Fundamentals of AI & ML Flashcards

Question

An ecommerce company wants to build an AI-driven recommendation engine that suggests products to customers based on their past browsing behavior and purchase history. The company needs a solution that can deliver personalized recommendations in real time. **Which AWS service meets these requirements?** 1. Amazon Rekognition 2. Amazon Personalize 3. Amazon Polly 4. Amazon Transcribe

Answer 1

**2.** Amazon Personalize ## Footnote Amazon Personalize is designed for building real-time recommendation systems based on user interactions, such as browsing history and past purchases. It personalizes customer experiences by providing relevant recommendations that can boost engagement and sales. * Amazon Rekognition is incorrect because Amazon Rekognition is used for image and video analysis, not for generating product recommendations. * Amazon Polly is incorrect because Amazon Polly converts text to speech and does not generate personalized recommendations. * Amazon Transcribe is incorrect because Amazon Transcribe converts speech to text and does not deal with customer behavior or recommendation systems. **Reference:** [Amazon Personalize](https://aws.amazon.com/personalize/)

Answer 2

**2.** Amazon Textract **3.** Amazon Polly ## Footnote Amazon Textract extracts text and data from scanned documents, making it ideal for processing physical forms. Amazon Polly converts text into lifelike speech, which helps the company make the extracted content accessible to users with visual impairments. * Amazon Comprehend is incorrect because Amazon Comprehend is used for natural language processing tasks like sentiment analysis and entity recognition, not for extracting text from scanned documents or converting text to speech. * Amazon Lex is incorrect because Amazon Lex is used for building conversational chatbots, not for extracting or converting text. * Amazon Rekognition is incorrect because Amazon Rekognition is used for image and video analysis, not for text extraction or speech synthesis. **References:** * [Amazon Textract](https://aws.amazon.com/textract/) * [Amazon Polly - AI Voice Generator](https://aws.amazon.com/polly/)

Answer 3

**4.** Implement Amazon Fraud Detector to create, train, and deploy machine learning models that automatically identify and flag fraudulent transactions in real time. ## Footnote Amazon Fraud Detector is specifically designed for detecting and preventing fraud in real-time transactions by leveraging machine learning models. It simplifies the process by offering pre-built fraud detection templates, allowing the company to deploy fraud prevention mechanisms quickly without the need to create custom models from scratch. * Use Amazon SageMaker to build custom machine learning models and deploy them for fraud detection: SageMaker requires significant effort to build custom models from scratch, whereas Amazon Fraud Detector is purpose-built for fraud detection, making it a more efficient solution for this specific need. * Use AWS Glue to preprocess transaction data and send it to a custom fraud detection model: AWS Glue is an ETL service for extracting, transforming, and loading data, not for real-time fraud detection. It would not help with creating or deploying machine learning models for fraud detection on its own. * Integrate Amazon Comprehend to analyze customer transaction data for sentiment and flag potential fraud: Amazon Comprehend is a natural language processing service designed for sentiment analysis and entity recognition, not for analyzing transactional data or detecting fraudulent activities. **Reference:** [Amazon Fraud Detector](https://aws.amazon.com/fraud-detector/)

Answer 4

**4.** Refine the prompt to instruct the model to provide clear and concise responses. ## Footnote By refining the prompt, the company can guide the foundation model to produce responses that are tailored to specific requirements, such as being concise and factual. Clear prompt instructions help the model stay focused on providing the desired output. * Decrease the temperature to reduce randomness in the responses is incorrect because lowering the temperature controls randomness in response generation but does not directly affect the length or level of detail in the answers. * Add detailed examples to the prompt to make responses longer is incorrect because the goal is to make responses more concise, not longer. Adding detailed examples would lead to lengthier outputs. * Increase the token limit to allow for more detailed answers is incorrect because increasing the token limit would allow the model to generate longer responses, which is contrary to the goal of keeping answers concise. **Reference:** [Prompt Engineering Concepts](https://docs.aws.amazon.com/bedrock/latest/userguide/prompt-engineering-guidelines.html)

Answer 5

**3.** Unsupervised learning ## Footnote Unsupervised learning is the appropriate approach for analyzing unstructured or unlabeled data. In this case, it can help the organization identify patterns and group similar patient records based on shared characteristics without the need for labeled data. * Data augmentation is incorrect because data augmentation involves creating new, modified versions of the data to expand the dataset, not for grouping unstructured data. * Supervised learning is incorrect because supervised learning requires labeled data, which the organization does not have. * Feature engineering is incorrect because feature engineering involves creating new features from existing data, not grouping or clustering unstructured data. **Reference:** [What’s the Difference Between Supervised and Unsupervised Learning?](https://aws.amazon.com/compare/the-difference-between-machine-learning-supervised-and-unsupervised/)

Answer 6

**2.** Feature engineering ## Footnote Feature engineering involves selecting, modifying, or creating new variables (features) from the dataset that are most relevant to improving the model’s performance. This stage helps the company enhance the model’s ability to make accurate predictions while reducing complexity. * Model deployment is incorrect because model deployment refers to the process of making the trained model available in production for real-world use, not selecting features. * Data collection is incorrect because data collection involves gathering raw data, and the company is already past this stage, working on refining the data. * Model evaluation is incorrect because model evaluation occurs after training to assess the model’s performance, not while selecting relevant features for training. **Reference:** [Feature Engineering](https://docs.aws.amazon.com/wellarchitected/latest/machine-learning-lens/feature-engineering.html)

Answer 7

**3.** Reinforcement learning based on customer engagement and purchase feedback ## Footnote Reinforcement learning helps improve the recommendation engine by using customer interactions (such as purchases and clicks) as feedback to optimize future recommendations. The engine is continuously learning from this feedback to enhance its performance over time. * Supervised learning with a static dataset of customer interactions is incorrect because static datasets don’t allow the model to continuously learn from new interactions, limiting its ability to improve over time. * Transfer learning with periodic updates from pre-trained models is incorrect because transfer learning involves applying a pre-trained model to a different domain and does not inherently support continuous learning from new data. * Unsupervised learning to categorize products without customer data is incorrect because unsupervised learning focuses on finding patterns in data, but without customer interaction, it wouldn’t improve recommendations over time. **Reference:** [What is Reinforcement Learning?](https://aws.amazon.com/what-is/reinforcement-learning/)

Answer 8

**1.** Batch inference ## Footnote Batch inference is ideal for scenarios where predictions are generated for large amounts of data at specific intervals rather than in real time. It allows the company to process patient records in batches, making it a cost-effective solution for handling large datasets that do not require immediate predictions. * Serverless inference is incorrect because serverless inference focuses on scaling automatically without managing servers but does not specifically address batch processing. * Real-time inference is incorrect because real-time inference is designed for scenarios where immediate predictions are needed for individual records or requests, which is not the case here. * Edge inference is incorrect because edge inference is used for running predictions on devices located at the edge of the network, closer to where the data is generated, and is not suited for batch processing. **Reference:** [Deploy models for inference](https://docs.aws.amazon.com/sagemaker/latest/dg/deploy-model.html)

Answer 9

**2.** Unlabeled, text data ## Footnote The company is working with unlabeled, text data because the customer feedback consists of written text that hasn’t been categorized or labeled as positive or negative. * Labeled, tabular data is incorrect because the text data is not labeled, and tabular data refers to data organized in rows and columns, which isn’t relevant in this scenario. * Time-series data is incorrect because time-series data refers to data points indexed in time order, which isn’t applicable to text-based customer feedback. * Structured, image data is incorrect because the data is text-based, not image-based, and is unstructured. **Reference:** [Data Labeling with a Human-in-the-Loop](https://docs.aws.amazon.com/sagemaker/latest/dg/data-label.html)

Answer 10

**3.** Time-series data ## Footnote The company is working with time-series data because it consists of daily sales amounts recorded over time. Time-series data is used to analyze and forecast patterns based on historical trends indexed by time. * Labeled, text data is incorrect because the data is numerical and indexed by time, not labeled or text-based. * Unstructured, image data is incorrect because the data is structured and numerical, not image-based or unstructured. * Unlabeled, tabular data is incorrect because, although the data might be tabular, the key characteristic is that it's indexed over time, making it time-series data. **Reference:** [Data Labeling with a Human-in-the-Loop](https://docs.aws.amazon.com/sagemaker/latest/dg/data-label.html)

Answer 11

**1.** Model re-training ## Footnote Model re-training is an MLOps practice where a model is periodically updated with new data to ensure it maintains high performance as data distributions change over time. This is essential for keeping the model production-ready and accurate. * Feature engineering is incorrect because feature engineering involves creating new input variables from raw data, but it doesn’t address the need to monitor and update models. * Data augmentation is incorrect because data augmentation refers to increasing the size of the training dataset by creating variations of the existing data, not monitoring or updating the model. * Batch inference is incorrect because batch inference refers to making predictions on large datasets, not about maintaining or updating models in production. **Reference:** [MLOps Checklist Components](https://docs.aws.amazon.com/prescriptive-guidance/latest/mlops-checklist/mlops-checklist-components.html)

Answer 12

**3.** Area Under the ROC Curve (AUC) ## Footnote AUC measures the ability of a binary classification model to distinguish between positive and negative classes at various threshold settings. It is a commonly used metric to evaluate model performance beyond simple accuracy. * Accuracy is incorrect because accuracy measures the proportion of correct predictions but does not account for the model's performance across different thresholds. * F1 score is incorrect because F1 score is the harmonic mean of precision and recall, focusing on balancing false positives and false negatives, but it doesn’t evaluate performance at different thresholds. * Precision is incorrect because precision measures the proportion of true positives among predicted positives but does not assess the model's overall performance across thresholds. **Reference:** [Types of ML Models](https://docs.aws.amazon.com/machine-learning/latest/dg/types-of-ml-models.html)

Answer 13

**3.** AWS PrivateLink ## Footnote AWS PrivateLink allows secure access to AWS services like Amazon S3 over a private network, ensuring data doesn’t traverse the public internet. * AWS Shield is incorrect because AWS Shield is for DDoS protection, not private access to AWS services. * Amazon GuardDuty is incorrect because GuardDuty is for threat detection, not securing network traffic. * AWS WAF is incorrect because AWS WAF is a web application firewall, not for securing private connections to AWS services. **Reference:** [AWS PrivateLink](https://aws.amazon.com/privatelink/)

Answer 14

**4.** Amazon EC2 Trn series instances for energy-efficient model training ## Footnote Amazon EC2 Trn series instances are designed for high-performance machine learning training with a focus on energy efficiency, making them ideal for reducing the environmental impact of model training. * Amazon EC2 C series instances for cost-efficiency is incorrect because C series instances are optimized for compute, not energy efficiency, which is the primary concern here. * Amazon EC2 G series instances for graphics processing is incorrect because G series instances are optimized for graphics-intensive workloads, not for energy-efficient AI training. * Amazon EC2 P series instances for large-scale ML training is incorrect because P series instances are used for large-scale ML training but do not prioritize energy efficiency for sustainability. **Reference:** [AWS Sustainability](https://aws.amazon.com/sustainability/)

Answer 15

**3.** Reinforcement learning from human feedback (RLHF) ## Footnote Reinforcement learning from human feedback (RLHF) allows the model to learn by incorporating human feedback into its reward system. This technique is ideal for aligning the AI model’s behavior with human preferences, as it uses evaluators to guide the model’s decisions. * Supervised learning is incorrect because supervised learning requires labeled data but does not involve feedback from human evaluators during the training process. * Transfer learning is incorrect because transfer learning involves reusing a pre-trained model on a different task, but it doesn’t involve human feedback during training. * Unsupervised learning is incorrect because unsupervised learning identifies patterns in data without any labeled examples or human feedback. **Reference:** [What is RLHF?](https://aws.amazon.com/what-is/reinforcement-learning-from-human-feedback/)

Answer 16

**2.** Amazon Neptune ## Footnote Amazon Neptune is a graph database service optimized for storing and querying relationships between entities. It is ideal for use cases that involve complex relationships, such as recommendation systems, social networks, or knowledge graphs. * Amazon RDS is incorrect because RDS is a relational database service and not designed for efficiently handling complex relationships and graph data. * Amazon Redshift is incorrect because Redshift is a data warehousing service optimized for analytics, not for storing and querying graph-based relationships. * Amazon DynamoDB is incorrect because DynamoDB is a NoSQL database optimized for key-value and document-based use cases, but it is not ideal for querying graph-based data. **Reference:** [What Is Amazon Neptune?](https://docs.aws.amazon.com/neptune/latest/userguide/intro.html)

Fundamentals of AI & ML Flashcards

Understand fundamental AI and machine learning concepts, including learning types, model training, and inference. (40 cards)