Fundamentals of AI & ML Flashcards

Understand fundamental AI and machine learning concepts, including learning types, model training, and inference. (40 cards)

1
Q

A business is developing a machine learning model to analyze large archived datasets. These datasets are several gigabytes in size, and the company does not require immediate access to the predictions.

What Amazon SageMaker inference option should the company choose?

  1. Batch transform
  2. Real-time inference
  3. Serverless inference
  4. Asynchronous inference
A

1. Batch transform

Batch transform is the ideal solution when you need to perform inference on large datasets without needing real-time results. It allows for processing data in bulk and is optimized for situations where immediate model predictions aren’t required, making it suitable for analyzing multiple gigabytes of archived data.

  • Real-time inference is incorrect because real-time inference is designed for low-latency scenarios where predictions are needed almost immediately after a request is made. This option is not necessary when there is no need for instant access to predictions.
  • Serverless inference is incorrect because serverless inference is best suited for sporadic workloads where the model needs to scale up and down without managing infrastructure. It’s not the best option for processing large, non-urgent datasets in bulk.
  • Asynchronous inference is incorrect because asynchronous inference is intended for scenarios where you need to handle large payloads with a delay, but it’s more focused on single prediction requests that take time to process, rather than bulk processing like batch transform.

Reference:
Deploy models for inference

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

A company is developing a machine learning model using Amazon SageMaker and needs a solution to store and share feature sets across different teams for collaborative model building.

Which Amazon SageMaker feature should the company use?

  1. Amazon SageMaker Feature Store
  2. Amazon SageMaker Data Wrangler
  3. Amazon SageMaker Clarify
  4. Amazon SageMaker Model Registry
A

1. Amazon SageMaker Feature Store

SageMaker Feature Store is designed to allow teams to store, manage, and share features (attributes or variables) in a central repository. This ensures consistency across models and helps teams collaborate by reusing the same features across multiple projects.

  • Amazon SageMaker Data Wrangler is incorrect because Data Wrangler is used for data transformation and preparation, helping users to clean and structure data before training models. It does not provide a mechanism to store and share features across teams.
  • Amazon SageMaker Clarify is incorrect because Clarify is focused on detecting bias in machine learning models and ensuring explainability. It is not used for storing or sharing features between teams.
  • Amazon SageMaker Model Registry is incorrect because the Model Registry is designed for managing and versioning machine learning models, not for storing or sharing feature sets used during model development.

Reference:
Create, store, and share features with Feature Store

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

A business uses Amazon SageMaker to run its machine learning pipeline in a production environment. The company processes large datasets, sometimes reaching 1 GB in size, with processing times that can take up to an hour. To support its operations, the company requires low-latency predictions.

Which Amazon SageMaker inference option should the company choose?

  1. Real-time inference
  2. Serverless inference
  3. Asynchronous inference
  4. Batch transform
A

1. Real-time inference

Real-time inference is designed for scenarios where low-latency responses are needed. It is ideal when predictions need to be generated immediately upon receiving input data, making it suitable for use cases requiring near real-time results, even with large datasets.

  • Serverless inference is incorrect because serverless inference is optimized for intermittent workloads that don’t require low-latency predictions. While it’s cost-effective for occasional requests, it doesn’t meet the requirement for near real-time performance when processing large datasets.
  • Asynchronous inference is incorrect because asynchronous inference is intended for situations where the input size is large or the processing time is lengthy, but real-time predictions are not required. It’s useful when results can be delayed, but it doesn’t support the company’s need for near real-time latency.
  • Batch transform is incorrect because batch transform is used for processing large datasets in batches without a focus on real-time results. It is better suited for use cases where predictions are processed in bulk and not required immediately after input.

Reference:
Deploy models for inference

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

A company is utilizing machine learning models for specialized tasks in a specific domain. To save time and resources, the company prefers to modify existing pre-trained models instead of building new ones from scratch.

Which machine learning approach should the company use?

  1. Increase the number of training iterations.
  2. Apply transfer learning.
  3. Reduce the number of training iterations.
  4. Implement unsupervised learning.
A

2. Apply transfer learning.

Transfer learning allows a company to leverage pre-trained models and adapt them to new, related tasks. Instead of training a model from scratch, the pre-trained model’s knowledge is fine-tuned for the new task, significantly reducing the training time and required data.

  • Increase the number of training iterations is incorrect because increasing the number of epochs (iterations) only affects the training of the model, but it doesn’t allow you to reuse knowledge from pre-trained models. This would require starting with a model from scratch, which the company wants to avoid.
  • Reduce the number of training iterations is incorrect because reducing the number of epochs does not address the need for reusing pre-trained models. While this may shorten training time, it doesn’t leverage pre-existing model knowledge, which is the main objective.
  • Implement unsupervised learning is incorrect because unsupervised learning involves training a model without labeled data. The company’s goal is to adapt pre-trained models for new tasks, which typically involves supervised or fine-tuned learning rather than unsupervised techniques.

Reference:
Maximize business outcomes with machine learning on AWS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

A company has developed a chatbot that responds to user queries with images. The company needs to ensure that the chatbot avoids displaying inappropriate or offensive images.

Which approach should the company take to achieve this?

  1. Use content moderation tools to filter image responses.
  2. Retrain the model using larger, publicly available datasets.
  3. Implement regular performance checks on the chatbot.
  4. Enable automatic updates from user feedback.
A

1. Use content moderation tools to filter image responses.

Content moderation tools can scan and block images that are inappropriate, ensuring that only safe and relevant images are returned by the chatbot.

  • Retrain the model using larger, publicly available datasets is incorrect because retraining with a public dataset will not directly address the issue of inappropriate images unless the dataset itself is highly curated.
  • Implement regular performance checks on the chatbot is incorrect because performance checks assess the chatbot’s overall functioning but won’t specifically prevent inappropriate images from being displayed.
  • Enable automatic updates from user feedback is incorrect because relying on user feedback is reactive. The company needs a proactive solution like moderation to prevent inappropriate content from being displayed in the first place.

Reference:
Amazon Rekognition Content Moderation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

A company is developing a machine learning model. The company has gathered new data and is analyzing it by generating correlation matrices, calculating statistics, and visualizing patterns in the dataset.

What stage of the machine learning pipeline is the company in?

  1. Data cleansing
  2. Feature extraction
  3. Exploratory data analysis
  4. Model evaluation
A

3. Exploratory data analysis

The company is focused on understanding the data by visualizing relationships and calculating statistical measures. These activities are key components of exploratory data analysis (EDA), which helps identify patterns and guide further steps in the pipeline.

  • Data cleansing is incorrect because data cleansing focuses on correcting or removing inaccurate records from the dataset. The scenario describes analysis and visualization, not the cleaning of data.
  • Feature extraction is incorrect because feature extraction refers to creating or selecting specific attributes (features) to improve the model’s performance. The company is still in the data exploration phase and hasn’t started engineering or extracting features yet.
  • Model evaluation is incorrect because model evaluation is the process of assessing the performance of a trained model. The company has not yet built or trained a model, so evaluation is not relevant at this stage.

Reference:
Maximize business outcomes with machine learning on AWS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

A company is using Amazon SageMaker Studio notebooks to build and train machine learning models. The data is stored in an Amazon S3 bucket, and the company needs to manage the data flow between Amazon S3 and SageMaker Studio notebooks.

Which solution will meet this requirement?

  1. Use Amazon Inspector to monitor SageMaker Studio.
  2. Use Amazon Macie to track data flow in SageMaker Studio.
  3. Configure SageMaker to use a VPC with an S3 VPC endpoint.
  4. Configure SageMaker to use S3 Glacier Deep Archive for data access.
A

3. Configure SageMaker to use a VPC with an S3 VPC endpoint.

Setting up a VPC with an S3 VPC endpoint allows secure and efficient access to data stored in Amazon S3 without using the public internet, ensuring smooth data flow between SageMaker and S3.

  • Use Amazon Inspector to monitor SageMaker Studio is incorrect because Amazon Inspector is a security assessment service, not a service for managing data flow between S3 and SageMaker.
  • Use Amazon Macie to track data flow in SageMaker Studio is incorrect because Amazon Macie is used for data security and privacy, specifically for identifying sensitive data in S3. It doesn’t manage the flow of data between SageMaker and S3.
  • Configure SageMaker to use S3 Glacier Deep Archive for data access is incorrect because S3 Glacier Deep Archive is used for long-term, low-cost storage of infrequently accessed data. It’s not suitable for active data access and flow management between SageMaker and S3.

Reference:
Amazon SageMaker Documentation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

A logistics company has thousands of warehouse images and wants to automatically identify and classify different types of items stored in the images without manual effort.

Which strategy will help the company achieve this?

  1. Anomaly detection
  2. Object detection
  3. Named entity recognition
  4. Semantic segmentation
A

2. Object detection

Object detection is a computer vision technique used to identify and classify multiple objects within an image. In this case, it can automatically identify and categorize different items stored in the warehouse.

  • Anomaly detection is incorrect because anomaly detection is used to identify unusual patterns or outliers in data, not to recognize or classify objects in images.
  • Named entity recognition is incorrect because named entity recognition (NER) is a natural language processing technique used to identify entities in text, not for identifying objects in images.
  • Semantic segmentation is incorrect because while semantic segmentation labels each pixel in an image to classify different parts of the image, it is more detailed than what is needed for simply identifying and categorizing items.

Reference:
How Object Detection Works

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

A healthtech startup has created a machine learning model that analyzes X-ray images to detect potential signs of illness. The company wants to deploy the model to production so that doctors can upload X-rays via a web application and receive predictions in real-time. The company prefers a solution that does not require managing underlying infrastructure.

Which solution should the company use?

  1. Use Amazon SageMaker Serverless Inference to deploy the model.
  2. Use Amazon CloudFront to serve the model for real-time predictions.
  3. Use Amazon API Gateway to deploy the model and serve predictions.
  4. Use AWS Batch to deploy the model for processing X-ray images.
A

1. Use Amazon SageMaker Serverless Inference to deploy the model.

SageMaker Serverless Inference provides a fully managed, serverless environment for hosting and serving machine learning models. It allows the company to focus on deploying the model without managing any underlying infrastructure, which meets the company’s needs.

  • Use Amazon CloudFront to serve the model for real-time predictions is incorrect because CloudFront is a content delivery network (CDN) used to deliver data with low latency but is not designed for hosting and serving machine learning models.
  • Use Amazon API Gateway to deploy the model and serve predictions is incorrect because API Gateway is used to expose APIs, but it doesn’t host or serve machine learning models. It can be used alongside SageMaker but isn’t sufficient on its own for model hosting.
  • Use AWS Batch to deploy the model for processing X-ray images is incorrect because AWS Batch is designed for batch processing large jobs, not for real-time inference or serving predictions for a web application.

Reference:
Deploy models with Amazon SageMaker Serverless Inference

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

A retail company has collected terabytes of customer purchase data but the data is not labeled. The company wants to segment its customers into groups for a targeted marketing campaign based on their purchasing patterns.

Which machine learning approach should the company use to achieve this?

  1. Data clustering
  2. Unsupervised learning
  3. Semi-supervised learning
  4. Deep reinforcement learning
A

2. Unsupervised learning

Unsupervised learning is ideal for this task because it works with unlabeled data and can identify patterns in the data to group customers based on their purchasing behaviors. This method will allow the company to classify its customers into segments for targeted marketing.

  • Data clustering is incorrect because clustering is a technique used in unsupervised learning but is not a learning methodology itself. The broader approach should be unsupervised learning.
  • Semi-supervised learning is incorrect because semi-supervised learning works with a mix of labeled and unlabeled data. Since the company’s data is entirely unlabeled, this method would not be the best fit.
  • Deep reinforcement learning is incorrect because deep reinforcement learning is focused on learning through interactions in an environment with rewards, not on data classification or segmentation of large, unlabeled datasets.

Reference:
What’s the Difference Between Supervised and Unsupervised Learning?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

A healthcare organization is handling a large number of patient records in PDF format. As the volume of records continues to grow, the organization needs an automated system to convert these PDF documents into plain text format for integration into their electronic health record (EHR) system.

Which AWS service meets this requirement?

  1. Amazon Personalize
  2. Amazon Lex
  3. Amazon Textract
  4. Amazon Transcribe
A

3. Amazon Textract

Amazon Textract is the ideal solution because it can automatically extract text, tables, and forms from PDFs and other scanned documents. This allows the healthcare organization to convert patient records into plain text, making it easier to integrate the data into their EHR system.

  • Amazon Personalize is incorrect because Amazon Personalize is used to provide personalized recommendations, not for extracting text from documents.
  • Amazon Lex is incorrect because Amazon Lex is focused on building conversational interfaces and chatbots, not on converting documents into text.
  • Amazon Transcribe is incorrect because Amazon Transcribe is designed for converting speech to text from audio files, not from PDF documents.

Reference:
Amazon Textract

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

A retail company wants to predict customer demand for seasonal products. The company lacks coding experience and knowledge of machine learning algorithms but needs to build a predictive model using internal sales data and external market data.

Which solution will meet these requirements?

  1. Import the data into Amazon SageMaker Studio. Build ML models and predict demand using built-in SageMaker algorithms.
  2. Import the data into Amazon SageMaker Data Wrangler and build a demand forecasting model with SageMaker JumpStart.
  3. Import the data into Amazon SageMaker Canvas. Build ML models and predict demand by selecting values in the data from SageMaker Canvas.
  4. Use Amazon Lex to analyze the data and automatically generate predictions for product demand.
A

3. Import the data into Amazon SageMaker Canvas. Build ML models and predict demand by selecting values in the data from SageMaker Canvas.

SageMaker Canvas allows users with no coding or machine learning experience to create models by simply interacting with the data via a point-and-click interface. This makes it the best choice for the retail company to generate demand forecasts without technical expertise.

  • Import the data into Amazon SageMaker Studio. Build ML models and predict demand using built-in SageMaker algorithms is incorrect because using SageMaker Studio and its algorithms requires coding and machine learning knowledge, which the company does not have.
  • Import the data into Amazon SageMaker Data Wrangler and build a demand forecasting model with SageMaker JumpStart is incorrect because, although SageMaker JumpStart simplifies access to pre-built models, using Data Wrangler and tuning these models still requires some machine learning understanding.
  • Use Amazon Lex to analyze the data and automatically generate predictions for product demand is incorrect because Amazon Lex is designed for building conversational AI, like chatbots, and is not suitable for forecasting product demand.

Reference:
Amazon SageMaker Canvas

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

A food processing company has built an AI model to classify different types of fruits based on images. The company wants to evaluate how many images the model has correctly classified into the right fruit categories.

Which evaluation metric should the company use to measure the model’s performance?

  1. F1 score
  2. Accuracy
  3. Mean Absolute Error (MAE)
  4. Dropout rate
A

2. Accuracy

Accuracy is the appropriate metric because it measures the proportion of images that the model classified correctly out of the total number of images. This is the most straightforward metric for evaluating how well a classification model is performing.

  • F1 score is incorrect because while the F1 score balances precision and recall, it is more commonly used when dealing with imbalanced datasets. In this case, accuracy is a simpler and more direct measure for evaluating overall classification performance.
  • Mean Absolute Error (MAE) is incorrect because MAE is used to measure errors in regression tasks, not classification. It calculates the difference between predicted and actual continuous values, not category labels.
  • Dropout rate is incorrect because dropout rate is a parameter used during the training of neural networks to prevent overfitting. It is not an evaluation metric for measuring model performance.

Reference:
Maximize business outcomes with machine learning on AWS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

A biotechnology company needs to classify human genes into 20 categories based on various gene characteristics. The company also requires a machine learning algorithm that can clearly document how the inner workings of the model influence its decisions and outputs.

Which machine learning algorithm should the company use?

  1. K-means clustering
  2. Support vector machines (SVM)
  3. Decision trees
  4. Neural networks
A

3. Decision trees

Decision trees are well-suited for this task because they provide transparency in their decision-making process. The structure of a decision tree allows the company to trace how the input characteristics lead to specific classifications, making it easy to document the inner mechanism of the model and its impact on the output.

  • K-means clustering is incorrect because K-means is used for unsupervised learning and doesn’t inherently document how specific features influence its clustering results.
  • Support vector machines (SVM) is incorrect because while SVMs are effective for classification, they are less interpretable and harder to document in terms of how inputs lead to decisions compared to decision trees.
  • Neural networks is incorrect because neural networks, while powerful, are often seen as “black box” models, making it difficult to document and explain the inner workings and how inputs affect outputs.

Reference:
Maximize business outcomes with machine learning on AWS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

A company is developing an educational app where users solve basic math problems such as: “A bag contains 8 blue balls, 5 red balls, and 2 yellow balls. What is the probability of picking a red ball?” The company needs a solution that minimizes operational overhead.

Which solution will meet these requirements with the least operational complexity?

  1. Use supervised learning to create a classification model for probability prediction.
  2. Use reinforcement learning to teach a model to compute probabilities.
  3. Use a simple algorithm that calculates probability using basic rules and formulas.
  4. Use unsupervised learning to generate a model for probability estimation.
A

3. Use a simple algorithm that calculates probability using basic rules and formulas.

Using a simple algorithm is the best solution because probability problems can be solved using basic math formulas, without the need for complex machine learning models. This approach requires minimal operational overhead and ensures accurate results through straightforward computations.

  • Use supervised learning to create a classification model for probability prediction is incorrect because supervised learning is unnecessary for basic probability calculations, which are deterministic and do not require a model.
  • Use reinforcement learning to teach a model to compute probabilities is incorrect because reinforcement learning is used for decision-making tasks where an agent interacts with an environment, which is excessive for simple probability problems.
  • Use unsupervised learning to generate a model for probability estimation is incorrect because unsupervised learning is meant for discovering patterns in unlabeled data, not for calculating explicit probabilities based on known values.

Reference:
Maximize Business Outcomes with Machine Learning on AWS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

An AI researcher has developed a deep learning model to identify different types of textures in images. The researcher now wants to assess how well the model performs in classifying these textures.

Which metric will help the researcher evaluate the model’s performance?

  1. Confusion matrix
  2. Correlation matrix
  3. R2 score
  4. Mean absolute error (MAE)
A

1. Confusion matrix

A confusion matrix is a tool used to evaluate the performance of a classification model by showing the true positives, true negatives, false positives, and false negatives. This helps the AI researcher understand how well the model is classifying the textures in the images and where it may be making mistakes.

  • Correlation matrix is incorrect because a correlation matrix shows relationships between variables in a dataset but does not help evaluate classification performance.
  • R2 score is incorrect because R2 score is used to measure the goodness-of-fit for regression models, not classification models.
  • Mean absolute error (MAE) is incorrect because MAE is used to measure error in regression models, not classification tasks like image classification.

Reference:
Viewing the confusion matrix for a model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

A retail company has terabytes of data stored in its database, which can be used for business analysis. The company wants to develop an AI application that can generate SQL queries from simple text inputs provided by employees with minimal technical experience.

Which solution meets these requirements?

  1. Generative pre-trained transformers (GPT)
  2. Convolutional neural network (CNN)
  3. Random forest
  4. Recurrent neural network (RNN)
A

1. Generative pre-trained transformers (GPT)

Generative pre-trained transformers (GPT) are ideal for this task because they are designed for natural language processing (NLP) tasks, such as converting human language into structured queries like SQL. GPT models can understand and interpret employee text inputs and generate the appropriate SQL queries, even for users with minimal technical skills.

  • Convolutional neural network (CNN) is incorrect because CNNs are typically used for image processing tasks, not for generating SQL queries from text inputs.
  • Random forest is incorrect because random forests are used for classification and regression tasks, not for natural language understanding or generating queries from text.
  • Recurrent neural network (RNN) is incorrect because while RNNs are used for sequential data like time series or language modeling, GPT is a more advanced architecture for NLP tasks such as converting text to SQL.

Reference:
What is Natural Language Processing (NLP)?

18
Q

A healthcare company is developing an application that needs to generate synthetic medical data based on patterns observed in existing patient datasets.

Which type of model should the company use to meet this requirement?

  1. Generative adversarial network (GAN)
  2. Support vector machine (SVM)
  3. Convolutional neural network (CNN)
  4. Decision tree
A

1. Generative adversarial network (GAN)

A Generative adversarial network (GAN) is ideal for generating synthetic data that mimics real data by learning the patterns from existing datasets. GANs are commonly used for creating synthetic images, text, and other types of data, making them well-suited for this task.

  • Support vector machine (SVM) is incorrect because SVM is used for classification and regression tasks, not for generating synthetic data.
  • Convolutional neural network (CNN) is incorrect because CNNs are typically used for image recognition and processing tasks, not for generating synthetic data.
  • Decision tree is incorrect because decision trees are used for decision-making and classification tasks, but they do not generate synthetic data.

Reference:
What is a GAN?

19
Q

A research company has historical transcripts of interviews, but some portions of the text are missing due to errors in data collection. The company needs to build a machine learning model that can predict and fill in the missing words based on the surrounding context.

Which type of model meets this requirement?

  1. LDA (Latent Dirichlet Allocation) models
  2. K-means clustering models
  3. BERT-based models
  4. Time series models
A

3. BERT-based models

BERT-based models are designed for natural language processing tasks, including predicting missing words in a sentence. BERT’s ability to understand context by looking at the words before and after the missing sections makes it highly effective for tasks such as filling in gaps in transcripts.

  • LDA (Latent Dirichlet Allocation) models is incorrect because LDA is used for topic modeling, which helps in identifying themes in a text but is not suited for predicting missing words.
  • K-means clustering models is incorrect because K-means is a clustering algorithm that groups similar data points together, not for completing text or predicting missing words.
  • Time series models is incorrect because time series models are used for predicting future values based on historical data trends, not for completing sentences or predicting words in text.

Reference:
Fine-tune and host Hugging Face BERT models on Amazon SageMaker

20
Q

A video game development company is building an AI system that can generate responses based on player interactions. The AI needs to represent in-game objects and player actions numerically to understand their relationships and meanings.

Which term describes these numerical representations that AI models use to enhance their understanding of in-game interactions?

  1. Embeddings
  2. Frames
  3. Layers
  4. Buffers
A

1. Embeddings

Embeddings are the numerical representations that AI systems use to capture the relationships between different objects, actions, or concepts. In this case, embeddings can help the AI understand the interactions between players and in-game objects by mapping them to a numerical space.

  • Frames is incorrect because frames refer to visual or graphical content in games, not the numerical representation of objects or actions used by AI models.
  • Layers is incorrect because layers refer to levels of computation in neural networks, not the representation of data itself.
  • Buffers is incorrect because buffers are used to temporarily hold data in computing systems and are not related to how AI models understand relationships between objects and actions.

Reference:
What is Natural Language Processing (NLP)?

21
Q

A telecommunications company is developing a customer support chatbot. The company wants the chatbot to learn from past interactions and online knowledge bases to continuously improve its responses over time.

Which AI learning strategy provides this self-improvement capability?

  1. Supervised learning with a manually labeled dataset of customer responses.
  2. Unsupervised learning to identify common patterns in customer queries.
  3. Transfer learning by updating the model with a new training dataset each week.
  4. Reinforcement learning with rewards based on customer satisfaction and feedback.
A

4. Reinforcement learning with rewards based on customer satisfaction and feedback.

Reinforcement learning enables the chatbot to improve by learning from its interactions. The chatbot receives feedback in the form of rewards or penalties based on customer satisfaction, which helps it adjust its responses over time to become more effective.

  • Supervised learning with a manually labeled dataset of customer responses is incorrect because supervised learning relies on fixed datasets, and it does not allow the chatbot to improve dynamically from real-time interactions.
  • Unsupervised learning to identify common patterns in customer queries is incorrect because unsupervised learning groups data but does not provide the feedback mechanism necessary for self-improvement.
  • Transfer learning by updating the model with a new training dataset each week is incorrect because transfer learning helps a model adapt to new tasks but doesn’t offer the self-improvement capability needed for continuous learning from interactions.

Reference:
What is Reinforcement Learning?

22
Q

A company is developing a new model to predict the prices of specific items. The model performed well during training, but its performance dropped significantly once deployed to production.

What should the company do to mitigate this problem?

  1. Reduce the volume of data used in training.
  2. Add hyperparameters to the model.
  3. Increase the diversity of the training data to better match real-world conditions.
  4. Use a simpler model architecture to avoid overfitting.
A

3. Increase the diversity of the training data to better match real-world conditions.

The issue likely arises because the training data did not fully represent the real-world data the model encounters in production. Increasing the diversity of the training data will help the model generalize better and improve its performance in production.

  • Reduce the volume of data used in training is incorrect because reducing data volume can worsen performance by limiting the information available for the model to learn.
  • Add hyperparameters to the model is incorrect because adding hyperparameters alone won’t necessarily address the problem if the training data doesn’t match real-world conditions.
  • Use a simpler model architecture to avoid overfitting is incorrect because the problem is likely due to poor generalization, not overfitting, so simplifying the model architecture might not solve the issue.

Reference:
Analytics and AI/ML Solutions

23
Q

A healthcare company is building an application to analyze patient-doctor conversations. The company wants to extract key medical details from the audio recordings of consultations for further analysis.

Which solution meets these requirements?

  1. Build a voice-controlled assistant using Amazon Alexa.
  2. Transcribe medical consultations using Amazon Transcribe.
  3. Monitor audio quality using Amazon SageMaker Model Monitor.
  4. Analyze images from consultation notes using Amazon Rekognition.
A

2. Transcribe medical consultations using Amazon Transcribe.

Amazon Transcribe is designed to convert audio recordings into text. In this case, it allows the company to transcribe patient-doctor conversations into text, enabling further analysis of the medical details discussed during consultations.

  • Build a voice-controlled assistant using Amazon Alexa is incorrect because Amazon Alexa is designed for building voice-controlled applications, not for transcribing or analyzing recorded conversations.
  • Monitor audio quality using Amazon SageMaker Model Monitor is incorrect because SageMaker Model Monitor is used for detecting data drift and monitoring machine learning models, not for audio transcription.
  • Analyze images from consultation notes using Amazon Rekognition is incorrect because Amazon Rekognition is used for image and video analysis, not for processing audio recordings or extracting text from speech.

Reference:
Amazon Transcribe

24
Q

A legal firm wants to build an AI-powered search tool to help employees quickly find relevant information from large sets of legal documents. The tool must provide accurate answers to employee queries by understanding the context of the search terms and extracting relevant information from the documents.

Which AWS service meets these requirements?

  1. Amazon Personalize
  2. Amazon Kendra
  3. Amazon Polly
  4. Amazon Lex
A

2. Amazon Kendra

Amazon Kendra is an AI-powered search service designed to provide accurate, contextual search results across large sets of documents. It helps users find relevant information quickly by understanding the intent behind search queries and returning precise answers from complex data sources.

  • Amazon Personalize is incorrect because Amazon Personalize is focused on real-time recommendation systems, not intelligent document search.
  • Amazon Polly is incorrect because Amazon Polly converts text to speech, which is not relevant for document search.
  • Amazon Lex is incorrect because Amazon Lex is used for building conversational chatbots, not for document search and information retrieval.

Reference:
Amazon Kendra

25
An ecommerce company wants to build an AI-driven recommendation engine that suggests products to customers based on their past browsing behavior and purchase history. The company needs a solution that can deliver personalized recommendations in real time. **Which AWS service meets these requirements?** 1. Amazon Rekognition 2. Amazon Personalize 3. Amazon Polly 4. Amazon Transcribe
**2.** Amazon Personalize ## Footnote Amazon Personalize is designed for building real-time recommendation systems based on user interactions, such as browsing history and past purchases. It personalizes customer experiences by providing relevant recommendations that can boost engagement and sales. * Amazon Rekognition is incorrect because Amazon Rekognition is used for image and video analysis, not for generating product recommendations. * Amazon Polly is incorrect because Amazon Polly converts text to speech and does not generate personalized recommendations. * Amazon Transcribe is incorrect because Amazon Transcribe converts speech to text and does not deal with customer behavior or recommendation systems. **Reference:** [Amazon Personalize](https://aws.amazon.com/personalize/)
26
A company is building an AI application to process and read physical customer forms. The company needs to extract text from the scanned forms and convert the extracted text into lifelike speech for users with visual impairments. **Which AWS services meet these requirements?** (Select TWO.) 1. Amazon Comprehend 2. Amazon Textract 3. Amazon Polly 4. Amazon Lex 5. Amazon Rekognition
**2.** Amazon Textract **3.** Amazon Polly ## Footnote Amazon Textract extracts text and data from scanned documents, making it ideal for processing physical forms. Amazon Polly converts text into lifelike speech, which helps the company make the extracted content accessible to users with visual impairments. * Amazon Comprehend is incorrect because Amazon Comprehend is used for natural language processing tasks like sentiment analysis and entity recognition, not for extracting text from scanned documents or converting text to speech. * Amazon Lex is incorrect because Amazon Lex is used for building conversational chatbots, not for extracting or converting text. * Amazon Rekognition is incorrect because Amazon Rekognition is used for image and video analysis, not for text extraction or speech synthesis. **References:** * [Amazon Textract](https://aws.amazon.com/textract/) * [Amazon Polly - AI Voice Generator](https://aws.amazon.com/polly/)
27
A financial services company wants to automatically detect potentially fraudulent transactions in real time to protect its customers. The company needs a solution that uses machine learning to flag suspicious activities and prevent fraud as transactions occur. **Which solution will meet these requirements?** 1. Use Amazon SageMaker to build custom machine learning models and deploy them for fraud detection. 2. Use AWS Glue to preprocess transaction data and send it to a custom fraud detection model. 3. Integrate Amazon Comprehend to analyze customer transaction data for sentiment and flag potential fraud. 4. Implement Amazon Fraud Detector to create, train, and deploy machine learning models that automatically identify and flag fraudulent transactions in real time.
**4.** Implement Amazon Fraud Detector to create, train, and deploy machine learning models that automatically identify and flag fraudulent transactions in real time. ## Footnote Amazon Fraud Detector is specifically designed for detecting and preventing fraud in real-time transactions by leveraging machine learning models. It simplifies the process by offering pre-built fraud detection templates, allowing the company to deploy fraud prevention mechanisms quickly without the need to create custom models from scratch. * Use Amazon SageMaker to build custom machine learning models and deploy them for fraud detection: SageMaker requires significant effort to build custom models from scratch, whereas Amazon Fraud Detector is purpose-built for fraud detection, making it a more efficient solution for this specific need. * Use AWS Glue to preprocess transaction data and send it to a custom fraud detection model: AWS Glue is an ETL service for extracting, transforming, and loading data, not for real-time fraud detection. It would not help with creating or deploying machine learning models for fraud detection on its own. * Integrate Amazon Comprehend to analyze customer transaction data for sentiment and flag potential fraud: Amazon Comprehend is a natural language processing service designed for sentiment analysis and entity recognition, not for analyzing transactional data or detecting fraudulent activities. **Reference:** [Amazon Fraud Detector](https://aws.amazon.com/fraud-detector/)
28
A customer service company is building a virtual assistant to help users resolve common technical issues. The company wants to ensure that the assistant provides concise, factual answers without unnecessary details. **What should the company do to achieve this?** 1. Decrease the temperature to reduce randomness in the responses. 2. Add detailed examples to the prompt to make responses longer. 3. Increase the token limit to allow for more detailed answers. 4. Refine the prompt to instruct the model to provide clear and concise responses.
**4.** Refine the prompt to instruct the model to provide clear and concise responses. ## Footnote By refining the prompt, the company can guide the foundation model to produce responses that are tailored to specific requirements, such as being concise and factual. Clear prompt instructions help the model stay focused on providing the desired output. * Decrease the temperature to reduce randomness in the responses is incorrect because lowering the temperature controls randomness in response generation but does not directly affect the length or level of detail in the answers. * Add detailed examples to the prompt to make responses longer is incorrect because the goal is to make responses more concise, not longer. Adding detailed examples would lead to lengthier outputs. * Increase the token limit to allow for more detailed answers is incorrect because increasing the token limit would allow the model to generate longer responses, which is contrary to the goal of keeping answers concise. **Reference:** [Prompt Engineering Concepts](https://docs.aws.amazon.com/bedrock/latest/userguide/prompt-engineering-guidelines.html)
29
A healthcare organization has collected a large amount of medical data from patient records but the data is unstructured and lacks predefined labels. The organization wants to group similar patient records together to identify patterns in medical conditions and improve treatment strategies. **Which machine learning approach should the organization use to achieve this?** 1. Data augmentation 2. Supervised learning 3. Unsupervised learning 4. Feature engineering
**3.** Unsupervised learning ## Footnote Unsupervised learning is the appropriate approach for analyzing unstructured or unlabeled data. In this case, it can help the organization identify patterns and group similar patient records based on shared characteristics without the need for labeled data. * Data augmentation is incorrect because data augmentation involves creating new, modified versions of the data to expand the dataset, not for grouping unstructured data. * Supervised learning is incorrect because supervised learning requires labeled data, which the organization does not have. * Feature engineering is incorrect because feature engineering involves creating new features from existing data, not grouping or clustering unstructured data. **Reference:** [What’s the Difference Between Supervised and Unsupervised Learning?](https://aws.amazon.com/compare/the-difference-between-machine-learning-supervised-and-unsupervised/)
30
A company is building a machine learning model to predict customer churn. After cleaning and preparing the dataset, the company is now selecting the most relevant variables from the data to improve model performance and reduce complexity. **Which stage of the ML pipeline is the company currently in?** 1. Model deployment 2. Feature engineering 3. Data collection 4. Model evaluation
**2.** Feature engineering ## Footnote Feature engineering involves selecting, modifying, or creating new variables (features) from the dataset that are most relevant to improving the model’s performance. This stage helps the company enhance the model’s ability to make accurate predictions while reducing complexity. * Model deployment is incorrect because model deployment refers to the process of making the trained model available in production for real-world use, not selecting features. * Data collection is incorrect because data collection involves gathering raw data, and the company is already past this stage, working on refining the data. * Model evaluation is incorrect because model evaluation occurs after training to assess the model’s performance, not while selecting relevant features for training. **Reference:** [Feature Engineering](https://docs.aws.amazon.com/wellarchitected/latest/machine-learning-lens/feature-engineering.html)
31
An e-commerce company is developing a product recommendation engine that suggests items based on customer browsing behavior and purchase history. The company wants the engine to improve continuously by learning from new customer interactions and product updates. **Which AI learning strategy will help the company achieve this goal?** 1. Supervised learning with a static dataset of customer interactions 2. Transfer learning with periodic updates from pre-trained models 3. Reinforcement learning based on customer engagement and purchase feedback 4. Unsupervised learning to categorize products without customer data
**3.** Reinforcement learning based on customer engagement and purchase feedback ## Footnote Reinforcement learning helps improve the recommendation engine by using customer interactions (such as purchases and clicks) as feedback to optimize future recommendations. The engine is continuously learning from this feedback to enhance its performance over time. * Supervised learning with a static dataset of customer interactions is incorrect because static datasets don’t allow the model to continuously learn from new interactions, limiting its ability to improve over time. * Transfer learning with periodic updates from pre-trained models is incorrect because transfer learning involves applying a pre-trained model to a different domain and does not inherently support continuous learning from new data. * Unsupervised learning to categorize products without customer data is incorrect because unsupervised learning focuses on finding patterns in data, but without customer interaction, it wouldn’t improve recommendations over time. **Reference:** [What is Reinforcement Learning?](https://aws.amazon.com/what-is/reinforcement-learning/)
32
A healthcare company is developing a machine learning model to analyze patient data. The model will be used to generate predictions for large batches of patient records at regular intervals, rather than needing immediate results for each individual record. The company needs a cost-effective inference solution to process these batches of data. **Which type of inference should the company use?** 1. Batch inference 2. Serverless inference 3. Real-time inference 4. Edge inference
**1.** Batch inference ## Footnote Batch inference is ideal for scenarios where predictions are generated for large amounts of data at specific intervals rather than in real time. It allows the company to process patient records in batches, making it a cost-effective solution for handling large datasets that do not require immediate predictions. * Serverless inference is incorrect because serverless inference focuses on scaling automatically without managing servers but does not specifically address batch processing. * Real-time inference is incorrect because real-time inference is designed for scenarios where immediate predictions are needed for individual records or requests, which is not the case here. * Edge inference is incorrect because edge inference is used for running predictions on devices located at the edge of the network, closer to where the data is generated, and is not suited for batch processing. **Reference:** [Deploy models for inference](https://docs.aws.amazon.com/sagemaker/latest/dg/deploy-model.html)
33
A company is building an AI model to classify customer feedback into positive or negative sentiments. The feedback data consists of written text without any predefined labels. **What type of data is the company working with?** 1. Labeled, tabular data 2. Unlabeled, text data 3. Time-series data 4. Structured, image data
**2.** Unlabeled, text data ## Footnote The company is working with unlabeled, text data because the customer feedback consists of written text that hasn’t been categorized or labeled as positive or negative. * Labeled, tabular data is incorrect because the text data is not labeled, and tabular data refers to data organized in rows and columns, which isn’t relevant in this scenario. * Time-series data is incorrect because time-series data refers to data points indexed in time order, which isn’t applicable to text-based customer feedback. * Structured, image data is incorrect because the data is text-based, not image-based, and is unstructured. **Reference:** [Data Labeling with a Human-in-the-Loop](https://docs.aws.amazon.com/sagemaker/latest/dg/data-label.html)
34
A retail company is developing an AI model to forecast future sales based on historical transaction records. The records consist of daily sales amounts over several years. **What type of data is the company working with?** 1. Labeled, text data 2. Unstructured, image data 3. Time-series data 4. Unlabeled, tabular data
**3.** Time-series data ## Footnote The company is working with time-series data because it consists of daily sales amounts recorded over time. Time-series data is used to analyze and forecast patterns based on historical trends indexed by time. * Labeled, text data is incorrect because the data is numerical and indexed by time, not labeled or text-based. * Unstructured, image data is incorrect because the data is structured and numerical, not image-based or unstructured. * Unlabeled, tabular data is incorrect because, although the data might be tabular, the key characteristic is that it's indexed over time, making it time-series data. **Reference:** [Data Labeling with a Human-in-the-Loop](https://docs.aws.amazon.com/sagemaker/latest/dg/data-label.html)
35
A company has successfully deployed a machine learning model to production. To ensure that the model continues performing well as data changes over time, the company needs a process to monitor the model's performance and update it when necessary. **Which MLOps practice should the company implement?** 1. Model re-training 2. Feature engineering 3. Data augmentation 4. Batch inference
**1.** Model re-training ## Footnote Model re-training is an MLOps practice where a model is periodically updated with new data to ensure it maintains high performance as data distributions change over time. This is essential for keeping the model production-ready and accurate. * Feature engineering is incorrect because feature engineering involves creating new input variables from raw data, but it doesn’t address the need to monitor and update models. * Data augmentation is incorrect because data augmentation refers to increasing the size of the training dataset by creating variations of the existing data, not monitoring or updating the model. * Batch inference is incorrect because batch inference refers to making predictions on large datasets, not about maintaining or updating models in production. **Reference:** [MLOps Checklist Components](https://docs.aws.amazon.com/prescriptive-guidance/latest/mlops-checklist/mlops-checklist-components.html)
36
A company is evaluating the performance of its binary classification model and wants to measure how well the model distinguishes between the positive and negative classes across various thresholds. **Which metric should the company use?** 1. Accuracy 2. F1 score 3. Area Under the ROC Curve (AUC) 4. Precision
**3.** Area Under the ROC Curve (AUC) ## Footnote AUC measures the ability of a binary classification model to distinguish between positive and negative classes at various threshold settings. It is a commonly used metric to evaluate model performance beyond simple accuracy. * Accuracy is incorrect because accuracy measures the proportion of correct predictions but does not account for the model's performance across different thresholds. * F1 score is incorrect because F1 score is the harmonic mean of precision and recall, focusing on balancing false positives and false negatives, but it doesn’t evaluate performance at different thresholds. * Precision is incorrect because precision measures the proportion of true positives among predicted positives but does not assess the model's overall performance across thresholds. **Reference:** [Types of ML Models](https://docs.aws.amazon.com/machine-learning/latest/dg/types-of-ml-models.html)
37
A company needs to securely access Amazon S3 from its machine learning application without exposing the data to the public internet. **Which AWS service should the company use?** 1. AWS Shield 2. Amazon GuardDuty 3. AWS PrivateLink 4. AWS WAF
**3.** AWS PrivateLink ## Footnote AWS PrivateLink allows secure access to AWS services like Amazon S3 over a private network, ensuring data doesn’t traverse the public internet. * AWS Shield is incorrect because AWS Shield is for DDoS protection, not private access to AWS services. * Amazon GuardDuty is incorrect because GuardDuty is for threat detection, not securing network traffic. * AWS WAF is incorrect because AWS WAF is a web application firewall, not for securing private connections to AWS services. **Reference:** [AWS PrivateLink](https://aws.amazon.com/privatelink/)
38
A company is building an AI application and wants to minimize the environmental impact during model training. The company is looking for an AI model that consumes less energy while maintaining high performance. **Which AWS service or approach should the company choose for sustainability?** 1. Amazon EC2 C series instances for cost-efficiency 2. Amazon EC2 G series instances for graphics processing 3. Amazon EC2 P series instances for large-scale ML training 4. Amazon EC2 Trn series instances for energy-efficient model training
**4.** Amazon EC2 Trn series instances for energy-efficient model training ## Footnote Amazon EC2 Trn series instances are designed for high-performance machine learning training with a focus on energy efficiency, making them ideal for reducing the environmental impact of model training. * Amazon EC2 C series instances for cost-efficiency is incorrect because C series instances are optimized for compute, not energy efficiency, which is the primary concern here. * Amazon EC2 G series instances for graphics processing is incorrect because G series instances are optimized for graphics-intensive workloads, not for energy-efficient AI training. * Amazon EC2 P series instances for large-scale ML training is incorrect because P series instances are used for large-scale ML training but do not prioritize energy efficiency for sustainability. **Reference:** [AWS Sustainability](https://aws.amazon.com/sustainability/)
39
A robotics company is training an AI model to improve the decision-making capabilities of its autonomous robots. The company wants to ensure that the model learns behaviors aligned with human preferences by receiving feedback from human evaluators during the training process. **Which technique should the company use?** 1. Supervised learning 2. Transfer learning 3. Reinforcement learning from human feedback (RLHF) 4. Unsupervised learning
**3.** Reinforcement learning from human feedback (RLHF) ## Footnote Reinforcement learning from human feedback (RLHF) allows the model to learn by incorporating human feedback into its reward system. This technique is ideal for aligning the AI model’s behavior with human preferences, as it uses evaluators to guide the model’s decisions. * Supervised learning is incorrect because supervised learning requires labeled data but does not involve feedback from human evaluators during the training process. * Transfer learning is incorrect because transfer learning involves reusing a pre-trained model on a different task, but it doesn’t involve human feedback during training. * Unsupervised learning is incorrect because unsupervised learning identifies patterns in data without any labeled examples or human feedback. **Reference:** [What is RLHF?](https://aws.amazon.com/what-is/reinforcement-learning-from-human-feedback/)
40
A research company needs to build an AI-powered recommendation system that can analyze relationships between different entities such as research papers, authors, and topics. The system must store and query complex relationships and make recommendations based on these connections. **Which AWS service should the company use to store and query this relationship data?** 1. Amazon RDS 2. Amazon Neptune 3. Amazon Redshift 4. Amazon DynamoDB
**2.** Amazon Neptune ## Footnote Amazon Neptune is a graph database service optimized for storing and querying relationships between entities. It is ideal for use cases that involve complex relationships, such as recommendation systems, social networks, or knowledge graphs. * Amazon RDS is incorrect because RDS is a relational database service and not designed for efficiently handling complex relationships and graph data. * Amazon Redshift is incorrect because Redshift is a data warehousing service optimized for analytics, not for storing and querying graph-based relationships. * Amazon DynamoDB is incorrect because DynamoDB is a NoSQL database optimized for key-value and document-based use cases, but it is not ideal for querying graph-based data. **Reference:** [What Is Amazon Neptune?](https://docs.aws.amazon.com/neptune/latest/userguide/intro.html)