A business is developing a machine learning model to analyze large archived datasets. These datasets are several gigabytes in size, and the company does not require immediate access to the predictions.
What Amazon SageMaker inference option should the company choose?
1. Batch transform
Batch transform is the ideal solution when you need to perform inference on large datasets without needing real-time results. It allows for processing data in bulk and is optimized for situations where immediate model predictions aren’t required, making it suitable for analyzing multiple gigabytes of archived data.
Reference:
Deploy models for inference
A company is developing a machine learning model using Amazon SageMaker and needs a solution to store and share feature sets across different teams for collaborative model building.
Which Amazon SageMaker feature should the company use?
1. Amazon SageMaker Feature Store
SageMaker Feature Store is designed to allow teams to store, manage, and share features (attributes or variables) in a central repository. This ensures consistency across models and helps teams collaborate by reusing the same features across multiple projects.
Reference:
Create, store, and share features with Feature Store
A business uses Amazon SageMaker to run its machine learning pipeline in a production environment. The company processes large datasets, sometimes reaching 1 GB in size, with processing times that can take up to an hour. To support its operations, the company requires low-latency predictions.
Which Amazon SageMaker inference option should the company choose?
1. Real-time inference
Real-time inference is designed for scenarios where low-latency responses are needed. It is ideal when predictions need to be generated immediately upon receiving input data, making it suitable for use cases requiring near real-time results, even with large datasets.
Reference:
Deploy models for inference
A company is utilizing machine learning models for specialized tasks in a specific domain. To save time and resources, the company prefers to modify existing pre-trained models instead of building new ones from scratch.
Which machine learning approach should the company use?
2. Apply transfer learning.
Transfer learning allows a company to leverage pre-trained models and adapt them to new, related tasks. Instead of training a model from scratch, the pre-trained model’s knowledge is fine-tuned for the new task, significantly reducing the training time and required data.
Reference:
Maximize business outcomes with machine learning on AWS
A company has developed a chatbot that responds to user queries with images. The company needs to ensure that the chatbot avoids displaying inappropriate or offensive images.
Which approach should the company take to achieve this?
1. Use content moderation tools to filter image responses.
Content moderation tools can scan and block images that are inappropriate, ensuring that only safe and relevant images are returned by the chatbot.
Reference:
Amazon Rekognition Content Moderation
A company is developing a machine learning model. The company has gathered new data and is analyzing it by generating correlation matrices, calculating statistics, and visualizing patterns in the dataset.
What stage of the machine learning pipeline is the company in?
3. Exploratory data analysis
The company is focused on understanding the data by visualizing relationships and calculating statistical measures. These activities are key components of exploratory data analysis (EDA), which helps identify patterns and guide further steps in the pipeline.
Reference:
Maximize business outcomes with machine learning on AWS
A company is using Amazon SageMaker Studio notebooks to build and train machine learning models. The data is stored in an Amazon S3 bucket, and the company needs to manage the data flow between Amazon S3 and SageMaker Studio notebooks.
Which solution will meet this requirement?
3. Configure SageMaker to use a VPC with an S3 VPC endpoint.
Setting up a VPC with an S3 VPC endpoint allows secure and efficient access to data stored in Amazon S3 without using the public internet, ensuring smooth data flow between SageMaker and S3.
Reference:
Amazon SageMaker Documentation
A logistics company has thousands of warehouse images and wants to automatically identify and classify different types of items stored in the images without manual effort.
Which strategy will help the company achieve this?
2. Object detection
Object detection is a computer vision technique used to identify and classify multiple objects within an image. In this case, it can automatically identify and categorize different items stored in the warehouse.
Reference:
How Object Detection Works
A healthtech startup has created a machine learning model that analyzes X-ray images to detect potential signs of illness. The company wants to deploy the model to production so that doctors can upload X-rays via a web application and receive predictions in real-time. The company prefers a solution that does not require managing underlying infrastructure.
Which solution should the company use?
1. Use Amazon SageMaker Serverless Inference to deploy the model.
SageMaker Serverless Inference provides a fully managed, serverless environment for hosting and serving machine learning models. It allows the company to focus on deploying the model without managing any underlying infrastructure, which meets the company’s needs.
Reference:
Deploy models with Amazon SageMaker Serverless Inference
A retail company has collected terabytes of customer purchase data but the data is not labeled. The company wants to segment its customers into groups for a targeted marketing campaign based on their purchasing patterns.
Which machine learning approach should the company use to achieve this?
2. Unsupervised learning
Unsupervised learning is ideal for this task because it works with unlabeled data and can identify patterns in the data to group customers based on their purchasing behaviors. This method will allow the company to classify its customers into segments for targeted marketing.
Reference:
What’s the Difference Between Supervised and Unsupervised Learning?
A healthcare organization is handling a large number of patient records in PDF format. As the volume of records continues to grow, the organization needs an automated system to convert these PDF documents into plain text format for integration into their electronic health record (EHR) system.
Which AWS service meets this requirement?
3. Amazon Textract
Amazon Textract is the ideal solution because it can automatically extract text, tables, and forms from PDFs and other scanned documents. This allows the healthcare organization to convert patient records into plain text, making it easier to integrate the data into their EHR system.
Reference:
Amazon Textract
A retail company wants to predict customer demand for seasonal products. The company lacks coding experience and knowledge of machine learning algorithms but needs to build a predictive model using internal sales data and external market data.
Which solution will meet these requirements?
3. Import the data into Amazon SageMaker Canvas. Build ML models and predict demand by selecting values in the data from SageMaker Canvas.
SageMaker Canvas allows users with no coding or machine learning experience to create models by simply interacting with the data via a point-and-click interface. This makes it the best choice for the retail company to generate demand forecasts without technical expertise.
Reference:
Amazon SageMaker Canvas
A food processing company has built an AI model to classify different types of fruits based on images. The company wants to evaluate how many images the model has correctly classified into the right fruit categories.
Which evaluation metric should the company use to measure the model’s performance?
2. Accuracy
Accuracy is the appropriate metric because it measures the proportion of images that the model classified correctly out of the total number of images. This is the most straightforward metric for evaluating how well a classification model is performing.
Reference:
Maximize business outcomes with machine learning on AWS
A biotechnology company needs to classify human genes into 20 categories based on various gene characteristics. The company also requires a machine learning algorithm that can clearly document how the inner workings of the model influence its decisions and outputs.
Which machine learning algorithm should the company use?
3. Decision trees
Decision trees are well-suited for this task because they provide transparency in their decision-making process. The structure of a decision tree allows the company to trace how the input characteristics lead to specific classifications, making it easy to document the inner mechanism of the model and its impact on the output.
Reference:
Maximize business outcomes with machine learning on AWS
A company is developing an educational app where users solve basic math problems such as: “A bag contains 8 blue balls, 5 red balls, and 2 yellow balls. What is the probability of picking a red ball?” The company needs a solution that minimizes operational overhead.
Which solution will meet these requirements with the least operational complexity?
3. Use a simple algorithm that calculates probability using basic rules and formulas.
Using a simple algorithm is the best solution because probability problems can be solved using basic math formulas, without the need for complex machine learning models. This approach requires minimal operational overhead and ensures accurate results through straightforward computations.
Reference:
Maximize Business Outcomes with Machine Learning on AWS
An AI researcher has developed a deep learning model to identify different types of textures in images. The researcher now wants to assess how well the model performs in classifying these textures.
Which metric will help the researcher evaluate the model’s performance?
1. Confusion matrix
A confusion matrix is a tool used to evaluate the performance of a classification model by showing the true positives, true negatives, false positives, and false negatives. This helps the AI researcher understand how well the model is classifying the textures in the images and where it may be making mistakes.
Reference:
Viewing the confusion matrix for a model
A retail company has terabytes of data stored in its database, which can be used for business analysis. The company wants to develop an AI application that can generate SQL queries from simple text inputs provided by employees with minimal technical experience.
Which solution meets these requirements?
1. Generative pre-trained transformers (GPT)
Generative pre-trained transformers (GPT) are ideal for this task because they are designed for natural language processing (NLP) tasks, such as converting human language into structured queries like SQL. GPT models can understand and interpret employee text inputs and generate the appropriate SQL queries, even for users with minimal technical skills.
A healthcare company is developing an application that needs to generate synthetic medical data based on patterns observed in existing patient datasets.
Which type of model should the company use to meet this requirement?
1. Generative adversarial network (GAN)
A Generative adversarial network (GAN) is ideal for generating synthetic data that mimics real data by learning the patterns from existing datasets. GANs are commonly used for creating synthetic images, text, and other types of data, making them well-suited for this task.
Reference:
What is a GAN?
A research company has historical transcripts of interviews, but some portions of the text are missing due to errors in data collection. The company needs to build a machine learning model that can predict and fill in the missing words based on the surrounding context.
Which type of model meets this requirement?
3. BERT-based models
BERT-based models are designed for natural language processing tasks, including predicting missing words in a sentence. BERT’s ability to understand context by looking at the words before and after the missing sections makes it highly effective for tasks such as filling in gaps in transcripts.
Reference:
Fine-tune and host Hugging Face BERT models on Amazon SageMaker
A video game development company is building an AI system that can generate responses based on player interactions. The AI needs to represent in-game objects and player actions numerically to understand their relationships and meanings.
Which term describes these numerical representations that AI models use to enhance their understanding of in-game interactions?
1. Embeddings
Embeddings are the numerical representations that AI systems use to capture the relationships between different objects, actions, or concepts. In this case, embeddings can help the AI understand the interactions between players and in-game objects by mapping them to a numerical space.
A telecommunications company is developing a customer support chatbot. The company wants the chatbot to learn from past interactions and online knowledge bases to continuously improve its responses over time.
Which AI learning strategy provides this self-improvement capability?
4. Reinforcement learning with rewards based on customer satisfaction and feedback.
Reinforcement learning enables the chatbot to improve by learning from its interactions. The chatbot receives feedback in the form of rewards or penalties based on customer satisfaction, which helps it adjust its responses over time to become more effective.
Reference:
What is Reinforcement Learning?
A company is developing a new model to predict the prices of specific items. The model performed well during training, but its performance dropped significantly once deployed to production.
What should the company do to mitigate this problem?
3. Increase the diversity of the training data to better match real-world conditions.
The issue likely arises because the training data did not fully represent the real-world data the model encounters in production. Increasing the diversity of the training data will help the model generalize better and improve its performance in production.
Reference:
Analytics and AI/ML Solutions
A healthcare company is building an application to analyze patient-doctor conversations. The company wants to extract key medical details from the audio recordings of consultations for further analysis.
Which solution meets these requirements?
2. Transcribe medical consultations using Amazon Transcribe.
Amazon Transcribe is designed to convert audio recordings into text. In this case, it allows the company to transcribe patient-doctor conversations into text, enabling further analysis of the medical details discussed during consultations.
Reference:
Amazon Transcribe
A legal firm wants to build an AI-powered search tool to help employees quickly find relevant information from large sets of legal documents. The tool must provide accurate answers to employee queries by understanding the context of the search terms and extracting relevant information from the documents.
Which AWS service meets these requirements?
2. Amazon Kendra
Amazon Kendra is an AI-powered search service designed to provide accurate, contextual search results across large sets of documents. It helps users find relevant information quickly by understanding the intent behind search queries and returning precise answers from complex data sources.
Reference:
Amazon Kendra