Unlock Amazon AWS ML Specialty Success: MLS-C01 Mastery Awaits
A law firm handles thousands of contracts every day. Every contract must be signed. Currently, a lawyer manually checks all contracts for signatures.
The law firm is developing a machine learning (ML) solution to automate signature detection for each contract. The ML solution must also provide a confidence score for each contract page.
Which Amazon Textract API action can the law firm use to generate a confidence score for each page of each contract?
Correct : A
The AnalyzeDocument API action is the best option to generate a confidence score for each page of each contract. This API action analyzes an input document for relationships between detected items. The input document can be an image file in JPEG or PNG format, or a PDF file. The output is a JSON structure that contains the extracted data from the document. The FeatureTypes parameter specifies the types of analysis to perform on the document. The available feature types are TABLES, FORMS, and SIGNATURES. By setting the FeatureTypes parameter to SIGNATURES, the API action will detect and extract information about signatures from the document. The output will include a list of SignatureDetection objects, each containing information about a detected signature, such as its location and confidence score. The confidence score is a value between 0 and 100 that indicates the probability that the detected signature is correct. The output will also include a list of Block objects, each representing a document page. Each Block object will have a Page attribute that contains the page number and a Confidence attribute that contains the confidence score for the page. The confidence score for the page is the average of the confidence scores of the blocks that are detected on the page. The law firm can use the AnalyzeDocument API action to generate a confidence score for each page of each contract by using the SIGNATURES feature type and returning the confidence scores from the SignatureDetection and Block objects.
The other options are not suitable for generating a confidence score for each page of each contract. The Prediction API call is not an Amazon Textract API action, but a generic term for making inference requests to a machine learning model. The StartDocumentAnalysis API action is used to start an asynchronous job to analyze a document. The output is a job identifier (JobId) that is used to get the results of the analysis with the GetDocumentAnalysis API action. The GetDocumentAnalysis API action is used to get the results of a document analysis started by the StartDocumentAnalysis API action. The output is a JSON structure that contains the extracted data from the document. However, both the StartDocumentAnalysis and the GetDocumentAnalysis API actions do not support the SIGNATURES feature type, and therefore cannot detect signatures or provide confidence scores for them.
References:
* AnalyzeDocument
* SignatureDetection
* Block
* Amazon Textract launches the ability to detect signatures on any document
Start a Discussions
An ecommerce company has developed a XGBoost model in Amazon SageMaker to predict whether a customer will return a purchased item. The dataset is imbalanced. Only 5% of customers return items
A data scientist must find the hyperparameters to capture as many instances of returned items as possible. The company has a small budget for compute.
How should the data scientist meet these requirements MOST cost-effectively?
Correct : B
The best solution to meet the requirements is to tune the csv_weight hyperparameter and the scale_pos_weight hyperparameter by using automatic model tuning (AMT). Optimize on {''HyperParameterTuningJobObjective'': {''MetricName'': ''validation:f1'', ''Type'': ''Maximize''}}.
The csv_weight hyperparameter is used to specify the instance weights for the training data in CSV format. This can help handle imbalanced data by assigning higher weights to the minority class examples and lower weights to the majority class examples. The scale_pos_weight hyperparameter is used to control the balance of positive and negative weights. It is the ratio of the number of negative class examples to the number of positive class examples. Setting a higher value for this hyperparameter can increase the importance of the positive class and improve the recall. Both of these hyperparameters can help the XGBoost model capture as many instances of returned items as possible.
Automatic model tuning (AMT) is a feature of Amazon SageMaker that automates the process of finding the best hyperparameter values for a machine learning model. AMT uses Bayesian optimization to search the hyperparameter space and evaluate the model performance based on a predefined objective metric. The objective metric is the metric that AMT tries to optimize by adjusting the hyperparameter values. For imbalanced classification problems, accuracy is not a good objective metric, as it can be misleading and biased towards the majority class. A better objective metric is the F1 score, which is the harmonic mean of precision and recall. The F1 score can reflect the balance between precision and recall and is more suitable for imbalanced data. The F1 score ranges from 0 to 1, where 1 is the best possible value. Therefore, the type of the objective should be ''Maximize'' to achieve the highest F1 score.
By tuning the csv_weight and scale_pos_weight hyperparameters and optimizing on the F1 score, the data scientist can meet the requirements most cost-effectively. This solution requires tuning only two hyperparameters, which can reduce the computation time and cost compared to tuning all possible hyperparameters. This solution also uses the appropriate objective metric for imbalanced classification, which can improve the model performance and capture more instances of returned items.
References:
* XGBoost Hyperparameters
* Automatic Model Tuning
* How to Configure XGBoost for Imbalanced Classification
* Imbalanced Data
Start a Discussions
A tourism company uses a machine learning (ML) model to make recommendations to customers. The company uses an Amazon SageMaker environment and set hyperparameter tuning completion criteria to MaxNumberOfTrainingJobs.
An ML specialist wants to change the hyperparameter tuning completion criteri
a. The ML specialist wants to stop tuning immediately after an internal algorithm determines that tuning job is unlikely to improve more than 1% over the objective metric from the best training job.
Which completion criteria will meet this requirement?
Correct : C
In Amazon SageMaker, hyperparameter tuning jobs optimize model performance by adjusting hyperparameters. Amazon SageMaker's hyperparameter tuning supports completion criteria settings that enable efficient management of tuning resources. In this scenario, the ML specialist aims to set a completion criterion that will terminate the tuning job as soon as SageMaker detects that further improvements in the objective metric are unlikely to exceed 1%.
The CompleteOnConvergence setting is designed for such requirements. This criterion enables the tuning job to automatically stop when SageMaker determines that additional hyperparameter evaluations are unlikely to improve the objective metric beyond a certain threshold, allowing for efficient tuning completion. The convergence process relies on an internal optimization algorithm that continuously evaluates the objective metric during tuning and stops when performance stabilizes without further improvement.
This is supported by AWS documentation, which explains that CompleteOnConvergence is an efficient way to manage tuning by stopping unnecessary evaluations once the model performance stabilizes within the specified threshold.
Start a Discussions
A machine learning (ML) specialist uploads a dataset to an Amazon S3 bucket that is protected by server-side encryption with AWS KMS keys (SSE-KMS). The ML specialist needs to ensure that an Amazon SageMaker notebook instance can read the dataset that is in Amazon S3.
Which solution will meet these requirements?
Correct : C
When an Amazon SageMaker notebook instance needs to access encrypted data in Amazon S3, the ML specialist must ensure that both Amazon S3 access permissions and AWS Key Management Service (KMS) decryption permissions are properly configured. The dataset in this scenario is stored with server-side encryption using an AWS KMS key (SSE-KMS), so the following steps are necessary:
S3 Read Permissions: Attach an IAM role to the SageMaker notebook instance with permissions that allow the s3:GetObject action for the specific S3 bucket storing the data. This will allow the notebook instance to read data from Amazon S3.
KMS Key Policy Permissions: Grant permissions in the KMS key policy to the IAM role assigned to the SageMaker notebook instance. This allows SageMaker to use the KMS key to decrypt data in the S3 bucket.
These steps ensure the SageMaker notebook instance can access the encrypted data stored in S3. The AWS documentation emphasizes that to access SSE-KMS encrypted data, the SageMaker notebook requires appropriate permissions in both the S3 bucket policy and the KMS key policy, making Option C the correct and secure approach.
Start a Discussions
A data scientist is designing a repository that will contain many images of vehicles. The repository must scale automatically in size to store new images every day. The repository must support versioning of the images. The data scientist must implement a solution that maintains multiple immediately accessible copies of the data in different AWS Regions.
Which solution will meet these requirements?
Correct : A
For a repository containing a large and dynamically scaling collection of images, Amazon S3 is ideal due to its scalability and versioning capabilities. Amazon S3 natively supports automatic scaling to accommodate increasing storage needs and allows versioning, which enables tracking and managing different versions of objects.
To meet the requirement of maintaining multiple, immediately accessible copies of data across AWS Regions, S3 Cross-Region Replication (CRR) can be enabled. CRR automatically replicates new or updated objects to a specified destination bucket in another AWS Region, ensuring low-latency access and disaster recovery. By setting up CRR with versioning enabled, the data scientist can achieve a multi-Region, scalable, and version-controlled repository in Amazon S3.
Start a Discussions
Total 307 questions