What ML background is needed before studying for MLS-C01?
You should understand supervised and unsupervised learning concepts, common algorithms (regression, classification, clustering), model evaluation metrics, and overfitting/regularization before studying AWS-specific services. The exam tests both ML theory and AWS implementation. XGBoost is the most widely used built-in algorithm for tabular classification and regression tasks.
The AWS Certified Machine Learning - Specialty (MLS-C01) validates the ability to design, build, train, tune, and deploy machine learning models on AWS. It requires both understanding of ML concepts — supervised learning, deep learning, feature engineering, model evaluation — and the AWS services that implement them. Neither AWS knowledge nor ML knowledge alone is sufficient; you need both.
This guide covers all exam domains with emphasis on SageMaker architecture, model training, deployment patterns, and the supporting data engineering services.
Exam Overview
The MLS-C01 exam contains 65 questions (50 scored, 15 unscored) with a 180-minute time limit. The passing score is 750 out of 1000.
Domain Weights
| Domain | Weight |
|---|---|
| Domain 1: Data Engineering | 20% |
| Domain 2: Exploratory Data Analysis | 24% |
| Domain 3: Modeling | 36% |
| Domain 4: Machine Learning Implementation and Operations | 20% |
Domain 3 (Modeling) is by far the most heavily tested area. You must know ML algorithms, their assumptions, their hyperparameters, and when one is preferred over another.
Domain 1: Data Engineering (20%)
Data Ingestion and Storage
ML pipelines require data at scale. The exam tests which AWS service to use at each stage.
| Stage | AWS Service |
|---|---|
| Batch data ingestion | AWS Glue, AWS Data Pipeline, S3 batch operations |
| Streaming ingestion | Amazon Kinesis Data Streams, Kinesis Data Firehose |
| Feature storage | Amazon SageMaker Feature Store |
| Data lake | Amazon S3 + AWS Glue Data Catalog |
| Data warehouse | Amazon Redshift |
Kinesis Data Streams vs. Kinesis Data Firehose for ML:
Kinesis Data Streams enables custom consumers (Lambda, custom applications) to process each record with low latency. Use it when you need to invoke a SageMaker endpoint for real-time inference on each event.
Kinesis Data Firehose delivers data to destinations (S3, Redshift, OpenSearch) with optional transformation via Lambda. Use it when the goal is landing streaming data in S3 for batch training.
AWS Glue for Data Preparation
AWS Glue provides a serverless ETL service:
Glue Crawlers: Discover schema automatically from S3, RDS, and other sources; populate the Glue Data Catalog
Glue Jobs: PySpark or Python shell scripts for transformation; run on a managed Spark cluster
Glue DataBrew: Visual data preparation without code; profile data, detect anomalies, apply transformations
Data Catalog: Central metadata repository. Athena, Redshift Spectrum, and EMR can query data registered in the Data Catalog without moving it.
SageMaker Feature Store
Feature Store provides a centralized repository for ML features:
Online store: Low-latency retrieval for real-time inference (milliseconds)
Offline store: Historical feature values stored in S3 for training
Ensures consistency between training features (offline store) and serving features (online store), preventing training-serving skew.
Domain 2: Exploratory Data Analysis (24%)
Data Analysis Tools
Amazon Athena: Serverless SQL queries directly on S3 data. Use for exploring raw datasets before building pipelines. Pay per query (per TB scanned).
Amazon SageMaker Data Wrangler: Visual interface within SageMaker Studio for:
Importing data from S3, Athena, Redshift, Feature Store
Profiling data (distributions, missing values, correlations)
Applying 300+ built-in transformations
Generating a feature engineering pipeline exportable as code
Feature Engineering Concepts
The exam tests feature engineering heavily because it is the most impactful step in improving model quality.
Common transformations:
| Transformation | When to Apply |
|---|---|
| Normalization (min-max scaling) | When features have different ranges; required for distance-based algorithms (KNN, SVM) |
| Standardization (z-score) | When features need zero mean and unit variance; for gradient-based algorithms |
| Log transform | When a feature has a right-skewed distribution |
| One-hot encoding | For nominal categorical variables (no ordinal relationship) |
| Ordinal encoding | For ordinal categorical variables (e.g., small/medium/large) |
| Binning | Convert continuous values to discrete categories |
| Imputation | Fill missing values with mean, median, or a model-based estimate |
Handling imbalanced datasets:
Class imbalance (e.g., 1% fraud cases, 99% non-fraud) causes models to predict the majority class. Solutions:
Oversampling: Duplicate or synthesize minority class samples (SMOTE)
Undersampling: Remove majority class samples
Class weights: Assign higher weight to minority class during training
Evaluation metric: Use F1, AUC-ROC, or precision-recall curve rather than accuracy
Domain 3: Modeling (36%)
AWS Built-in Algorithms
SageMaker includes built-in algorithms optimized for distributed training. These are among the most tested topics.
| Algorithm | Type | Use Case |
|---|---|---|
| XGBoost | Supervised: classification, regression | Tabular data; frequently top performer |
| Linear Learner | Supervised: classification, regression | Linear relationships; fast training |
| K-Nearest Neighbors (KNN) | Supervised: classification, regression | Simple; expensive at inference time |
| Factorization Machines | Supervised: classification, regression | Sparse data, recommendation systems |
| DeepAR | Supervised: time series forecasting | Forecast multiple related time series |
| Object2Vec | Unsupervised / supervised | Embedding pairs (e.g., sentence similarity) |
| BlazingText | NLP: word embeddings, text classification | Fast word2vec, sentence classification |
| Seq2Seq | NLP: sequence to sequence | Translation, text summarization |
| LDA (Latent Dirichlet Allocation) | Unsupervised: topic modeling | Discover topics in documents |
| k-means | Unsupervised: clustering | Group similar items |
| PCA | Unsupervised: dimensionality reduction | Reduce feature count before training |
| IP Insights | Unsupervised: anomaly detection | Detect unusual IP address behavior |
| Random Cut Forest | Unsupervised: anomaly detection | Time series anomaly detection |
| Object Detection | Computer vision | Identify and locate objects in images |
| Image Classification | Computer vision | Classify images into categories |
| Semantic Segmentation | Computer vision | Pixel-level image classification |
Hyperparameter Tuning
SageMaker Automatic Model Tuning (AMT):
AMT searches the hyperparameter space to find the best combination:
Bayesian optimization: Uses probabilistic model of the objective function to select promising hyperparameter sets; efficient for expensive experiments
Grid search: Exhaustive search over defined parameter values; not practical for large spaces
Random search: Random sampling; simple but less efficient than Bayesian
Specify the objective metric, hyperparameter ranges, and maximum number of training jobs. AMT runs jobs in parallel (respecting concurrency limits) and focuses searches based on prior results.
Model Evaluation Metrics
Classification:
| Metric | Formula | When to Use |
|---|---|---|
| Accuracy | (TP+TN)/(TP+TN+FP+FN) | Balanced classes only |
| Precision | TP/(TP+FP) | When false positives are costly |
| Recall (Sensitivity) | TP/(TP+FN) | When false negatives are costly |
| F1 Score | 2*(Precision*Recall)/(Precision+Recall) | Balanced precision-recall trade-off |
| AUC-ROC | Area under ROC curve | Ranking quality; threshold-independent |
Regression:
MSE (Mean Squared Error): Penalizes large errors heavily
MAE (Mean Absolute Error): More robust to outliers
RMSE: Square root of MSE; same units as target variable
R-squared: Proportion of variance explained; 1.0 is perfect
Overfitting and Regularization
Indicators of overfitting: Low training loss, high validation loss (large gap between them).
Remedies:
Reduce model complexity (fewer layers, fewer trees)
L1 regularization (Lasso): Shrinks some coefficients to zero; feature selection
L2 regularization (Ridge): Shrinks all coefficients; reduces magnitude
Dropout: Randomly disable neurons during training (neural networks)
Early stopping: Stop training when validation loss stops improving
More training data or data augmentation
Domain 4: ML Implementation and Operations (20%)
SageMaker Training and Deployment
Training job configuration:
estimator = sagemaker.estimator.Estimator(
image_uri=container_uri,
role=role,
instance_count=2,
instance_type='ml.p3.8xlarge',
volume_size=50,
max_run=3600,
output_path=s3_output_path
)
Distributed training strategies:
Data parallelism: Split training data across instances; each instance has a copy of the model; gradients are aggregated. Use with SageMaker Distributed Data Parallel (SMDP) library
Model parallelism: Split a model too large to fit on one GPU across multiple devices. Use with SageMaker Distributed Model Parallel library
SageMaker Inference Options
| Option | Latency | Use Case |
|---|---|---|
| Real-time endpoint | Milliseconds | Interactive applications, low-latency requirements |
| Serverless inference | Variable (cold start) | Infrequent, variable traffic |
| Batch transform | Minutes to hours | Offline batch scoring |
| Asynchronous inference | Seconds to minutes | Large payloads, long inference time |
Multi-model endpoint: Host multiple models on a single endpoint. SageMaker loads models into memory on demand and caches them. Reduces endpoint costs when hosting many low-traffic models.
Inference pipelines: Chain preprocessing, ML model, and post-processing containers into a single endpoint invocation. Ensures the same transformations are applied at inference time as during training.
SageMaker Pipelines
SageMaker Pipelines provides a CI/CD-like workflow for ML:
| Step Type | Purpose |
|---|---|
| Processing step | Feature engineering, data validation |
| Training step | Model training |
| Evaluation step | Compute model metrics |
| Condition step | Branch based on metric thresholds |
| Register step | Register model in Model Registry if metrics pass |
| Transform step | Batch inference |
Model Registry: Central catalog of trained models with version tracking and approval workflow. Approved models can be deployed to endpoints automatically via pipeline.
MLOps: Monitoring and Drift Detection
SageMaker Model Monitor:
Data quality monitoring: Compare feature distributions in production against the training baseline
Model quality monitoring: Compare predictions against ground truth labels (requires actuals to be captured)
Bias drift monitoring: Detect changes in model fairness metrics over time
Feature attribution drift: Track changes in which features contribute most to predictions (using SHAP values)
Model Monitor generates violations when drift exceeds configured thresholds. Violations trigger CloudWatch alarms.
SageMaker Clarify:
Clarify provides:
Bias detection: Pre-training (data bias) and post-training (model bias) analysis
Explainability: SHAP values for global and per-prediction feature importance
"Feature engineering remains the highest-leverage activity in applied machine learning. A well-engineered feature set with a simple model almost always outperforms a poorly engineered feature set with a complex one. The MLS-C01 exam reflects this reality — it tests feature engineering concepts more heavily than algorithm tuning." — Aurélien Géron, author of Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow (O'Reilly, 3rd edition, 2022)
Study Timeline
Recommended: 10-14 weeks. Requires basic ML knowledge and Python familiarity.
| Week | Focus |
|---|---|
| 1-2 | Data engineering: Glue, Kinesis, Feature Store, S3 data lake |
| 3-4 | EDA: Data Wrangler, feature engineering techniques, imbalanced data |
| 5-6 | SageMaker built-in algorithms: supervised, unsupervised, NLP, CV |
| 7-8 | Training configuration, distributed training, hyperparameter tuning |
| 9-10 | Model evaluation metrics, overfitting, regularization |
| 11-12 | Inference options, SageMaker Pipelines, Model Registry |
| 13-14 | MLOps, Model Monitor, Clarify, practice exams |
See also: AWS Solutions Architect Associate (SAA-C03) Study Guide: Domains, Services, and Scenarios
MLS-C01 career trajectory and compensation
Machine learning roles command premium compensation relative to adjacent engineering roles. The AWS ML Specialty is priced at $300 with a 180-minute duration and 65 questions.
| Role | Seniority | US salary range (2024-2025) [1] | MLS-C01 impact |
|---|---|---|---|
| ML Engineer | Mid | $130,000-$180,000 | $10,000-$20,000 uplift |
| Senior ML Engineer | Senior | $170,000-$240,000 | $15,000-$30,000 uplift |
| ML Research Engineer | Senior | $180,000-$270,000 | Complementary signal |
| Applied Scientist | Senior | $185,000-$275,000 | Complementary signal |
| FAANG L5 ML Engineer | Senior | $275,000-$450,000 TC | Baseline credential |
| ML Solutions Architect | Senior | $175,000-$240,000 | Near-required credential |
| Principal ML Engineer | Staff | $230,000-$340,000 | Baseline expectation |
Adjacent certifications
| Certification | Current exam code | Fee | Overlap with MLS-C01 |
|---|---|---|---|
| AWS SAA-C03 | SAA-C03 | $150 | Prerequisite-level AWS knowledge |
| AWS DAS-C01 | DAS-C01 | $300 | Data engineering overlap |
| Google Cloud Professional ML Engineer | PMLE | $200 | Vendor-alternative ML credential |
| Azure AI Engineer Associate | AI-102 | $165 | Vendor-alternative ML credential |
The MLS-C01 is the most rigorous of the three major cloud vendor ML certifications, focusing heavily on SageMaker depth and MLOps patterns rather than general ML theory.
"Machine learning on AWS in 2024-2025 is dominated by operational concerns rather than model selection. SageMaker Pipelines, Model Monitor, feature store hygiene, and multi-region inference scaling are the topics that distinguish production-ready ML engineers from candidates who merely know how to train a model in a notebook. The MLS-C01 reflects this evolution." - Aurélien Géron, author of Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, O'Reilly 3rd edition, 2022 [2].
SageMaker built-in algorithm selection
The exam tests the candidate's ability to match ML use cases to specific SageMaker built-in algorithms.
| Use case | SageMaker built-in | Rationale |
|---|---|---|
| Binary/multi-class classification | XGBoost | Gradient boosting; strong tabular performance |
| Linear regression with regularization | Linear Learner | SGD-based; supports both regression and classification |
| Large-scale linear models | Factorization Machines | Handles sparse high-dimensional data |
| Clustering | K-Means | Classic unsupervised clustering |
| Anomaly detection | Random Cut Forest | Tree-based; robust to outliers |
| Recommendation | Neural Collaborative Filtering or FM | Personalization use cases |
| Image classification | Image Classification (MXNet) | Pre-configured CNN |
| Object detection | Object Detection (SSD) | Bounding box output |
| Semantic segmentation | Semantic Segmentation | Pixel-level labeling |
| Text classification | BlazingText | Fast word embeddings + classification |
| Machine translation / summarization | Sequence to Sequence | Encoder-decoder |
| Topic modeling | LDA or NTM | Unsupervised topic discovery |
| Time-series forecasting | DeepAR | RNN-based; probabilistic forecasts |
For questions that describe the use case without naming the algorithm, candidates must recognize the pattern and select the canonical SageMaker built-in. Questions asking about custom algorithms typically indicate that Bring Your Own Container (BYOC) is the correct answer.
References
[1] Robert Half. (2024). 2024 Technology Salary Guide. https://www.roberthalf.com/us/en/insights/salary-guide/technology
[2] Géron, Aurélien. "Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow." O'Reilly Media, 3rd edition, 2022.
AWS. "AWS Certified Machine Learning - Specialty Exam Guide (MLS-C01)." https://d1.awsstatic.com/training-and-certification/docs-ml/AWS-Certified-Machine-Learning-Specialty_Exam-Guide.pdf
AWS. "Amazon SageMaker Developer Guide." https://docs.aws.amazon.com/sagemaker/latest/dg/whatis.html
AWS. "Use Amazon SageMaker Built-in Algorithms." https://docs.aws.amazon.com/sagemaker/latest/dg/algos.html
AWS. "Amazon SageMaker Pipelines." https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines.html
Faye Ellis. "AWS Certified Machine Learning Specialty MLS-C01." Udemy, 2023.
AWS. "Amazon SageMaker Model Monitor." https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor.html
AWS. "AWS Machine Learning Blog." https://aws.amazon.com/blogs/machine-learning/
Payscale. (2024). AWS Certified Machine Learning Specialty Salary Data. https://www.payscale.com/research/US/Certification=AWS_Certified_Machine_Learning_-_Specialty
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press. ISBN: 978-0262035613. Foundational reference for deep learning theory tested conceptually on MLS-C01, particularly the regularization, optimization, and architecture-selection questions.
Frequently Asked Questions
What ML background is needed before studying for MLS-C01?
You should understand supervised and unsupervised learning concepts, common algorithms (regression, classification, clustering), model evaluation metrics, and overfitting/regularization before studying AWS-specific services. The exam tests both ML theory and AWS implementation.
Which SageMaker algorithm is best for tabular data classification tasks?
XGBoost is the most widely used built-in algorithm for tabular classification and regression tasks. It is frequently the top performer on structured data and supports distributed training on multiple instances.
What is the difference between data parallelism and model parallelism in SageMaker?
Data parallelism splits the training dataset across multiple instances, each holding a full copy of the model, then aggregates gradients. Model parallelism splits the model itself across devices when it is too large to fit in a single GPU's memory.
What is training-serving skew and how does SageMaker Feature Store prevent it?
Training-serving skew occurs when features used during training differ from features used at inference time. Feature Store prevents this by providing the same feature definitions to both the offline store (training) and the online store (real-time inference).
What does SageMaker Model Monitor detect?
Model Monitor detects data quality drift (distribution changes in incoming features), model quality drift (degradation in prediction accuracy), bias drift (changes in fairness metrics), and feature attribution drift (changes in which features most influence predictions).
