If you're preparing for the AWS Certified AI Practitioner (AIF-C01) exam, focus on these major domains.
Table of Contents
- Fundamentals of AI and ML
- Generative AI Concepts
- Foundation Models
- Prompt Engineering
- Responsible AI
- AWS AI Services
- AWS Generative AI Services
- Security and Compliance
- AI Use Cases
- Exam Tips
1. Fundamentals of AI and ML
Artificial Intelligence (AI)
Machines performing tasks that normally require human intelligence.
Machine Learning (ML)
Subset of AI where systems learn patterns from data.
Deep Learning
Uses neural networks with multiple layers.
Generative AI
Creates new content such as:
- Text
- Images
- Audio
- Video
- Code
Training vs Inference
| Term | Meaning |
|---|---|
| Training | Model learns from data |
| Inference | Model makes predictions using learned knowledge |
2. Generative AI Concepts
Large Language Model (LLM)
Examples:
- OpenAI GPT models
- Anthropic Claude
- Meta Llama
Tokens
Text is broken into small units called tokens.
Example:
I love AWS
May become:
[I] [love] [AWS]
Hallucination
Model generates incorrect information while sounding confident.
Context Window
Amount of information an LLM can consider at once.
3. Foundation Models
Foundation Model (FM)
Large pretrained model that can be adapted for many tasks.
Examples:
- Text generation
- Summarization
- Classification
- Translation
- Chatbots
Multi Modality
Fine Tuning
Retraining model with domain-specific data.
Retrieval Augmented Generation (RAG)
Instead of retraining:
- Retrieve documents
- Send documents to LLM
- Generate response
Benefits:
- Lower cost
- More current data
- Reduced hallucinations
RAG Pipeline: Model Types Used at Each Step
| RAG Step | Purpose | Model Type | Example Models |
|---|---|---|---|
| 1. Document Ingestion | Read PDFs, DOCX, HTML, Images | OCR / Document AI | Tesseract, LayoutLM, Donut |
| 2. Chunking | Split documents into passages | Rule-based / NLP | Sentence Splitter, Recursive Text Splitter |
| 3. Text → Embeddings | Convert chunks into vectors | Embedding Model (Encoder-only Transformer) | BERT, Sentence-BERT, E5, BGE |
| 4. Vector Storage | Store embeddings | Vector Database | FAISS, Milvus, Weaviate, Pinecone |
| 5. Query → Embedding | Convert user query to vector | Same Embedding Model | BGE, E5, SBERT |
| 6. Retrieval | Find nearest chunks | ANN Search Algorithm | HNSW, IVF, Flat Search |
| 7. Re-ranking (Optional) | Improve retrieved results | Cross Encoder | MonoBERT, Cohere Rerank, BGE Reranker |
| 8. Context Construction | Build prompt with retrieved chunks | Prompt Builder | Template Engine |
| 9. Answer Generation | Generate final answer | Decoder-only LLM | GPT-4o, Claude Sonnet, Llama 3, Mistral |
| 10. Citation Generation (Optional) | Show sources | LLM / Metadata Layer | GPT-4o, Claude |
Transformer Architecture Used at Each Step
| Step | Transformer Type |
|---|---|
| Embedding Generation | Encoder-only |
| Re-ranking | Encoder-only (Cross Encoder) |
| Answer Generation | Decoder-only |
| Translation (optional) | Encoder-Decoder |
| Summarization (optional) | Encoder-Decoder |
| OCR Understanding | Encoder or Encoder-Decoder |
| Multimodal RAG | Vision Encoder + LLM Decoder |
Common Models by Transformer Family
| Transformer Family | Example Models | Used For |
|---|---|---|
| Encoder-only | BERT, RoBERTa, SBERT, E5, BGE | Embeddings, Retrieval |
| Decoder-only | GPT, Llama, Claude, Mistral, Qwen | Generation |
| Encoder-Decoder | T5, FLAN-T5, BART | Summarization, Translation |
| Vision Encoder | ViT, CLIP Vision Encoder | Image Embeddings |
| Vision-Language | LLaVA, Qwen-VL, GPT-4o | Multimodal RAG |
Typical Modern RAG Stack
| Layer | Common Choice |
|---|---|
| Chunking | LangChain Recursive Splitter |
| Embeddings | BGE-large, E5-large |
| Vector DB | FAISS, Milvus |
| Retrieval | HNSW |
| Re-ranker | BGE-Reranker |
| Generator | GPT-4o, Claude, Llama 3 |
| Orchestration | LangChain, LlamaIndex |
Mental Model
Documents
↓
Chunking
↓
Encoder Model
(BERT / E5 / BGE)
↓
Embeddings
↓
Vector DB
↓
End
User Query
↓
Encoder Model
(BERT / E5 / BGE)
↓
Similarity Search
↓
Top K Chunks
↓
Cross Encoder Re-ranker
(Optional)
↓
Prompt Construction
↓
Decoder LLM
(GPT / Claude / Llama)
↓
Final Answer
| Task | Model Type |
|---|---|
| Create Embeddings | Encoder-only |
| Retrieve Documents | Vector Search |
| Re-rank Results | Cross Encoder |
| Generate Answer | Decoder-only LLM |
| Summarize Documents | Encoder-Decoder |
| Multimodal Retrieval | CLIP / Vision Encoder |
| Multimodal Generation | GPT-4o / Gemini / Qwen-VL |
4. Prompt Engineering
Zero-Shot Prompting
Translate this sentence to French.
One-Shot Prompting
Provide one example.
Few-Shot Prompting
Provide multiple examples.
Chain of Thought
Ask model to reason step-by-step.
Prompt Components
- Role
- Context
- Instructions
- Examples
- Constraints
5. Responsible AI
Fairness
Avoid bias.
Explainability
Understand why model produced output.
Privacy
Protect user data.
Robustness
Model behaves reliably.
Transparency
Users know AI is involved.
Evaluation
Here’s the table reorganized the way you asked:
Model Group
Evaluation
Description of the Evaluation
Agentic Models
AgentBench
Evaluates autonomous task execution, planning, tool usage, and multi-step reasoning.
Agentic Models
GAIA
Tests real-world assistant capabilities like searching, tool calling, and reasoning.
Agentic Models
SWE-bench
Measures ability to solve real GitHub issues by editing codebases.
Bi-Encoder
BEIR
Evaluates embedding-based retrieval across multiple datasets and domains.
Bi-Encoder
MTEB
Measures embedding quality across retrieval, clustering, classification, and reranking tasks.
Bi-Encoder
MS MARCO
Evaluates dense retrieval and passage ranking performance.
Cross-Encoder
BEIR
Measures pairwise query-document relevance scoring quality.
Cross-Encoder
MS MARCO
Evaluates reranking precision for query-passage relevance.
Cross-Encoder
TREC Deep Learning Track
Measures ranking quality for search relevance tasks.
Decoder-only
GSM8K
Evaluates arithmetic and multi-step mathematical reasoning.
Decoder-only
HellaSwag
Measures commonsense reasoning and next-sentence prediction.
Decoder-only
HumanEval
Evaluates code generation correctness using executable unit tests.
Decoder-only
MMLU
Measures broad knowledge and reasoning across many academic domains.
Decoder-only
MT-Bench
Tests instruction following and conversational quality.
Decoder-only
Needle-in-a-Haystack
Measures ability to retrieve specific information from long contexts.
Decoder-only
TruthfulQA
Tests factual consistency and resistance to hallucinations.
Encoder-decoder
BLEU
Measures overlap between generated text and reference text, mainly for translation.
Encoder-decoder
ROUGE
Evaluates summarization quality based on n-gram overlap.
Encoder-decoder
SQuAD
Measures extractive question-answering accuracy.
Encoder-only
GLUE
Evaluates language understanding tasks like sentiment, entailment, and similarity.
Encoder-only
MTEB
Measures embedding performance across multiple NLP tasks.
Encoder-only
STS-B
Measures how well embeddings capture sentence similarity.
Encoder-only
SuperGLUE
Harder version of GLUE for advanced reasoning tasks.
Long-context Models
InfiniteBench
Evaluates memory retention and reasoning over very long contexts.
Long-context Models
LongBench
Tests summarization, retrieval, and reasoning on long documents.
Long-context Models
Needle-in-a-Haystack
Measures retrieval accuracy from large contexts.
Multimodal Models
MMBench
Evaluates image understanding and multimodal reasoning.
Multimodal Models
MMMU
Measures multimodal reasoning across academic and professional domains.
Multimodal Models
MMVet
Tests advanced visual reasoning and perception.
RAG Systems
CRUD-RAG
Measures retrieval robustness and update handling in RAG pipelines.
RAG Systems
RAGAS
Evaluates faithfulness, context precision, context recall, and answer relevance in RAG.
Reward Models
RewardBench
Evaluates preference model quality and alignment performance.
Tool-use Models
ToolBench
Measures correctness in tool selection, API usage, and tool chaining.
Here’s the table reorganized the way you asked:
Model Group Evaluation Description of the Evaluation
Agentic Models AgentBench Evaluates autonomous task execution, planning, tool usage, and multi-step reasoning.
Agentic Models GAIA Tests real-world assistant capabilities like searching, tool calling, and reasoning.
Agentic Models SWE-bench Measures ability to solve real GitHub issues by editing codebases.
Bi-Encoder BEIR Evaluates embedding-based retrieval across multiple datasets and domains.
Bi-Encoder MTEB Measures embedding quality across retrieval, clustering, classification, and reranking tasks.
Bi-Encoder MS MARCO Evaluates dense retrieval and passage ranking performance.
Cross-Encoder BEIR Measures pairwise query-document relevance scoring quality.
Cross-Encoder MS MARCO Evaluates reranking precision for query-passage relevance.
Cross-Encoder TREC Deep Learning Track Measures ranking quality for search relevance tasks.
Decoder-only GSM8K Evaluates arithmetic and multi-step mathematical reasoning.
Decoder-only HellaSwag Measures commonsense reasoning and next-sentence prediction.
Decoder-only HumanEval Evaluates code generation correctness using executable unit tests.
Decoder-only MMLU Measures broad knowledge and reasoning across many academic domains.
Decoder-only MT-Bench Tests instruction following and conversational quality.
Decoder-only Needle-in-a-Haystack Measures ability to retrieve specific information from long contexts.
Decoder-only TruthfulQA Tests factual consistency and resistance to hallucinations.
Encoder-decoder BLEU Measures overlap between generated text and reference text, mainly for translation.
Encoder-decoder ROUGE Evaluates summarization quality based on n-gram overlap.
Encoder-decoder SQuAD Measures extractive question-answering accuracy.
Encoder-only GLUE Evaluates language understanding tasks like sentiment, entailment, and similarity.
Encoder-only MTEB Measures embedding performance across multiple NLP tasks.
Encoder-only STS-B Measures how well embeddings capture sentence similarity.
Encoder-only SuperGLUE Harder version of GLUE for advanced reasoning tasks.
Long-context Models InfiniteBench Evaluates memory retention and reasoning over very long contexts.
Long-context Models LongBench Tests summarization, retrieval, and reasoning on long documents.
Long-context Models Needle-in-a-Haystack Measures retrieval accuracy from large contexts.
Multimodal Models MMBench Evaluates image understanding and multimodal reasoning.
Multimodal Models MMMU Measures multimodal reasoning across academic and professional domains.
Multimodal Models MMVet Tests advanced visual reasoning and perception.
RAG Systems CRUD-RAG Measures retrieval robustness and update handling in RAG pipelines.
RAG Systems RAGAS Evaluates faithfulness, context precision, context recall, and answer relevance in RAG.
Reward Models RewardBench Evaluates preference model quality and alignment performance.
Tool-use Models ToolBench Measures correctness in tool selection, API usage, and tool chaining.
6. AWS AI Services
Amazon Rekognition
- Image analysis
- Face detection
- Object detection
Amazon Comprehend
- Sentiment analysis
- Entity extraction
- Language detection
Amazon Transcribe
- Speech to text
Amazon Polly
- Text to speech
Amazon Textract
- Extract text from documents
Amazon Translate
- Language translation
7. AWS Generative AI Services
Amazon Bedrock
Most important service for the exam.
Provides access to foundation models from:
- Anthropic Claude
- Meta Llama
- Amazon Nova
- Stability AI
Features:
- RAG
- Agents
- Knowledge Bases
- Guardrails
- Fine-tuning
Amazon Q Business
Enterprise chatbot over company data.
Amazon Q Developer
Developer coding assistant.
Amazon SageMaker AI
Build, train and deploy ML models.
8. Security and Compliance
Shared Responsibility Model
AWS secures:
- Infrastructure
- Hardware
- Network
Customer secures:
- Data
- Access control
- Configuration
IAM
Identity and access management.
Encryption
Data:
- At rest
- In transit
9. Common Use Cases
Classification
Spam or Not Spam
Sentiment Analysis
Positive / Negative
Summarization
Long article -> Short summary
Chatbots
Customer support
Code Generation
Generate Java/Python code
Document Processing
Invoice extraction
10. Frequently Tested Comparisons
| Service | Purpose |
|---|---|
| Bedrock | Generative AI |
| SageMaker AI | Build/train ML models |
| Comprehend | NLP analysis |
| Rekognition | Image analysis |
| Textract | Document extraction |
| Transcribe | Speech to text |
| Polly | Text to speech |
| Translate | Translation |
Last-Minute Exam Memorization
| Topic | Remember |
|---|---|
| Generative AI | Creates new content |
| Foundation Model | Large pretrained model |
| Hallucination | Confident wrong answer |
| RAG | Retrieve + Generate |
| Bedrock | Managed GenAI platform |
| SageMaker | ML lifecycle |
| Guardrails | Safety controls |
| Fine-Tuning | Retrain model |
| Inference | Model prediction |
| Token | Small text unit |
For exam success, spend extra time on:
- Amazon Bedrock
- RAG vs Fine-Tuning
- Foundation Models
- Responsible AI
- Prompt Engineering
- AWS AI service selection scenarios
These areas account for a large percentage of the AWS AI Practitioner questions.
No comments:
Post a Comment