Thursday, June 18, 2026

AWS AI Practioner Prep

 If you're preparing for the AWS Certified AI Practitioner (AIF-C01) exam, focus on these major domains.

Table of Contents

  1. Fundamentals of AI and ML
  2. Generative AI Concepts
  3. Foundation Models
  4. Prompt Engineering
  5. Responsible AI
  6. AWS AI Services
  7. AWS Generative AI Services
  8. Security and Compliance
  9. AI Use Cases
  10. Exam Tips

1. Fundamentals of AI and ML

Artificial Intelligence (AI)

Machines performing tasks that normally require human intelligence.

Machine Learning (ML)

Subset of AI where systems learn patterns from data.

Deep Learning

Uses neural networks with multiple layers.

Generative AI

Creates new content such as:

  • Text
  • Images
  • Audio
  • Video
  • Code

Training vs Inference

TermMeaning
TrainingModel learns from data
InferenceModel makes predictions using learned knowledge

2. Generative AI Concepts

Large Language Model (LLM)

Examples:

  • OpenAI GPT models
  • Anthropic Claude
  • Meta Llama

Tokens

Text is broken into small units called tokens.

Example:

I love AWS

May become:

[I] [love] [AWS]

Hallucination

Model generates incorrect information while sounding confident.

Context Window

Amount of information an LLM can consider at once.

3. Foundation Models

Foundation Model (FM)

Large pretrained model that can be adapted for many tasks.

Examples:

  • Text generation
  • Summarization
  • Classification
  • Translation
  • Chatbots

Multi Modality

 
ModalitiesExample Models
Text → TextGPT-4oClaude SonnetLlama 3Mistral
Image → TextLLaVAQwen2.5-VLBLIP-2InstructBLIP
Image + Text → TextGPT-4oGemini 2.5Claude SonnetKosmos-2
Image ↔ Text (Similarity/Retrieval)CLIPSigLIPALIGNFlorence
Text → ImageStable DiffusionDALL-E 3ImagenFLUX.1
Image → ImageStable Diffusion XLControlNetInstructPix2Pix
Text → Audio (Speech)Tacotron 2VALL-EBark
Audio → Text (ASR)Whisperwav2vec 2.0Conformer
Text + Audio → TextGPT-4oGemini 2.5
Audio ↔ TextCLAPAudioCLIP
Text → VideoSoraVeoGen-3Pika
Image → VideoRunway Gen-3PikaLuma Dream Machine
Video → TextVideo-LLaVAVideoChatGPTGemini 2.5
Video + Text → TextGPT-4oGemini 2.5Qwen2.5-VL
Text + Image + Audio → TextGPT-4oGemini 2.5
Text + Image + Audio + Video → TextGPT-4oGemini 2.5
Text + Image + Audio + Video → Text + AudioGPT-4o RealtimeGemini Live


Fine Tuning

Retraining model with domain-specific data.

Retrieval Augmented Generation (RAG)

Instead of retraining:

  1. Retrieve documents
  2. Send documents to LLM
  3. Generate response

Benefits:

  • Lower cost
  • More current data
  • Reduced hallucinations

RAG Pipeline: Model Types Used at Each Step

RAG StepPurposeModel TypeExample Models
1. Document IngestionRead PDFs, DOCX, HTML, ImagesOCR / Document AITesseractLayoutLMDonut
2. ChunkingSplit documents into passagesRule-based / NLPSentence Splitter, Recursive Text Splitter
3. Text → EmbeddingsConvert chunks into vectorsEmbedding Model (Encoder-only Transformer)BERTSentence-BERTE5BGE
4. Vector StorageStore embeddingsVector DatabaseFAISSMilvusWeaviatePinecone
5. Query → EmbeddingConvert user query to vectorSame Embedding ModelBGE, E5, SBERT
6. RetrievalFind nearest chunksANN Search AlgorithmHNSW, IVF, Flat Search
7. Re-ranking (Optional)Improve retrieved resultsCross EncoderMonoBERTCohere RerankBGE Reranker
8. Context ConstructionBuild prompt with retrieved chunksPrompt BuilderTemplate Engine
9. Answer GenerationGenerate final answerDecoder-only LLMGPT-4oClaude SonnetLlama 3Mistral
10. Citation Generation (Optional)Show sourcesLLM / Metadata LayerGPT-4o, Claude

Transformer Architecture Used at Each Step

StepTransformer Type
Embedding GenerationEncoder-only
Re-rankingEncoder-only (Cross Encoder)
Answer GenerationDecoder-only
Translation (optional)Encoder-Decoder
Summarization (optional)Encoder-Decoder
OCR UnderstandingEncoder or Encoder-Decoder
Multimodal RAGVision Encoder + LLM Decoder

Common Models by Transformer Family

Transformer FamilyExample ModelsUsed For
Encoder-onlyBERT, RoBERTa, SBERT, E5, BGEEmbeddings, Retrieval
Decoder-onlyGPT, Llama, Claude, Mistral, QwenGeneration
Encoder-DecoderT5, FLAN-T5, BARTSummarization, Translation
Vision EncoderViT, CLIP Vision EncoderImage Embeddings
Vision-LanguageLLaVA, Qwen-VL, GPT-4oMultimodal RAG

Typical Modern RAG Stack

LayerCommon Choice
ChunkingLangChain Recursive Splitter
EmbeddingsBGE-large, E5-large
Vector DBFAISS, Milvus
RetrievalHNSW
Re-rankerBGE-Reranker
GeneratorGPT-4o, Claude, Llama 3
OrchestrationLangChain, LlamaIndex

Mental Model

Documents

Chunking

Encoder Model
(BERT / E5 / BGE)

Embeddings

Vector DB

End

User Query

Encoder Model
(BERT / E5 / BGE)

Similarity Search

Top K Chunks

Cross Encoder Re-ranker
(Optional)

Prompt Construction

Decoder LLM
(GPT / Claude / Llama)

Final Answer


TaskModel Type
Create EmbeddingsEncoder-only
Retrieve DocumentsVector Search
Re-rank ResultsCross Encoder
Generate AnswerDecoder-only LLM
Summarize DocumentsEncoder-Decoder
Multimodal RetrievalCLIP / Vision Encoder
Multimodal GenerationGPT-4o / Gemini / Qwen-VL

4. Prompt Engineering

Zero-Shot Prompting

Translate this sentence to French.

One-Shot Prompting

Provide one example.

Few-Shot Prompting

Provide multiple examples.

Chain of Thought

Ask model to reason step-by-step.

Prompt Components

  • Role
  • Context
  • Instructions
  • Examples
  • Constraints

5. Responsible AI

Fairness

Avoid bias.

Explainability

Understand why model produced output.

Privacy

Protect user data.

Robustness

Model behaves reliably.

Transparency

Users know AI is involved.


Evaluation 


Here’s the table reorganized the way you asked:

Model Group

Evaluation

Description of the Evaluation

Agentic Models

AgentBench

Evaluates autonomous task execution, planning, tool usage, and multi-step reasoning.

Agentic Models

GAIA

Tests real-world assistant capabilities like searching, tool calling, and reasoning.

Agentic Models

SWE-bench

Measures ability to solve real GitHub issues by editing codebases.

Bi-Encoder

BEIR

Evaluates embedding-based retrieval across multiple datasets and domains.

Bi-Encoder

MTEB

Measures embedding quality across retrieval, clustering, classification, and reranking tasks.

Bi-Encoder

MS MARCO

Evaluates dense retrieval and passage ranking performance.

Cross-Encoder

BEIR

Measures pairwise query-document relevance scoring quality.

Cross-Encoder

MS MARCO

Evaluates reranking precision for query-passage relevance.

Cross-Encoder

TREC Deep Learning Track

Measures ranking quality for search relevance tasks.

Decoder-only

GSM8K

Evaluates arithmetic and multi-step mathematical reasoning.

Decoder-only

HellaSwag

Measures commonsense reasoning and next-sentence prediction.

Decoder-only

HumanEval

Evaluates code generation correctness using executable unit tests.

Decoder-only

MMLU

Measures broad knowledge and reasoning across many academic domains.

Decoder-only

MT-Bench

Tests instruction following and conversational quality.

Decoder-only

Needle-in-a-Haystack

Measures ability to retrieve specific information from long contexts.

Decoder-only

TruthfulQA

Tests factual consistency and resistance to hallucinations.

Encoder-decoder

BLEU

Measures overlap between generated text and reference text, mainly for translation.

Encoder-decoder

ROUGE

Evaluates summarization quality based on n-gram overlap.

Encoder-decoder

SQuAD

Measures extractive question-answering accuracy.

Encoder-only

GLUE

Evaluates language understanding tasks like sentiment, entailment, and similarity.

Encoder-only

MTEB

Measures embedding performance across multiple NLP tasks.

Encoder-only

STS-B

Measures how well embeddings capture sentence similarity.

Encoder-only

SuperGLUE

Harder version of GLUE for advanced reasoning tasks.

Long-context Models

InfiniteBench

Evaluates memory retention and reasoning over very long contexts.

Long-context Models

LongBench

Tests summarization, retrieval, and reasoning on long documents.

Long-context Models

Needle-in-a-Haystack

Measures retrieval accuracy from large contexts.

Multimodal Models

MMBench

Evaluates image understanding and multimodal reasoning.

Multimodal Models

MMMU

Measures multimodal reasoning across academic and professional domains.

Multimodal Models

MMVet

Tests advanced visual reasoning and perception.

RAG Systems

CRUD-RAG

Measures retrieval robustness and update handling in RAG pipelines.

RAG Systems

RAGAS

Evaluates faithfulness, context precision, context recall, and answer relevance in RAG.

Reward Models

RewardBench

Evaluates preference model quality and alignment performance.

Tool-use Models

ToolBench

Measures correctness in tool selection, API usage, and tool chaining.



Here’s the table reorganized the way you asked:


Model Group Evaluation Description of the Evaluation


Agentic Models AgentBench Evaluates autonomous task execution, planning, tool usage, and multi-step reasoning.

Agentic Models GAIA Tests real-world assistant capabilities like searching, tool calling, and reasoning.

Agentic Models SWE-bench Measures ability to solve real GitHub issues by editing codebases.

Bi-Encoder BEIR Evaluates embedding-based retrieval across multiple datasets and domains.

Bi-Encoder MTEB Measures embedding quality across retrieval, clustering, classification, and reranking tasks.

Bi-Encoder MS MARCO Evaluates dense retrieval and passage ranking performance.

Cross-Encoder BEIR Measures pairwise query-document relevance scoring quality.

Cross-Encoder MS MARCO Evaluates reranking precision for query-passage relevance.

Cross-Encoder TREC Deep Learning Track Measures ranking quality for search relevance tasks.

Decoder-only GSM8K Evaluates arithmetic and multi-step mathematical reasoning.

Decoder-only HellaSwag Measures commonsense reasoning and next-sentence prediction.

Decoder-only HumanEval Evaluates code generation correctness using executable unit tests.

Decoder-only MMLU Measures broad knowledge and reasoning across many academic domains.

Decoder-only MT-Bench Tests instruction following and conversational quality.

Decoder-only Needle-in-a-Haystack Measures ability to retrieve specific information from long contexts.

Decoder-only TruthfulQA Tests factual consistency and resistance to hallucinations.

Encoder-decoder BLEU Measures overlap between generated text and reference text, mainly for translation.

Encoder-decoder ROUGE Evaluates summarization quality based on n-gram overlap.

Encoder-decoder SQuAD Measures extractive question-answering accuracy.

Encoder-only GLUE Evaluates language understanding tasks like sentiment, entailment, and similarity.

Encoder-only MTEB Measures embedding performance across multiple NLP tasks.

Encoder-only STS-B Measures how well embeddings capture sentence similarity.

Encoder-only SuperGLUE Harder version of GLUE for advanced reasoning tasks.

Long-context Models InfiniteBench Evaluates memory retention and reasoning over very long contexts.

Long-context Models LongBench Tests summarization, retrieval, and reasoning on long documents.

Long-context Models Needle-in-a-Haystack Measures retrieval accuracy from large contexts.

Multimodal Models MMBench Evaluates image understanding and multimodal reasoning.

Multimodal Models MMMU Measures multimodal reasoning across academic and professional domains.

Multimodal Models MMVet Tests advanced visual reasoning and perception.

RAG Systems CRUD-RAG Measures retrieval robustness and update handling in RAG pipelines.

RAG Systems RAGAS Evaluates faithfulness, context precision, context recall, and answer relevance in RAG.

Reward Models RewardBench Evaluates preference model quality and alignment performance.

Tool-use Models ToolBench Measures correctness in tool selection, API usage, and tool chaining.


6. AWS AI Services

Amazon Rekognition

  • Image analysis
  • Face detection
  • Object detection

Amazon Comprehend

  • Sentiment analysis
  • Entity extraction
  • Language detection

Amazon Transcribe

  • Speech to text

Amazon Polly

  • Text to speech

Amazon Textract

  • Extract text from documents

Amazon Translate

  • Language translation
Deep Racer 

  • AWS DeepRacer is a cloud-based autonomous racing car platform used to learn, train, and evaluate reinforcement learning (RL) models through simulated and real-world racing.
AWS DeepLens
  • Run computer vision and deep learning models on an AI-enabled camera.

AWS DeepComposer
  • Learn generative AI and machine learning through music composition.

7. AWS Generative AI Services

Amazon Bedrock

Most important service for the exam.

Provides access to foundation models from:

  • Anthropic Claude
  • Meta Llama
  • Amazon Nova
  • Stability AI

Features:

  • RAG
  • Agents
  • Knowledge Bases
  • Guardrails
  • Fine-tuning

Amazon Q Business

Enterprise chatbot over company data.

Amazon Q Developer

Developer coding assistant.

Amazon SageMaker AI

Build, train and deploy ML models.

8. Security and Compliance

Shared Responsibility Model

AWS secures:

  • Infrastructure
  • Hardware
  • Network

Customer secures:

  • Data
  • Access control
  • Configuration

IAM

Identity and access management.

Encryption

Data:

  • At rest
  • In transit

9. Common Use Cases

Classification

Spam or Not Spam

Sentiment Analysis

Positive / Negative

Summarization

Long article -> Short summary

Chatbots

Customer support

Code Generation

Generate Java/Python code

Document Processing

Invoice extraction

10. Frequently Tested Comparisons

ServicePurpose
BedrockGenerative AI
SageMaker AIBuild/train ML models
ComprehendNLP analysis
RekognitionImage analysis
TextractDocument extraction
TranscribeSpeech to text
PollyText to speech
TranslateTranslation

Last-Minute Exam Memorization

TopicRemember
Generative AICreates new content
Foundation ModelLarge pretrained model
HallucinationConfident wrong answer
RAGRetrieve + Generate
BedrockManaged GenAI platform
SageMakerML lifecycle
GuardrailsSafety controls
Fine-TuningRetrain model
InferenceModel prediction
TokenSmall text unit

For exam success, spend extra time on:

  1. Amazon Bedrock
  2. RAG vs Fine-Tuning
  3. Foundation Models
  4. Responsible AI
  5. Prompt Engineering
  6. AWS AI service selection scenarios

These areas account for a large percentage of the AWS AI Practitioner questions.

No comments:

Post a Comment

Build Lakehouse using Iceberg

 Flow Diagram of Data Lakehouse While Data Lake is excels for Machine Learning , Data warehouse is used for Business Intelligence , Data Lak...