If you're preparing for the NVIDIA Agentic AI and LLMs Certification, expect questions around LLM fundamentals, RAG, agents, vector databases, orchestration, tool calling, evaluation, deployment, and NVIDIA's AI stack.
LLM Fundamentals
Q1. What is the difference between pre-training and fine-tuning?
A: Pre-training learns general language patterns from large corpora; fine-tuning adapts the model to a specific task using labeled data.
Q2. What is a token?
A: A token is the basic unit processed by an LLM, representing words, subwords, or characters.
Q3. What causes hallucinations in LLMs?
A: Missing knowledge, ambiguous prompts, outdated training data, and probabilistic text generation.
Q4. What is the context window?
A: The maximum number of tokens an LLM can process in a single request.
---
RAG (Retrieval-Augmented Generation)
Q5. Why use RAG instead of fine-tuning?
A: RAG injects up-to-date knowledge without retraining the model.
Q6. What are the main components of a RAG pipeline?
A: Ingestion, chunking, embedding, vector store, retrieval, reranking, and generation.
Q7. Why is chunking important?
A: It improves retrieval accuracy by breaking documents into semantically meaningful sections.
Q8. What is embedding?
A: A numerical vector representation capturing semantic meaning of text.
Q9. How does semantic search differ from keyword search?
A: Semantic search retrieves based on meaning, while keyword search matches exact terms.
Q10. What metrics are used to evaluate retrieval quality?
A: Recall@K, Precision@K, MRR, and NDCG.
---
Vector Databases
Q11. Why use a vector database?
A: To efficiently store and search embeddings using nearest-neighbor algorithms.
Q12. What is ANN search?
A: Approximate Nearest Neighbor search trades slight accuracy for faster retrieval.
Q13. Why not store embeddings in a traditional database like MongoDB?
A: MongoDB is optimized for key-value/document retrieval, not high-dimensional similarity search.
Q14. What is cosine similarity?
A: A measure of similarity based on the angle between two vectors.
---
Agentic AI
Q15. What is an AI Agent?
A: An autonomous system that reasons, plans, uses tools, and executes actions to achieve goals.
Q16. How is Agentic AI different from a chatbot?
A: Agents can perform actions and interact with external systems; chatbots mainly generate responses.
Q17. What are the key components of an agent?
A: LLM, memory, planning, tools, and execution loop.
Q18. What is tool calling?
A: Allowing an LLM to invoke external APIs, databases, or functions.
Q19. What is agent memory?
A: Mechanisms for storing conversation history or long-term knowledge.
Q20. What is a planner agent?
A: An agent that decomposes complex tasks into executable subtasks.
---
Multi-Agent Systems
Q21. What is a multi-agent architecture?
A: Multiple specialized agents collaborating to solve a task.
Q22. How do agents communicate?
A: Through messages, shared memory, event buses, or orchestration frameworks.
Q23. When should you use multiple agents instead of one?
A: When tasks require specialized expertise or parallel execution.
Q24. What is a supervisor agent?
A: An agent that routes tasks and coordinates worker agents.
Q25. What are common multi-agent patterns?
A: Supervisor-worker, hierarchical, peer-to-peer, blackboard, and swarm.
---
Prompt Engineering
Q26. What is chain-of-thought prompting?
A: Prompting the model to reason through intermediate steps.
Q27. What is few-shot prompting?
A: Providing examples to guide model behavior.
Q28. What is prompt injection?
A: Malicious instructions intended to manipulate agent behavior.
Q29. How can prompt injection attacks be mitigated?
A: Input validation, instruction hierarchy, and tool access controls.
---
Evaluation
Q30. How do you evaluate an LLM application?
A: Measure answer quality, groundedness, latency, cost, and retrieval effectiveness.
Q31. What is groundedness?
A: The extent to which responses are supported by retrieved evidence.
Q32. Name hallucination benchmarks.
A: HaluEval, HaluBench, and RAGTruth.
---
NVIDIA-Specific Questions
Q33. What is NVIDIA NIM?
A: A containerized inference microservice for deploying AI models.
Q34. What is NVIDIA NeMo?
A: NVIDIA's framework for training, customizing, and deploying generative AI models.
Q35. What is NVIDIA TensorRT-LLM?
A: An inference optimization framework for accelerating LLMs on NVIDIA GPUs.
Q36. What is quantization?
A: Reducing numerical precision (FP16 → INT8/INT4) to improve inference efficiency.
Q37. Why use TensorRT-LLM?
A: Lower latency, higher throughput, and optimized GPU utilization.
Q38. What is KV Cache?
A: Cached attention states reused during generation to speed inference.
Q39. What is speculative decoding?
A: Using a smaller model to generate candidate tokens that a larger model verifies.
Q40. What is model parallelism?
A: Splitting a model across multiple GPUs to handle large parameter sizes.
---
Scenario-Based Questions
Q41. Your RAG system retrieves irrelevant chunks. What would you improve?
A: Chunking strategy, embeddings, metadata filtering, and reranking.
Q42. An agent repeatedly calls the same tool. How would you fix it?
A: Add memory, loop detection, and tool usage constraints.
Q43. Latency is too high in production. What optimizations can you apply?
A: Quantization, batching, KV caching, TensorRT-LLM, and smaller models.
Q44. When would you choose fine-tuning over RAG?
A: When changing model behavior or domain-specific reasoning rather than adding knowledge.
Q45. Design an agentic system for QE automation.
A: Supervisor agent → Requirement Analysis Agent → Test Case Generator → Automation Script Generator → Review Agent → Execution Agent → Reporting Agent.
These 45 questions cover roughly 80–90% of the concepts typically tested in Agentic AI, RAG, LLMs, and NVIDIA deployment-focused certifications.
No comments:
Post a Comment