programmers vocabulary

Thursday, June 18, 2026

Nvidia Agentic AI prep

If you're preparing for the NVIDIA Agentic AI and LLMs Certification, expect questions around LLM fundamentals, RAG, agents, vector databases, orchestration, tool calling, evaluation, deployment, and NVIDIA's AI stack.

LLM Fundamentals

Q1. What is the difference between pre-training and fine-tuning?

A: Pre-training learns general language patterns from large corpora; fine-tuning adapts the model to a specific task using labeled data.

Q2. What is a token?

A: A token is the basic unit processed by an LLM, representing words, subwords, or characters.

Q3. What causes hallucinations in LLMs?

A: Missing knowledge, ambiguous prompts, outdated training data, and probabilistic text generation.

Q4. What is the context window?

A: The maximum number of tokens an LLM can process in a single request.

---

RAG (Retrieval-Augmented Generation)

Q5. Why use RAG instead of fine-tuning?

A: RAG injects up-to-date knowledge without retraining the model.

Q6. What are the main components of a RAG pipeline?

A: Ingestion, chunking, embedding, vector store, retrieval, reranking, and generation.

Q7. Why is chunking important?

A: It improves retrieval accuracy by breaking documents into semantically meaningful sections.

Q8. What is embedding?

A: A numerical vector representation capturing semantic meaning of text.

Q9. How does semantic search differ from keyword search?

A: Semantic search retrieves based on meaning, while keyword search matches exact terms.

Q10. What metrics are used to evaluate retrieval quality?

A: Recall@K, Precision@K, MRR, and NDCG.

---

Vector Databases

Q11. Why use a vector database?

A: To efficiently store and search embeddings using nearest-neighbor algorithms.

Q12. What is ANN search?

A: Approximate Nearest Neighbor search trades slight accuracy for faster retrieval.

Q13. Why not store embeddings in a traditional database like MongoDB?

A: MongoDB is optimized for key-value/document retrieval, not high-dimensional similarity search.

Q14. What is cosine similarity?

A: A measure of similarity based on the angle between two vectors.

---

Agentic AI

Q15. What is an AI Agent?

A: An autonomous system that reasons, plans, uses tools, and executes actions to achieve goals.

Q16. How is Agentic AI different from a chatbot?

A: Agents can perform actions and interact with external systems; chatbots mainly generate responses.

Q17. What are the key components of an agent?

A: LLM, memory, planning, tools, and execution loop.

Q18. What is tool calling?

A: Allowing an LLM to invoke external APIs, databases, or functions.

Q19. What is agent memory?

A: Mechanisms for storing conversation history or long-term knowledge.

Q20. What is a planner agent?

A: An agent that decomposes complex tasks into executable subtasks.

---

Multi-Agent Systems

Q21. What is a multi-agent architecture?

A: Multiple specialized agents collaborating to solve a task.

Q22. How do agents communicate?

A: Through messages, shared memory, event buses, or orchestration frameworks.

Q23. When should you use multiple agents instead of one?

A: When tasks require specialized expertise or parallel execution.

Q24. What is a supervisor agent?

A: An agent that routes tasks and coordinates worker agents.

Q25. What are common multi-agent patterns?

A: Supervisor-worker, hierarchical, peer-to-peer, blackboard, and swarm.

---

Prompt Engineering

Q26. What is chain-of-thought prompting?

A: Prompting the model to reason through intermediate steps.

Q27. What is few-shot prompting?

A: Providing examples to guide model behavior.

Q28. What is prompt injection?

A: Malicious instructions intended to manipulate agent behavior.

Q29. How can prompt injection attacks be mitigated?

A: Input validation, instruction hierarchy, and tool access controls.

---

Evaluation

Q30. How do you evaluate an LLM application?

A: Measure answer quality, groundedness, latency, cost, and retrieval effectiveness.

Q31. What is groundedness?

A: The extent to which responses are supported by retrieved evidence.

Q32. Name hallucination benchmarks.

A: HaluEval, HaluBench, and RAGTruth.

---

NVIDIA-Specific Questions

Q33. What is NVIDIA NIM?

A: A containerized inference microservice for deploying AI models.

Q34. What is NVIDIA NeMo?

A: NVIDIA's framework for training, customizing, and deploying generative AI models.

Q35. What is NVIDIA TensorRT-LLM?

A: An inference optimization framework for accelerating LLMs on NVIDIA GPUs.

Q36. What is quantization?

A: Reducing numerical precision (FP16 → INT8/INT4) to improve inference efficiency.

Q37. Why use TensorRT-LLM?

A: Lower latency, higher throughput, and optimized GPU utilization.

Q38. What is KV Cache?

A: Cached attention states reused during generation to speed inference.

Q39. What is speculative decoding?

A: Using a smaller model to generate candidate tokens that a larger model verifies.

Q40. What is model parallelism?

A: Splitting a model across multiple GPUs to handle large parameter sizes.

---

Scenario-Based Questions

Q41. Your RAG system retrieves irrelevant chunks. What would you improve?

A: Chunking strategy, embeddings, metadata filtering, and reranking.

Q42. An agent repeatedly calls the same tool. How would you fix it?

A: Add memory, loop detection, and tool usage constraints.

Q43. Latency is too high in production. What optimizations can you apply?

A: Quantization, batching, KV caching, TensorRT-LLM, and smaller models.

Q44. When would you choose fine-tuning over RAG?

A: When changing model behavior or domain-specific reasoning rather than adding knowledge.

Q45. Design an agentic system for QE automation.

A: Supervisor agent → Requirement Analysis Agent → Test Case Generator → Automation Script Generator → Review Agent → Execution Agent → Reporting Agent.

These 45 questions cover roughly 80–90% of the concepts typically tested in Agentic AI, RAG, LLMs, and NVIDIA deployment-focused certifications.

Data Modeling with Databricks

Data modeling is the process of creating a visual blueprint of your business data to structure how it is collected, stored, and related. It translates real-world business rules into organized technical schemas, ensuring consistency, scalability, and efficiency in databases and data warehouses. [1, 2]

The 3 Levels of Data Modeling

Data models progress from abstract business ideas to concrete technical blueprints.

• Conceptual Data Model: The highest level. It defines what data is needed (e.g., customers, products, orders) and general business rules. It acts as a shared language between technical teams and business stakeholders.

• Logical Data Model: The middle layer. It outlines detailed data structures, attributes, and exact relationships. It is independent of any specific database management system.

• Physical Data Model: The technical implementation layer. It details how data will be physically stored in a specific system (e.g., SQL Server, Oracle, data lakehouse), including data types, indexes, and partitions. [1, 2]

Core Modeling Components

Regardless of the model, these are the fundamental building blocks:

• Entities: The "things" or concepts you want to track (e.g., Customer, Employee, Product). These typically become tables in a database.

• Attributes: The specific characteristics of an entity. For example, a Customer entity might have attributes like Name, Email, and Phone Number.

• Relationships: How entities interact with each other. For example, a Customer "places" an Order.

• Cardinality: Defines the numerical relationship between entities (e.g., One-to-One, One-to-Many, or Many-to-Many).

• Primary & Foreign Keys: Unique identifiers. A Primary Key uniquely identifies a specific record (like a Customer ID), while a Foreign Key is an attribute that links back to the primary key in another table, establishing a relationship. [1, 11, 12, 13, 14]

Key Methodologies

Depending on whether you are building a transactional application or an analytical dashboard, you'll use different modeling styles:

• Entity-Relationship (ER) Modeling: Used primarily for Operational/Transactional systems (OLTP). It focuses on reducing data redundancy through a process called normalization, ensuring every piece of data is stored in exactly one place.

• Dimensional Modeling: Used for Data Warehouses and Analytics (OLAP). It organizes data into Facts (quantitative events like sales transactions) and Dimensions (descriptive contexts like store locations or dates). [2]

Best Practices

• Understand the Business Purpose: Technical design must always serve business needs; knowing exactly what metrics the business wants to track dictates the model's structure.

• Avoid Fact-to-Fact Joins: In dimensional modeling, joining two fact tables directly often indicates an error in the model.

• Use Surrogate Keys: When building data warehouses, professionals on Reddit generally agree that using artificial, integer-based keys (surrogate keys) simplifies joining tables and managing historical data. [19, 20, 21]

AI can make mistakes, so double-check responses

[1] https://www.databricks.com/blog/what-is-data-modeling

[2] https://www.sap.com/resources/what-is-data-modeling

[3] https://www.mongodb.com/resources/basics/databases/data-modeling

[4] https://www.geeksforgeeks.org/data-analysis/data-modeling-a-comprehensive-guide-for-analysts/

[5] https://www.scribd.com/document/610970256/DATA-MODELLING

[6] https://learning.sap.com/courses/becoming-an-sap-data-architect/transforming-business-concepts-with-data-modeling

[7] https://community.sap.com/t5/technology-q-a/conceptual-logical-physical-modeling/qaq-p/11584240

[8] https://agiledata.org/essays/datamodeling101.html

[9] https://atlan.com/what-is/data-modeling-concepts/

[10] https://www.quest.com/learn/conceptual.aspx

[11] https://medium.com/business-architected/conceptual-data-modelling-start-with-business-use-cases-10b3f2670d47

[12] https://www.datamation.com/big-data/types-of-data-modeling/

[13] https://www.workday.com/en-us/perspectives/ai/intro-to-data-modeling.html

[14] https://jcsites.juniata.edu/faculty/rhodes/dbms/ermodel.htm

[15] https://www.packtpub.com/en-us/learning/how-to-tutorials/implementing-data-modeling-techniques-in-qlik-sense-tutorial

[16] https://www.sciencedirect.com/topics/computer-science/normalized-model

[17] https://atlan.com/what-is-data-modeling/

[18] https://www.red-gate.com/blog/database-design-patterns/

[19] https://www.reddit.com/r/dataengineering/comments/1onxcfo/data_modeling_what_is_the_most_important_concept/

[20] https://www.reddit.com/r/dataengineering/comments/1onxcfo/data_modeling_what_is_the_most_important_concept/

[21] https://www.reddit.com/r/dataengineering/comments/1onxcfo/data_modeling_what_is_the_most_important_concept/

Wednesday, June 17, 2026

Learning Classical Machine Learning

You should learn these five classical machine learning topics in the following order: Linear Regression $\rightarrow$ Logistic Regression $\rightarrow$ Naive Bayes $\rightarrow$ Support Vector Machines (SVM) $\rightarrow$ Matrix Factorization. [1, 2]

This specific sequence builds a smooth mathematical and conceptual path, moving from basic lines to probabilities, optimization boundaries, and finally unsupervised matrix decompositions.

------------------------------

## 1. Linear Regression (Start Here)

* Why first: It is the foundational stepping stone of all parametric machine learning.

* Core Concepts to Learn: You will master Loss Functions (Mean Squared Error), Gradient Descent (how weights update), and Regularization (L1/L2 or Lasso/Ridge).

* Math required: Basic algebra and simple derivatives. [3, 4, 5, 6, 7]

## 2. Logistic Regression

* Why second: As established, it uses the exact same core linear combination ($wx + b$) as Linear Regression but introduces a Sigmoid function to transform outputs into probabilities.

* Core Concepts to Learn: You will learn about Classification, Log Loss (Binary Cross-Entropy), and decision boundaries.

* Math required: Logarithms and exponent math. [3, 4, 8, 9, 10]

## 3. Naive Bayes

* Why third: This shifts your perspective from optimization (finding the best line) to pure probabilistic classification.

* Core Concepts to Learn: You will learn Bayes' Theorem, conditional probability, and text classification (like spam filtering). Learning this right after Logistic Regression allows you to easily compare Discriminative models (Logistic) with Generative models (Naive Bayes).

* Math required: Basic probability and conditional probability rules. [3, 4, 11, 12, 13]

## 4. Support Vector Machines (SVM)

* Why fourth: SVMs handle classification like Logistic Regression but use a much more advanced geometric concept. Instead of finding any line that separates the data, it finds the line with the absolute maximum margin. [11, 14, 15, 16, 17]

* Core Concepts to Learn: You will learn about Hyperplanes, Margin Maximization, and the Kernel Trick (which allows the model to project flat data into higher-dimensional spaces to find non-linear separations). [18, 19, 20]

* Math required: Vector geometry and optimization theory.

## 5. Matrix Factorization (End Here)

* Why last: This is a distinct shift into Unsupervised Learning and recommendation systems. It breaks a single large matrix down into smaller component matrices to find hidden relationships. [21, 22, 23, 24]

* Core Concepts to Learn: You will learn about Latent Factors, Collaborative Filtering (how Netflix or Spotify recommend content), and Singular Value Decomposition (SVD). [21, 25, 26, 27]

* Math required: Advanced Linear Algebra (matrix multiplication, dimensions, and rank). [28, 29]

------------------------------

Would you like a curated list of hands-on projects or Python libraries to practice as you go through this learning path?

[1] [https://www.ncbi.nlm.nih.gov](https://www.ncbi.nlm.nih.gov/books/NBK597496/)

[2] [https://dokumen.pub](https://dokumen.pub/linear-algebra-and-optimization-for-machine-learning-a-textbook-1nbsped-3030403432-9783030403430.html)

[3] [https://www.linkedin.com](https://www.linkedin.com/posts/amit-shekhar-iitbhu_ai-machinelearning-activity-7415244847399460864-5c0g)

[4] [https://www.youtube.com](https://www.youtube.com/watch?v=E0Hmnixke2g&t=141)

[5] [https://cs-114.org](https://cs-114.org/wp-content/uploads/2025/01/LogisticRegression-1.pdf)

[6] [https://www.linkedin.com](https://www.linkedin.com/pulse/supervised-machine-learning-python-regression-simple-linear-maharaj-fwmjc)

[7] [https://www.craw.in](https://www.craw.in/machine-learning-interview-questions-and-answers-in-india)

[8] [https://www.youtube.com](https://www.youtube.com/watch?v=63Kr3HFECHM&t=122)

[9] [https://medium.com](https://medium.com/analytics-vidhya/math-behind-logistic-regression-that-will-make-you-a-data-scientist-2bce20ea53fd)

[10] [https://medium.com](https://medium.com/@prajun_t/linear-classifiers-7e46869844cc)

[11] [https://mrcet.com](https://mrcet.com/downloads/digital_notes/CSE/IV%20Year/MACHINE%20LEARNING%28R17A0534%29.pdf)

[12] [https://raman-singh-13-09.medium.com](https://raman-singh-13-09.medium.com/introduction-to-linear-regression-c98aca3a08f1)

[13] [https://www.cognixia.com](https://www.cognixia.com/blog/everything-you-need-to-know-about-the-naive-bayes-algorithm/)

[14] [https://link.springer.com](https://link.springer.com/protocol/10.1007/978-1-0716-3195-9_2)

[15] [https://www.geeksforgeeks.org](https://www.geeksforgeeks.org/machine-learning/machine-learning-algorithms/)

[16] [https://www.upgrad.com](https://www.upgrad.com/tutorials/ai-ml/machine-learning-tutorial/)

[17] [https://methods.sagepub.com](https://methods.sagepub.com/foundations/machine-learning)

[18] [https://www.upgrad.com](https://www.upgrad.com/blog/support-vector-machines/)

[19] [https://python.plainenglish.io](https://python.plainenglish.io/deep-dive-into-support-vector-machines-svms-for-efficient-data-classification-by-hand-8d3afce90d4a)

[20] [https://webmobtech.com](https://webmobtech.com/blog/understanding-ai-algorithms/)

[21] [https://www.sciencedirect.com](https://www.sciencedirect.com/topics/computer-science/machine-learning)

[22] [https://www.shaped.ai](https://www.shaped.ai/blog/matrix-factorization-the-bedrock-of-collaborative-filtering-recommendations)

[23] [https://saturncloud.io](https://saturncloud.io/glossary/matrix-factorization/)

[24] [https://www.lexalytics.com](https://www.lexalytics.com/blog/machine-learning-natural-language-processing/)

[25] [https://medium.com](https://medium.com/the-andela-way/foundations-of-machine-learning-singular-value-decomposition-svd-162ac796c27d)

[26] [https://www.simplilearn.com](https://www.simplilearn.com/tutorials/pyspark-tutorial/pyspark-mllib-for-ml)

[27] [https://bostoninstituteofanalytics.org](https://bostoninstituteofanalytics.org/blog/how-machine-learning-powers-recommendation-systems-netflix-amazon-spotify/)

[28] [https://wikidocs.net](https://wikidocs.net/216015)

[29] [https://vinuni.edu.vn](https://vinuni.edu.vn/data-science-skills/)