Module 3B — Agentic AI & GenAI¶
Parent: Phase 3 — Artificial Intelligence · Track B
Build applications on top of large language models — agents, RAG, tool use, GenAI products.
Prerequisites: Module 1 (Neural Networks), Module 2 (Frameworks — understand transformers and PyTorch).
Role targets: Agentic AI Engineer · GenAI Engineer · AI Engineer
Lecture Series¶
12 lectures + 3 hands-on labs covering everything from LLM fundamentals to production multi-agent systems. → Start here
| # | Lecture | # | Lecture |
|---|---|---|---|
| 01 | LLM Fundamentals | 07 | Claude Agent SDK |
| 02 | Prompt Engineering | 08 | Multi-Agent Systems |
| 03 | Tool Use | 09 | RAG — Ingestion |
| 04 | Agent Architecture | 10 | RAG — Retrieval |
| 05 | Memory Systems | 11 | Evaluation |
| 06 | LangGraph | 12 | Production |
Labs: Lab 01 — Research Agent · Lab 02 — Multi-Agent Pipeline · Lab 03 — Production RAG
Why This Matters for AI Hardware¶
Agentic AI creates the inference demand that drives chip design: - Long-context attention (128K+ tokens) → L5: HBM bandwidth, memory hierarchy - Multi-turn tool calling → L3: low-latency kernel launch, stream scheduling - Batch inference serving → L2: TensorRT-LLM, in-flight batching optimization - RAG vector search → L1: cuVS/FAISS acceleration on GPU
Understanding these workloads helps L2 (compiler) and L5 (architecture) engineers design for real usage patterns.
1. LLM Fundamentals for Engineers¶
- Transformer architecture: attention mechanism, KV-cache, positional encoding
- Tokenization: BPE, SentencePiece, vocabulary size impact on embedding layer
- Inference mechanics: prefill (compute-bound) vs decode (memory-bound), autoregressive generation
- Scaling laws: parameter count vs dataset size vs compute budget
2. Agentic AI¶
- What agents are: LLM + tools + memory + planning loop
- Agent frameworks: LangChain, LangGraph, CrewAI, AutoGen, Claude Agent SDK
- Tool use: function calling, API integration, code execution
- Memory: conversation history, vector store retrieval, working memory
- Planning: chain-of-thought, ReAct, tree-of-thought, self-reflection
- Multi-agent systems: task decomposition, agent collaboration, orchestration
Projects: 1. Build an agent that uses tools (web search, calculator, code execution) to answer complex questions. 2. Build a multi-step research agent: given a topic, search → synthesize → write a report.
3. RAG (Retrieval-Augmented Generation)¶
- Architecture: document ingestion → chunking → embedding → vector store → retrieval → generation
- Embedding models: sentence-transformers, OpenAI embeddings, Cohere
- Vector stores: FAISS, Chroma, Pinecone, Weaviate, Milvus
- Chunking strategies: fixed-size, recursive, semantic, document-structure-aware
- Retrieval: similarity search, hybrid (dense + sparse), re-ranking
- Evaluation: faithfulness, relevance, hallucination detection
Projects: 1. Build a RAG pipeline over a technical documentation corpus. Evaluate retrieval quality. 2. Compare FAISS (CPU) vs FAISS (GPU) vs cuVS for vector search latency at 1M documents.
4. GenAI Product Development¶
- Prompt engineering: system prompts, few-shot, chain-of-thought, output formatting
- Fine-tuning: LoRA, QLoRA, full fine-tuning on domain data
- Evaluation: automated metrics (ROUGE, BLEU), LLM-as-judge, human evaluation
- Guardrails: output filtering, content safety, hallucination mitigation
- Production deployment: API design, streaming, rate limiting, cost management
Projects: 1. Fine-tune a 7B model with QLoRA on a domain-specific dataset. Measure improvement vs base model. 2. Deploy a GenAI application with streaming, safety guardrails, and cost tracking.
Resources¶
| Resource | What it covers |
|---|---|
| LangChain Documentation | Agent and RAG framework |
| Claude Agent SDK | Building agents with Claude |
| Anthropic Cookbook | Practical Claude API examples |
| RAG best practices | LlamaIndex documentation |
| Build a Large Language Model (From Scratch) (Raschka) | LLM internals |