Module 3B — Agentic AI & GenAI¶

Parent: Phase 3 — Artificial Intelligence · Track B

Build applications on top of large language models — agents, RAG, tool use, GenAI products.

Prerequisites: Module 1 (Neural Networks), Module 2 (Frameworks — understand transformers and PyTorch).

Role targets: Agentic AI Engineer · GenAI Engineer · AI Engineer

Lecture Series¶

12 lectures + 3 hands-on labs covering everything from LLM fundamentals to production multi-agent systems. → Start here

#	Lecture	#	Lecture
01	LLM Fundamentals	07	Claude Agent SDK
02	Prompt Engineering	08	Multi-Agent Systems
03	Tool Use	09	RAG — Ingestion
04	Agent Architecture	10	RAG — Retrieval
05	Memory Systems	11	Evaluation
06	LangGraph	12	Production

Labs: Lab 01 — Research Agent · Lab 02 — Multi-Agent Pipeline · Lab 03 — Production RAG

Why This Matters for AI Hardware¶

Agentic AI creates the inference demand that drives chip design: - Long-context attention (128K+ tokens) → L5: HBM bandwidth, memory hierarchy - Multi-turn tool calling → L3: low-latency kernel launch, stream scheduling - Batch inference serving → L2: TensorRT-LLM, in-flight batching optimization - RAG vector search → L1: cuVS/FAISS acceleration on GPU

Understanding these workloads helps L2 (compiler) and L5 (architecture) engineers design for real usage patterns.

1. LLM Fundamentals for Engineers¶

Transformer architecture: attention mechanism, KV-cache, positional encoding
Tokenization: BPE, SentencePiece, vocabulary size impact on embedding layer
Inference mechanics: prefill (compute-bound) vs decode (memory-bound), autoregressive generation
Scaling laws: parameter count vs dataset size vs compute budget

2. Agentic AI¶

What agents are: LLM + tools + memory + planning loop
Agent frameworks: LangChain, LangGraph, CrewAI, AutoGen, Claude Agent SDK
Tool use: function calling, API integration, code execution
Memory: conversation history, vector store retrieval, working memory
Planning: chain-of-thought, ReAct, tree-of-thought, self-reflection
Multi-agent systems: task decomposition, agent collaboration, orchestration

Projects: 1. Build an agent that uses tools (web search, calculator, code execution) to answer complex questions. 2. Build a multi-step research agent: given a topic, search → synthesize → write a report.

3. RAG (Retrieval-Augmented Generation)¶

Architecture: document ingestion → chunking → embedding → vector store → retrieval → generation
Embedding models: sentence-transformers, OpenAI embeddings, Cohere
Vector stores: FAISS, Chroma, Pinecone, Weaviate, Milvus
Chunking strategies: fixed-size, recursive, semantic, document-structure-aware
Retrieval: similarity search, hybrid (dense + sparse), re-ranking
Evaluation: faithfulness, relevance, hallucination detection

Projects: 1. Build a RAG pipeline over a technical documentation corpus. Evaluate retrieval quality. 2. Compare FAISS (CPU) vs FAISS (GPU) vs cuVS for vector search latency at 1M documents.

4. GenAI Product Development¶

Prompt engineering: system prompts, few-shot, chain-of-thought, output formatting
Fine-tuning: LoRA, QLoRA, full fine-tuning on domain data
Evaluation: automated metrics (ROUGE, BLEU), LLM-as-judge, human evaluation
Guardrails: output filtering, content safety, hallucination mitigation
Production deployment: API design, streaming, rate limiting, cost management

Projects: 1. Fine-tune a 7B model with QLoRA on a domain-specific dataset. Measure improvement vs base model. 2. Deploy a GenAI application with streaming, safety guardrails, and cost tracking.

Resources¶

Resource	What it covers
LangChain Documentation	Agent and RAG framework
Claude Agent SDK	Building agents with Claude
Anthropic Cookbook	Practical Claude API examples
RAG best practices	LlamaIndex documentation
Build a Large Language Model (From Scratch) (Raschka)	LLM internals

Next¶

→ Module 4B — ML Engineering & MLOps