Skip to content

Module 3B — Agentic AI & GenAI

Parent: Phase 3 — Artificial Intelligence · Track B

Build applications on top of large language models — agents, RAG, tool use, GenAI products.

Prerequisites: Module 1 (Neural Networks), Module 2 (Frameworks — understand transformers and PyTorch).

Role targets: Agentic AI Engineer · GenAI Engineer · AI Engineer


Lecture Series

12 lectures + 3 hands-on labs covering everything from LLM fundamentals to production multi-agent systems. → Start here

# Lecture # Lecture
01 LLM Fundamentals 07 Claude Agent SDK
02 Prompt Engineering 08 Multi-Agent Systems
03 Tool Use 09 RAG — Ingestion
04 Agent Architecture 10 RAG — Retrieval
05 Memory Systems 11 Evaluation
06 LangGraph 12 Production

Labs: Lab 01 — Research Agent · Lab 02 — Multi-Agent Pipeline · Lab 03 — Production RAG


Why This Matters for AI Hardware

Agentic AI creates the inference demand that drives chip design: - Long-context attention (128K+ tokens) → L5: HBM bandwidth, memory hierarchy - Multi-turn tool calling → L3: low-latency kernel launch, stream scheduling - Batch inference serving → L2: TensorRT-LLM, in-flight batching optimization - RAG vector search → L1: cuVS/FAISS acceleration on GPU

Understanding these workloads helps L2 (compiler) and L5 (architecture) engineers design for real usage patterns.


1. LLM Fundamentals for Engineers

  • Transformer architecture: attention mechanism, KV-cache, positional encoding
  • Tokenization: BPE, SentencePiece, vocabulary size impact on embedding layer
  • Inference mechanics: prefill (compute-bound) vs decode (memory-bound), autoregressive generation
  • Scaling laws: parameter count vs dataset size vs compute budget

2. Agentic AI

  • What agents are: LLM + tools + memory + planning loop
  • Agent frameworks: LangChain, LangGraph, CrewAI, AutoGen, Claude Agent SDK
  • Tool use: function calling, API integration, code execution
  • Memory: conversation history, vector store retrieval, working memory
  • Planning: chain-of-thought, ReAct, tree-of-thought, self-reflection
  • Multi-agent systems: task decomposition, agent collaboration, orchestration

Projects: 1. Build an agent that uses tools (web search, calculator, code execution) to answer complex questions. 2. Build a multi-step research agent: given a topic, search → synthesize → write a report.


3. RAG (Retrieval-Augmented Generation)

  • Architecture: document ingestion → chunking → embedding → vector store → retrieval → generation
  • Embedding models: sentence-transformers, OpenAI embeddings, Cohere
  • Vector stores: FAISS, Chroma, Pinecone, Weaviate, Milvus
  • Chunking strategies: fixed-size, recursive, semantic, document-structure-aware
  • Retrieval: similarity search, hybrid (dense + sparse), re-ranking
  • Evaluation: faithfulness, relevance, hallucination detection

Projects: 1. Build a RAG pipeline over a technical documentation corpus. Evaluate retrieval quality. 2. Compare FAISS (CPU) vs FAISS (GPU) vs cuVS for vector search latency at 1M documents.


4. GenAI Product Development

  • Prompt engineering: system prompts, few-shot, chain-of-thought, output formatting
  • Fine-tuning: LoRA, QLoRA, full fine-tuning on domain data
  • Evaluation: automated metrics (ROUGE, BLEU), LLM-as-judge, human evaluation
  • Guardrails: output filtering, content safety, hallucination mitigation
  • Production deployment: API design, streaming, rate limiting, cost management

Projects: 1. Fine-tune a 7B model with QLoRA on a domain-specific dataset. Measure improvement vs base model. 2. Deploy a GenAI application with streaming, safety guardrails, and cost tracking.


Resources

Resource What it covers
LangChain Documentation Agent and RAG framework
Claude Agent SDK Building agents with Claude
Anthropic Cookbook Practical Claude API examples
RAG best practices LlamaIndex documentation
Build a Large Language Model (From Scratch) (Raschka) LLM internals

Next

Module 4B — ML Engineering & MLOps