Skip to content

Module 3B — Agentic AI & GenAI

Parent: Phase 3 — Artificial Intelligence · Track B

Build applications on top of large language models — agents, RAG, tool use, GenAI products.

Prerequisites: Module 1 (Neural Networks), Module 2 (Frameworks — understand transformers and PyTorch).

Role targets: Agentic AI Engineer · GenAI Engineer · AI Engineer


Lecture Series

43 numbered lectures, one supplemental Lecture 24b, and 5 hands-on labs covering everything from LLM fundamentals to production multi-agent systems. → Start here

Course currency note: exact model names, context windows, SDK features, and pricing change quickly. This module teaches stable interfaces and system patterns: model APIs, tool protocols, runtime loops, workflow graphs, gateways, telemetry, and policy boundaries. Always check provider documentation before copying model IDs or prices into a real deployment.

# Lecture # Lecture
01 LLM Fundamentals 07 Agent SDKs
02 Prompt Engineering 08 Multi-Agent Systems
03 Tool Use 09 RAG — Ingestion
04 Agent Architecture 10 RAG — Retrieval
05 Memory Systems 11 Evaluation
06 LangGraph 12 Production
13 Runtime Discipline
14 Deterministic Startup
15 OpenClaw Gateway
16 Routing and Sessions
17 Isolation and Memory
18 Operations and Security
19 Agent Loop
20 Cron and Scheduled Agent Runs
21 System Prompt Architecture
22 App SDK and Typed RPCs
23 Gateway RPC Protocol
24 Agent Harness
24b Event-Sourced Agent State
25 OpenCoven Workspace
26 OpenKnots Interfaces
27 Agent Security Engineer
28 Pi Minimal Agent
29 Agent Skills
30 Agentic SDLC
31 Runtime Strategy
32 LLM From Scratch
33 Structured Tools
34 Multimodal Sub-Agents
35 GPU Kernel Translation
36 FP8 KV-Cache
37 TraceLens
38 AutoSP
39 Agent Skills Eval
40 ZAYA1-8B
41 OpenClaw Threat Model
42 OpenAI Agents SDK
43 MLSys Kernel Contest

Labs: Lab 01 — Research Agent · Lab 02 — Multi-Agent Pipeline · Lab 03 — Production RAG · Lab 04 — TokenJuice Output Compaction · Lab 05 — OpenMeow App SDK Dogfood


Why This Matters for AI Hardware

Agentic AI creates the inference demand that drives chip design: - Long-context attention (128K+ tokens) → L5: HBM bandwidth, memory hierarchy - Multi-turn tool calling → L3: low-latency kernel launch, stream scheduling - Batch inference serving → L2: TensorRT-LLM, in-flight batching optimization - RAG vector search → L1: cuVS/FAISS acceleration on GPU

Understanding these workloads helps L2 (compiler) and L5 (architecture) engineers design for real usage patterns.


The strongest application trend is that AI is moving from chat interfaces to persistent, tool-using systems that act inside real workflows.

Two useful reference patterns are: - agentic coding systems such as Claude Code - local-first personal assistant systems such as OpenClaw

These are worth studying because they show what modern AI applications actually look like in production-like usage, not just in demos.

1. Coding Agents Are Replacing Single-Step “Code Completion”

This is the clearest shift.

Tools like Claude Code are no longer limited to autocomplete in an editor. They act more like software workers: - read and navigate a repository - plan changes across multiple files - run tests and verification loops - create commits and pull requests - connect to external tools through MCP - load reusable skills, hooks, and plugins

Anthropic’s official docs describe this directly: Claude Code can automate routine engineering work, work with git, connect tools through MCP, spawn multiple agents, and run in CI/CD workflows. Its public repository and plugin system make it a good reference for how coding agents are becoming a real application category rather than a novelty.

Why this matters for hardware: - coding agents create long-running, tool-rich inference sessions instead of short chat turns - they increase demand for low-latency iteration loops, larger context windows, and higher background inference volume - they push AI products into developer infrastructure, where reliability, permissioning, and automation matter as much as model quality

2. Personal AI Is Moving Toward Local-First Control Planes

OpenClaw is a useful reference for a different trend: AI assistants that are not “one web page with one chat box,” but a control plane that stays running and connects many surfaces.

From the official OpenClaw repo and docs: - one long-lived local Gateway owns channels, sessions, tools, and events - the assistant can operate across WhatsApp, Telegram, Slack, Discord, WebChat, and other channels - it supports voice, mobile nodes, live canvas UI, and multi-agent routing - it treats inbound messages as untrusted input and documents a concrete security model

This shows where application design is going: - persistent assistants instead of one-off prompts - multi-channel delivery instead of one frontend - local or operator-controlled infrastructure instead of only cloud-hosted chat - agents as routed services with isolated workspaces and memory

Why this matters for hardware: - always-on assistants create steady inference demand, not just bursty usage - multimodal assistants increase pressure on device memory, streaming, and local inference paths - local-first designs make edge hardware, Jetson-class devices, mobile nodes, and hybrid cloud/edge deployment more relevant

3. MCP, Plugins, and Hooks Are Becoming the Real Application Surface

Another major trend is that the model alone is no longer the full product.

Modern AI systems are increasingly defined by: - MCP connectors to external systems - plugins for reusable workflows - hooks and automations around model actions - skills and custom agents for domain-specific behavior

Claude Code’s docs explicitly position MCP, plugins, skills, hooks, monitors, and custom agents as first-class extension surfaces. OpenClaw similarly treats tools, plugins, channels, nodes, and gateway protocols as part of the product architecture.

The implication is important: the application layer is becoming a tool-and-protocol ecosystem, not just a prompt template.

4. Agent SDKs Are Becoming Runtime Layers, Not Just API Wrappers

Modern agent SDKs now sit above raw model calls. They increasingly manage: - agent loops - tool dispatch - handoffs between specialists - sessions and state - guardrails and human review - tracing and evaluation - MCP server integration

This matters because production agent systems need more than a messages.create() call. They need a repeatable runtime contract: what tools are available, which identity executes them, which actions require approval, where state is stored, and what gets logged.

The practical design rule for this course is:

Provider API details belong in adapters.
Product behavior belongs in your runtime contract.
Security decisions belong outside the LLM.

This is why Lecture 07 teaches SDKs and runtime APIs as a general layer instead of treating one vendor SDK as the architecture.

5. Multi-Agent Structure Is Becoming Practical, Not Theoretical

The industry has moved beyond “one model, one prompt, one answer.”

Current systems increasingly use: - a lead agent plus worker agents - isolated workspaces per task or user - explicit routing rules - background monitors and event-driven triggers

Claude Code exposes multiple-agent workflows and custom agents. OpenClaw exposes multi-agent routing with isolated workspaces, session stores, and bindings from inbound channels to specific agents.

This is a practical trend because it maps well to real products: - support workflows - developer workflows - personal assistant workflows - project automation and governance workflows

6. Security and Permission Boundaries Are Now Core Product Features

This is one of the biggest changes from early LLM apps.

Current AI applications increasingly ship with: - pairing and allowlists - tool permission boundaries - gateway auth and signed connections - sandboxing - safe output policies - moderation and prompt-injection defenses

OpenClaw’s docs emphasize pairing approval, DM safety, sandbox modes, and gateway auth. Claude Code’s plugin system and tool model emphasize explicit structure, scoped extensions, and operational safety. GitHub’s agentic workflow material also frames security, permissions, and isolated execution as central rather than optional.

This means “application architecture” now includes trust boundaries, not just prompts and UI.

7. What To Learn From These Examples

Do not study OpenClaw and Claude Code just because they are popular. Study them because they represent two high-signal application patterns:

  • Claude Code pattern: AI embedded into developer workflows, repositories, CI, tools, and review loops
  • OpenClaw pattern: AI embedded into messaging, voice, mobile nodes, control planes, and personal automation

Together they show that the current application frontier is: - agentic - tool-connected - persistent - permissioned - multi-surface - operationally observable

For this roadmap, the important takeaway is that Track B should teach the workloads and architectures that real AI products are converging toward, because those products are what downstream systems and hardware will ultimately serve.


1. LLM Fundamentals for Engineers

  • Transformer architecture: attention mechanism, KV-cache, positional encoding
  • Tokenization: BPE, SentencePiece, vocabulary size impact on embedding layer
  • Inference mechanics: prefill (compute-bound) vs decode (memory-bound), autoregressive generation
  • Scaling laws: parameter count vs dataset size vs compute budget

2. Agentic AI

  • What agents are: LLM + tools + memory + planning loop
  • Agent runtimes and frameworks: raw provider APIs, OpenAI Agents SDK, LangGraph, MCP servers, OpenClaw-style gateways, CrewAI/AutoGen for experiments
  • Tool use: function calling, API integration, code execution
  • Security boundaries: prompt injection, tool abuse, least-privilege credentials, human approval for risky actions
  • Memory: conversation history, vector store retrieval, working memory
  • Planning: chain-of-thought, ReAct, tree-of-thought, self-reflection
  • Multi-agent systems: task decomposition, agent collaboration, orchestration

Projects: 1. Build an agent that uses tools (web search, calculator, code execution) to answer complex questions. 2. Build a multi-step research agent: given a topic, search → synthesize → write a report.


3. RAG (Retrieval-Augmented Generation)

  • Architecture: document ingestion → chunking → embedding → vector store → retrieval → generation
  • Embedding models: sentence-transformers, OpenAI embeddings, Cohere
  • Vector stores: FAISS, Chroma, Pinecone, Weaviate, Milvus
  • Chunking strategies: fixed-size, recursive, semantic, document-structure-aware
  • Retrieval: similarity search, hybrid (dense + sparse), re-ranking
  • RAG security: treat documents as untrusted input, screen uploads and retrieved chunks, defend against indirect prompt injection
  • Evaluation: faithfulness, relevance, hallucination detection

Projects: 1. Build a RAG pipeline over a technical documentation corpus. Evaluate retrieval quality. 2. Compare FAISS (CPU) vs FAISS (GPU) vs cuVS for vector search latency at 1M documents.


4. GenAI Product Development

  • Prompt engineering: system prompts, few-shot, chain-of-thought, output formatting
  • Fine-tuning: LoRA, QLoRA, full fine-tuning on domain data
  • Evaluation: automated metrics (ROUGE, BLEU), LLM-as-judge, human evaluation
  • Guardrails: input moderation, output filtering, content safety, prompt-injection defense, hallucination mitigation
  • Production deployment: API design, streaming, rate limiting, cost management
  • Runtime discipline: live telemetry, tool-call policy gates, least-privilege agent identities, audit trails, and runtime incident response
  • Deterministic startup: startup contracts, readiness checks, prompt/tool/policy versioning, memory hydration, and reproducible agent boot
  • Agent control planes: gateways, sessions, routing, and multi-surface delivery
  • Scheduled agent execution: cron jobs, isolated run sessions, delivery fallback, failure routing, retries, run logs, and retention
  • OpenClaw-style agent development: multi-agent isolation, workspace design, pairing, and operations for persistent local-first assistants

Projects: 1. Fine-tune a 7B model with QLoRA on a domain-specific dataset. Measure improvement vs base model. 2. Deploy a GenAI application with streaming, safety guardrails, and cost tracking. 3. Build a secure agent or RAG pipeline with prompt-attack detection, tool constraints, and output validation.


Resources

Resource What it covers
LangChain Documentation Agent and RAG framework
LangGraph Documentation Durable, stateful agent workflows, human-in-the-loop, memory, and tracing
OpenAI Agents SDK Agent loops, tools, handoffs, guardrails, sessions, tracing, and MCP integration
OpenAI API Agents Guide Current OpenAI guidance for code-first agent apps, tools, orchestration, and observability
Model Context Protocol Specification Standard protocol for tools, resources, prompts, hosts, clients, servers, and safety considerations
Claude Code Overview Agentic coding workflows, MCP, multi-agent use, and CI patterns
Claude Code Plugins Skills, agents, hooks, MCP servers, plugin structure, and distribution
Claude Code Repository Public implementation surface, examples, plugins, and project layout
Anthropic Cookbook Practical Claude API examples
OpenClaw Repository Local-first assistant architecture, channels, gateway model, and security defaults
OpenClaw Gateway Architecture Long-lived gateway, WS protocol, nodes, pairing, and remote access model
OpenClaw Features Multi-agent routing, media, channels, tools, apps, and provider support
GitHub Agentic Workflows Official GitHub framing for agentic CI/CD, permissions, and safe outputs
OWASP Top 10 for LLM Applications Prompt injection, insecure output handling, plugin/tool risk, excessive agency, and LLM app security
NIST AI RMF Generative AI Profile Governance and risk-management framing for generative AI systems
RAG best practices LlamaIndex documentation
Build a Large Language Model (From Scratch) (Raschka) LLM internals

Next

Module 4B — ML Engineering & MLOps