Module 3B — Agentic AI & GenAI¶

Parent: Phase 3 — Artificial Intelligence · Track B

Build applications on top of large language models — agents, RAG, tool use, GenAI products.

Prerequisites: Module 1 (Neural Networks), Module 2 (Frameworks — understand transformers and PyTorch).

Role targets: Agentic AI Engineer · GenAI Engineer · AI Engineer

Lecture Series¶

43 numbered lectures, one supplemental Lecture 24b, and 5 hands-on labs covering everything from LLM fundamentals to production multi-agent systems. → Start here

Course currency note: exact model names, context windows, SDK features, and pricing change quickly. This module teaches stable interfaces and system patterns: model APIs, tool protocols, runtime loops, workflow graphs, gateways, telemetry, and policy boundaries. Always check provider documentation before copying model IDs or prices into a real deployment.

#	Lecture	#	Lecture
01	LLM Fundamentals	07	Agent SDKs
02	Prompt Engineering	08	Multi-Agent Systems
03	Tool Use	09	RAG — Ingestion
04	Agent Architecture	10	RAG — Retrieval
05	Memory Systems	11	Evaluation
06	LangGraph	12	Production
		13	Runtime Discipline
		14	Deterministic Startup
		15	OpenClaw Gateway
		16	Routing and Sessions
		17	Isolation and Memory
		18	Operations and Security
		19	Agent Loop
		20	Cron and Scheduled Agent Runs
		21	System Prompt Architecture
		22	App SDK and Typed RPCs
		23	Gateway RPC Protocol
		24	Agent Harness
		24b	Event-Sourced Agent State
		25	OpenCoven Workspace
		26	OpenKnots Interfaces
		27	Agent Security Engineer
		28	Pi Minimal Agent
		29	Agent Skills
		30	Agentic SDLC
		31	Runtime Strategy
		32	LLM From Scratch
		33	Structured Tools
		34	Multimodal Sub-Agents
		35	GPU Kernel Translation
		36	FP8 KV-Cache
		37	TraceLens
		38	AutoSP
		39	Agent Skills Eval
		40	ZAYA1-8B
		41	OpenClaw Threat Model
		42	OpenAI Agents SDK
		43	MLSys Kernel Contest

Labs: Lab 01 — Research Agent · Lab 02 — Multi-Agent Pipeline · Lab 03 — Production RAG · Lab 04 — TokenJuice Output Compaction · Lab 05 — OpenMeow App SDK Dogfood

Why This Matters for AI Hardware¶

Agentic AI creates the inference demand that drives chip design: - Long-context attention (128K+ tokens) → L5: HBM bandwidth, memory hierarchy - Multi-turn tool calling → L3: low-latency kernel launch, stream scheduling - Batch inference serving → L2: TensorRT-LLM, in-flight batching optimization - RAG vector search → L1: cuVS/FAISS acceleration on GPU

Understanding these workloads helps L2 (compiler) and L5 (architecture) engineers design for real usage patterns.

Current AI Application Trends (2026)¶

The strongest application trend is that AI is moving from chat interfaces to persistent, tool-using systems that act inside real workflows.

Two useful reference patterns are: - agentic coding systems such as Claude Code - local-first personal assistant systems such as OpenClaw

These are worth studying because they show what modern AI applications actually look like in production-like usage, not just in demos.

1. Coding Agents Are Replacing Single-Step “Code Completion”¶

This is the clearest shift.

Tools like Claude Code are no longer limited to autocomplete in an editor. They act more like software workers: - read and navigate a repository - plan changes across multiple files - run tests and verification loops - create commits and pull requests - connect to external tools through MCP - load reusable skills, hooks, and plugins

Anthropic’s official docs describe this directly: Claude Code can automate routine engineering work, work with git, connect tools through MCP, spawn multiple agents, and run in CI/CD workflows. Its public repository and plugin system make it a good reference for how coding agents are becoming a real application category rather than a novelty.

Why this matters for hardware: - coding agents create long-running, tool-rich inference sessions instead of short chat turns - they increase demand for low-latency iteration loops, larger context windows, and higher background inference volume - they push AI products into developer infrastructure, where reliability, permissioning, and automation matter as much as model quality

2. Personal AI Is Moving Toward Local-First Control Planes¶

OpenClaw is a useful reference for a different trend: AI assistants that are not “one web page with one chat box,” but a control plane that stays running and connects many surfaces.

From the official OpenClaw repo and docs: - one long-lived local Gateway owns channels, sessions, tools, and events - the assistant can operate across WhatsApp, Telegram, Slack, Discord, WebChat, and other channels - it supports voice, mobile nodes, live canvas UI, and multi-agent routing - it treats inbound messages as untrusted input and documents a concrete security model

This shows where application design is going: - persistent assistants instead of one-off prompts - multi-channel delivery instead of one frontend - local or operator-controlled infrastructure instead of only cloud-hosted chat - agents as routed services with isolated workspaces and memory

Why this matters for hardware: - always-on assistants create steady inference demand, not just bursty usage - multimodal assistants increase pressure on device memory, streaming, and local inference paths - local-first designs make edge hardware, Jetson-class devices, mobile nodes, and hybrid cloud/edge deployment more relevant

3. MCP, Plugins, and Hooks Are Becoming the Real Application Surface¶

Another major trend is that the model alone is no longer the full product.

Modern AI systems are increasingly defined by: - MCP connectors to external systems - plugins for reusable workflows - hooks and automations around model actions - skills and custom agents for domain-specific behavior

Claude Code’s docs explicitly position MCP, plugins, skills, hooks, monitors, and custom agents as first-class extension surfaces. OpenClaw similarly treats tools, plugins, channels, nodes, and gateway protocols as part of the product architecture.

The implication is important: the application layer is becoming a tool-and-protocol ecosystem, not just a prompt template.

4. Agent SDKs Are Becoming Runtime Layers, Not Just API Wrappers¶

Modern agent SDKs now sit above raw model calls. They increasingly manage: - agent loops - tool dispatch - handoffs between specialists - sessions and state - guardrails and human review - tracing and evaluation - MCP server integration

This matters because production agent systems need more than a messages.create() call. They need a repeatable runtime contract: what tools are available, which identity executes them, which actions require approval, where state is stored, and what gets logged.

The practical design rule for this course is:

Provider API details belong in adapters.
Product behavior belongs in your runtime contract.
Security decisions belong outside the LLM.

This is why Lecture 07 teaches SDKs and runtime APIs as a general layer instead of treating one vendor SDK as the architecture.

5. Multi-Agent Structure Is Becoming Practical, Not Theoretical¶

The industry has moved beyond “one model, one prompt, one answer.”

Current systems increasingly use: - a lead agent plus worker agents - isolated workspaces per task or user - explicit routing rules - background monitors and event-driven triggers

Claude Code exposes multiple-agent workflows and custom agents. OpenClaw exposes multi-agent routing with isolated workspaces, session stores, and bindings from inbound channels to specific agents.

This is a practical trend because it maps well to real products: - support workflows - developer workflows - personal assistant workflows - project automation and governance workflows

6. Security and Permission Boundaries Are Now Core Product Features¶

This is one of the biggest changes from early LLM apps.

Current AI applications increasingly ship with: - pairing and allowlists - tool permission boundaries - gateway auth and signed connections - sandboxing - safe output policies - moderation and prompt-injection defenses

OpenClaw’s docs emphasize pairing approval, DM safety, sandbox modes, and gateway auth. Claude Code’s plugin system and tool model emphasize explicit structure, scoped extensions, and operational safety. GitHub’s agentic workflow material also frames security, permissions, and isolated execution as central rather than optional.

This means “application architecture” now includes trust boundaries, not just prompts and UI.

7. What To Learn From These Examples¶

Do not study OpenClaw and Claude Code just because they are popular. Study them because they represent two high-signal application patterns:

Claude Code pattern: AI embedded into developer workflows, repositories, CI, tools, and review loops
OpenClaw pattern: AI embedded into messaging, voice, mobile nodes, control planes, and personal automation

Together they show that the current application frontier is: - agentic - tool-connected - persistent - permissioned - multi-surface - operationally observable

For this roadmap, the important takeaway is that Track B should teach the workloads and architectures that real AI products are converging toward, because those products are what downstream systems and hardware will ultimately serve.

1. LLM Fundamentals for Engineers¶

Transformer architecture: attention mechanism, KV-cache, positional encoding
Tokenization: BPE, SentencePiece, vocabulary size impact on embedding layer
Inference mechanics: prefill (compute-bound) vs decode (memory-bound), autoregressive generation
Scaling laws: parameter count vs dataset size vs compute budget

2. Agentic AI¶

What agents are: LLM + tools + memory + planning loop
Agent runtimes and frameworks: raw provider APIs, OpenAI Agents SDK, LangGraph, MCP servers, OpenClaw-style gateways, CrewAI/AutoGen for experiments
Tool use: function calling, API integration, code execution
Security boundaries: prompt injection, tool abuse, least-privilege credentials, human approval for risky actions
Memory: conversation history, vector store retrieval, working memory
Planning: chain-of-thought, ReAct, tree-of-thought, self-reflection
Multi-agent systems: task decomposition, agent collaboration, orchestration

Projects: 1. Build an agent that uses tools (web search, calculator, code execution) to answer complex questions. 2. Build a multi-step research agent: given a topic, search → synthesize → write a report.

3. RAG (Retrieval-Augmented Generation)¶

Architecture: document ingestion → chunking → embedding → vector store → retrieval → generation
Embedding models: sentence-transformers, OpenAI embeddings, Cohere
Vector stores: FAISS, Chroma, Pinecone, Weaviate, Milvus
Chunking strategies: fixed-size, recursive, semantic, document-structure-aware
Retrieval: similarity search, hybrid (dense + sparse), re-ranking
RAG security: treat documents as untrusted input, screen uploads and retrieved chunks, defend against indirect prompt injection
Evaluation: faithfulness, relevance, hallucination detection

Projects: 1. Build a RAG pipeline over a technical documentation corpus. Evaluate retrieval quality. 2. Compare FAISS (CPU) vs FAISS (GPU) vs cuVS for vector search latency at 1M documents.

4. GenAI Product Development¶

Prompt engineering: system prompts, few-shot, chain-of-thought, output formatting
Fine-tuning: LoRA, QLoRA, full fine-tuning on domain data
Evaluation: automated metrics (ROUGE, BLEU), LLM-as-judge, human evaluation
Guardrails: input moderation, output filtering, content safety, prompt-injection defense, hallucination mitigation
Production deployment: API design, streaming, rate limiting, cost management
Runtime discipline: live telemetry, tool-call policy gates, least-privilege agent identities, audit trails, and runtime incident response
Deterministic startup: startup contracts, readiness checks, prompt/tool/policy versioning, memory hydration, and reproducible agent boot
Agent control planes: gateways, sessions, routing, and multi-surface delivery
Scheduled agent execution: cron jobs, isolated run sessions, delivery fallback, failure routing, retries, run logs, and retention
OpenClaw-style agent development: multi-agent isolation, workspace design, pairing, and operations for persistent local-first assistants

Projects: 1. Fine-tune a 7B model with QLoRA on a domain-specific dataset. Measure improvement vs base model. 2. Deploy a GenAI application with streaming, safety guardrails, and cost tracking. 3. Build a secure agent or RAG pipeline with prompt-attack detection, tool constraints, and output validation.

Resources¶

Resource	What it covers
LangChain Documentation	Agent and RAG framework
LangGraph Documentation	Durable, stateful agent workflows, human-in-the-loop, memory, and tracing
OpenAI Agents SDK	Agent loops, tools, handoffs, guardrails, sessions, tracing, and MCP integration
OpenAI API Agents Guide	Current OpenAI guidance for code-first agent apps, tools, orchestration, and observability
Model Context Protocol Specification	Standard protocol for tools, resources, prompts, hosts, clients, servers, and safety considerations
Claude Code Overview	Agentic coding workflows, MCP, multi-agent use, and CI patterns
Claude Code Plugins	Skills, agents, hooks, MCP servers, plugin structure, and distribution
Claude Code Repository	Public implementation surface, examples, plugins, and project layout
Anthropic Cookbook	Practical Claude API examples
OpenClaw Repository	Local-first assistant architecture, channels, gateway model, and security defaults
OpenClaw Gateway Architecture	Long-lived gateway, WS protocol, nodes, pairing, and remote access model
OpenClaw Features	Multi-agent routing, media, channels, tools, apps, and provider support
GitHub Agentic Workflows	Official GitHub framing for agentic CI/CD, permissions, and safe outputs
OWASP Top 10 for LLM Applications	Prompt injection, insecure output handling, plugin/tool risk, excessive agency, and LLM app security
NIST AI RMF Generative AI Profile	Governance and risk-management framing for generative AI systems
RAG best practices	LlamaIndex documentation
Build a Large Language Model (From Scratch) (Raschka)	LLM internals

Next¶

→ Module 4B — ML Engineering & MLOps