Computer Architecture Course - Complete Overview¶
This track is Phase 1, section 2 (Computer Architecture and Hardware). For the section hub (including the hardware-platforms survey), see Guide.md.
This comprehensive computer architecture course is designed for engineers building AI accelerators, embedded systems, and custom hardware. It provides deep technical understanding of how modern CPUs work, from instruction set design through microarchitecture implementation.
Course Structure¶
The course is divided into three comprehensive guides that work together as a complete curriculum:
1. Architecture_Guide.md — Core Theory & Principles¶
The foundational material covering: - Instruction Set Architecture (ISA) fundamentals — design philosophy, encoding, instruction formats - CPU Design & Microarchitecture — from single-cycle to pipelined execution - Pipelining — stages, hazards (data, control, structural), forwarding, stalling - Superscalar Architecture — executing multiple instructions per cycle, ILP extraction - Memory Hierarchy & Caching — cache organization, policies, coherence (MESI protocol) - Branch Prediction & Speculation — prediction mechanisms, misprediction recovery - Out-of-Order Execution — instruction windows, reservation stations, renaming - Multi-Core & Cache Coherence — scaling challenges, coherency protocols - Real-World ISA Case Studies — ARM64, x86-64, RISC-V comparison - Advanced Topics — speculative execution security, SIMD, hardware security
Best for: Understanding the "why" behind CPU design decisions, theoretical foundations.
2. Labs_and_ProjectsGuide.md — Hands-On Implementation¶
Detailed laboratory exercises to validate concepts: - Lab 1: Single-Cycle CPU in Verilog — Build a minimal CPU; understand datapath - Lab 2: 5-Stage Pipeline with Forwarding — Add pipelining; resolve hazards - Lab 3: Branch Prediction Simulator — Implement & test predictors on real benchmarks - Lab 4: Cache Performance Analysis — Measure & optimize cache behavior - Lab 5: Multi-Core Coherence Simulator — Simulate MESI protocol - Lab 6: ISA Comparison Project — Compile code for x86/ARM/RISC-V; analyze - Lab 7: CPU Microarchitecture Reverse Engineering — Deduce real CPU design via benchmarking - Capstone: Out-of-Order CPU Simulator — Full OoO simulation; IPC measurement
Best for: Building intuition through building (Verilog, C++, Python simulations).
3. Advanced_Topics_and_CaseStudies.md — Real-World Details¶
Deep product analysis and advanced topics: - ARM64 Deep Dive — Registers, instruction encoding, calling convention, ARMv8 vs ARMv9 - x86-64 Deep Dive — Variable-length encoding, instruction set, System V ABI - RISC-V Deep Dive — Modular ISA design, extensions, calling convention - Case Study 1: Apple M4 — Unified memory, P/E core hybrid, performance specs - Case Study 2: AMD Ryzen 9 9950X — Zen 5 cores, cache hierarchy, multi-core scaling - Case Study 3: Qualcomm Snapdragon X — ARM-based PC, NPU integration, Windows ARM ecosystem - Speculative Execution Security — Spectre/Meltdown, mitigations (LFENCE, IBPB), side-channel attacks - Virtual Memory & TLB — Translation lookaside buffers, TLB misses, large pages - Memory Bandwidth Optimization — Spatial/temporal locality, prefetching, tiling - Advanced Tools & Profiling — Roofline model, Amdahl's law, bottleneck analysis
Best for: Connecting theory to commercial products, security concerns, performance optimization.
Recommended Learning Path¶
For Complete Beginners (12-16 weeks)¶
Weeks 1-2: Fundamentals - Read Architecture_Guide § 1-2 (ISA, CPU Design) - Complete Lab 1 (Single-cycle CPU)
Weeks 3-4: Pipelining - Read Architecture_Guide § 3 (Pipelining) - Complete Lab 2 (5-stage pipeline with forwarding)
Weeks 5-6: Branch Prediction - Read Architecture_Guide § 7 (Branch Prediction) - Complete Lab 3 (Branch predictor simulation)
Weeks 7-8: Caching - Read Architecture_Guide § 5 (Memory Hierarchy) - Complete Lab 4 (Cache performance analysis)
Weeks 9-10: Multi-core & Advanced Execution - Read Architecture_Guide § 6, 8 (OoO, multi-core) - Complete Lab 5 (Cache coherence simulator)
Weeks 11-12: Real Hardware - Read Advanced_Topics_and_CaseStudies (ISA deep dives, case studies) - Complete Labs 6-7 (ISA comparison, reverse engineering)
Weeks 13+: Capstone - Complete Lab 8 (OoO CPU simulator) in parallel with above - Begin Phase 2 (Embedded Systems)
For Experienced Programmers (6-8 weeks)¶
Week 1: Accelerated Fundamentals - Skim Architecture_Guide § 1-3 (ISA, pipeline concepts) - Start Lab 1 (single-cycle CPU) in parallel
Weeks 2-3: Deep Dive - Read Architecture_Guide § 4-8 (branch prediction, caching, OoO, multi-core) - Complete Labs 2-4 (pipeline, prediction, cache)
Weeks 4-5: Real Hardware - Read Advanced_Topics_and_CaseStudies (ISA dives, case studies, security) - Complete Labs 6-7 (ISA analysis, reverse engineering)
Weeks 6+: Capstone - Lab 8 (OoO simulator) + begin Phase 2
For Hardware Design Background (4-6 weeks)¶
Week 1: ISA & Design Philosophy - Read Architecture_Guide § 1, 9 - Read Advanced_Topics_and_CaseStudies (ISA deep dives)
Weeks 2-4: Advanced Topics - Read Architecture_Guide § 4-8 (speculative exec, OoO, coherence) - Skim through Labs 2-4 for context - Focus on Lab 8 (OoO CPU simulator)
Weeks 5+: Specialization - Use course as reference for Phase 2/3 (Embedded Systems, FPGA design)
Key Concepts Checklist¶
By the end of this course, you should understand:
ISA & Hardware Interface¶
- Difference between ISA and microarchitecture
- Why x86 is CISC, ARM is RISC, and what the trade-offs are
- How instructions are encoded (fixed vs. variable length)
- Register usage and calling conventions
- Memory addressing modes
CPU Pipeline & Hazards¶
- How pipelining increases throughput but not latency
- Three types of hazards: data, control, structural
- How forwarding resolves most data hazards
- Cost of branch misprediction
- Speculative execution and why it's needed
Performance Metrics¶
- IPC (Instructions Per Cycle) and what limits it
- How to use the Roofline model to identify compute vs. memory bottlenecks
- Amdahl's Law and parallelization limits
- Cache hit/miss rates and how to optimize for locality
Advanced Execution¶
- How out-of-order execution masks memory stalls
- Register renaming and physical vs. architectural registers
- Instruction window size and its impact on ILP
- Difference between speculative and architectural state
Memory Hierarchy¶
- Cache organization: direct-mapped, set-associative, fully-associative
- Cache replacement policies (LRU, random)
- MESI cache coherence protocol
- TLB and virtual memory
- DRAM latency vs. row buffer locality
Multi-Core Scaling¶
- Cache coherence challenges and solutions
- NUMA latency and proper thread/memory binding
- Scaling limits (Amdahl's Law, synchronization overhead)
- Unified memory (Apple Silicon) vs. discrete memory (x86/ARM servers)
Real Hardware¶
- ARM64 ISA and microarchitecture (Cortex-A, Apple Silicon)
- x86-64 System V ABI and performance characteristics
- RISC-V modular ISA design
- How to reverse-engineer CPU microarchitecture via benchmarking
Tools & Resources¶
Essential Tools¶
Simulation & Design: - Verilog/VHDL: ModelSim, Icarus Verilog, Vivado (free for learning) - CPU Simulators: Gem5 (full-system), SimpleScalar (simpler), Cachegrind (cache analysis) - Python: NumPy, Matplotlib (for analysis)
Profiling & Analysis:
- perf (Linux CPU profiling, branch tracing)
- valgrind --tool=cachegrind (cache miss analysis)
- taskset (CPU affinity for reproducible results)
- PAPI (Performance API for low-level counters)
Compilers & Cross-Compilation:
- GCC/Clang with ARM/RISC-V support: arm-linux-gnueabihf-gcc, riscv64-unknown-linux-gnu-gcc
- LLVM/Clang for IR analysis
- Objdump for disassembly analysis
Recommended Textbooks¶
- "Computer Architecture: A Quantitative Approach" (Hennessy & Patterson) — Gold standard; covers everything at depth
- "Computer Organization and Design" (Patterson & Hennessy, ARM Edition) — More accessible version
- "Modern Processor Design" (Shen & Lipasti) — Superscalar & OoO in detail
- "Structured Computer Organization" (Tanenbaum) — Layered approach, good for big picture
Online References¶
- ARM Architecture Reference Manual (official ISA spec)
- RISC-V ISA Specification (open-source ISA)
- x86-64 System V ABI (calling conventions)
- Wikichip / Wikichip Fuse (community CPU database)
- AnandTech (CPU reviews with architectural analysis)
- Chips & Cheese (microarchitecture reverse-engineering)
- ServeTheHome (server hardware analysis)
Success Criteria¶
You've mastered this course when you can:
- Explain the CPU pipeline to someone with only digital logic background; describe pipelining tradeoffs
- Identify bottlenecks in code: determine if memory-bound or compute-bound using Roofline model
- Write processor simulators in high-level languages (Python/C++) that execute real code correctly
- Reverse-engineer microarchitecture of real CPUs via benchmarking (cache sizes, branch predictor type, TLB behavior)
- Optimize algorithms for cache locality: design tiled matrix multiply, understand prefetching
- Analyze real ISAs (ARM64, x86-64, RISC-V): read specs, compile code, interpret assembly
- Design simple CPUs in Verilog: single-cycle, pipelined, with forwarding
- Understand security implications of speculative execution: explain Spectre, describe mitigations
Next Steps After Course¶
Once you complete this course, you're ready for:
Phase 2: Embedded Systems & Operating Systems¶
- Understand ARM Cortex-M real-time processors
- Learn embedded OS concepts (FreeRTOS, Zephyr)
- Implement device drivers and hardware abstraction layers
- Build IoT applications on real embedded hardware
Phase 3: FPGA Design (Xilinx)¶
- Implement custom CPU cores on FPGAs
- Understand hardware design workflows (Verilog → synthesis → P&R)
- Design AI accelerators (systolic arrays, tensor engines)
- Learn about memory controllers and interconnects
Phase 4: GPU & AI Accelerator Design¶
- GPU architecture (NVIDIA, AMD): warps, blocks, memory hierarchy
- Tensor operations and systolic arrays
- Compiler optimizations for accelerators
- ML model optimization on edge hardware
Course Maintenance & Tips¶
Study Tips¶
-
Theory Before Labs: Read the guide sections before starting corresponding labs. Theory provides context; labs validate understanding.
-
Hands-On First for Labs: Don't read lab solutions; work through the problem. Struggling builds intuition.
-
Use Benchmarking Tools: Valgrind,
perfare powerful. Spend time understanding their output. -
Real Hardware > Simulation: When possible, profile real code on real CPUs (your laptop counts!).
-
Connect to AI Hardware: Every concept (pipelining, caching, multi-core) applies directly to GPU design (Phase 4).
Extending the Course¶
- Add newer ISAs: RISC-V V (vector), ARM SVE, custom architectures (Google Tensor, Apple Neural Engine)
- Security focus: Add Spectre/Meltdown labs, side-channel exploits, CET, TrustZone
- AI accelerators: Add systolic array design, dataflow optimization, memory hierarchies for AI
- Formal verification: Use formal tools to verify correct pipeline behavior
Credits & Attribution¶
This course synthesizes: - Classic computer architecture textbooks (Hennessy & Patterson, Tanenbaum) - Modern processor analysis (WikiChip, Chips & Cheese, AnandTech) - AI accelerator design principles (from Phase 4 scope) - Real benchmark data from CES 2026 announcements - Labs inspired by EECS courses (UC Berkeley, MIT, Stanford)
Contact & Questions¶
For questions on course content: - Refer to official ISA specifications (ARM, x86, RISC-V) - Check Wikichip for microarchitecture details - Run your own benchmarks to validate theory - Examine source code of simulators (Gem5, SimpleScalar) for implementation details