Lecture 1: Modern OS Architecture & the Linux Kernel¶
Overview¶
Every piece of AI hardware — a Jetson inference node, a camera pipeline server, or an autonomous vehicle compute platform — runs on top of an operating system that determines how code reaches hardware. The core challenge this lecture addresses is: how does the OS safely mediate between many competing software components and the physical hardware beneath them? The mental model to carry forward is that the OS is a trusted referee: it owns the hardware, enforces rules about who can touch what, and presents clean abstractions so application code never has to know register addresses. For an AI hardware engineer, understanding this architecture means you can trace a slow inference pipeline to a kernel subsystem, debug a crashing camera driver without corrupting the whole system, and make deliberate choices about which kernel version to target for a production deployment.
OS Definition: Three Roles¶
| Role | Meaning |
|---|---|
| Resource manager | Arbitrates CPU time, memory pages, I/O bandwidth, and network across competing processes |
| Abstraction layer | Presents uniform interfaces (files, sockets, virtual memory) over heterogeneous hardware |
| Protection boundary | Isolates processes from each other and from kernel data structures; enforced in hardware |
Without protection, a buggy camera driver corrupts kernel memory. Without abstraction, each application must know specific hardware register layouts.
Key Insight: These three roles are inseparable. Abstraction without protection would let any process break another's view of hardware. Protection without abstraction would force every developer to write device-specific code. All three together are what make Linux viable as a platform for complex AI systems with multiple concurrent workloads.
Privilege Levels¶
When code runs on a CPU, the hardware enforces who is allowed to do what. This is the foundation of the OS protection boundary. Think of it as the CPU acting as a security guard: code in user space asks for services, the OS grants or denies them, and the hardware makes it impossible to bypass the check.
x86 Rings¶
| Ring | Mode | Access | Example occupants |
|---|---|---|---|
| Ring 0 | Kernel | All instructions, I/O ports, MSRs, CR0/CR3 | Linux kernel, device drivers |
| Ring 1–2 | Unused | — | Historical OS/2; unused by Linux |
| Ring 3 | User | Restricted; privileged instruction → GP Fault | Applications, runtimes, Python |
Mode switch: SYSCALL instruction → Ring 0; SYSRET → Ring 3.
ARM Exception Levels (AArch64)¶
| EL | Name | Purpose |
|---|---|---|
| EL0 | User | Applications, TensorRT, ONNX Runtime, ROS2 nodes |
| EL1 | Kernel | Linux kernel, exception handlers, MMU configuration |
| EL2 | Hypervisor | KVM, Xen; controls VM-to-VM memory partitioning |
| EL3 | Secure Monitor | ARM TrustZone, PSCI power management, BCT/BL31 signing |
The two-level EL0/EL1 split is sufficient for most deployments. EL2 adds virtualization overhead; EL3 handles secure world and is always present on Cortex-A platforms. On Jetson Orin (Cortex-A78AE), the Linux kernel runs at EL1 and NVIDIA MB1/SC7 firmware occupies EL3.
AArch64 Exception Level Stack
┌─────────────────────────────────────┐
│ EL3 — Secure Monitor (TrustZone) │ ← NVIDIA MB1/SC7 firmware
│ ARM TrustZone, PSCI, OTP key mgmt │
├─────────────────────────────────────┤
│ EL2 — Hypervisor (optional) │ ← KVM, Xen
│ VM-to-VM memory partitioning │
├─────────────────────────────────────┤
│ EL1 — Kernel │ ← Linux kernel
│ MMU, exception handlers, drivers │
├─────────────────────────────────────┤
│ EL0 — User │ ← TensorRT, ROS2, Python
│ Restricted; must SVC to reach EL1 │
└─────────────────────────────────────┘
Key Insight: On a Jetson, your TensorRT inference code runs at EL0 and cannot directly touch the NVDLA hardware registers. It must pass through EL1 (the Linux kernel) via system calls and ioctls. This indirection is what makes it safe to run multiple inference workloads concurrently — the kernel serializes access and prevents one process from corrupting another's DMA buffers.
Linux Kernel Architecture¶
Linux is a monolithic kernel with loadable modules: all core subsystems share a single address space at Ring 0 / EL1 with no IPC overhead between them. Drivers and filesystems compile as .ko modules inserted at runtime without rebuilding the kernel.
The monolithic design delivers fast in-kernel calls at the cost of shared fate — a crashing driver can panic the whole system. Contrast with microkernels (QNX, seL4) where drivers are separate processes; slower IPC, but crash isolation.
Linux Kernel Internal Architecture (Monolithic)
┌──────────────────────────────────────────────────────────┐
│ Ring 0 / EL1 (Kernel Space) │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌─────────┐ │
│ │Scheduler │ │ Memory │ │ VFS │ │ Net │ │
│ │ (CFS/ │ │ Manager │ │ (ext4, │ │ (TCP/IP,│ │
│ │ EEVDF) │ │ (mm/) │ │ procfs) │ │ XDP) │ │
│ └──────────┘ └──────────┘ └──────────┘ └─────────┘ │
│ │
│ ┌──────────────────────────────────────────────────┐ │
│ │ Device Drivers (drivers/) ~60% of kernel LOC │ │
│ │ GPU / DRM │ V4L2 Camera │ NVMe │ PCIe │ GPIO │ │
│ └──────────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────┐ │
│ │ arch/ (arm64, x86): MMU, entry.S, IRQ setup │ │
│ └──────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────┘
↑ syscall / interrupt boundary ↑
┌──────────────────────────────────────────────────────────┐
│ Ring 3 / EL0 (User Space) │
│ TensorRT │ camerad │ modeld │ Python │ ROS2 │
└──────────────────────────────────────────────────────────┘
Major Subsystems¶
| Directory | Subsystem | Responsibility |
|---|---|---|
kernel/ |
Scheduler, signals, timers | CFS/EEVDF, workqueues, kthreads |
mm/ |
Memory manager | Page allocator, slab/slub, OOM killer, mmap |
drivers/gpu/drm/ |
DRM / GPU | Display, render, GEM/PRIME buffer management |
drivers/media/ |
V4L2 / Media | Camera ISP, video capture pipeline |
drivers/char/, drivers/block/ |
Char / Block | TTY, GPIO, NVMe, MMC |
fs/ |
VFS | ext4, tmpfs, overlayfs, procfs, sysfs |
net/ |
Networking | TCP/IP, netfilter, XDP, RDMA |
arch/ |
CPU-specific | x86, arm64 — exception tables, syscall entry, MMU |
include/ |
Headers | Shared kernel-wide type definitions |
Documentation/ |
Docs | ABI contracts, Device Tree bindings, admin guides |
Common Pitfall: When a camera driver crashes on Jetson, it can kernel-panic the entire system because all drivers share Ring 0 space. The fix is to test drivers with IOMMU protection enabled (
iommu=onin the kernel command line) so that a misbehaving DMA operation hits a fault rather than scribbling over kernel memory.
Now that we understand how the kernel is organized as a monolithic system, let's look at how the kernel manages versioning — and why AI platforms are conservative about which version they run.
Kernel Versioning¶
Format: major.minor.patch — e.g., 6.1.57
| Track | Lifespan | Purpose | Example |
|---|---|---|---|
| Mainline | ~9 weeks per release | New features merge here | 6.13, 6.14 |
| Stable | ~3 months | Fixes only after release | 6.13.y |
| LTS | 2–6 years | Security and critical fixes only | 5.10, 5.15, 6.1, 6.6 |
Why AI Platforms Pin to LTS¶
Board Support Packages couple tightly to a specific kernel ABI. Upgrading to mainline breaks downstream out-of-tree modules and vendor driver patches.
| Platform | Kernel base | Notable additions |
|---|---|---|
| Jetson L4T 35.x (JetPack 5) | 5.10 LTS | NVDLA, Argus ISP/VI/CSI, NvMedia, Tegra PCIe |
| Jetson L4T 36.x (JetPack 6) | 6.1 LTS | Orin NvDLA v2, Tegra ISP v5, PCIe Gen4 |
| openpilot Agnos (comma 3X) | Ubuntu LTS-based | Snapdragon camera ISPs, CAN-over-SPI, openpilot services |
| Yocto Kirkstone embedded AI | 5.15 LTS | Stripped BSP; FPGA/NPU out-of-tree modules |
| Yocto Scarthgap embedded AI | 6.6 LTS | EEVDF scheduler; latest nftables, XDP |
Key Insight: The reason NVIDIA ships L4T 36.x based on Linux 6.1 rather than a newer kernel is that the NVDLA v2 driver, Tegra ISP v5, and PCIe Gen4 support all live in out-of-tree patches. Rebasing to a newer mainline kernel would require re-porting and re-validating hundreds of thousands of lines of vendor driver code. LTS stability is a practical engineering constraint, not laziness.
With the kernel version context established, let's look at the two most important runtime inspection interfaces the kernel exposes: /proc and /sys.
/proc Virtual Filesystem¶
/proc is a kernel interface that looks like a filesystem. Files have no disk representation; reads invoke kernel functions that format data on demand. Think of it as the kernel's "dashboard" — you read from it the same way you read a file, but what you get back is a live snapshot of kernel state.
| Path | Content |
|---|---|
/proc/cpuinfo |
Per-CPU: model, frequency, flags (avx512, neon, crypto) |
/proc/meminfo |
MemTotal, MemFree, Buffers, Cached, HugePages |
/proc/interrupts |
Per-CPU interrupt counts per IRQ line and name |
/proc/cmdline |
Kernel boot parameters passed by bootloader |
/proc/[pid]/maps |
VMA layout: address range, permissions, backing file |
/proc/[pid]/status |
State, VmRSS, threads, capability sets |
/proc/[pid]/fd/ |
Symlinks to all open file descriptors |
/proc/[pid]/wchan |
Kernel function where process is currently sleeping |
Common Pitfall:
/proc/[pid]/mapsshows virtual address ranges, not physical ones. Two processes can both show a mapping at0x7fff0000and those are completely independent — different pages in physical RAM. Confusing virtual and physical addresses is a frequent source of debugging errors in DMA and GPU buffer management.
/sys (sysfs)¶
sysfs exports kernel objects — devices, drivers, buses — as a directory tree. Structure mirrors the kernel object model rather than process hierarchy. While /proc is about processes and kernel state, /sys is about devices and the hardware model.
| Path | Purpose |
|---|---|
/sys/class/net/eth0/ |
Interface attributes: speed, mtu, carrier state |
/sys/bus/pci/devices/ |
PCI devices; vendor/device IDs, resource (BAR) files |
/sys/class/thermal/thermal_zone*/temp |
Zone temperature in millidegrees Celsius |
/sys/class/gpio/ |
GPIO pin export, direction, and value control |
/sys/fs/cgroup/ |
cgroup v2 hierarchy; resource controllers |
/sys/kernel/debug/ |
Debugfs: DVFS state, tracing, GPU activity monitors |
/sys/firmware/devicetree/base/ |
Live Device Tree from running kernel |
On Jetson, /sys/class/thermal/ exposes CPU, GPU, and SoC thermal zones. Polling during inference detects throttling before it causes latency spikes.
Key Insight:
/sys/class/thermal/thermal_zoneN/tempis not just a monitoring curiosity — it is a direct signal of whether your inference workload is being throttled by the kernel's thermal governor. If a GPU zone exceeds its trip point, the kernel reduces clock frequency without warning your application. An inference benchmark that ignores thermal readings is not measuring real-world performance.
Kernel Modules¶
Modules run at Ring 0 and have full kernel access. They provide the standard integration path for device drivers without rebuilding the kernel.
insmod my_driver.ko # load from file; no dependency resolution
rmmod my_driver # unload by name
modprobe nvidia # load with dependency resolution (reads modules.dep)
lsmod # list loaded modules and usage count
modinfo nvme # show parameters, license, firmware requirements
Module Lifecycle in C¶
static int __init my_init(void) {
pr_info("my_driver: loaded\n");
return 0; /* non-zero = load failure; kernel will not insert the module */
}
static void __exit my_exit(void) {
pr_info("my_driver: unloaded\n");
/* release all resources here: IRQs, DMA buffers, device nodes */
}
module_init(my_init);
module_exit(my_exit);
MODULE_LICENSE("GPL");
MODULE_DEVICE_TABLE(of, my_of_match); /* enables udev auto-load on DT match */
__init lets the kernel discard initialization code after boot, reclaiming memory. MODULE_DEVICE_TABLE generates a modules.alias entry so modprobe can auto-load when the Device Tree node appears.
The module lifecycle follows a strict sequence when a device is discovered:
- DT match: kernel finds a Device Tree node whose
compatiblestring matchesmy_of_match. - udev triggers:
modules.aliasmaps thecompatiblestring to the module name; udev callsmodprobe. - Module loads: kernel verifies signature (if
CONFIG_MODULE_SIG_FORCE), maps.kointo Ring 0 address space. probe()runs: the driver's probe function claims hardware resources, registers device nodes.- Device ready: application code can now open
/dev/my_accel0and issue ioctls.
DKMS (Dynamic Kernel Module Support) rebuilds out-of-tree modules when the kernel is updated — used by the NVIDIA GPU driver on development hosts and custom FPGA PCIe drivers.
Common Pitfall: Loading a module compiled against a different kernel version fails with
version magic mismatch. Always compile modules against the exact kernel headers of the running kernel (uname -r). DKMS automates this rebuild, but manual module builds break silently after aapt upgradethat bumps the kernel version.
Kernel Source Layout¶
| Directory | Contents |
|---|---|
kernel/ |
Core scheduler, signals, timers, printk, kprobes |
mm/ |
Buddy allocator, slab/slub, vmalloc, OOM killer |
drivers/ |
All device drivers — approximately 60% of kernel source lines |
arch/arm64/, arch/x86/ |
Platform entry (head.S, entry.S), IRQ setup, NUMA |
fs/ |
Filesystems: ext4, xfs, tmpfs, procfs, overlayfs |
include/ |
Kernel headers; include/linux/ for cross-arch types |
net/ |
TCP, UDP, netfilter, socket layer, XDP |
Documentation/ |
Documentation/ABI/ defines stable sysfs interfaces |
Linux on AI Platforms¶
Understanding kernel architecture pays off when you navigate vendor-specific kernel trees for real hardware. Each platform below has custom subsystems that only exist because of the module and subsystem architecture described above.
Jetson L4T¶
NVIDIA's Linux for Tegra is a downstream LTS fork with patches for NVDLA, VIC (Video Image Compositor), ISP (Image Signal Processor), NvMedia, and Tegra PCIe IOMMU. L4T 35.x is based on 5.10; L4T 36.x rebased to 6.1. These patches are absent from mainline; they live in drivers/gpu/, drivers/media/, and arch/arm64/ of the L4T tree.
openpilot Agnos¶
Comma's AGNOS is a forked and custom-modified Linux that runs on comma 3X and comma four (Snapdragon-based). It is built for one practical use case: running openpilot on the road. The kernel (agnos-kernel-sdm845) is forked from Linux and custom developed with patches for road-facing and driver-monitoring camera ISPs, CAN-over-SPI, and power management for always-on vehicle operation. The OS image is produced by agnos-builder (boot chain, device tree, userspace). modeld, camerad, and controlsd depend on stable V4L2 and SocketCAN ABIs; comma pins the kernel so upstream changes do not break camera ISP register maps or driver interfaces. To see how each OS lecture topic maps to what was changed in the AGNOS fork for openpilot on the road, see the AGNOS Guide — OS Lectures ↔ Development Changes in Phase 5 (Autonomous Driving specialization).
Yocto for Custom Boards¶
Production inference nodes use Yocto to build minimal, reproducible kernel + rootfs images. Only required drivers are compiled; attack surface and boot time are minimized for safety-critical deployment.
Summary¶
| Component | Location in kernel tree | Purpose |
|---|---|---|
| Process scheduler | kernel/sched/ |
CPU time allocation across tasks |
| Memory manager | mm/ |
Virtual memory, page and slab allocators |
| VFS | fs/ |
Uniform file API over all filesystem types |
| Device drivers | drivers/ |
Hardware abstraction and resource management |
| Architecture code | arch/arm64/, arch/x86/ |
Platform entry, MMU, exception tables |
| Network stack | net/ |
Protocols, socket layer, XDP |
| IPC | kernel/ (futex, signal), ipc/ |
Inter-process communication primitives |
Conceptual Review¶
- Why is Linux monolithic rather than microkernel? In-kernel calls between subsystems (scheduler → memory manager → driver) have zero IPC cost. A monolithic design delivers throughput required for high-bandwidth GPU and camera workloads; the tradeoff is that a driver bug can crash the whole system.
- Why do AI platforms pin to LTS kernels? Board Support Packages include hundreds of out-of-tree driver patches (NVDLA, ISP, PCIe IOMMU). These patches cannot be instantly rebased to a newer mainline kernel. LTS gives 2–6 years of security fixes without requiring a full BSP re-port.
- What is
/procand why does it look like a filesystem?/procis a virtual filesystem backed entirely by kernel functions, not disk storage. Reads trigger kernel code that formats live data. The filesystem metaphor means existing tools (cat,grep, shell scripts) can query kernel state without special APIs. - What is a kernel module, and why would a driver be one? A module is a
.kofile that loads into Ring 0 at runtime without rebuilding the kernel. Device drivers are modules so new hardware can be supported without a full kernel rebuild and reboot. - What does
MODULE_DEVICE_TABLEdo? It generates an entry inmodules.aliasthat maps a Device Treecompatiblestring (or PCI vendor/device ID) to a module name, enabling udev to automaticallymodprobethe correct driver when hardware is detected. - What is sysfs (
/sys) and how does it differ from/proc? sysfs organizes kernel objects (devices, buses, drivers) in a hierarchy mirroring the kernel object model./procis process-oriented and kernel-state-oriented. Both are virtual filesystems with no disk backing.
AI Hardware Connection¶
- L4T downstream patches add NVDLA, VIC, and ISP drivers absent from mainline; production Jetson deployments rely on these for hardware-accelerated inference and camera pipelines — understanding the kernel source layout locates them immediately when debugging initialization failures.
/sys/class/thermal/thermal_zoneN/tempis the primary interface for detecting GPU and CPU throttling on Jetson; inference benchmarks should poll this to correlate latency spikes with temperature events.- Custom FPGA PCIe accelerators require out-of-tree
.komodules compiled against the target LTS kernel; DKMS manages rebuilds across point releases automatically. - openpilot Agnos pins its kernel to maintain camera ISP register map compatibility with comma 3X hardware; upstream kernel changes would break the V4L2 subdevice interface.
/proc/interruptsis the first diagnostic for interrupt storm conditions in high-framerate camera pipelines — per-CPU counts per IRQ line reveal unbalanced distribution before it becomes a throughput problem.- The monolithic architecture means an FPGA DMA driver crash kernel-panics the entire system; production AI hardware deployments invest in IOMMU protection and driver fault injection testing to contain faults before deployment.