Lecture 1: Modern OS Architecture & the Linux Kernel¶

Overview¶

Every piece of AI hardware — a Jetson inference node, a camera pipeline server, or an autonomous vehicle compute platform — runs on top of an operating system that determines how code reaches hardware. The core challenge this lecture addresses is: how does the OS safely mediate between many competing software components and the physical hardware beneath them? The mental model to carry forward is that the OS is a trusted referee: it owns the hardware, enforces rules about who can touch what, and presents clean abstractions so application code never has to know register addresses. For an AI hardware engineer, understanding this architecture means you can trace a slow inference pipeline to a kernel subsystem, debug a crashing camera driver without corrupting the whole system, and make deliberate choices about which kernel version to target for a production deployment.

OS Definition: Three Roles¶

Role	Meaning
Resource manager	Arbitrates CPU time, memory pages, I/O bandwidth, and network across competing processes
Abstraction layer	Presents uniform interfaces (files, sockets, virtual memory) over heterogeneous hardware
Protection boundary	Isolates processes from each other and from kernel data structures; enforced in hardware

Without protection, a buggy camera driver corrupts kernel memory. Without abstraction, each application must know specific hardware register layouts.

Key Insight: These three roles are inseparable. Abstraction without protection would let any process break another's view of hardware. Protection without abstraction would force every developer to write device-specific code. All three together are what make Linux viable as a platform for complex AI systems with multiple concurrent workloads.

Privilege Levels¶

When code runs on a CPU, the hardware enforces who is allowed to do what. This is the foundation of the OS protection boundary. Think of it as the CPU acting as a security guard: code in user space asks for services, the OS grants or denies them, and the hardware makes it impossible to bypass the check.

x86 Rings¶

Ring	Mode	Access	Example occupants
Ring 0	Kernel	All instructions, I/O ports, MSRs, CR0/CR3	Linux kernel, device drivers
Ring 1–2	Unused	—	Historical OS/2; unused by Linux
Ring 3	User	Restricted; privileged instruction → GP Fault	Applications, runtimes, Python

Mode switch: SYSCALL instruction → Ring 0; SYSRET → Ring 3.

ARM Exception Levels (AArch64)¶

EL	Name	Purpose
EL0	User	Applications, TensorRT, ONNX Runtime, ROS2 nodes
EL1	Kernel	Linux kernel, exception handlers, MMU configuration
EL2	Hypervisor	KVM, Xen; controls VM-to-VM memory partitioning
EL3	Secure Monitor	ARM TrustZone, PSCI power management, BCT/BL31 signing

The two-level EL0/EL1 split is sufficient for most deployments. EL2 adds virtualization overhead; EL3 handles secure world and is always present on Cortex-A platforms. On Jetson Orin (Cortex-A78AE), the Linux kernel runs at EL1 and NVIDIA MB1/SC7 firmware occupies EL3.

AArch64 Exception Level Stack
┌─────────────────────────────────────┐
│  EL3 — Secure Monitor (TrustZone)   │  ← NVIDIA MB1/SC7 firmware
│  ARM TrustZone, PSCI, OTP key mgmt  │
├─────────────────────────────────────┤
│  EL2 — Hypervisor (optional)        │  ← KVM, Xen
│  VM-to-VM memory partitioning       │
├─────────────────────────────────────┤
│  EL1 — Kernel                       │  ← Linux kernel
│  MMU, exception handlers, drivers   │
├─────────────────────────────────────┤
│  EL0 — User                         │  ← TensorRT, ROS2, Python
│  Restricted; must SVC to reach EL1  │
└─────────────────────────────────────┘

Key Insight: On a Jetson, your TensorRT inference code runs at EL0 and cannot directly touch the NVDLA hardware registers. It must pass through EL1 (the Linux kernel) via system calls and ioctls. This indirection is what makes it safe to run multiple inference workloads concurrently — the kernel serializes access and prevents one process from corrupting another's DMA buffers.

Linux Kernel Architecture¶

Linux is a monolithic kernel with loadable modules: all core subsystems share a single address space at Ring 0 / EL1 with no IPC overhead between them. Drivers and filesystems compile as .ko modules inserted at runtime without rebuilding the kernel.

The monolithic design delivers fast in-kernel calls at the cost of shared fate — a crashing driver can panic the whole system. Contrast with microkernels (QNX, seL4) where drivers are separate processes; slower IPC, but crash isolation.

Linux Kernel Internal Architecture (Monolithic)
┌──────────────────────────────────────────────────────────┐
│                    Ring 0 / EL1 (Kernel Space)           │
│                                                          │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌─────────┐  │
│  │Scheduler │  │  Memory  │  │   VFS    │  │  Net    │  │
│  │ (CFS/    │  │ Manager  │  │ (ext4,   │  │ (TCP/IP,│  │
│  │  EEVDF)  │  │ (mm/)    │  │  procfs) │  │  XDP)   │  │
│  └──────────┘  └──────────┘  └──────────┘  └─────────┘  │
│                                                          │
│  ┌──────────────────────────────────────────────────┐    │
│  │  Device Drivers (drivers/)  ~60% of kernel LOC   │    │
│  │  GPU / DRM │ V4L2 Camera │ NVMe │ PCIe │ GPIO    │    │
│  └──────────────────────────────────────────────────┘    │
│                                                          │
│  ┌──────────────────────────────────────────────────┐    │
│  │  arch/ (arm64, x86): MMU, entry.S, IRQ setup     │    │
│  └──────────────────────────────────────────────────┘    │
└──────────────────────────────────────────────────────────┘
              ↑ syscall / interrupt boundary ↑
┌──────────────────────────────────────────────────────────┐
│             Ring 3 / EL0 (User Space)                    │
│   TensorRT  │  camerad  │  modeld  │  Python  │  ROS2    │
└──────────────────────────────────────────────────────────┘

Major Subsystems¶

Directory	Subsystem	Responsibility
`kernel/`	Scheduler, signals, timers	CFS/EEVDF, workqueues, kthreads
`mm/`	Memory manager	Page allocator, slab/slub, OOM killer, mmap
`drivers/gpu/drm/`	DRM / GPU	Display, render, GEM/PRIME buffer management
`drivers/media/`	V4L2 / Media	Camera ISP, video capture pipeline
`drivers/char/`, `drivers/block/`	Char / Block	TTY, GPIO, NVMe, MMC
`fs/`	VFS	ext4, tmpfs, overlayfs, procfs, sysfs
`net/`	Networking	TCP/IP, netfilter, XDP, RDMA
`arch/`	CPU-specific	x86, arm64 — exception tables, syscall entry, MMU
`include/`	Headers	Shared kernel-wide type definitions
`Documentation/`	Docs	ABI contracts, Device Tree bindings, admin guides

Common Pitfall: When a camera driver crashes on Jetson, it can kernel-panic the entire system because all drivers share Ring 0 space. The fix is to test drivers with IOMMU protection enabled (iommu=on in the kernel command line) so that a misbehaving DMA operation hits a fault rather than scribbling over kernel memory.

Now that we understand how the kernel is organized as a monolithic system, let's look at how the kernel manages versioning — and why AI platforms are conservative about which version they run.

Kernel Versioning¶

Format: major.minor.patch — e.g., 6.1.57

Track	Lifespan	Purpose	Example
Mainline	~9 weeks per release	New features merge here	6.13, 6.14
Stable	~3 months	Fixes only after release	6.13.y
LTS	2–6 years	Security and critical fixes only	5.10, 5.15, 6.1, 6.6

Why AI Platforms Pin to LTS¶

Board Support Packages couple tightly to a specific kernel ABI. Upgrading to mainline breaks downstream out-of-tree modules and vendor driver patches.

Platform	Kernel base	Notable additions
Jetson L4T 35.x (JetPack 5)	5.10 LTS	NVDLA, Argus ISP/VI/CSI, NvMedia, Tegra PCIe
Jetson L4T 36.x (JetPack 6)	6.1 LTS	Orin NvDLA v2, Tegra ISP v5, PCIe Gen4
openpilot Agnos (comma 3X)	Ubuntu LTS-based	Snapdragon camera ISPs, CAN-over-SPI, openpilot services
Yocto Kirkstone embedded AI	5.15 LTS	Stripped BSP; FPGA/NPU out-of-tree modules
Yocto Scarthgap embedded AI	6.6 LTS	EEVDF scheduler; latest nftables, XDP

Key Insight: The reason NVIDIA ships L4T 36.x based on Linux 6.1 rather than a newer kernel is that the NVDLA v2 driver, Tegra ISP v5, and PCIe Gen4 support all live in out-of-tree patches. Rebasing to a newer mainline kernel would require re-porting and re-validating hundreds of thousands of lines of vendor driver code. LTS stability is a practical engineering constraint, not laziness.

With the kernel version context established, let's look at the two most important runtime inspection interfaces the kernel exposes: /proc and /sys.

/proc Virtual Filesystem¶

/proc is a kernel interface that looks like a filesystem. Files have no disk representation; reads invoke kernel functions that format data on demand. Think of it as the kernel's "dashboard" — you read from it the same way you read a file, but what you get back is a live snapshot of kernel state.

Path	Content
`/proc/cpuinfo`	Per-CPU: model, frequency, flags (avx512, neon, crypto)
`/proc/meminfo`	MemTotal, MemFree, Buffers, Cached, HugePages
`/proc/interrupts`	Per-CPU interrupt counts per IRQ line and name
`/proc/cmdline`	Kernel boot parameters passed by bootloader
`/proc/[pid]/maps`	VMA layout: address range, permissions, backing file
`/proc/[pid]/status`	State, VmRSS, threads, capability sets
`/proc/[pid]/fd/`	Symlinks to all open file descriptors
`/proc/[pid]/wchan`	Kernel function where process is currently sleeping

Common Pitfall: /proc/[pid]/maps shows virtual address ranges, not physical ones. Two processes can both show a mapping at 0x7fff0000 and those are completely independent — different pages in physical RAM. Confusing virtual and physical addresses is a frequent source of debugging errors in DMA and GPU buffer management.

/sys (sysfs)¶

sysfs exports kernel objects — devices, drivers, buses — as a directory tree. Structure mirrors the kernel object model rather than process hierarchy. While /proc is about processes and kernel state, /sys is about devices and the hardware model.

Path	Purpose
`/sys/class/net/eth0/`	Interface attributes: speed, mtu, carrier state
`/sys/bus/pci/devices/`	PCI devices; vendor/device IDs, resource (BAR) files
`/sys/class/thermal/thermal_zone*/temp`	Zone temperature in millidegrees Celsius
`/sys/class/gpio/`	GPIO pin export, direction, and value control
`/sys/fs/cgroup/`	cgroup v2 hierarchy; resource controllers
`/sys/kernel/debug/`	Debugfs: DVFS state, tracing, GPU activity monitors
`/sys/firmware/devicetree/base/`	Live Device Tree from running kernel

On Jetson, /sys/class/thermal/ exposes CPU, GPU, and SoC thermal zones. Polling during inference detects throttling before it causes latency spikes.

Key Insight: /sys/class/thermal/thermal_zoneN/temp is not just a monitoring curiosity — it is a direct signal of whether your inference workload is being throttled by the kernel's thermal governor. If a GPU zone exceeds its trip point, the kernel reduces clock frequency without warning your application. An inference benchmark that ignores thermal readings is not measuring real-world performance.

Kernel Modules¶

Modules run at Ring 0 and have full kernel access. They provide the standard integration path for device drivers without rebuilding the kernel.

insmod my_driver.ko          # load from file; no dependency resolution
rmmod my_driver              # unload by name
modprobe nvidia              # load with dependency resolution (reads modules.dep)
lsmod                        # list loaded modules and usage count
modinfo nvme                 # show parameters, license, firmware requirements

Module Lifecycle in C¶

static int __init my_init(void) {
    pr_info("my_driver: loaded\n");
    return 0;  /* non-zero = load failure; kernel will not insert the module */
}
static void __exit my_exit(void) {
    pr_info("my_driver: unloaded\n");
    /* release all resources here: IRQs, DMA buffers, device nodes */
}
module_init(my_init);
module_exit(my_exit);
MODULE_LICENSE("GPL");
MODULE_DEVICE_TABLE(of, my_of_match);  /* enables udev auto-load on DT match */

__init lets the kernel discard initialization code after boot, reclaiming memory. MODULE_DEVICE_TABLE generates a modules.alias entry so modprobe can auto-load when the Device Tree node appears.

The module lifecycle follows a strict sequence when a device is discovered:

DT match: kernel finds a Device Tree node whose compatible string matches my_of_match.
udev triggers: modules.alias maps the compatible string to the module name; udev calls modprobe.
Module loads: kernel verifies signature (if CONFIG_MODULE_SIG_FORCE), maps .ko into Ring 0 address space.
probe() runs: the driver's probe function claims hardware resources, registers device nodes.
Device ready: application code can now open /dev/my_accel0 and issue ioctls.

DKMS (Dynamic Kernel Module Support) rebuilds out-of-tree modules when the kernel is updated — used by the NVIDIA GPU driver on development hosts and custom FPGA PCIe drivers.

Common Pitfall: Loading a module compiled against a different kernel version fails with version magic mismatch. Always compile modules against the exact kernel headers of the running kernel (uname -r). DKMS automates this rebuild, but manual module builds break silently after a apt upgrade that bumps the kernel version.

Kernel Source Layout¶

Directory	Contents
`kernel/`	Core scheduler, signals, timers, printk, kprobes
`mm/`	Buddy allocator, slab/slub, vmalloc, OOM killer
`drivers/`	All device drivers — approximately 60% of kernel source lines
`arch/arm64/`, `arch/x86/`	Platform entry (head.S, entry.S), IRQ setup, NUMA
`fs/`	Filesystems: ext4, xfs, tmpfs, procfs, overlayfs
`include/`	Kernel headers; `include/linux/` for cross-arch types
`net/`	TCP, UDP, netfilter, socket layer, XDP
`Documentation/`	`Documentation/ABI/` defines stable sysfs interfaces

Linux on AI Platforms¶

Understanding kernel architecture pays off when you navigate vendor-specific kernel trees for real hardware. Each platform below has custom subsystems that only exist because of the module and subsystem architecture described above.

Jetson L4T¶

NVIDIA's Linux for Tegra is a downstream LTS fork with patches for NVDLA, VIC (Video Image Compositor), ISP (Image Signal Processor), NvMedia, and Tegra PCIe IOMMU. L4T 35.x is based on 5.10; L4T 36.x rebased to 6.1. These patches are absent from mainline; they live in drivers/gpu/, drivers/media/, and arch/arm64/ of the L4T tree.

openpilot Agnos¶

Comma's AGNOS is a forked and custom-modified Linux that runs on comma 3X and comma four (Snapdragon-based). It is built for one practical use case: running openpilot on the road. The kernel (agnos-kernel-sdm845) is forked from Linux and custom developed with patches for road-facing and driver-monitoring camera ISPs, CAN-over-SPI, and power management for always-on vehicle operation. The OS image is produced by agnos-builder (boot chain, device tree, userspace). modeld, camerad, and controlsd depend on stable V4L2 and SocketCAN ABIs; comma pins the kernel so upstream changes do not break camera ISP register maps or driver interfaces. To see how each OS lecture topic maps to what was changed in the AGNOS fork for openpilot on the road, see the AGNOS Guide — OS Lectures ↔ Development Changes in Phase 5 (Autonomous Driving specialization).

Yocto for Custom Boards¶

Production inference nodes use Yocto to build minimal, reproducible kernel + rootfs images. Only required drivers are compiled; attack surface and boot time are minimized for safety-critical deployment.

Summary¶

Component	Location in kernel tree	Purpose
Process scheduler	`kernel/sched/`	CPU time allocation across tasks
Memory manager	`mm/`	Virtual memory, page and slab allocators
VFS	`fs/`	Uniform file API over all filesystem types
Device drivers	`drivers/`	Hardware abstraction and resource management
Architecture code	`arch/arm64/`, `arch/x86/`	Platform entry, MMU, exception tables
Network stack	`net/`	Protocols, socket layer, XDP
IPC	`kernel/` (futex, signal), `ipc/`	Inter-process communication primitives

Conceptual Review¶

Why is Linux monolithic rather than microkernel? In-kernel calls between subsystems (scheduler → memory manager → driver) have zero IPC cost. A monolithic design delivers throughput required for high-bandwidth GPU and camera workloads; the tradeoff is that a driver bug can crash the whole system.
Why do AI platforms pin to LTS kernels? Board Support Packages include hundreds of out-of-tree driver patches (NVDLA, ISP, PCIe IOMMU). These patches cannot be instantly rebased to a newer mainline kernel. LTS gives 2–6 years of security fixes without requiring a full BSP re-port.
What is /proc and why does it look like a filesystem? /proc is a virtual filesystem backed entirely by kernel functions, not disk storage. Reads trigger kernel code that formats live data. The filesystem metaphor means existing tools (cat, grep, shell scripts) can query kernel state without special APIs.
What is a kernel module, and why would a driver be one? A module is a .ko file that loads into Ring 0 at runtime without rebuilding the kernel. Device drivers are modules so new hardware can be supported without a full kernel rebuild and reboot.
What does MODULE_DEVICE_TABLE do? It generates an entry in modules.alias that maps a Device Tree compatible string (or PCI vendor/device ID) to a module name, enabling udev to automatically modprobe the correct driver when hardware is detected.
What is sysfs (/sys) and how does it differ from /proc? sysfs organizes kernel objects (devices, buses, drivers) in a hierarchy mirroring the kernel object model. /proc is process-oriented and kernel-state-oriented. Both are virtual filesystems with no disk backing.

AI Hardware Connection¶

L4T downstream patches add NVDLA, VIC, and ISP drivers absent from mainline; production Jetson deployments rely on these for hardware-accelerated inference and camera pipelines — understanding the kernel source layout locates them immediately when debugging initialization failures.
/sys/class/thermal/thermal_zoneN/temp is the primary interface for detecting GPU and CPU throttling on Jetson; inference benchmarks should poll this to correlate latency spikes with temperature events.
Custom FPGA PCIe accelerators require out-of-tree .ko modules compiled against the target LTS kernel; DKMS manages rebuilds across point releases automatically.
openpilot Agnos pins its kernel to maintain camera ISP register map compatibility with comma 3X hardware; upstream kernel changes would break the V4L2 subdevice interface.
/proc/interrupts is the first diagnostic for interrupt storm conditions in high-framerate camera pipelines — per-CPU counts per IRQ line reveal unbalanced distribution before it becomes a throughput problem.
The monolithic architecture means an FPGA DMA driver crash kernel-panics the entire system; production AI hardware deployments invest in IOMMU protection and driver fault injection testing to contain faults before deployment.