Orin Nano 8GB — Root File System & A/B Redundancy¶
Scope: Production-level understanding of Jetson rootfs architecture, the three rootfs flavors, A/B slot redundancy, OTA update strategy, boot validation, partition layout, and how rootfs connects to the memory and boot chain architecture.
Prerequisites: Familiarity with the Orin Nano boot chain and memory architecture.
Table of Contents¶
- L4T Root File System Overview
- Rootfs Generation and BSP Structure
- The Three Rootfs Flavors
- Rootfs Customization for Production
- Partition Layout on Orin Nano
- A/B Rootfs Redundancy
- Boot Flow With A/B Enabled
- Slot States and nvbootctrl
- Boot Validation Services
- Partition Size With A/B Enabled
- UUID-Based Partition Mounting
- OTA Update Strategy With A/B
- Flash XML Layout Files
- Rootfs and Memory Architecture Connection
- Designing Safe Field Updates for AI Edge Devices
- Debugging Bootloops in A/B Systems
- Production Hardening Checklist
- References
1. L4T Root File System Overview¶
On Jetson, the root file system (rootfs) is the entire Linux filesystem tree:
/
├── bin ← core binaries
├── lib ← shared libraries (including NVIDIA drivers)
├── etc ← system configuration
├── usr ← user programs, CUDA toolkit, TensorRT
├── opt ← NVIDIA-specific tools
├── home ← user data
└── var ← logs, runtime data
Jetson Linux (L4T — Linux for Tegra) is:
- Based on Ubuntu (20.04 focal for JetPack 5.x, 22.04 jammy for JetPack 6.x)
- Customized with NVIDIA BSP (Board Support Package)
- Includes CUDA, cuDNN, TensorRT, multimedia stack, and Jetson-specific drivers
- Kernel is NVIDIA-patched (not mainline)
The rootfs is what gets written to the APP partition (NVMe on Dev Kit, eMMC on production modules).
2. Rootfs Generation and BSP Structure¶
BSP Directory Layout¶
After extracting the L4T BSP, the directory structure is:
Linux_for_Tegra/
├── bootloader/ ← MB1, MB2, UEFI, firmware blobs
├── kernel/ ← Image, DTB, modules
├── rootfs/ ← Ubuntu root filesystem (initially empty)
├── tools/
│ └── samplefs/
│ └── nv_build_samplefs.sh ← rootfs generator
├── nv_tegra/ ← NVIDIA drivers, configs
├── flash.sh ← flashing script
└── apply_binaries.sh ← applies NVIDIA binaries to rootfs
Building a Rootfs¶
# Step 1: Generate base Ubuntu rootfs
cd Linux_for_Tegra/tools/samplefs/
sudo ./nv_build_samplefs.sh --abi aarch64 --distro ubuntu --version focal
# Step 2: Copy to rootfs directory
sudo cp <generated>.tbz2 ../../rootfs/
cd ../../rootfs/
sudo tar xpf <generated>.tbz2
# Step 3: Apply NVIDIA binaries (drivers, CUDA, etc.)
cd ..
sudo ./apply_binaries.sh
# Step 4: Flash
sudo ./flash.sh jetson-orin-nano-devkit internal
apply_binaries.sh installs:
nvgpukernel module- CUDA runtime libraries
- Multimedia libraries (nvbufsurface, nvargus, etc.)
- Jetson-specific systemd services
- Device tree overlays
3. The Three Rootfs Flavors¶
NVIDIA provides three rootfs configurations. Choosing the right one is a critical production decision.
Desktop¶
- Full Ubuntu GUI (GNOME/GDM3)
- OEM setup wizard on first boot
- Includes: desktop apps, browser, text editor, file manager
- Size: ~4–6GB
- Use case: development boards, prototyping, demos
Minimal¶
- No GUI — SSH / UART access only
- Core system utilities
- Smaller footprint (~1.5–2GB)
- Use case: robotics, industrial vision, production edge systems
Basic¶
- Smallest footprint (~800MB–1.2GB)
- Includes Docker/container runtime dependencies
- Designed for container-based deployment (all apps run in containers)
- Use case: cloud-native edge AI, fleet-managed devices
Which to Use in Production¶
| Scenario | Recommended | Why |
|---|---|---|
| Development / prototyping | Desktop | Easy setup, GUI debugging tools |
| Single-purpose AI device | Minimal | Small, predictable, low attack surface |
| Fleet of managed devices | Basic + Docker | Containerized updates, reproducible |
| Autonomous robot/vehicle | Minimal | Tight control, custom services only |
Production AI edge systems almost always use minimal or basic. Desktop rootfs wastes storage, memory, and increases attack surface.
4. Rootfs Customization for Production¶
Removing Unnecessary Packages¶
# After generating rootfs, chroot into it
sudo mount --bind /dev rootfs/dev
sudo mount --bind /proc rootfs/proc
sudo mount --bind /sys rootfs/sys
sudo chroot rootfs
# Remove packages
apt remove --purge snapd thunderbird libreoffice-*
apt autoremove
# Exit chroot
exit
sudo umount rootfs/{dev,proc,sys}
Adding Production Packages¶
Common additions for edge AI systems:
# Inside chroot
apt install -y \
openssh-server \
python3-pip \
docker.io \
chrony \ # time sync (critical for sensor fusion)
watchdog \ # hardware watchdog
logrotate # prevent log storage exhaustion
Read-Only Rootfs¶
For maximum reliability, make rootfs read-only with an overlay for writable data:
/dev/nvme0n1p1 (APP) → mount read-only at /
tmpfs → overlay for /tmp, /var/run
/dev/nvme0n1p2 (data) → mount read-write at /data
Benefits:
- Survives unexpected power loss without filesystem corruption
- Prevents log accumulation from filling storage
- Forces clean separation of system vs. application data
5. Partition Layout on Orin Nano¶
Default Layout (No A/B)¶
QSPI NOR Flash:
mb1, mb2, uefi, firmware blobs, BCT, boot config
NVMe / eMMC:
APP → rootfs (ext4)
[optional] → data partition
Layout With A/B Enabled¶
QSPI NOR Flash:
mb1 mb1_b
mb2 mb2_b
uefi uefi_b
BCT BCT_b
(each firmware blob duplicated)
NVMe / eMMC:
APP → rootfs A (ext4)
APP_b → rootfs B (ext4)
kernel → kernel A
kernel_b → kernel B
kernel-dtb → DTB A
kernel-dtb_b → DTB B
Every component is duplicated — bootloader, kernel, DTB, and rootfs. This ensures a complete fallback if any single component is corrupted.
Viewing Current Partition Layout¶
# On a running Jetson
lsblk
sudo fdisk -l /dev/nvme0n1
# Or check flash layout XML before flashing
cat Linux_for_Tegra/bootloader/generic/cfg/flash_t234_qspi.xml
6. A/B Rootfs Redundancy¶
Why It Exists¶
Imagine deploying hundreds of AI edge devices in the field — traffic cameras, industrial inspection systems, autonomous robots. If an OTA update breaks the system and there is no redundancy:
- Device is bricked
- Physical access required to reflash
- Downtime, truck rolls, customer impact
With A/B redundancy:
- Update is written to the inactive slot
- System reboots into the updated slot
- If boot succeeds, the slot is marked good
- If boot fails, automatic rollback to the previous slot
This is the same approach used in automotive ECUs (ISO 26262), Android devices, and industrial PLCs.
How Slots Are Paired¶
Each slot is a complete bootable system:
Slot A: Slot B:
Bootloader A (MB1, MB2, UEFI) Bootloader B
Kernel A Kernel B
DTB A DTB B
Rootfs A (APP) Rootfs B (APP_b)
The pairing is critical — you cannot mix Slot A bootloader with Slot B rootfs. This would cause driver mismatches, module load failures, and potentially a non-booting system.
Enabling A/B¶
Set in the flash configuration before flashing:
This cannot be enabled after flashing — the partition table must be created with dual slots from the start.
7. Boot Flow With A/B Enabled¶
Normal Boot (Slot A Active)¶
Power On
↓
BootROM (reads fuses, selects boot media)
↓
MB1 (Slot A — DRAM training, power rails)
↓
MB2 (Slot A — loads UEFI A)
↓
UEFI (Slot A — checks slot state, loads kernel A)
↓
Kernel A (from kernel partition A)
↓
Mounts rootfs A (APP partition)
↓
systemd → validation services → mark slot A successful
Boot Failure Recovery¶
If Slot A fails to boot (kernel panic, hang, validation failure):
Boot attempt 1 → Slot A → FAIL
Boot attempt 2 → Slot A → FAIL
Boot attempt 3 → Slot A → FAIL (retry count exhausted)
↓
cpu-bootloader marks Slot A invalid
↓
Switch to Slot B
↓
MB1 B → MB2 B → UEFI B → Kernel B → Rootfs B
↓
System boots from Slot B (rollback complete)
If Both Slots Fail¶
If both Slot A and Slot B fail to boot:
Both slots exhausted
↓
Recovery kernel (if configured)
↓
Minimal recovery environment
↓
Requires manual intervention (reflash via USB)
This is why production systems must have robust validation — you want to catch failures in Slot A and roll back to a known-good Slot B, not exhaust both slots.
8. Slot States and nvbootctrl¶
Slot Attributes¶
Each slot has four attributes:
| Attribute | Meaning |
|---|---|
| Active | The slot UEFI will boot on next reboot |
| Current | The slot currently running |
| Bootable | The slot contains a valid OS image |
| Retry count | Remaining boot attempts before marking failed |
nvbootctrl Commands¶
# Dump current slot information
sudo nvbootctrl -t rootfs dump-slots-info
# Example output:
# Current slot: A
# Slot A:
# Priority: 15
# Suffix: _a
# Retry count: 7
# Boot successful: 1
# Slot B:
# Priority: 14
# Suffix: _b
# Retry count: 7
# Boot successful: 1
# Check which slot is currently active
sudo nvbootctrl -t rootfs get-current-slot
# Set Slot B as the next boot target
sudo nvbootctrl -t rootfs set-active-boot-slot 1
# Mark current slot as successfully booted
sudo nvbootctrl -t rootfs mark-boot-successful
The -t rootfs flag targets the rootfs partition table. Without it, nvbootctrl operates on the bootloader partition table.
9. Boot Validation Services¶
Two critical systemd services run after boot to validate the slot. Understanding these prevents a very common production mistake.
nv-l4tbootloader-config.service¶
- Runs early in the boot process
- Calls
nvbootctrl verifyto validate bootloader integrity - Marks the slot as successfully booted if verification passes
l4t-rootfs-validation-config.service¶
- Custom validation hook — you add your own health checks here
- Default: minimal checks (filesystem mounted, basic services running)
- Production: add application-specific checks (inference engine starts, camera opens, network connects)
The Common Production Mistake¶
If you reboot before validation services complete:
- System boots into Slot A
- Validation service starts but hasn't finished
- You reboot (manual or watchdog)
- Retry count decrements
- After 3 reboots, system marks Slot A as failed
- Switches to Slot B (unintended rollback)
Prevention:
- Ensure validation services complete before any reboot
- Set retry count high enough for your boot time
- Add explicit health checks before calling
mark-boot-successful
Custom Validation Example¶
Create a production validation script:
#!/bin/bash
# /opt/nvidia/validation/validate.sh
# Check CUDA is functional
nvidia-smi > /dev/null 2>&1 || exit 1
# Check camera opens
v4l2-ctl --device=/dev/video0 --all > /dev/null 2>&1 || exit 1
# Check inference engine
python3 -c "import tensorrt" > /dev/null 2>&1 || exit 1
# Check network connectivity
ping -c 1 -W 5 8.8.8.8 > /dev/null 2>&1 || exit 1
# All checks passed — mark boot successful
nvbootctrl -t rootfs mark-boot-successful
Wire this into l4t-rootfs-validation-config.service so the slot is only marked good when your application stack is confirmed working.
10. Partition Size With A/B Enabled¶
When A/B is enabled:
The rootfs partition is split in half:
Example:
| ROOTFSSIZE | APP (Slot A) | APP_b (Slot B) |
|---|---|---|
| 28 GiB | 14 GiB | 14 GiB |
| 56 GiB | 28 GiB | 28 GiB |
| 128 GiB | 64 GiB | 64 GiB |
Storage Planning Mistakes¶
If you forget that A/B halves your rootfs:
- Rootfs fills up during operation
- Docker images fail to pull
- OTA updates fail (no space for new image)
- Log files exhaust remaining space
- System becomes unstable
Production Storage Budget (14 GiB Slot Example)¶
| Component | Size |
|---|---|
| Minimal rootfs (L4T + NVIDIA) | ~2 GB |
| CUDA + TensorRT libraries | ~2 GB |
| Application code | ~500 MB |
| TensorRT engines | ~500 MB |
| Docker images (if used) | ~2–4 GB |
| Logs + temp (with rotation) | ~1 GB |
| Free space buffer | ~4–6 GB |
Use log rotation, tmpfs for /tmp, and minimal rootfs to maximize available space.
11. UUID-Based Partition Mounting¶
Why UUID Is Mandatory With A/B¶
When A/B is enabled, device names are unreliable:
The kernel command line and fstab must use UUID to identify the correct rootfs:
Where UUIDs Are Stored¶
Jetson stores partition UUIDs in the bootloader directory:
Linux_for_Tegra/bootloader/l4t-rootfs-uuid.txt ← Slot A UUID
Linux_for_Tegra/bootloader/l4t-rootfs-uuid.txt_b ← Slot B UUID
UEFI reads these during boot to pass the correct root=UUID= parameter to the kernel.
Checking Current UUID¶
What Happens If UUID Is Wrong¶
- Kernel fails to mount rootfs
- Boot drops to initramfs emergency shell
- If retry count exhausts, system switches to other slot
- If both UUIDs wrong, system is unbootable
12. OTA Update Strategy With A/B¶
Correct OTA Flow¶
1. Device running on Slot A (current, verified)
2. Download new system image
3. Write new image to Slot B (inactive)
4. Update Slot B bootloader, kernel, DTB
5. Set Slot B as active: nvbootctrl -t rootfs set-active-boot-slot 1
6. Reboot
7. System boots into Slot B
8. Validation services run health checks
9. Mark Slot B successful: nvbootctrl -t rootfs mark-boot-successful
10. Slot B is now the running, verified slot
If boot into Slot B fails:
Retry count exhausts → automatic rollback to Slot A
Device continues running on previous known-good image
What NOT to Do¶
apt upgrade is NOT supported with A/B.
Debian package upgrades modify the running rootfs in place. This:
- Only updates the active slot
- Leaves the inactive slot stale
- Breaks the A/B invariant (both slots should be complete, independent images)
- Can leave the system in an inconsistent state
Use image-based OTA — write a complete rootfs image to the inactive slot.
OTA Methods¶
| Method | Pros | Cons |
|---|---|---|
| NVIDIA OTA tools | Official, tested | NVIDIA ecosystem lock-in |
| Custom image-based | Full control | More engineering effort |
| Mender / SWUpdate / RAUC | Open-source, fleet management | Integration effort with L4T |
| Container-based updates | Fast, no rootfs change needed | Base OS updates still needed |
Container-Based OTA (Recommended for Fleet)¶
For fleet-managed devices, a hybrid approach works well:
- Base rootfs: Minimal or Basic, rarely updated (quarterly)
- Application: Runs in Docker containers, updated frequently
- A/B: Protects the base rootfs during rare OS updates
- Container registry: Pulls new inference containers on schedule
This minimizes A/B partition switches while keeping application updates fast and safe.
13. Flash XML Layout Files¶
The partition layout is defined in XML files that flash.sh reads during flashing.
Key Layout Files¶
Linux_for_Tegra/bootloader/generic/cfg/
├── flash_t234_qspi.xml ← QSPI NOR (bootloader partitions)
└── flash_t234_nvme.xml ← NVMe (rootfs, kernel, DTB)
Example Partition Entry (Simplified)¶
<partition name="APP" type="data">
<allocation_policy> sequential </allocation_policy>
<filesystem_type> basic </filesystem_type>
<size> 30064771072 </size> <!-- 28 GiB in bytes -->
<file_system_attribute> 0 </file_system_attribute>
<allocation_attribute> 0x8 </allocation_attribute>
<filename> system.img </filename>
</partition>
With ROOTFS_AB=1, the layout file is modified to include:
<partition name="APP" ...>
<size> 15032385536 </size> <!-- 14 GiB -->
<filename> system.img </filename>
</partition>
<partition name="APP_b" ...>
<size> 15032385536 </size> <!-- 14 GiB -->
<filename> system.img </filename>
</partition>
Customizing Partition Layout¶
To add a persistent data partition:
<partition name="UDA" type="data">
<allocation_policy> sequential </allocation_policy>
<filesystem_type> basic </filesystem_type>
<size> 10737418240 </size> <!-- 10 GiB -->
<filename> uda.img </filename>
</partition>
This creates a data partition that survives A/B slot switches and OTA updates — use it for models, configuration, and application data.
Applying Custom Layout¶
# Flash with custom layout
sudo ROOTFS_AB=1 ./flash.sh -c my_custom_layout.xml jetson-orin-nano-devkit internal
14. Rootfs and Memory Architecture Connection¶
Rootfs and A/B slots interact with memory architecture in important ways.
Each Slot Has Its Own Kernel and DTB¶
With A/B enabled, each slot loads:
- Its own kernel (
Imagefrom kernel or kernel_b partition) - Its own device tree (
kernel-dtborkernel-dtb_b) - Its own kernel modules (from rootfs
/lib/modules/)
If these are mismatched between slots, you get:
| Mismatch | Symptom |
|---|---|
| Kernel version != module version | modprobe: FATAL: Module not found |
| DTB != kernel | Wrong memory carveouts, boot crash |
| Old NVIDIA drivers with new kernel | GPU driver load failure, no CUDA |
| Mismatched CMA config | Camera buffer allocation failures |
Always update kernel + DTB + rootfs + modules as a complete atomic unit per slot.
Memory Carveouts Depend on DTB¶
Memory carveouts (BPMP, SPE, RCE, OP-TEE) are defined in the DTB. If Slot A and Slot B have different DTBs with different carveout configurations:
- Switching slots changes the memory map
- Firmware processors may malfunction if carveouts are wrong
- CMA size may differ, affecting camera buffer allocation
Production systems should use identical DTBs across both slots unless intentionally migrating to a new memory configuration.
Kernel Modules and Driver Stack¶
The rootfs contains NVIDIA kernel modules under /lib/modules/<version>/:
These must match the running kernel exactly. A rootfs with modules from JetPack 5.1 will not work with a kernel from JetPack 6.0.
15. Designing Safe Field Updates for AI Edge Devices¶
Update Architecture¶
Cloud / Update Server
↓ (image + manifest + signature)
Device Agent (runs on Jetson)
↓ (verify signature)
Write to inactive slot
↓
Set inactive slot as active
↓
Reboot
↓
Validation services
↓ (pass) ↓ (fail)
Mark successful Automatic rollback
↓
Report success to server Report failure to server
Design Principles¶
- Atomic updates — entire slot (bootloader + kernel + DTB + rootfs) is updated together
- Signature verification — every image must be cryptographically signed; device verifies before writing
- Bandwidth efficiency — use delta updates (binary diff) when possible; full images for major versions
- Rollback is automatic — never require manual intervention for failed updates
- Health checks are application-specific — default validation is insufficient; add camera, CUDA, inference, and network checks
- Data partition survives updates — models, configuration, and logs live on a separate partition (UDA)
- Staged rollout — update 1% of fleet first, monitor, then expand
Update Failure Modes and Mitigations¶
| Failure Mode | Mitigation |
|---|---|
| Power loss during write | Inactive slot is written; active slot intact |
| Corrupted download | Checksum verification before writing |
| New image doesn't boot | A/B rollback (automatic) |
| New image boots but app fails | Custom validation rejects slot |
| Both slots corrupted | Recovery kernel + USB reflash capability |
| Storage full | Pre-check free space before download |
16. Debugging Bootloops in A/B Systems¶
Symptoms¶
- Device reboots repeatedly
- Alternates between Slot A and Slot B
- Eventually stops booting (both slots exhausted)
Step 1 — Capture Serial Console¶
Connect UART (115200 baud) to the Jetson debug header. Serial output shows:
[UEFI] Slot A: retry_count=2, boot_successful=0
[UEFI] Booting Slot A...
[kernel] ... panic ...
[UEFI] Slot A: retry_count=1
Serial console is essential — without it, you're debugging blind.
Step 2 — Check Slot State¶
If you can get to a shell (recovery or brief boot window):
Look for:
retry_count: 0— slot has exhausted retriesboot_successful: 0— slot was never validated
Step 3 — Common Causes¶
| Cause | Diagnosis |
|---|---|
| Kernel panic | Serial log shows panic trace |
| GPU driver fails to load | dmesg | grep nvgpu shows errors |
| Rootfs corrupted (fsck needed) | initramfs drops to emergency shell |
| Validation service marks slot failed | Service logs: journalctl -u nv-l4t* |
| Watchdog reboot before validation | Increase watchdog timeout or defer start |
| DTB mismatch | Wrong carveouts, devices not probed |
Step 4 — Recovery¶
If both slots are exhausted:
- Enter recovery mode (hold recovery button during power-on)
- Connect USB-C to host machine
- Reflash using
flash.sh:
Prevention¶
- Always test OTA images in a staging environment before fleet deployment
- Set retry count to a reasonable value (3–7, not 1)
- Ensure validation services complete before any watchdog-triggered reboot
- Keep serial console accessible on production hardware (debug header or test points)
17. Production Hardening Checklist¶
Before deploying Jetson Orin Nano in the field:
Storage¶
- A/B enabled with correct ROOTFSSIZE
- Separate UDA data partition for persistent storage
- Log rotation configured (prevent storage exhaustion)
- Read-only rootfs with overlay (if applicable)
Boot and Recovery¶
- Custom validation service with application-specific health checks
- Retry count set appropriately (3–7)
- Recovery kernel configured
- Serial console accessible for debugging
OTA¶
- Image-based OTA (not apt upgrade)
- Cryptographic signature verification on images
- Staged rollout process defined
- Rollback tested and verified
Security¶
- Secure boot enabled (fuses burned)
- SSH keys deployed (password auth disabled)
- Unnecessary services removed
- Firewall configured
Memory¶
- CMA sized for workload (see Memory Architecture Guide)
- Memory budget calculated (cameras + model + runtime + OS)
- OOM behavior tested under load
Monitoring¶
- tegrastats or equivalent running
- CMA/buddyinfo logged periodically
- Thermal monitoring with alerts
- Remote health reporting to fleet management
18. References¶
- NVIDIA Jetson Linux Developer Guide — Bootloader — boot flow and A/B documentation
- NVIDIA Jetson Linux Developer Guide — Root File System — rootfs generation and customization
- NVIDIA Jetson Linux Developer Guide — OTA — OTA update mechanisms
- Mender for Jetson — open-source OTA for Jetson
- SWUpdate — software update framework for embedded Linux
- Main guide: Nvidia Jetson Platform Guide
- Memory deep dive: Orin Nano Memory Architecture