Security and OTA¶
Phase 4 — Track B — Nvidia Jetson · Module 6 of 7
Focus: Harden a Jetson Orin Nano 8GB product for field deployment: secure boot (PKC/SBK fuse programming), OP-TEE trusted execution, disk encryption, A/B rootfs redundancy, and a complete OTA update pipeline with rollback, monitoring, and reliability testing.
Primary hardware: Jetson Orin Nano 8GB on custom or dev-kit carrier
Previous: 5. Application Development · Next: 7. Compliance and Manufacturing
Table of Contents¶
- Why Security and OTA Are a Single Module
- Threat Model for Deployed Jetson Products
- Secure Boot — PKC and SBK
- Fuse Programming Workflow
- OP-TEE and Trusted Execution
- Disk Encryption (LUKS + OP-TEE Key Storage)
- Runtime Hardening
- A/B Rootfs Redundancy Architecture
- OTA Pipeline Design
- OTA Frameworks on Jetson (SWUpdate, Mender, RAUC)
- Bootloader and Firmware OTA
- Container-Layer OTA
- Delta Updates and Bandwidth Engineering
- Rollback and Power-Fail Safety
- OTA Signing and Verification Chain
- Fleet Monitoring and Telemetry
- Reliability Testing (HALT/HASS, Soak, Power-Cycle)
- Projects
- Resources
1. Why security and OTA are a single module¶
Security and OTA are tightly coupled in production Jetson products:
- Secure boot ensures only authorized firmware and kernel run on the device
- OTA channels must be signed so an attacker cannot push malicious updates
- Disk encryption protects data at rest if the device is physically stolen
- A/B redundancy makes OTA safe — a failed update rolls back to the last known-good slot
- Fuse programming is irreversible and must be planned before production flashing
This module brings together material that was previously split across deep-dive sub-guides in Module 1 and organizes it as a production decision flow: what to do, in what order, and why.
Deep-dive companions (Module 1): The sub-guides below contain implementation-level detail that this module references but does not duplicate: - Orin Nano Security — threat model, attack surfaces, detailed PKC/SBK walkthrough - Orin Nano OTA Deep Dive — NVIDIA
nv_update_engine, slot management internals - Orin Nano Rootfs and A/B Redundancy — partition layout,nvbootctrl, slot switching
2. Threat model for deployed Jetson products¶
Before hardening, define what you're protecting against:
| Threat | Example | Mitigation |
|---|---|---|
| Firmware tampering | Attacker replaces bootloader via physical access | Secure boot (PKC) — hardware root of trust |
| Data theft | Device stolen, NVMe removed and read | Disk encryption (LUKS + OP-TEE key) |
| Unauthorized updates | Malicious OTA pushed to device | Signed update images + TLS channel |
| Network intrusion | Attacker exploits open port on device | Firewall, minimal services, SSH key-only |
| Privilege escalation | Application container escapes to host | AppArmor/SELinux, minimal rootfs, no root SSH |
| Supply chain | Counterfeit module or tampered firmware image | Fuse-based device identity, signed factory images |
3. Secure boot — PKC and SBK¶
Secure boot chain¶
Jetson Orin Nano (T234) supports a hardware root of trust using fuses:
BootROM (silicon)
│ Verifies MB1 signature using PKC hash in fuses
│
MB1 (QSPI)
│ Verifies MB2 using PKC chain
│
MB2 (QSPI)
│ Verifies UEFI, OP-TEE, kernel
│
UEFI → Linux kernel → userspace
PKC (Public Key Cryptography)¶
- Generate an RSA-2048 or RSA-3072 key pair
- Burn the public key hash into OTP fuses
- All firmware images are signed with the private key
- BootROM verifies the signature chain at every stage
# Generate PKC key pair
openssl genrsa -out pkc.pem 3072
# Extract public key
openssl rsa -in pkc.pem -pubout -out pkc_pub.pem
SBK (Symmetric Binding Key)¶
- 128-bit AES key burned into OTP fuses
- Encrypts firmware images in addition to signing them
- Prevents reading firmware even with physical access to QSPI
- Combined with PKC: images are encrypted AND signed
4. Fuse programming workflow¶
Fuse programming is irreversible. Once fuses are burned, the device permanently requires signed/encrypted firmware.
Lab workflow (test keys)¶
- Generate test PKC and SBK keys (clearly labeled, stored separately from production keys)
- Flash and validate the device works with signed images
- Burn fuses on a test unit only
- Verify the fused device boots with signed images and rejects unsigned images
- Do not fuse production units until the entire boot/OTA chain is validated
Production workflow¶
- Generate production PKC and SBK keys
- Store private key in an HSM (Hardware Security Module) or at minimum an encrypted, access-controlled vault
- Integrate key signing into the CI/CD build pipeline
- Program fuses as part of factory flashing (see Module 8 — Compliance and Manufacturing)
Fuse burning command¶
Verifying fuse state¶
5. OP-TEE and trusted execution¶
OP-TEE provides a Trusted Execution Environment (TEE) on the Cortex-A78AE cores:
- Secure world runs OP-TEE OS alongside normal world Linux
- Trusted Applications (TAs) execute in the secure world
- Use cases: key storage, attestation, DRM, secure sensor processing
Jetson OP-TEE integration¶
NVIDIA ships OP-TEE as part of the L4T BSP. It is loaded during the boot chain (MB2 loads OP-TEE before UEFI).
| Component | Location |
|---|---|
| OP-TEE OS image | tos-optee_t234.img in flash partition |
| Trusted Applications | Built into the OP-TEE image or loaded at runtime |
| TEE client library | libteec.so (in rootfs) |
| TEE supplicant | tee-supplicant daemon |
Key storage with OP-TEE¶
Store disk encryption keys in OP-TEE secure storage rather than in plaintext on the filesystem:
- OP-TEE generates or receives the LUKS key
- Key is stored in OP-TEE secure storage (RPMB or encrypted file)
- At boot, OP-TEE retrieves the key and passes it to the kernel for LUKS unlock
- The key never appears in normal-world memory
6. Disk encryption (LUKS + OP-TEE key storage)¶
Why encrypt¶
Devices deployed in the field can be physically accessed. Without disk encryption, an attacker can:
- Remove the NVMe SSD and read all data on a different machine
- Extract ML models, credentials, customer data
- Clone the device by copying the filesystem
LUKS setup on Jetson¶
# Create encrypted partition
sudo cryptsetup luksFormat /dev/nvme0n1p1
# Open (unlock) the partition
sudo cryptsetup open /dev/nvme0n1p1 crypt_root
# Format and mount
sudo mkfs.ext4 /dev/mapper/crypt_root
sudo mount /dev/mapper/crypt_root /mnt
Auto-unlock with OP-TEE¶
For headless devices, manual passphrase entry is not practical. Use OP-TEE to auto-unlock at boot:
- Store the LUKS key in OP-TEE secure storage during factory provisioning
- At boot, an initrd script calls the OP-TEE client to retrieve the key
- The key unlocks the LUKS partition
- The key is not accessible from normal-world after boot
Performance impact¶
Disk encryption adds CPU overhead for I/O:
| Workload | Without encryption | With LUKS (AES-256-XTS) | Overhead |
|---|---|---|---|
| Sequential read | ~3.2 GB/s | ~2.4 GB/s | ~25% |
| Sequential write | ~2.8 GB/s | ~2.1 GB/s | ~25% |
| Random 4K read | ~180K IOPS | ~140K IOPS | ~22% |
The Cortex-A78AE has ARMv8 crypto extensions, so AES is hardware-accelerated. The overhead is acceptable for most edge AI workloads where the bottleneck is GPU inference, not storage I/O.
7. Runtime hardening¶
Service minimization¶
# List all enabled services
systemctl list-unit-files --state=enabled
# Disable unnecessary services
sudo systemctl disable bluetooth
sudo systemctl disable avahi-daemon
sudo systemctl disable cups
sudo systemctl mask <service> # prevent re-enabling
SSH hardening¶
# /etc/ssh/sshd_config
PermitRootLogin no
PasswordAuthentication no
PubkeyAuthentication yes
AllowUsers deploy
MaxAuthTries 3
Firewall¶
sudo ufw default deny incoming
sudo ufw default allow outgoing
sudo ufw allow ssh
sudo ufw allow 443/tcp # OTA update channel
sudo ufw enable
AppArmor¶
JetPack ships with AppArmor support. Create profiles for your application containers and services to restrict file access, network access, and capabilities.
Debug UART in production¶
Disable or restrict the debug UART on production devices. An exposed UART gives root shell access. Options:
- Remove the debug UART header from the production carrier PCB (recommended)
- Disable UART console in the kernel command line (
console=set tottyS0with no getty) - Protect with password (less secure, still exposes boot log)
8. A/B rootfs redundancy architecture¶
A/B rootfs allows safe updates: write the new image to the inactive slot, validate, then switch.
Partition layout¶
QSPI NOR (boot):
├─ mb1_a / mb1_b (A/B bootloader stage 1)
├─ mb2_a / mb2_b (A/B bootloader stage 2)
├─ uefi_a / uefi_b (A/B UEFI)
├─ tos_a / tos_b (A/B OP-TEE)
└─ bpmp_a / bpmp_b (A/B BPMP firmware)
NVMe:
├─ APP_a (rootfs slot A)
├─ APP_b (rootfs slot B)
└─ DATA (persistent user data, not duplicated)
Slot management¶
# Check active slot
sudo nvbootctrl get-current-slot
# Set next boot slot
sudo nvbootctrl set-active-boot-slot <0|1>
# Mark current slot as successful (after validation)
sudo nvbootctrl set-slot-as-successful <0|1>
Sizing trade-offs¶
A/B doubles the rootfs storage requirement. For a 4 GB rootfs:
- Single slot: 4 GB
- A/B: 8 GB + ~1 GB overhead = ~9 GB
- With persistent data partition: 9 GB + DATA size
Plan NVMe capacity accordingly. A 128 GB NVMe gives ample room; a 32 GB eMMC may be tight.
Deep dive: Orin Nano Rootfs and A/B Redundancy
9. OTA pipeline design¶
End-to-end architecture¶
Build server (CI/CD)
│ Build rootfs image, sign with PKC/OTA key
│
Staging server
│ Host signed images, manage release channels
│ (stable, beta, canary)
│
├─ TLS ─────────────────────────────────────────┐
│ │
Device agent │
│ Poll for updates or receive push notification │
│ Download image, verify signature │
│ Write to inactive slot │
│ Reboot into new slot │
│ Run validation tests │
│ Mark slot as successful (or rollback) │
│ Report status back to server │
└────────────────────────────────────────────────┘
Release channel strategy¶
| Channel | Purpose | Rollout |
|---|---|---|
| canary | Internal testing, nightly builds | 1–5 devices |
| beta | Early adopter or staging fleet | 5–10% of fleet |
| stable | Production | 100% of fleet (phased rollout) |
10. OTA frameworks on Jetson (SWUpdate, Mender, RAUC)¶
| Framework | Strengths | Jetson integration |
|---|---|---|
| SWUpdate | Flexible, scriptable, double-copy and single-copy modes, signed images, Lua handlers | Good — integrates with nvbootctrl for slot management |
| Mender | Managed cloud service, device dashboard, fleet management | Good — Mender Hub has Jetson examples |
| RAUC | D-Bus API, slot status tracking, bundle signing | Good — RAUC bundles can be configured for Jetson A/B |
| NVIDIA nv_update_engine | Native Jetson tool, partition-level updates | Built-in but limited fleet management |
Recommended approach¶
Use SWUpdate or Mender as the device-side agent, with NVIDIA's partition tools (nvbootctrl, nv_update_engine) for bootloader and slot operations. This gives you both ecosystem-standard OTA features and Jetson-specific boot integration.
11. Bootloader and firmware OTA¶
Updating QSPI-resident firmware (MB1, MB2, UEFI, OP-TEE, BPMP, SPE) in the field:
- Use A/B slot support in QSPI — write new firmware to the inactive slot
- NVIDIA's
nv_update_enginehandles partition-level writes - Risk: A failed bootloader update can brick the device if not properly A/B managed
- Mitigation: Always update bootloader before rootfs; validate bootloader boots before switching rootfs slot
12. Container-layer OTA¶
For applications running in Docker containers (as set up in Module 2 — L4T Customization):
- Update container images independently of the rootfs
- Use a private container registry (Harbor, AWS ECR, or local)
- Pull new images, stop old container, start new container
- Rollback: Keep the previous image tag;
docker rollbackor restart with old tag
Layer caching¶
Docker layer caching reduces download size — only changed layers are pulled. Structure your Dockerfile so that:
- Base L4T image (large, changes rarely) is the bottom layer
- System dependencies (changes occasionally) in middle layers
- Application code (changes frequently) in the top layer
13. Delta updates and bandwidth engineering¶
For devices on cellular or bandwidth-constrained networks:
| Technique | Savings | Complexity |
|---|---|---|
| Full image | 0% (baseline) | Low |
| bsdiff / zstd delta | 60–90% | Medium — requires generating deltas per version pair |
| casync (content-addressable chunks) | 70–95% | Medium — chunk store on server, only new chunks downloaded |
| Container layer diff | 80–95% (for app updates) | Low — built into Docker |
Practical recommendation¶
- Use full rootfs images for the initial fleet (<100 devices) — simpler, less infrastructure
- Add delta updates when bandwidth costs become significant
- Use container-layer updates for application changes (most common update type)
14. Rollback and power-fail safety¶
Power-fail during OTA¶
If power fails during an update:
| Phase | Power-fail result | Recovery |
|---|---|---|
| Downloading image | Incomplete download | Resume or re-download on next boot |
| Writing to inactive slot | Partially written slot | Inactive slot is corrupt but active slot is untouched — device boots normally |
| Rebooting into new slot | Boot interrupted | Watchdog or retry counter triggers rollback to previous slot |
| Validation running | Validation incomplete | Slot not marked as successful — next reboot falls back |
Watchdog timer¶
Configure a hardware watchdog to trigger reboot if the system hangs during post-update validation:
# Enable watchdog
echo 1 > /dev/watchdog
# Application must pet the watchdog periodically
# If it fails to pet within timeout → hardware reset → rollback
Retry counter¶
Use a boot counter in nvbootctrl or a custom persistent flag:
- Before switching to new slot, set retry counter = 3
- Each boot attempt decrements the counter
- If the counter reaches 0 without the slot being marked successful → revert to old slot
15. OTA signing and verification chain¶
Build server:
├─ Build rootfs image
├─ Compute SHA-256 hash of image
├─ Sign hash with OTA signing key (RSA-3072 or Ed25519)
└─ Bundle: image + signature + metadata (version, channel, timestamp)
Device:
├─ Download bundle
├─ Verify signature using embedded OTA public key
├─ Verify hash matches image
├─ Check version > current (prevent downgrade)
└─ Apply update
Key management¶
- OTA signing key is separate from the secure boot PKC key
- Store OTA private key in CI/CD secrets or HSM
- Embed OTA public key in the rootfs (or in a read-only partition)
- Rotate OTA keys by shipping the new public key in a signed update before switching
16. Fleet monitoring and telemetry¶
| Metric | Source | Purpose |
|---|---|---|
| Boot count | Persistent counter | Detect reboot loops |
| Update status | OTA agent | Track rollout success/failure rate |
| Slot status | nvbootctrl |
Detect devices stuck on old slot |
| Temperature | tegrastats |
Detect thermal issues in the field |
| Disk usage | df |
Prevent devices from running out of space |
| Uptime | /proc/uptime |
Detect unexpected reboots |
| Application health | Health check endpoint | Detect application-level failures |
Telemetry stack¶
- Device side: Lightweight agent (custom, or Telegraf/Fluent Bit) reports metrics via MQTT or HTTPS
- Server side: Time-series database (InfluxDB, Prometheus) + dashboard (Grafana)
- Alerts: Set thresholds for reboot loops, update failures, temperature anomalies
17. Reliability testing (HALT/HASS, soak, power-cycle)¶
Before shipping, validate that the device survives real-world conditions:
Power-cycle endurance¶
#!/bin/bash
# Automated power-cycle test (requires controllable power supply or relay)
for i in $(seq 1 1000); do
power_off
sleep 5
power_on
wait_for_boot # timeout → fail
run_smoke_test # basic peripheral check
log_result $i
done
Target: 1000 power cycles with zero boot failures.
Thermal cycling¶
| Parameter | Consumer | Industrial |
|---|---|---|
| Temperature range | 0 to +45 C | -20 to +70 C |
| Ramp rate | 5 C/min | 10 C/min |
| Soak time at extreme | 15 min | 30 min |
| Cycles | 100 | 500 |
Soak testing¶
- Run the production workload (AI inference + OTA checks + telemetry) continuously for 7–14 days
- Monitor for memory leaks, file descriptor leaks, disk space growth, temperature drift
- Log all metrics and compare day 1 vs day 14
HALT (Highly Accelerated Life Test)¶
- Push beyond spec limits (temperature, vibration, voltage) to find design margins
- Not a pass/fail test — it reveals where the design breaks
- Performed on 5–10 units early in the design cycle
18. Projects¶
- Secure boot lab: Generate test PKC/SBK keys, sign a full JetPack image, flash a dev kit with secure boot enabled, then verify that unsigned images are rejected.
- End-to-end OTA pipeline: Set up SWUpdate on a Jetson Orin Nano with A/B rootfs, a signing server, and a staging server. Push an update, verify rollback on intentional failure.
- Power-cycle soak: Build a test jig with a relay-controlled power supply, run 1000 power-cycle boot tests, log results, and analyze any failures.
- Fleet dashboard: Deploy 3+ Jetson devices, set up MQTT telemetry + Grafana dashboard, monitor boot count, temperature, and OTA status across the fleet.
19. Resources¶
| Resource | Description |
|---|---|
| NVIDIA Jetson Security Guide | Official secure boot, fuse programming, encryption documentation |
| Orin Nano Security deep dive | Detailed PKC/SBK walkthrough, threat model (Module 1 companion) |
| Orin Nano OTA deep dive | nv_update_engine, slot management internals (Module 1 companion) |
| Orin Nano Rootfs and A/B Redundancy | Partition layout, nvbootctrl (Module 1 companion) |
| SWUpdate (sbabic.github.io/swupdate) | Open-source OTA framework |
| Mender (mender.io) | Managed OTA platform with Jetson support |
| RAUC (rauc.io) | Robust Auto-Update Controller |
| OP-TEE (optee.org) | Open-source Trusted Execution Environment |