AI Hardware Engineering — Roles and Market Analysis¶

Job titles, salary ranges, work arrangement data, and hiring priorities — organized by sub-layer so you can target a specific niche, not just a broad layer.

Data basis: US market, 2025–2026. Ranges reflect base salary + equity/bonus for FAANG-adjacent and well-funded startups. Adjust -20% to -30% for non-coastal or smaller companies.

Layer Map — 8 Layers, 23 Sub-Layers¶

Layer	Sub-Layer	Focus	Typical Roles
L1	L1a Inference Optimization	Model optimization, profiling, quantization	ML Inference Engineer, AI Performance Engineer
	L1b Edge AI Deployment	On-device pipelines, Jetson/MCU, power-constrained	Edge AI Engineer, Embedded AI Engineer
	L1c AI Application	Vision, robotics, solutions engineering	CV Engineer, Robotics AI Engineer
	L1d Agentic AI & GenAI	LLM agents, RAG, tool use, GenAI applications	Agentic AI Engineer, GenAI Engineer, AI Engineer
	L1e ML Engineering & MLOps	Training pipelines, model lifecycle, deployment infra	ML Engineer, MLOps Engineer, AI/ML Engineer
L2	L2a Graph & IR Optimization	Fusion, memory planning, graph passes	DL Graph Optimization Engineer
	L2b Compiler Backend	LLVM, MLIR, code generation, custom targets	AI Compiler Engineer, GPU Compiler Engineer
	L2c Kernel Engineering	Triton, CUTLASS, Flash-Attention, hand-tuned kernels	Kernel Optimization Engineer, MTS Kernels
L3	L3a GPU/Accelerator Runtime	CUDA runtime, TensorRT execution, DLA scheduling	GPU Runtime Engineer, Inference Platform Engineer
	L3b Linux Kernel & Drivers	nvgpu, amdgpu, PCIe, DMA, IOMMU, device tree	Linux Kernel Engineer, Device Driver Engineer
	L3c HPC Infrastructure	NCCL, Slurm, K8s+GPU, GPUDirect, multi-node	Distributed Runtime Engineer, Resource Scheduler
L4	L4a Embedded Software	MCU, FreeRTOS, bare-metal, SPI/I2C/CAN	Embedded Software Engineer, RTOS Engineer
	L4b Embedded Linux & BSP	Yocto, L4T, kernel modules, OTA, rootfs	Embedded Linux Engineer, BSP Engineer
	L4c Automotive & IoT	ADAS firmware, ISO 26262, BLE/LoRa, cloud IoT	Automotive Embedded Engineer, IoT Engineer
L5	L5a Accelerator Architecture	Systolic arrays, dataflow, tensor core design	AI Accelerator Architect, GPU Architect
	L5b System & SoC Architecture	NoC, memory hierarchy, power domains, chiplet partitioning	SoC Architect, Memory Systems Architect
L6	L6a RTL Design	SystemVerilog datapaths, FSMs, IP implementation	RTL Design Engineer, ASIC Design Engineer
	L6b Design Verification	UVM, formal, constrained random, emulation	DV Engineer, Emulation Engineer
	L6c FPGA & HLS	Vivado, HLS, FPGA prototyping, timing closure	FPGA Engineer, HLS Engineer
L7	L7a Physical Design	P&R, floorplan, CTS, timing closure, signoff	Physical Design Engineer, STA Engineer
	L7b DFT & CAD	Scan insertion, ATPG, tool flow automation	DFT Engineer, CAD Engineer
L8	L8a Packaging & Process	CoWoS, chiplets, foundry interface, yield	Packaging Engineer, Process Engineer
	L8b Silicon Validation	Post-silicon bring-up, characterization, ATE	Validation Engineer, Test Engineer

L1a — Inference Optimization¶

Focus: Make ML models run faster — graph optimization, quantization, profiling, TensorRT/vLLM tuning.

Title	What they do
ML Inference Optimization Engineer	Graph-level optimization, TensorRT engine builds, INT8/FP8 quantization
AI Performance Engineer	Nsight profiling, roofline analysis, bottleneck identification
Applied ML Engineer (Inference)	Take research models to production with latency/throughput SLAs

Level	Total Comp (Top Tier)	Remote	Hybrid	Onsite
Junior	$120K–$160K	15%	25%	60%
Mid	$170K–$230K	10%	25%	65%
Senior	$250K–$350K+	10%	20%	70%

Trending: LLM inference optimization (TensorRT-LLM, vLLM) is pushing these roles toward L2 salary levels.

L1b — Edge AI Deployment¶

Focus: Deploy inference on resource-constrained hardware — Jetson, Snapdragon, MCU, TinyML.

Title	What they do
Edge AI Deployment Engineer	Jetson/Snapdragon on-device inference, power/latency targets
Edge AI Engineer	Full pipeline: sensor → preprocess → inference → actuation
Embedded AI Engineer	TFLite Micro, TinyML on Cortex-M, ultra-low-power inference

Level	Total Comp (Top Tier)	Remote	Hybrid	Onsite
Junior	$110K–$140K	10%	20%	70%
Mid	$150K–$200K	10%	20%	70%
Senior	$220K–$300K+	10%	15%	75%

Note: Hardware access (cameras, sensors, dev kits) limits remote work.

L1c — AI Application (CV / Robotics / Solutions)¶

Focus: Domain-specific AI deployment — vision, robotics, customer-facing solutions.

Title	What they do
Computer Vision Engineer (Edge AI)	Detection, segmentation, tracking on constrained devices
Robotics AI Engineer	Perception + planning on robot hardware (ROS 2, Jetson)
AI Application Engineer	SDK integration, demo systems, customer deployment
AI Solutions Engineer	Pre/post-sales technical, benchmark customer workloads

Level	Total Comp (Top Tier)	Remote	Hybrid	Onsite
Junior	$110K–$145K	15%	25%	60%
Mid	$160K–$220K	10%	25%	65%
Senior	$240K–$320K+	10%	20%	70%

L1d — Agentic AI & GenAI¶

Focus: Build applications on top of large language models — agents, RAG pipelines, tool-use orchestration, GenAI products.

Title	What they do
Agentic AI Engineer	Design multi-step AI agents: tool use, planning, memory, chain-of-thought orchestration
GenAI Engineer	Build GenAI-powered products: chatbots, code assistants, content generation, multimodal apps
AI Engineer	Full-stack AI application development: prompt engineering, fine-tuning, API integration, evaluation

Level	Total Comp (Top Tier)	Remote	Hybrid	Onsite
Junior	$120K–$160K	30%	35%	35%
Mid	$175K–$250K	25%	35%	40%
Senior	$270K–$400K+	20%	30%	50%

Market notes: - Highest remote flexibility in the entire stack — most work is API/cloud-based, no hardware needed - Fastest-growing category by absolute job count — LLM/GenAI application demand exploded 2023–2026 - Salary premium for engineers who understand inference optimization (L1a) + agent orchestration - Overlaps with software engineering more than hardware — included here because these roles generate the workloads your AI chip must run - Companies hiring: OpenAI, Anthropic, Google DeepMind, Meta, every enterprise SaaS company, startups

Connection to AI hardware: Agentic AI creates the inference demand (long-context, tool-calling, multi-turn) that drives hardware requirements. Understanding these workloads helps L2 (compiler) and L5 (architecture) engineers design for real usage patterns.

L1e — ML Engineering & MLOps¶

Focus: Build and operate ML training/inference pipelines — model lifecycle, data pipelines, serving infrastructure, monitoring.

Title	What they do
Machine Learning Engineer	Training pipelines, model architecture, feature engineering, evaluation
MLOps Engineer	CI/CD for models, experiment tracking (MLflow, W&B), model registry, serving infra
AI/ML Engineer	End-to-end: data → training → optimization → deployment → monitoring
ML Platform Engineer	Build internal ML platforms: compute scheduling, data versioning, model serving
Data/ML Infrastructure Engineer	GPU cluster management for training, distributed data pipelines

Level	Total Comp (Top Tier)	Remote	Hybrid	Onsite
Junior	$115K–$155K	25%	30%	45%
Mid	$170K–$240K	20%	30%	50%
Senior	$260K–$380K+	15%	30%	55%

Market notes: - Largest absolute job count across all L1 sub-layers — every company doing AI needs ML engineers - Higher remote flexibility than hardware roles but less than pure GenAI (some roles need GPU cluster access) - MLOps is maturing: Kubernetes + GPU, model serving (Triton, vLLM), experiment tracking are standard skills - Strong overlap with L3c (HPC Infrastructure) for training-focused roles - Companies hiring: every tech company, banks, healthcare, defense, manufacturing — ML is horizontal

Connection to AI hardware: ML engineers define the training workloads (distributed training, large batch, mixed precision) and serving requirements (latency SLAs, throughput targets) that hardware must support. Understanding MLOps helps L1a (inference optimization) and L3c (HPC) engineers.

L2a — Graph & IR Optimization¶

Focus: Compiler front-end — graph passes, operator fusion, memory planning, layout transforms.

Title	What they do
DL Graph Optimization Engineer	Fusion passes, constant folding, memory planning, quantization insertion
AI Systems Compiler Engineer	Distributed graph partitioning, multi-device scheduling
Performance Compiler Engineer	Auto-tuning (BEAM search), roofline-guided pass selection

Level	Total Comp (Top Tier)	Remote	Hybrid	Onsite
Junior	$150K–$200K	5%	10%	85%
Mid	$230K–$320K	2%	10%	88%
Senior	$350K–$480K+	1%	10%	89%

L2b — Compiler Backend (LLVM / MLIR / TVM)¶

Focus: Compiler infrastructure — IR design, lowering passes, instruction selection, code generation for GPU/NPU/TPU.

Title	What they do
AI Compiler Engineer	Full ML compiler: ONNX → IR → optimized target code
ML Compiler Backend Engineer	Target-specific codegen: NVPTX, AMDGPU, custom accelerator ISA
Compiler Engineer (LLVM/MLIR/TVM)	Framework-level: write passes, define dialects, implement lowering
GPU Compiler Engineer	GPU-specific: register allocation, occupancy, instruction scheduling
Code Generation Engineer	Custom NPU/TPU backend: ISA design, instruction selection, scheduling

Level	Total Comp (Top Tier)	Remote	Hybrid	Onsite
Junior	$160K–$210K	5%	10%	85%
Mid	$250K–$350K	2%	10%	88%
Senior	$400K–$550K+	1%	10%	89%

Note: Highest-paid sub-layer in the stack. Extreme scarcity — every AI chip startup needs one, few exist.

L2c — Kernel Engineering¶

Focus: Hand-write or tune the actual GPU/accelerator kernels — Triton, CUTLASS, Flash-Attention, NCCL kernels.

Title	What they do
Kernel Optimization Engineer	Triton/CUTLASS kernel authoring, tiling, memory optimization
MTS Kernels (Member of Technical Staff)	Production attention/GEMM kernels for LLM training and inference
HPC Compiler Engineer	Vectorization, parallelization for scientific computing

Level	Total Comp (Top Tier)	Remote	Hybrid	Onsite
Junior	$150K–$200K	5%	15%	80%
Mid	$220K–$320K	5%	10%	85%
Senior	$350K–$500K+	2%	10%	88%

Trending: Flash-Attention, long-context kernels, FP8/FP4 — hottest kernel engineering area.

L3a — GPU/Accelerator Runtime¶

Focus: Execution layer — CUDA runtime, TensorRT engine execution, DLA scheduling, memory management.

Title	What they do
GPU Runtime Engineer	CUDA runtime internals: streams, events, memory pools, context
Accelerator Runtime Engineer	Custom NPU/TPU runtime: command queue, buffer management
Inference Platform Engineer	TensorRT engine execution, Triton server, dynamic batching
CUDA Runtime Engineer	Driver API, module loading, JIT compilation, multi-context

Level	Total Comp (Top Tier)	Remote	Hybrid	Onsite
Junior	$140K–$180K	5%	15%	80%
Mid	$200K–$270K	5%	15%	80%
Senior	$280K–$380K+	5%	10%	85%

L3b — Linux Kernel & Drivers¶

Focus: Kernel-space GPU/accelerator drivers, DMA, PCIe, IOMMU, interrupt handling.

Title	What they do
Linux Kernel Engineer (GPU/Drivers)	nvgpu, amdgpu, DRM, IOMMU/SMMU, memory-mapped I/O
Device Driver Engineer	PCIe/CXL endpoint driver, DMA engine, scatter-gather
Embedded Linux BSP Engineer	Yocto kernel customization, device tree, L4T, rootfs

Level	Total Comp (Top Tier)	Remote	Hybrid	Onsite
Junior	$140K–$175K	5%	10%	85%
Mid	$200K–$260K	5%	10%	85%
Senior	$280K–$380K+	5%	10%	85%

Note: GPU kernel driver engineers are extremely rare — compensation approaches L2 levels.

L3c — HPC Infrastructure¶

Focus: Multi-GPU/multi-node systems — NCCL, Slurm, Kubernetes+GPU, GPUDirect, InfiniBand.

Title	What they do
Distributed Runtime Engineer	NCCL tuning, AllReduce overlap, multi-node communication
Resource Scheduler Engineer	Slurm, K8s GPU scheduling, MIG/MPS, multi-tenant GPU sharing
Parallel Computing Engineer	MPI+CUDA, GPUDirect RDMA, storage I/O optimization
Systems Software Engineer (AI)	Full-stack performance: CPU-GPU coordination, profiling, debugging

Level	Total Comp (Top Tier)	Remote	Hybrid	Onsite
Junior	$145K–$185K	10%	20%	70%
Mid	$210K–$280K	10%	20%	70%
Senior	$300K–$400K+	10%	15%	75%

Note: Most remote-friendly sub-layer in L3 — infrastructure work is often SSH-based.

L4a — Embedded Software (MCU / RTOS)¶

Focus: Bare-metal and RTOS firmware on microcontrollers — the hardware-closest software layer.

Title	What they do
Embedded Software Engineer	ARM Cortex-M/R, FreeRTOS/Zephyr, SPI/I2C/UART/CAN drivers
Firmware Engineer (AI/Edge SoC)	Command processor firmware, DMA scheduling, power management
Real-Time Systems Engineer	Deterministic scheduling, deadline guarantees, PREEMPT_RT
Bootloader / UEFI Engineer	U-Boot, UEFI, secure boot chain, A/B partition management

Level	Total Comp (Top Tier)	Remote	Hybrid	Onsite
Junior	$100K–$130K	10%	15%	75%
Mid	$140K–$185K	10%	20%	70%
Senior	$195K–$250K+	10%	20%	70%

L4b — Embedded Linux & BSP¶

Focus: Linux kernel customization, board support packages, Yocto, OTA, production images.

Title	What they do
Embedded Linux Engineer	Yocto/Buildroot, kernel config, systemd, rootfs optimization
BSP Engineer	Board bring-up, device tree, driver integration, HAL
Jetson Platform Engineer	L4T customization, JetPack, carrier board bring-up, SPE firmware

Level	Total Comp (Top Tier)	Remote	Hybrid	Onsite
Junior	$105K–$135K	10%	20%	70%
Mid	$150K–$200K	10%	20%	70%
Senior	$210K–$270K+	10%	20%	70%

L4c — Automotive & IoT¶

Focus: Domain-specific firmware — ADAS safety standards, connected IoT, fleet management.

Title	What they do
Automotive Embedded Engineer (ADAS)	ISO 26262, AUTOSAR, ECU firmware, CAN/CAN-FD, functional safety
IoT Firmware Engineer	Low-power wireless (BLE, LoRa, Wi-Fi), OTA, cloud connectivity
Device Firmware Engineer	Storage controllers, NIC firmware, PCIe endpoint firmware

Level	Total Comp (Top Tier)	Remote	Hybrid	Onsite
Junior	$105K–$140K	15%	20%	65%
Mid	$150K–$200K	15%	20%	65%
Senior	$210K–$275K+	10%	20%	70%

Note: Automotive ADAS pays 15–25% premium due to ISO 26262 certification. IoT has most remote flexibility in L4.

L5a — Accelerator Architecture¶

Focus: Define the compute engine — systolic arrays, dataflow, tensor core specs, PE design.

Title	What they do
AI Accelerator Architect	Systolic array dimensions, dataflow strategy, precision support
GPU Architect	SM/CU microarchitecture, warp scheduler, tensor core design
ML Systems Architect	Workload analysis → architecture decisions, hardware-software co-design
Performance Architect	Roofline modeling, bottleneck analysis, workload characterization

Level	Total Comp (Top Tier)	Remote	Hybrid	Onsite
Mid (5+ yr)	$280K–$380K	5%	10%	85%
Senior (8+ yr)	$400K–$550K+	2%	10%	88%
Principal/Fellow	$600K–$1M+	1%	5%	94%

Note: No junior roles. Requires years of RTL + systems experience. Defines what gets built.

L5b — System & SoC Architecture¶

Focus: Full-chip system design — NoC, memory hierarchy, I/O, power domains, chiplet partitioning.

Title	What they do
SoC Platform Engineer	Zynq/Versal PS-PL co-design, AXI interconnect, IP integration
Silicon Architect	Die floorplan, chiplet partitioning, power/thermal budgets
Memory Systems Architect	HBM controller, cache hierarchy, scratchpad design
Heterogeneous Computing Architect	CPU+GPU+NPU+DSP integration, coherency, shared memory
Edge AI Systems Architect	Power-constrained accelerator design (< 5W TDP)

Level	Total Comp (Top Tier)	Remote	Hybrid	Onsite
Mid (5+ yr)	$260K–$360K	5%	10%	85%
Senior (8+ yr)	$380K–$500K+	2%	10%	88%

L6a — RTL Design¶

Focus: Implement the architecture in synthesizable HDL — datapaths, controllers, interfaces.

Title	What they do
RTL Design Engineer	SystemVerilog implementation: PE arrays, FSMs, AXI interfaces
ASIC Design Engineer	Tape-out quality RTL, CDC handling, synthesis constraints
Logic Design Engineer	Combinational/sequential optimization, area/power trade-offs
SoC Integration Engineer	IP block integration, address maps, subsystem-level wiring

Level	Total Comp (Top Tier)	Remote	Hybrid	Onsite
Junior	$140K–$180K	2%	10%	88%
Mid	$200K–$280K	2%	10%	88%
Senior	$300K–$400K+	1%	10%	89%

L6b — Design Verification¶

Focus: Prove the RTL is correct — UVM testbenches, formal, coverage, emulation.

Title	What they do
Design Verification Engineer	UVM constrained random, coverage-driven verification, assertions
Formal Verification Engineer	Property checking, equivalence checking for critical blocks
Emulation Engineer	Palladium/Zebu, run firmware on RTL at MHz for pre-silicon validation

Level	Total Comp (Top Tier)	Remote	Hybrid	Onsite
Junior	$135K–$175K	2%	10%	88%
Mid	$195K–$270K	2%	10%	88%
Senior	$290K–$380K+	1%	10%	89%

Note: Chronic shortage. DV engineers are ~40% of chip design staff. Always in demand.

L6c — FPGA & HLS¶

Focus: FPGA prototyping, HLS-based accelerator design, Vivado/Quartus.

Title	What they do
FPGA Design Engineer	Vivado/Quartus, timing closure, IP integration, ILA debug
HLS Engineer	C/C++ to RTL (Vitis HLS), pragma optimization, dataflow
Hardware Design Engineer	General digital design on FPGA: interfaces, controllers

Level	Total Comp (Top Tier)	Remote	Hybrid	Onsite
Junior	$125K–$160K	5%	15%	80%
Mid	$175K–$240K	5%	10%	85%
Senior	$250K–$340K+	2%	10%	88%

Note: 10–15% lower than ASIC roles at the same level. More entry points for new grads.

L7a — Physical Design¶

Focus: Synthesis, place & route, timing closure, power integrity, signoff.

Title	What they do
Physical Design Engineer	Floorplan, placement, CTS, routing, congestion management
STA Engineer	PrimeTime, setup/hold analysis, MCMM, OCV, timing ECOs
Power Integrity Engineer	IR drop, EM analysis, power grid design, DVFS planning

Level	Total Comp (Top Tier)	Remote	Hybrid	Onsite
Junior	$130K–$165K	1%	5%	94%
Mid	$180K–$250K	1%	5%	94%
Senior	$260K–$360K+	1%	5%	94%

L7b — DFT & CAD¶

Focus: Test insertion, ATPG, tool flow automation, methodology.

Title	What they do
DFT Engineer	Scan insertion, ATPG, BIST, fault coverage optimization
CAD/EDA Engineer	Tool flow scripts, methodology, custom EDA automation
Layout Engineer	Standard cell placement, DRC/LVS clean, metal optimization

Level	Total Comp (Top Tier)	Remote	Hybrid	Onsite
Junior	$120K–$150K	1%	5%	94%
Mid	$165K–$230K	1%	5%	94%
Senior	$240K–$330K+	1%	5%	94%

L8a — Packaging & Process¶

Focus: Advanced packaging, foundry interface, yield, process selection.

Title	What they do
Packaging Engineer	CoWoS, EMIB, Foveros, chiplet integration, substrate design
Process Integration Engineer	Foundry relationship, process node selection, yield optimization
Reliability Engineer	Burn-in, electromigration, thermal cycling, qualification
Supply Chain Engineer	Wafer allocation, lead time, foundry contracts

Level	Total Comp (Top Tier)	Remote	Hybrid	Onsite
Junior	$120K–$155K	1%	5%	94%
Mid	$170K–$235K	1%	5%	94%
Senior	$240K–$330K+	1%	5%	94%

L8b — Silicon Validation¶

Focus: Post-silicon bring-up, characterization, ATE programming, production test.

Title	What they do
Post-Silicon Validation Engineer	First silicon bring-up, debug, speed/power characterization
Test Engineer	ATE programming, production test development, yield analysis

Level	Total Comp (Top Tier)	Remote	Hybrid	Onsite
Junior	$115K–$150K	1%	5%	94%
Mid	$165K–$225K	1%	5%	94%
Senior	$235K–$310K+	1%	5%	94%

Market Size & Industry Context¶

AI Chip Market¶

Segment	2025 Size	2030 Projected	CAGR	Key Drivers
AI Chip (Total)	$71B	$227B	26%	LLM training/inference, data center AI, edge AI
AI Training Chips	$38B	$105B	23%	GPT-scale models, multi-GPU clusters
AI Inference Chips	$25B	$95B	30%	On-device AI, LLM serving, autonomous vehicles
Edge AI Chips	$8B	$27B	28%	IoT, ADAS, robotics, smart cameras

Sources: Gartner, McKinsey Semiconductor Practice, SIA (Semiconductor Industry Association), company filings.

Semiconductor Industry¶

Metric	2025	Notes
Global semiconductor revenue	$687B	SIA estimate
Semiconductor engineering workforce (US)	~280,000	BLS + SIA data
AI hardware engineering jobs (US)	~45,000–55,000	Subset of semiconductor + AI infrastructure
New chip design startups (2023–2025)	150+	Funded $10M+, most need L2/L5/L6 hires

Adjacent Markets That Drive Hiring¶

Market	2025 Size	How It Drives AI Hardware Jobs
Data center AI infrastructure	$150B+	GPU clusters → L1a, L2c, L3c demand
Autonomous vehicles (ADAS)	$45B	Edge inference → L1b, L4c demand
Robotics	$18B	On-device perception → L1b, L1c demand
AI cloud services (MLaaS)	$80B+	Inference serving → L1a, L2a, L3a demand
EDA tools	$16B	Chip design tools → L7a, L7b ecosystem

Job Posting Volume by Sub-Layer¶

Estimated monthly active US job postings (LinkedIn + Indeed + Greenhouse + company career pages, Q1 2026). These numbers represent unique open positions, not total applicants.

Full Table¶

Sub-Layer	Monthly US Postings	YoY Change	Supply/Demand	Avg Time-to-Fill
L1a Inference Optimization	1,200–1,500	+35%	Balanced	45–60 days
L1b Edge AI Deployment	800–1,100	+15%	Balanced	40–55 days
L1c AI Application	2,000–2,500	+10%	Slight surplus	30–45 days
L1d Agentic AI & GenAI	5,000–7,000	+120%	High demand	25–40 days
L1e ML Engineering & MLOps	8,000–10,000	+25%	Balanced	30–45 days
L2a Graph/IR Optimization	200–350	+60%	Severe shortage	90–120 days
L2b Compiler Backend	300–500	+55%	Severe shortage	90–150 days
L2c Kernel Engineering	400–600	+50%	Shortage	75–100 days
L3a GPU/Accelerator Runtime	500–700	+25%	Shortage	60–80 days
L3b Linux Kernel/Drivers	600–800	+15%	Shortage	60–90 days
L3c HPC Infrastructure	700–1,000	+30%	Balanced	45–60 days
L4a Embedded Software	3,500–4,500	+5%	Balanced	30–45 days
L4b Embedded Linux/BSP	1,500–2,000	+10%	Balanced	35–50 days
L4c Automotive/IoT	2,000–2,800	+20%	Slight shortage	40–55 days
L5a Accelerator Architecture	100–200	+70%	Extreme shortage	120–180 days
L5b System/SoC Architecture	200–350	+40%	Severe shortage	90–150 days
L6a RTL Design	1,500–2,000	+25%	Shortage	50–70 days
L6b Design Verification	2,000–2,500	+20%	Chronic shortage	50–75 days
L6c FPGA/HLS	1,200–1,600	+10%	Balanced	40–55 days
L7a Physical Design	800–1,100	+20%	Shortage	55–75 days
L7b DFT/CAD	400–600	+10%	Balanced	45–60 days
L8a Packaging/Process	300–500	+30%	Shortage	60–80 days
L8b Silicon Validation	400–600	+15%	Balanced	45–60 days
	~33,700–41,800

Job Volume Visualization (with Remote %)¶

L1–L6: Hands-on in this roadmap (practical projects and deep skill building)

Monthly US Postings by Sub-Layer (Q1 2026)          Postings  Remote %

L1e ML Eng / MLOps     ████████████████████████████████████████████████████████████ 9,000   20%  ░░░
L1d Agentic AI/GenAI   ████████████████████████████████████████████ 6,000                   25%  ░░░░
L4a Embedded SW        ██████████████████████████████ 4,000                                 10%  ░
L4c Automotive/IoT     ██████████████████ 2,400                                             15%  ░░
L6b Verification       █████████████████ 2,250                                               2%
L1c AI Application     █████████████████ 2,250                                              15%  ░░
L4b Embedded Linux     █████████████ 1,750                                                  10%  ░
L6a RTL Design         █████████████ 1,750                                                   2%
L6c FPGA/HLS           ██████████ 1,400                                                      5%
L1a Inference Opt      ██████████ 1,350                                                     15%  ░░
L1b Edge AI            ███████ 950                                                          10%  ░
L3c HPC Infra          ██████ 850                                                           10%  ░
L3b Kernel/Drivers     █████ 700                                                             5%
L3a GPU Runtime        ████ 600                                                              5%
L2c Kernel Eng         ████ 500                                                              5%
L2b Compiler Backend   ███ 400                                                               2%
L2a Graph/IR           ██ 275                                                                2%
L5b System/SoC Arch    ██ 275                                                                2%
L5a Accelerator Arch   █ 150                                                                 1%
                       └──────────────────────────────────────────────────────┘
                       0    1K    2K    3K    4K    5K    6K    7K    8K   9K

                       Remote: ░ = 10%+  ░░ = 15%+  ░░░ = 20%+  ░░░░ = 25%+

L7–L8: Theory and guided labs only in this roadmap (OpenROAD, TinyTapeout) — listed for market context

Sub-Layer	Monthly Postings	Remote %	Note
L7a Physical Design	~950	1%	Requires EDA licenses (Synopsys, Cadence) + foundry PDK access
L7b DFT/CAD	~500	1%	Security-sensitive data, tool-locked
L8a Packaging/Process	~400	1%	Cleanroom and foundry interaction
L8b Silicon Validation	~500	1%	Lab equipment, ATE, first silicon

Total: ~34K–42K/month (L1–L6: ~31K–39K practical roles · L7–L8: ~2,350 theory-context roles)

Remote-friendly sub-layers (10%+ remote postings): - L1a Inference Optimization (15%) — profiling and optimization can be done with cloud GPU access - L1c AI Application (15%) — SDK/solutions work, customer-facing but often remote-capable - L4c Automotive/IoT (15%) — IoT firmware (not ADAS) has most remote flexibility - L1b Edge AI (10%) — some roles allow remote with dev kit shipping - L3c HPC Infrastructure (10%) — cluster management is SSH-based - L4a Embedded Software (10%) — growing remote with remote lab access tools - L4b Embedded Linux/BSP (10%) — Yocto builds and kernel work can be remote

Almost zero remote (1–2%): - L5a/L5b Architecture — daily whiteboard sessions with RTL and silicon teams - L6a/L6b RTL/DV — EDA tools, lab access, emulation hardware - L7a/L7b Physical Design/DFT — EDA licenses, foundry NDA data, security - L8a/L8b Packaging/Validation — cleanroom, lab equipment, ATE access

Key Insights from Job Data¶

Highest volume (easiest to find openings): - L1e ML Engineering / MLOps (~9,000/month) — every company doing AI needs ML engineers - L1d Agentic AI / GenAI (~6,000/month) — LLM application demand exploding - L4a Embedded Software (~4,000/month) — the bread and butter of hardware engineering - L6b Design Verification (~2,250/month) — chronic shortage means constant openings

Lowest volume but highest pay (hardest to get, hardest to fill): - L5a Accelerator Architecture (~150/month) — only ~150 open positions, but $400K–$1M+ comp - L2a Graph/IR Optimization (~275/month) — every chip startup needs one, few candidates exist - L2b Compiler Backend (~400/month) — MLIR/LLVM expertise is extremely rare

Most remote-friendly: - L1d Agentic AI / GenAI (25% remote) — API/cloud-based, no hardware needed - L1e ML Engineering / MLOps (20% remote) — training on cloud GPUs, MLflow/K8s - L1a/L1c (15% remote) — inference optimization and applications

Best ROI for career investment: - L2b/L2c (Compiler/Kernel) — low supply, high demand, highest pay, growing 50–60% YoY - L5a (Architecture) — requires experience, but once you're there, extreme scarcity = leverage - L6b (Verification) — chronic shortage means job security; moderate pay but never unemployed - L1d (Agentic AI) — if you combine agent/GenAI skills with L1a inference optimization, you're uniquely valuable

Fastest growing (YoY posting increase): - L1d Agentic AI / GenAI: +120% (LLM application wave) - L5a Accelerator Architecture: +70% (AI chip startup wave) - L2a Graph/IR: +60% (every new chip needs a compiler) - L2b Compiler Backend: +55% - L2c Kernel Engineering: +50% - L1a Inference Optimization: +35% (LLM inference demand)

Cross-Layer Summary¶

Compensation + Volume Combined (Senior, Top-Tier Total Comp)¶

Sub-Layer	Senior Comp	Monthly Postings	Scarcity	Demand Trend
L1a Inference Optimization	$250K–$350K+	~1,350	Medium	Growing (LLM)
L1b Edge AI	$220K–$300K+	~950	Medium	Stable
L1c AI Application	$240K–$320K+	~2,250	Low-Medium	Stable
L1d Agentic AI / GenAI	$270K–$400K+	~6,000	Medium	Surging (+120% YoY)
L1e ML Eng / MLOps	$260K–$380K+	~9,000	Low-Medium	Growing (+25% YoY)
L2a Graph/IR	$350K–$480K+	~275	High	Surging
L2b Compiler Backend	$400K–$550K+	~400	Very High	Surging
L2c Kernel Engineering	$350K–$500K+	~500	Very High	Surging
L3a GPU Runtime	$280K–$380K+	~600	High	Growing
L3b Kernel/Drivers	$280K–$380K+	~700	High	Growing
L3c HPC Infrastructure	$300K–$400K+	~850	Medium-High	Growing
L4a Embedded Software	$195K–$250K+	~4,000	Medium	Stable
L4b Embedded Linux/BSP	$210K–$270K+	~1,750	Medium	Stable
L4c Automotive/IoT	$210K–$275K+	~2,400	Medium	Growing
L5a Accelerator Arch	$400K–$550K+	~150	Extreme	Surging
L5b System/SoC Arch	$380K–$500K+	~275	Extreme	Surging
L6a RTL Design	$300K–$400K+	~1,750	High	Growing
L6b Verification	$290K–$380K+	~2,250	Chronic	Growing
L6c FPGA/HLS	$250K–$340K+	~1,400	Medium	Stable
L7a Physical Design	$260K–$360K+	~950	High	Growing
L7b DFT/CAD	$240K–$330K+	~500	Medium	Stable
L8a Packaging/Process	$240K–$330K+	~400	Medium-High	Growing
L8b Silicon Validation	$235K–$310K+	~500	Medium	Stable

Work Arrangement Summary¶

Work Mode	Software-Heavy (L1–L3)	Hardware-Heavy (L4–L8)
Remote	5–15%	1–10%
Hybrid	10–25%	5–20%
Onsite	60–85%	70–94%

Hiring Priority for an AI Chip Startup¶

Hire #	Sub-Layer	Role	Why this order
1	L5a	AI Accelerator Architect	Defines the chip — everything else follows
2	L2b	AI Compiler Engineer	Software must co-design with hardware from day 1
3	L6a	RTL Design Engineer (2–3x)	Implement the architect's design
4	L6b	DV Engineer (2–3x)	Verify correctness before tape-out
5	L2c	Kernel Optimization Engineer	Write reference kernels that prove the architecture works
6	L4a	Firmware Engineer	Command processor, bring-up software
7	L3a	Runtime Engineer	Host-side API and driver
8	L7a	Physical Design Engineer	Synthesis, P&R, timing closure
9	L1a	ML Inference Engineer	Benchmark against competition
10	L8a	Packaging Engineer	Engage foundry, plan packaging

Estimated first-year cost (10 hires, mid/senior): $3M–$5M salary + equity

Where This Roadmap Takes You¶

Roadmap Completion	Sub-Layers You Can Target	Expected Level
Phase 1–3	L1a, L1b, L1c	Junior
Phase 1–3 + Phase 4B	L4a, L4b, L1b	Junior–Mid
Phase 1–3 + Phase 4C	L2a, L2c	Junior
Phase 1–3 + Phase 4A	L6c (FPGA)	Junior
Phase 1–4 (all tracks)	L1a–L4c (any software sub-layer)	Mid
Phase 1–4 + Phase 5F	L5a, L5b, L6a, L6b	Mid
Phase 1–5 (full roadmap)	Any sub-layer L1a–L6c	Mid–Senior