Skip to content

Lecture 30 - Agentic SDLC: Explore Fast, Ship Safely

Course: Agentic AI & GenAI | Previous: Lecture 29 | Next: Lecture 31


Lecture 29 focused on agent skills:

agents skip discipline
  -> encode senior-engineering workflows
  -> require checkpoints, evidence, tests, and scope control

This lecture starts from the opposite side:

code is cheaper now
  -> use implementation as exploration
  -> preserve what matters: tests, intent, specs, security, and taste

The tension is the point:

Explore fast.
Ship safely.

Strong agent systems support both.


Learning objectives

By the end of this lecture, you should be able to:

  1. Explain why cheap code changes software-process economics.
  2. Separate exploration code from shipping code.
  3. Treat tests and intent as persistent assets.
  4. Explain why end-to-end behavior tests matter more when agents can rewrite internals quickly.
  5. Keep specs synchronized with implementation instead of freezing them upfront.
  6. Identify which work should be automated and which work still requires human taste.
  7. Design a dual-mode agent workflow: explore mode and stabilize mode.
  8. Apply this workflow to OpenClaw, on-device AI, and hardware engineering.

1. The core shift

Traditional software economics assumed:

Old world Practical effect
Writing code is expensive plan carefully before coding
Rewriting is expensive avoid large experiments
Tests feel like overhead test after implementation pressure allows it
Specs are upfront artifacts write once, then implement

Agentic coding shifts the cost structure:

Agentic world Practical effect
Writing code is cheap implement to learn
Rebuilding is cheaper try parallel designs
Tests become the asset behavior contracts let internals change
Specs are continuous update intent as learning happens

The bottleneck moves from typing code to judging code:

Do we know what is worth building?
Can we tell when it is correct?
Can we maintain what we generated?
Can we keep it safe?

2. Code as exploration

"Implement to learn" is the key idea.

Sometimes you do not know the right design until you build a rough version.

This is especially true for:

  • UI workflows
  • agent loops
  • streaming event protocols
  • retrieval quality
  • latency paths
  • hardware bring-up scripts
  • deployment automation
  • developer experience

Prototype code becomes a probe:

implementation -> feedback -> updated intent

You build a slice to discover:

  • missing requirements
  • hidden state
  • bad abstractions
  • UX friction
  • testability problems
  • performance bottlenecks
  • security assumptions

Then you decide what to keep.


3. Cheap code still has expensive consequences

Cheap generation does not make software free.

It moves cost into:

  • review
  • verification
  • maintenance
  • support
  • security
  • incident response
  • user trust
  • documentation
  • operational ownership

The practical rule:

Treat exploratory code as disposable.
Treat tests and intent as assets.

This is why "vibe coding" without contracts breaks down quickly.

The agent can generate a feature in minutes.

The team still owns the bugs for months.


4. The synthesis with Agent Skills

Lecture 29 and this lecture fit together like this:

Concern Agentic SDLC Agent Skills
Exploration implement to learn, rebuild often not the main focus
Discipline maintenance is real workflow checkpoints
Tests persistent behavioral contracts mandatory exit criteria
Specs continuously synchronized structured entry point
Safety cheap code does not remove risk anti-rationalization and policy gates
Human role taste and experience become bottlenecks review, scope, and verification discipline

Combined loop:

EXPLORE
  -> build cheap prototypes
  -> learn from behavior
  -> update intent

LOCK IN
  -> turn useful behavior into tests
  -> update specs
  -> define constraints

STABILIZE
  -> apply skills
  -> verify
  -> review diff

SHIP
  -> release with evidence
  -> monitor and maintain

This is the agentic SDLC.


5. Explore mode vs stabilize mode

Do not use the same rules for every phase.

Explore mode

Property Rule
Goal learn quickly
Code quality rough is acceptable
Scope broader experiments allowed
Tests lightweight probes or golden examples
Output notes, screenshots, traces, candidate designs
Human review frequent direction checks

Stabilize mode

Property Rule
Goal make selected behavior safe to ship
Code quality maintainable and reviewable
Scope narrow, explicit, approved
Tests required behavior contracts
Output small diff, evidence, risk note
Human review final engineering review

Example:

"Try three ways to implement local voice activity detection."
  -> explore mode

"Make the selected VAD implementation production-ready."
  -> stabilize mode

6. Tests as the stability layer

When code is easy to rewrite, tests become more important.

Reason:

tests preserve behavior while agents rewrite implementation

Useful agentic tests are often behavior-level:

  • user journey tests
  • API contract tests
  • CLI smoke tests
  • event-stream contract tests
  • artifact-shape tests
  • model-independent harness tests
  • hardware observable-state tests

For OpenClaw-style systems:

Area Useful contract
Gateway RPC request/response schema and event ordering
App SDK normalized event shapes and wait/cancel behavior
cron invalid schedules rejected before job creation
node transport node command must be declared and allowed
tool policy denied tools fail closed
system prompt expected sections present without leaking secrets

The test should answer:

What must remain true if the implementation changes?

7. Intent documentation

Tests say what works.

Code says how it works.

Specs say what the system should do.

Intent explains why.

Agents need intent because they do not have durable product judgment unless you write it down.

Good intent docs include:

  • why this design exists
  • alternatives rejected
  • tradeoffs accepted
  • what must not be optimized away
  • what future work is intentionally deferred

Example:

# Intent: Gateway RPC Event Normalization

We normalize raw Gateway frames in the App SDK because external apps need a
stable event contract. Apps should not parse internal runtime frames directly.

Rejected alternative:
- expose raw frames only

Reason:
- raw frames create fragile UI integrations and make runtime changes risky

Must preserve:
- unknown raw frames remain available for advanced users
- stable event envelope stays versioned

This is high-value context for future agents.


8. Specs must evolve

A static spec is often wrong after implementation begins.

Agentic development reveals:

  • API edge cases
  • missing permission states
  • testability constraints
  • model behavior issues
  • UI states not considered
  • hardware timing problems

Continuous spec rule:

Every meaningful implementation discovery should update:
- acceptance criteria
- non-goals
- constraints
- test plan
- open risks

This is not bureaucracy.

It preserves learning.


9. Human taste becomes the limiter

When code arrives faster than external feedback, judgment becomes the bottleneck.

Taste means knowing:

  • what good looks like
  • which complexity is not worth it
  • when a prototype is lying
  • when UX is awkward
  • when an abstraction is premature
  • when a test is too brittle
  • when security risk is being hand-waved

Agents amplify taste.

They do not replace it.

Better engineers get more from agents because they:

  • frame tasks precisely
  • constrain the search space
  • detect weak answers faster
  • recognize accidental complexity
  • identify missing verification

10. Automate the easy stuff

Good automation targets:

  • formatting
  • linting
  • test selection
  • smoke test execution
  • dependency checks
  • docs build checks
  • API schema generation
  • screenshot capture
  • event fixture replay
  • log summarization

Repeated lessons should become:

habit -> checklist -> skill -> hook -> CI gate

Example:

Agent repeatedly forgets to run mkdocs build.
  -> add docs-build skill
  -> add final-answer evidence check
  -> add CI gate

11. Dual-mode agent design

A practical coding agent should support two explicit modes.

Explore mode

Purpose:

learn quickly, compare options, surface hidden constraints

Allowed behavior:

  • build throwaway prototypes
  • compare approaches
  • run quick probes
  • produce notes and tradeoff tables
  • ask for human direction before stabilizing

Required output:

what was tried
what was learned
which option is recommended
what evidence supports it
what should be discarded

Stabilize mode

Purpose:

turn selected behavior into reviewable, maintainable code

Required behavior:

  • update spec
  • add or update tests
  • keep diff scoped
  • run verification
  • document remaining risk
  • produce review-ready summary

Required output:

files changed
tests run
evidence captured
scope changes
known risks
next action

12. OpenClaw mapping

In an OpenClaw-style runtime:

SDLC concern Runtime primitive
Explore mode isolated session or sandbox workspace
Stabilize mode main project session with stricter tools
Tests as contracts tool execution plus captured run output
Intent docs workspace bootstrap files or project docs
Spec sync session memory and project markdown updates
Scope discipline file policy, diff review, approval hook
Evidence artifacts, logs, screenshots, run events
Human taste approval UI, dashboard, review surfaces
Long-running work cron, sessions, task ledger

Useful command vocabulary:

/explore "Try three possible implementations"
/choose "Select option B and explain why"
/stabilize "Make option B production-ready"
/verify "Run the contract checks"
/review "Inspect the diff and risks"

The runtime should record mode in run metadata.

Reviewers need to know whether they are looking at experiment output or ship-ready output.


13. On-device AI example

Task:

Improve wake-word responsiveness without increasing false positives.

Explore mode:

1. Try three VAD/wake-word pipeline variants.
2. Measure latency on short sample clips.
3. Track CPU/GPU usage.
4. Record false-positive behavior on noisy clips.
5. Recommend one candidate.

Stabilize mode:

1. Update the selected pipeline only.
2. Add regression clips.
3. Add latency threshold test.
4. Add false-positive check.
5. Document runtime limits.
6. Run on target Jetson or representative device.

Exploration discovers behavior.

Tests turn discoveries into contracts.

Stabilization prevents prototype debt.


14. Hardware bring-up example

Task:

Get ESP32-C6 Zigbee NCP talking to Jetson over UART.

Explore mode:

1. Confirm serial device candidates.
2. Try baud rates and flow-control assumptions.
3. Capture logs for each attempt.
4. Compare host-side and firmware-side symptoms.
5. Stop before changing firmware and kernel settings together.

Stabilize mode:

1. Document working wiring and serial config.
2. Add a bring-up checklist.
3. Add a smoke command.
4. Save known-good logs.
5. Add troubleshooting table for common failure states.

The agent skills from Lecture 29 prevent multi-variable chaos.

This SDLC lets you explore enough to learn.


15. Minimal artifact set

For serious projects, preserve:

SPEC.md
INTENT.md
TEST_PLAN.md
DECISIONS.md
RISKS.md
RUNBOOK.md

Minimal version:

SPEC.md      what should be true
INTENT.md    why decisions were made
TESTS        executable behavior contracts

If the agent can read only three things before changing code, give it:

current spec
relevant tests
current intent

16. Failure modes

Failure What happened Fix
Prototype shipped exploration code went to production require stabilize mode before merge
Spec drift implementation taught new facts, docs stayed old update spec during work
Test theater tests assert implementation details only write behavior contracts
Infinite exploration agent keeps trying ideas without converging timebox and force recommendation
Over-process agent writes bureaucracy for tiny tasks scale process to risk
Weak taste agent optimizes local code but worsens product human review for UX/architecture/security
Hidden maintenance generated code owns long-term support burden record owner, risks, and rollback path
Security blind spot code is cheap, exploit cleanup is not enforce policy and review threat paths

Dangerous confusion:

fast generation != low total cost

Mini-lab

Add two commands or skills to your agent workspace:

/explore
/stabilize

/explore output:

- options tried
- evidence gathered
- recommendation
- discarded ideas
- follow-up questions

/stabilize output:

- updated spec/intent
- tests added or updated
- verification command output
- scoped diff summary
- risks and rollback

Test with:

Explore three ways to improve OpenClaw App SDK event replay.
Then stabilize the best one.

Key takeaways

  • Cheap code changes software process, but it does not remove engineering cost.
  • Implementation can be an exploration tool.
  • Tests and intent are durable assets.
  • Specs should evolve as implementation reveals reality.
  • Human taste and domain experience become more important when code arrives faster.
  • Agent skills provide the stabilization discipline that exploration alone lacks.
  • The useful pattern is dual-mode: explore fast, then stabilize with evidence.

References


Next: Lecture 31 - Runtime Strategy for Agent Systems: Node, Bun, Rust, and Edge Packaging