Wiley (Wil) Marques - Software Engineer | AI Architect

In April 2026, Ryan Lopopolo from OpenAI gave a keynote at the AI Engineer Conference titled "Harness Engineering: How to Build Software When Humans Steer, Agents Execute." The talk — and the accompanying blog post — describes how his team built a real product with zero lines of manually-written code: ~1M LOC, ~1,500 PRs, averaging 3.5 PRs per engineer per day.

This post distills the key patterns for building a good agent that takes a development task and delivers a Pull Request.

The Core Philosophy

Humans steer. Agents execute.

The scarce resource is no longer code — it's human time, attention, and model context window. Every engineer is effectively a Staff Engineer leading an infinite team of agents. Your job is not to implement — it's to design systems, specify intent, and build feedback loops.

The Agent Loop: Task → PR

The pattern Lopopolo calls the "Ralph Wiggum Loop" is an iterative cycle where the agent:

Reads AGENTS.md to discover relevant context
Navigates to deeper docs and execution plans
Implements the solution
Runs linters and tests locally
Performs self-review
Opens a Pull Request
Requests reviews from specialized agent reviewers
Addresses feedback and iterates
Repeats until all reviewers approve

Single runs were observed working 6+ hours on a single task — often while the engineers slept.

Pattern 1: Progressive Disclosure

The first instinct was to write one comprehensive AGENTS.md. It failed:

A giant file crowds out the task and relevant code from context
When everything is "important," nothing is — the agent pattern-matches locally
It rots instantly and is hard to verify mechanically

The solution: Treat AGENTS.md as a table of contents (~100 lines) pointing to a structured docs/ directory:

AGENTS.md              ← ~100 lines, pointers only
ARCHITECTURE.md
docs/
├── design-docs/
│   ├── index.md
│   └── core-beliefs.md
├── exec-plans/
│   ├── active/
│   ├── completed/
│   └── tech-debt-tracker.md
├── product-specs/
│   └── index.md
├── references/
│   └── design-system-reference-llms.txt
├── QUALITY_SCORE.md
├── RELIABILITY.md
└── SECURITY.md

Agents start with a small, stable entry-point and are taught where to look next.

Pattern 2: Mechanical Enforcement

Architecture rules must be enforced, not documented.

The team uses:

Custom linters (themselves generated by Codex) that validate dependency directions
Structural tests that check import boundaries between layers
CI jobs that block PRs violating architectural constraints

The key insight: lint error messages become prompt injections. When the agent hits a lint failure, the error message becomes part of its context:

// Bad: "Error: Invalid import"
// Good: "Error: Service layer cannot import from UI layer.
//        Move this logic to a Provider or restructure
//        the dependency. See docs/ARCHITECTURE.md#layers"

Agents replicate patterns that already exist — even suboptimal ones. Without mechanical enforcement, bad patterns compound exponentially.

Pattern 3: Repository as Single Source of Truth

From the agent's point of view, anything it can't access in-context while running effectively doesn't exist.

That Slack thread where the team aligned on an architectural pattern? If it's not in the repo, the agent doesn't know about it. Everything must be pushed into versioned, repository-local artifacts:

Design decisions → versioned markdown
Execution plans → checked in with progress logs
Product specs → indexed and navigable
Technical debt → tracked alongside code

Pattern 4: Feedback Loops Replace Human QA

As throughput increased, the bottleneck shifted from writing code to validating it. The solution: make the application directly legible to the agent.

They wired Chrome DevTools Protocol into the agent runtime so it could:

Launch an isolated app instance per git worktree
Snapshot DOM state before and after interactions
Capture screenshots for visual regression
Query logs via LogQL and metrics via PromQL
Loop until clean: fix → restart → revalidate

The pattern works because it replaces subjective "does this look right?" with mechanical "does this pass?"

Pattern 5: Closing the Loop

If the agent fails, the solution is NOT to fix it manually. It's to encode the correction into the repository to prevent future occurrences.

The cycle:

Observe: Agent makes repeated errors of the same class
Diagnose: What's the underlying pattern?
Codify: Write a lint rule, guardrail, or documentation
Enforce: CI blocks the undesired pattern
Result: That class of error is eliminated permanently

This turns short-term velocity hits into long-term exponential gains.

Skills: Teaching the Agent Your Codebase

Skills are executable instructions that teach the agent how to operate in your specific repository:

How to launch the app — boot, verify health
How to run tests — correct commands, environment setup
How to review — what to check before submitting a PR
How to use observability — connect to local metrics/logs

The key: Codex (or any coding agent) should be the entry-point to development. Tools are designed for the agent to invoke first, humans second.

Agent Reviewers

Instead of mandatory human review, use specialized reviewer agents:

Reviewer	Focus
Security	Vulnerabilities, exposed secrets, injection vectors
Architecture	Layer boundary conformance
Reliability	Retry policies, timeouts, circuit breakers
Product	Implementation matches spec

The agent dev opens a PR → requests agent reviews → iterates on feedback → merges when all approve. Humans may review, but aren't required to.

The Takeaway

The role of the engineer is shifting from writing code to building systems that enable agents to write code reliably. The investment is in:

Documentation that agents can consume
Guardrails that mechanically prevent drift
Feedback loops that allow autonomous validation
Skills that encode tribal knowledge into executable instructions

As Lopopolo puts it: "You can simply say 'do not produce slop, don't accept slop' and you won't get slop in your codebase. But to do that requires taking short-term velocity hits to figure out what the agents are struggling with, put guardrails in place, then step back and spend your time on higher-leverage activities."

Harness Engineering: Building a Dev Agent That Delivers Pull Requests