In April 2026, Ryan Lopopolo from OpenAI gave a keynote at the AI Engineer Conference titled "Harness Engineering: How to Build Software When Humans Steer, Agents Execute." The talk — and the accompanying blog post — describes how his team built a real product with zero lines of manually-written code: ~1M LOC, ~1,500 PRs, averaging 3.5 PRs per engineer per day.
This post distills the key patterns for building a good agent that takes a development task and delivers a Pull Request.
The Core Philosophy
Humans steer. Agents execute.
The scarce resource is no longer code — it's human time, attention, and model context window. Every engineer is effectively a Staff Engineer leading an infinite team of agents. Your job is not to implement — it's to design systems, specify intent, and build feedback loops.
The Agent Loop: Task → PR
The pattern Lopopolo calls the "Ralph Wiggum Loop" is an iterative cycle where the agent:
- Reads
AGENTS.mdto discover relevant context - Navigates to deeper docs and execution plans
- Implements the solution
- Runs linters and tests locally
- Performs self-review
- Opens a Pull Request
- Requests reviews from specialized agent reviewers
- Addresses feedback and iterates
- Repeats until all reviewers approve
Single runs were observed working 6+ hours on a single task — often while the engineers slept.
Pattern 1: Progressive Disclosure
The first instinct was to write one comprehensive AGENTS.md. It failed:
- A giant file crowds out the task and relevant code from context
- When everything is "important," nothing is — the agent pattern-matches locally
- It rots instantly and is hard to verify mechanically
The solution: Treat AGENTS.md as a table of contents (~100 lines) pointing to a structured docs/ directory:
AGENTS.md ← ~100 lines, pointers only
ARCHITECTURE.md
docs/
├── design-docs/
│ ├── index.md
│ └── core-beliefs.md
├── exec-plans/
│ ├── active/
│ ├── completed/
│ └── tech-debt-tracker.md
├── product-specs/
│ └── index.md
├── references/
│ └── design-system-reference-llms.txt
├── QUALITY_SCORE.md
├── RELIABILITY.md
└── SECURITY.md
Agents start with a small, stable entry-point and are taught where to look next.
Pattern 2: Mechanical Enforcement
Architecture rules must be enforced, not documented.
The team uses:
- Custom linters (themselves generated by Codex) that validate dependency directions
- Structural tests that check import boundaries between layers
- CI jobs that block PRs violating architectural constraints
The key insight: lint error messages become prompt injections. When the agent hits a lint failure, the error message becomes part of its context:
// Bad: "Error: Invalid import"
// Good: "Error: Service layer cannot import from UI layer.
// Move this logic to a Provider or restructure
// the dependency. See docs/ARCHITECTURE.md#layers"
Agents replicate patterns that already exist — even suboptimal ones. Without mechanical enforcement, bad patterns compound exponentially.
Pattern 3: Repository as Single Source of Truth
From the agent's point of view, anything it can't access in-context while running effectively doesn't exist.
That Slack thread where the team aligned on an architectural pattern? If it's not in the repo, the agent doesn't know about it. Everything must be pushed into versioned, repository-local artifacts:
- Design decisions → versioned markdown
- Execution plans → checked in with progress logs
- Product specs → indexed and navigable
- Technical debt → tracked alongside code
Pattern 4: Feedback Loops Replace Human QA
As throughput increased, the bottleneck shifted from writing code to validating it. The solution: make the application directly legible to the agent.
They wired Chrome DevTools Protocol into the agent runtime so it could:
- Launch an isolated app instance per git worktree
- Snapshot DOM state before and after interactions
- Capture screenshots for visual regression
- Query logs via LogQL and metrics via PromQL
- Loop until clean: fix → restart → revalidate
The pattern works because it replaces subjective "does this look right?" with mechanical "does this pass?"
Pattern 5: Closing the Loop
If the agent fails, the solution is NOT to fix it manually. It's to encode the correction into the repository to prevent future occurrences.
The cycle:
- Observe: Agent makes repeated errors of the same class
- Diagnose: What's the underlying pattern?
- Codify: Write a lint rule, guardrail, or documentation
- Enforce: CI blocks the undesired pattern
- Result: That class of error is eliminated permanently
This turns short-term velocity hits into long-term exponential gains.
Skills: Teaching the Agent Your Codebase
Skills are executable instructions that teach the agent how to operate in your specific repository:
- How to launch the app — boot, verify health
- How to run tests — correct commands, environment setup
- How to review — what to check before submitting a PR
- How to use observability — connect to local metrics/logs
The key: Codex (or any coding agent) should be the entry-point to development. Tools are designed for the agent to invoke first, humans second.
Agent Reviewers
Instead of mandatory human review, use specialized reviewer agents:
| Reviewer | Focus |
|---|---|
| Security | Vulnerabilities, exposed secrets, injection vectors |
| Architecture | Layer boundary conformance |
| Reliability | Retry policies, timeouts, circuit breakers |
| Product | Implementation matches spec |
The agent dev opens a PR → requests agent reviews → iterates on feedback → merges when all approve. Humans may review, but aren't required to.
The Takeaway
The role of the engineer is shifting from writing code to building systems that enable agents to write code reliably. The investment is in:
- Documentation that agents can consume
- Guardrails that mechanically prevent drift
- Feedback loops that allow autonomous validation
- Skills that encode tribal knowledge into executable instructions
As Lopopolo puts it: "You can simply say 'do not produce slop, don't accept slop' and you won't get slop in your codebase. But to do that requires taking short-term velocity hits to figure out what the agents are struggling with, put guardrails in place, then step back and spend your time on higher-leverage activities."