Inception Through Delivery: Designing the SDLC for the AI Era

Part 2 of a three: Why the operating model matters more than the tooling, and how the cycle actually changes when you take AI seriously.

This is part of a larger argument about how engineering teams should operate in the AI era. The previous piece made the structural case that smaller teams—two to three engineers instead of six to eight—become viable when the SDLC is redesigned around the new constraints. This piece is about that redesign. What inception looks like when you can prototype in real time. Why standup goes away. What replaces it. Where AI belongs in the workflow, and where it doesn’t. How accountability and coaching get manufactured as a side effect of normal operations rather than as something senior engineers do when they have time, which they never do.

Most of the conversation about AI in the SDLC is happening at the wrong phase. The arguments are about whether vibe coding is safe at implementation, whether AI can write production code, whether engineers should be reviewing AI output or generating it themselves. Those questions matter, but they’re downstream. The more interesting question is what AI does to the front of the SDLC—to inception, to discovery, to the work of converging on what to build before anyone writes anything that ships. That’s where the traditional SDLC has been broken for decades and nobody noticed because we accepted calendar latency as the cost of doing business.

Once you redesign the front, the rest of the cycle has to follow. Standup stops making sense. Plan review takes its place. PR review changes shape. Stakeholders move from being audiences for finished work to participants in the work as it forms. The whole rhythm of how an engineering team operates shifts—not because AI made coding faster, but because AI made the slow parts of the cycle visible as the actual constraint they always were.

Where this argument sits in the current conversation

There is no shortage of writing right now about redesigning the SDLC for AI. McKinsey-style consulting decks. Vendor whitepapers. Methodology proposals from major cloud providers. Most of them converge on a similar thesis: let AI do more of the work autonomously, with humans moved upstream into supervision and orchestration roles. The language reaches for terms like intent engineering, autonomous reasoning systems, and self-managing delivery pipelines. The aspirational end state is a system where agents reason, predict, self-correct, and humans review the results.

This piece argues the opposite. Not because AI isn’t capable (it’s capable of more every quarter)but because the autonomy-first frame is solving the wrong problem and producing operating models that depend on capabilities AI doesn’t reliably have. There are three places where I think the dominant frame breaks down, and they’re worth naming up front.

The first is the diagnostic frame. AI’s failures aren’t random, they reflect what’s already in the codebase and the industry. AI pattern-matches against the public corpus of software, which is the work the industry has been shipping for decades. When AI confidently produces code that’s syntactically clean but semantically wrong, that’s not an AI problem; it’s the industry’s habits coming back at higher volume. Most teams are getting AI wrong because their domains were unclear before AI showed up. The autonomy-first pieces gesture at “context infrastructure” and “metadata” as solutions, but they don’t make the deeper point: AI is a mirror of how the work has been done. If your codebase doesn’t make the domain clear to a new engineer, it won’t make it clear to AI either, and no amount of orchestration overhead fixes that.

The second is the accountability boundary. The autonomy-first frameworks treat autonomous PR generation as the goal state: agents writing, reviewing, and merging code with humans in supervisory roles. I think that’s the wrong answer, not because AI isn’t capable of generating PRs, but because reviewability requires human ownership. Code that nobody has read, understood, and put their name on is not reviewable in any meaningful sense, regardless of what tooling sits around it. The accountability boundary isn’t a workflow speed bump; it’s the only mechanism that keeps AI’s pattern-matching from reaching production unchecked. The piece that argues for removing it is arguing for removing the only thing that made review work in the first place.

The third is the leadership and bench thesis. The autonomy-first writing mostly assumes the leadership pipeline is someone else’s problem: a separate program, a learning and development function, something HR figures out. The operating model I’m describing produces juniors-into-seniors and seniors-into-leaders as a structural output of how the work gets done. It’s the thing that keeps the model sustainable rather than just well-designed. Models that route humans into supervision-only roles quietly stop developing the next generation, because supervision isn’t how engineers grow into senior judgment or how seniors grow into leadership. The bench thins, the leaders carry more, and three years later the organization wonders why nobody internal can step into the senior role that just opened up.

The paper leader risk applied to AI strategy

There’s a related failure mode worth naming directly. I’ve written before about paper leaders—leaders who lead by concepts they read somewhere, replacing experience with citations and judgment with frameworks. The autonomy-first SDLC writing is exactly the kind of material that produces paper-leader decisions at scale. A consulting deck or vendor whitepaper lands on a leader’s desk. The framework looks coherent. The case studies sound impressive. The leader, without operating experience to evaluate whether the model survives contact with their actual codebase and team, adopts it. Six months later, the AI bill is enormous, the senior bench has thinned, the codebase has accumulated AI-generated debt that nobody fully owns, and the leader is genuinely surprised—because the framework didn’t cover the failure modes their operational reality was always going to surface.

This is the highest-stakes version of the paper leader pattern, because the technology is expensive, the variable costs are unpredictable, and the structural decisions (smaller teams, autonomous agents, eliminated review steps) are hard to reverse once they’ve been made. The defense isn’t to refuse AI; AI is real and useful. The defense is to engage with AI strategy as an operator, not as a reader. Understand where AI fails, in your codebase, with your team, against your domain. Build the operating model around those failure modes rather than around the marketing claims about how they’ll be solved next quarter. That’s the work the rest of this piece is about.

Inception, when you can prototype in real time

The traditional inception phase is mostly latency. Someone raises a problem. The room debates it abstractly. Someone hand-waves a solution. The team breaks to “go research” or “go design.” Calendars get involved. Two weeks pass. Half the room has lost context. The next meeting is partially re-establishing what was already discussed. A design doc surfaces, accumulates comments that talk past each other, and eventually a decision gets made—often by whoever had the most stamina rather than the best idea. The artifact at the end is a design that’s been validated in committee but not against reality.

Vibe coding at inception collapses that. Sit in the room, sketch the actual flow, generate a working prototype that exposes the real questions, the ones that don’t surface in abstract discussion, and iterate while the room is still together and still has context. The discussion stops being theoretical. “What if we do it this way” stops being a hand-wave and becomes something to poke at within minutes. The decisions made at the end of that session are grounded in something tangible, not in whoever was most persuasive.

This is a real shift in what the SDLC can do, and it doesn’t eliminate inception or design, it makes them better. Design happens in real time with real artifacts instead of in calendar-mediated abstraction. The artifact of the session is a decision with a working sketch attached, rather than a design doc trying to convince absent reviewers.

What the inception artifact is, and what it isn’t

The prototype is a thinking tool. It is not an asset. It should be thrown away.

This is the trap teams fall into the moment they start using vibe coding seriously: the inception prototype works, so it becomes the implementation. “It already works” is the rationalization. It doesn’t. It works in the narrow conditions tested in the room. The minute the inception artifact is treated as production-bound, every problem of vibe coding at the wrong phase comes back—pattern-matched code without domain understanding, ignored boundaries, brittle foundations that look fine until the second feature lands on top of them.

The discipline is using AI to compress the thinking and then doing the actual implementation work—with rigor, review, and architectural intent—in a separate, deliberate phase. The prototype’s job is to surface the questions, ground the conversation, and produce a decision. Then it goes in the trash.

This also changes what design documents are for. They stop being the artifact that converges the team on a decision and become the artifact that captures why the decision was made—the constraints, the rejected alternatives, the tradeoffs, the boundaries respected. The convergence already happened in the room with the prototype. The doc is now memory, not negotiation.

The room matters more than the tooling

Vibe coding during inception with a room full of pattern-matchers produces confidently wrong prototypes very quickly. The compression amplifies whatever judgment is in the room. The prerequisite isn’t AI tooling, it’s having engineers and domain experts who can read the prototype critically as it forms.

This is also where bounded contexts start mattering structurally rather than philosophically. If the team can’t articulate the domain boundaries the new feature lives within, the prototype will quietly violate them and AI won’t catch the violation because AI doesn’t know the domain. The room has to know the domain. If the room doesn’t, vibe coding doesn’t solve the problem; it just makes the wrong decision faster.

The other operating-model change is in the cadence of inception itself. The traditional pattern is many short meetings separated by individual research time. The new pattern is fewer, longer working sessions with the right people present. Bolting vibe coding onto the existing meeting cadence produces neither the speed gain nor the quality gain. The cadence has to change with it.

From inception to plan: pair-planning with AI, then with peers

Inception produces a decision and a discarded prototype. The next phase is planning the actual implementation—and this is where AI earns its place differently than most teams use it. Pair-planning is deliberate, slow thinking, not coding. The engineer uses AI as a thought partner to pressure-test the approach, surface edge cases, walk through the implications of the design at the boundaries it touches.

Then the plan moves to peers. Not to a ticket queue, not to a design doc that sits in a folder, but to the engineers who understand the surrounding work—other engineers in the same domain, engineers in adjacent domains whose contracts the work might touch. They poke at the plan. They challenge assumptions. They surface risks the AI missed because the AI doesn’t know the operational history of the system.

By the time the plan reaches formal review, it has been stress-tested twice—once by AI as a thought partner, once by peers as domain experts. The review meeting isn’t evaluating an unrefined idea. It’s evaluating a plan that has already survived initial scrutiny.

Plan review replaces standup

Scrum standup was built for a specific bottleneck: coordinating execution across engineers when execution was slow, expensive, and hard to predict. That bottleneck no longer exists in the same form. When AI can produce a working implementation in minutes, the daily ritual of “what did you do, what will you do, where are you stuck” is asking the wrong question. It’s checking on the part of the work that no longer needs checking on. Meanwhile the parts that do need scrutiny (the plan, the intent, the architectural fit, the review) get squeezed into whatever time is left, which is usually none.

Plan review takes the place of standup as the central operating ritual. It is engineering-only. Run by engineering leaders. Focused on whether the work serves the north star, whether boundaries between domains are being respected, and whether new concepts are being introduced thoughtfully or accidentally.

The engineer doing the work presents. They come prepared to answer four things: what problem are we solving, how does this move the needle, what does success look like, and what is the plan to get there. The room asks questions. If the engineer can’t defend the plan, that’s the signal and either the plan isn’t ready or the engineer needs more pairing before it is. Either way, the issue surfaces before code matters.

The not-ARB ARB

This will sound like a return of the Architecture Review Board. It isn’t. ARBs got their reputation honestly. They became gatekeeping rituals, slow and political, where architects who hadn’t shipped code in years would block teams over abstract principles. Most modern organizations killed them deliberately, and for good reason.

Plan review is structurally different in ways that matter. It’s run by engineering leaders who are still close to the work, not by a separate architecture function that exists outside delivery. The reviewers have skin in the game. They’re leading teams that ship, so their feedback is grounded in operational reality, not theoretical purity. The cadence is tied to the actual flow of work. Every plan goes through it as a normal step, not as a special-case escalation that triggers a months-long process. The output is a green light to execute, not a document that goes into a shared drive and gets re-litigated.

The most important difference is who presents. Traditional ARBs took ownership away from engineers; senior architects evaluated work that wasn’t theirs against principles they didn’t have to ship against. Plan review uses the same review structure to deepen ownership. The engineer doing the work walks the plan, answers the questions, and owns the response. They leave the meeting with sharper thinking, not just an approval stamp. That’s coaching disguised as governance.

Cross-domain awareness as a byproduct

Most engineering organizations have a structural problem where team leads only see their own team’s work in detail and everyone else’s work as headlines. That’s how teams ship things that quietly violate other teams’ invariants, duplicate capabilities, or take dependencies on contracts that are about to change. By the time the misalignment surfaces, it’s a costly correction.

Plan review fixes this by making cross-team awareness a byproduct of normal operations. Leads leave the meeting knowing what’s coming from neighboring domains. Their own planning improves automatically. More importantly, when a lead hears another team’s plan and recognizes a boundary violation or a brittle assumption, they can flag it in the room before the work starts, when the cost of changing direction is near zero. Without plan review, that same intervention happens after PRs are flying, when the cost of reversal is already high and the conversation gets political. Plan review compresses the cost of cross-team correction from weeks to minutes.

This is one of the things scrum standups actively hide. Standups are intra-team. They never surface cross-team risk until it’s already a problem.

Execution: AI doesn’t issue PRs. The engineer does.

Once the plan is approved, execution begins. AI is deeply involved in this phase—pairing with the engineer, generating tests, surfacing edge cases, accelerating the volume work that used to consume disproportionate time. With the right tooling and access to other agents, AI also handles much of the supporting infrastructure: build, security checks, supportability concerns, the kind of work that traditionally got deferred or skipped under deadline pressure.

But AI does not issue pull requests. The engineer does.

This distinction is doing enormous work in the operating model, and it will be the most quoted and the most misunderstood point in this piece. Some readers will hear it as Luddism, as resistance to autonomous AI agents, as fear of the future. It isn’t. The point isn’t to slow AI down. The point is to preserve the only mechanism that makes code reviewable: a human who has read every line, understood it, taken ownership of it, and put their name on it.

That accountability boundary is what keeps human-in-the-loop from becoming a checkbox. AI can write code, propose changes, draft the PR description, suggest the commit message. But the engineer is the one whose name is on the commit. They have read it. They have understood it. They are accountable for it. That’s not slowing down the work. That’s the only thing standing between AI’s pattern-matching and production.

The reason this matters specifically with AI is that AI fails differently than humans do. AI doesn’t know the business concepts behind the code. It pattern-matches against what it has seen, which is the public corpus of software written by an industry that has been getting plenty of things wrong for decades. AI is happy to produce code that’s syntactically clean but semantically wrong—that conflates transactions and events, that puts business logic in the wrong layer, that treats authorization as an afterthought, that builds eventually-consistent systems with synchronous code on top. None of those are AI being stupid. Those are the modal patterns in production codebases. AI learned from what was shipped. The engineer’s accountability (reading every line, understanding it, owning it) is what catches the failure modes the pattern-match introduces.

Paired code review: coaching at structural scale

Code review in the new model is paired. Not async, not in PR comments, not as a stamp-and-merge ritual. A senior engineer reviews with the engineer who did the work: together, in the same conversation, walking the code, asking the questions, surfacing the issues.

This is where the system becomes self-sustaining. Coaching happens here. The next generation of engineers learns judgment here. The senior engineer’s experience compounds into the team rather than being trapped in their head. None of this works asynchronously. Comments-on-a-PR theater can catch syntax issues, but it can’t teach why a particular boundary matters, why this approach scales and that one doesn’t, why the obvious solution is wrong in this codebase even if it’s right in general.

The structural payoff of paired review is that it turns career progression into a normal output of the operating model rather than a separate program someone has to remember to run. Junior engineers become senior engineers because they spend their working hours next to senior judgment—watching it form in real time, hearing the questions seniors ask, seeing what gets caught and why. Senior engineers become leaders because they’ve been carrying the responsibility of pressure-testing plans, coaching juniors, and being the second pair of eyes on real production code. By the time the org needs another lead, there are seniors who have already been doing the work; the promotion is recognition of capability already demonstrated, not a leap into something untested.

Most engineering organizations lose their leaders to tactical work because the system underneath them isn’t producing senior judgment fast enough. The leader becomes the only person who can make certain calls, so they make all of them, and stop doing the strategic work they were hired for. Paired review treats coaching and the manufacture of senior judgment as load-bearing structural elements of the SDLC rather than as something seniors do when they have time. The org produces senior judgment continuously as a side effect of normal operations. That’s how leaders stay strategic, and it’s how the bench keeps deepening over time rather than thinning out as people leave.

This also means the senior bench is not optional in the new model. The whole system breaks if seniors have been laid off and replaced with AI, because plan review and paired code review both require someone with real judgment to be the second pair of eyes. The organizations that gutted their senior bench to “save costs with AI” have just removed the exact people they need to make AI safe. That’s worth being explicit about, because it directly contradicts the cost-cutting narrative driving most of the current AI adoption.

Stakeholder close-out: shipped is not delivered

The cycle doesn’t end at deployment. Most teams running scrum end the ritual at “story done”: the work merged, the demo happened, the ticket closed. That’s the moment the team stops paying attention, and it’s exactly the moment when they should be paying the most attention.

Shipped is not the same as delivered. Stakeholder close-out is the loop that confirms the change actually moved the needle on the metric the team committed to at plan review or didn’t, in which case the team adjusts. The discipline of defining success criteria at inception only matters if you measure against them at the end. Without the close-out loop, success criteria become theatrical.

This loop also forces the team to be outcome-based. With smaller teams operating on shorter cycles, there’s no slack to absorb wasted work. Every cycle has to ask “did this move the metric we said it would,” and if not, why not, and what does that tell us. That discipline is what makes the operating model sustainable. It’s also what gives the team the right kind of accountability—not to story points completed, but to outcomes confirmed.

The infrastructure that supports this loop deserves a brief mention because it’s not optional. Observability has to be in place. Events have to be capturable. The system has to be able to tell the team what actually happened after the change shipped, in something close to real time. That’s the architectural prerequisite that makes outcome-based operations possible at scale. A team running this operating model on top of a system that can’t answer “what did the change actually do” is going to spend its close-out cycles guessing instead of learning.

How the cycle holds together

Each phase reinforces the others. Inception with stakeholders catches misalignment early, when changing direction is cheap. Pair-planning with AI surfaces the depth of thinking before review. Peer pressure-testing closes gaps that AI can’t see. Plan review aligns on intent, surfaces cross-domain risk, and earns the engineer’s ownership through the act of presenting. Execution with AI accelerates the volume work while the engineer holds the accountability boundary. Paired code review preserves senior judgment, manufactures it across the team, and builds redundancy for leads. Stakeholder close-out confirms outcomes and forces honest accounting of what worked.

None of these phases are optional. Take any one of them out and the model breaks. Skip inception, and the team executes confidently in the wrong direction. Skip pair-planning, and the plan is shallow. Skip plan review, and cross-domain risk doesn’t surface until it’s expensive. Skip the accountability boundary on PRs, and AI’s pattern-matching reaches production unchecked. Skip paired review, and senior judgment doesn’t propagate. Skip stakeholder close-out, and the team optimizes for shipping rather than for outcomes.

This is why the lazy version of “do more with less AI” fails. Cutting a phase to move faster collapses the structural integrity of the whole cycle. The model produces speed, accountability, and quality together because it preserves all of the elements that make each one possible. Pull one out and the others go with it.

Done well, the cycle is faster than the one it replaces and not because any single phase was made faster, but because the slow parts of the old cycle (calendar latency at inception, async coordination overhead at standup, deferred review backlogs, missing outcome loops) have been removed or restructured. What remains is the actual work of producing software that does what the business needs, with accountability and judgment preserved, and senior capability manufactured as part of normal operations.

What this asks of leaders

This is not a tooling change. It’s an operating model change. Bolting AI onto scrum, keeping standup, treating vibe coding as an implementation accelerant, letting AI agents issue PRs autonomously—these are the moves that produce the AI failure modes most leaders are worried about. The operating model has to change with the technology, not just adopt the technology inside the existing operating model.

It also asks something specific from leaders themselves. Plan review only works if engineering leaders are still close enough to the work to pressure-test it credibly. Paired code review only works if the senior bench is intact and actively engaged in coaching. Stakeholder close-out only works if leaders defend the time it takes to validate outcomes against the pressure to start the next thing. The operating model produces speed, but only if leaders hold the structure that makes the speed possible.

Done well, this is also how the leadership pipeline keeps refilling itself. Juniors become seniors because they spend their working hours pair-programming and getting reviewed by people whose judgment they’re actively absorbing. Seniors become leaders because plan review and paired review are giving them reps at the work leaders actually do: pressure-testing intent, weighing tradeoffs across domains, coaching engineers through hard calls. The organization isn’t just shipping software. It’s producing the next layer of senior capability and the next generation of leaders as a normal output of the cycle. That’s the leadership work the new model demands. It’s not more meetings, not more dashboards, but the discipline to keep the structure intact when shortcuts look attractive, because the structure is what produces both the work and the people who will lead the work next.

The next piece in this series is about how to know any of this is actually working. Metrics, instrumentation, what survives from DORA, what gets retired, what gets added, and the discipline of reading patterns instead of chasing single numbers. Without that, this model is just another methodology. With it, it becomes accountable.