Understanding how AI
actually changes
product development

The universal shift

The comparison isn't to software. It's to electricity.

Erik Brynjolfsson calls AI "the most general of all general-purpose technologies." The comparison isn't to the last software cycle or the latest platform shift. It's to electricity. The claim is that AI is a technology that touches every role, every function, every industry, and that its real impact comes not from the technology itself but from the reorganization it demands.

Electricity didn't transform manufacturing when factories installed electric motors. It transformed manufacturing when factories were redesigned around the new capabilities that electricity made possible, individual workstations with their own motors, flexible production lines, buildings designed for workflow rather than proximity to a central power shaft. The technology was available for decades before the reorganization delivered the productivity gains.

The data on AI adoption looks similar. Eighty-eight percent of organizations use AI in at least one function. Seventy-nine percent use generative AI specifically. But only 6% qualify as high performers, organizations that have moved beyond experiments to transformation. The gap between using and transforming is enormous. Most organizations sit in what the World Economic Forum calls Phase 2: a thousand flowers blooming, disconnected experiments, not yet linked to strategy.

The skeptical case and the J-Curve

The dip before the gain

The skeptical view has real weight. Daron Acemoglu, MIT economist, Nobel laureate, estimates that only about 5% of tasks in the economy can be economically automated with AI today, with GDP impact around 0.07% over a decade. His argument: the headline projections of trillions in economic value conflate technical capability with economic viability. Not everything AI can do is worth doing with AI.

Brynjolfsson has an answer to this, and it matters for the paper's argument. His Productivity J-Curve framework predicts that general-purpose technologies cause a measurable productivity dip before the gains materialize, because the real value requires organizational restructuring, not just tool adoption. The investment in new processes, new roles, new workflows is real but largely intangible and poorly measured. Manufacturing data validates the pattern: AI adoption initially depressed productivity by 1.33 percentage points before producing stronger growth. The dip is the cost of reorganization.

This means the current state, tools adopted, productivity gains mixed, organizations struggling to capture value, isn't evidence that AI doesn't work. It's the expected early signature of a technology that requires the process to change, not just the tools.

Why product development first

The sector at the leading edge

Product development sits at the leading edge of this transformation. The World Economic Forum's analysis of 19,000 tasks across 867 occupations found software and platforms at 33% exposure to automation and augmentation, the highest of any sector. ICT sector adoption is approaching saturation in some countries. A survey of product teams found 100% using AI tools and 98% planning structural changes as a result.

The transformation is real, uneven, and early. What follows is an examination of how it's playing out in product development, what's actually changing, what isn't, and what the shift demands of the people building products.

How teams work today

Discover. Design. Build. Ship.
A process built for a slower world.

The most influential model for how product teams should work comes from Marty Cagan, whose books and frameworks have defined product management practice for two decades. The empowered product team model puts a trio at the center: a product manager who owns the problem space, a designer who owns the user experience, and engineers who own the technical solution. The team is empowered to solve problems, not just ship features dictated from above, and operates in dual tracks: discovery (figuring out what to build) and delivery (building it).

It's a good model. It emphasizes autonomy, cross-functional collaboration, and outcome orientation. And most organizations don't fully achieve it. Industry surveys consistently show that the majority of product teams operate closer to the feature team model, receiving requirements from stakeholders and executing them, than the empowered team ideal. Cagan himself acknowledged this gap for years before beginning, in 2025, to articulate a more dramatic revision. His emerging vision: AI will "radically transform how we build products, drastically increasing efficiency and reducing team size." He began describing a near-future team of three people, a product manager, an engineer, and a designer, handling work that currently requires ten or more, with AI absorbing the execution workload.

But even Cagan's revision still starts from the same underlying assumption: the process flows in one direction. Discover what to build. Design how it should work and look. Build it. Ship it. Learn. Repeat. Each phase has a primary owner. Each transition involves a handoff.

The tools that encode the process

Figma, GitHub, Jira, each built for a separate workflow

The tools reflect this model. Figma provides a collaborative canvas where designers create visual representations of the interface, screens, flows, component libraries. These are mockups: they look like the product but aren't the product. They describe how things should appear and behave, but developers must interpret and implement them separately.

GitHub provides version control, code review, and deployment pipelines for the code that engineers write. The code IS the product, or becomes the product when built and deployed. But the code environment is separate from the design environment by default. Each has its own collaboration model, its own review processes, its own version history.

Design systems attempt to bridge the two by establishing a shared vocabulary of components, buttons, inputs, cards, navigation patterns, with defined visual properties and behavior. In practice, most teams maintain two versions: a component library in Figma for designers and a corresponding library in code for engineers. Keeping them synchronized is a persistent operational challenge. It requires token pipelines, handoff documentation, naming conventions, and human effort to ensure the button in Figma and the button in React actually match. Teams describe this as one of their most consistent pain points, and industry analysis confirms that most design system workflows have minimal automation.

The project management layer, Jira, Linear, Asana, Notion, creates the task structures that organize work into sprints, epics, and tickets. Each ticket represents a unit of work that flows through the pipeline: defined, designed, built, reviewed, shipped. The workflow management tool is the nervous system of the linear process.

The structure made explicit

The four-stage model and the constraint it was optimised for

This system has a structure that's worth making explicit, because understanding it is essential to understanding what AI changes. The traditional production process flows like this: roles (human) define and decide what to build. Process (human) sequences the work into phases. Tools (human-operated) execute each phase. The result is a product, an artefact. It's linear. It's sequential. Each stage completes before the next begins in earnest, with some overlap but a clear directional flow from conception to delivery.

The model works when building is the hard part. When turning a design into code takes weeks. When exploring ten variations of a layout means ten rounds of detailed mockup work. When the bottleneck is execution capacity, the hours of skilled human effort required to translate ideas into working software.

For decades, that was the reality. Product development was constrained by how fast talented people could produce things. The entire process, the roles, the workflows, the tools, the team structures, was optimized for that constraint.

The traditional production process

How teams worked for decades, before AI shifted the constraint.

1

RolesPM, Designer, Engineer

Each role owns a discipline. PM defines the problem. Designer shapes the experience. Engineer builds the solution. Execution requires specialized skill, a designer can't ship code, and an engineer can't produce production mockups.

2

ProcessSequential handoffs

Linear pipeline: discovery, then design, then handoff, then development, then QA, then release. Each stage completes before the next begins. Handoffs carry intent from one discipline to the next. Optimized for a world where building is the hard part.

3

ToolsHuman operated

Passive instruments that wait for human input. Figma for design, GitHub for code, Jira for coordination. The tool does exactly what the human directs, no more or less.

4

ProductShipped artefact

The pipeline produces working software through sequential handoffs. A button is a button. The model is optimized for turning ideas into shipped artifacts as efficiently as possible.

The capability trajectory

4.4% to 71.7% in twelve months.
And it's still accelerating.

In early 2024, the best AI coding agent could resolve 4.4% of real-world software engineering problems on the SWE-bench benchmark. By the end of that year, the leading model scored 71.7%. That's a 67-percentage-point improvement in twelve months, on a benchmark designed to test the kinds of tasks that working engineers actually encounter, bug fixes, feature implementations, codebase navigation.

By early 2026, the frontier had pushed to 79.2% on the original benchmark and 55–59% on its harder variant, SWE-bench Pro, which was created specifically because the original was becoming too easy. On GPQA, a benchmark of graduate-level science questions where PhD experts score 65%, the best models reached 92.4%. Graduate-level reasoning, effectively solved in under two years.

This isn't a smooth curve. It's an acceleration. METR, the AI evaluation organization tracking autonomous capability, found that the time horizon of tasks AI can complete reliably has been doubling every seven months, and that pace accelerated to roughly every four months in 2024–2025. Their assessment: AI is improving at approximately ten times per year, up from three times per year before 2024. Extrapolating conservatively, they project that within five years, AI systems will be capable of automating tasks that currently take humans a month.

From autocomplete to autonomy

The shift from AI-as-assistant to AI-as-agent

The nature of what AI does is shifting as fast as how well it does it. The clearest window into this comes from Cursor, which publishes unusually detailed data about how its users work.

In March 2025, 2.5 times more Cursor users relied on tab completion, AI suggesting the next line as the developer types, than on agent mode, where AI executes multi-step tasks with more autonomy. Eleven months later, the ratio had flipped: twice as many users worked with agents as with tab completion. Agent usage grew fifteen-fold in a single year. Today, Cursor's CEO reports, most users never touch the tab key.

Agents become the default

How developers went from writing code to directing AI, in under a year.

Usable models
Agent rise
Agents lead

Agent requests per tab accept, based on Cursor usage data (Michael Truell, February 2026). Below 1x means tabs dominate; above 1x means agents dominate.

This is a qualitative shift, not just a quantitative one. Tab completion augments a developer's existing workflow, the human writes code and AI accelerates it. Agent mode changes the workflow itself, the human describes what needs to happen and AI executes it. The developer's role shifts from author to director.

Internally, Cursor has taken this further. Thirty-five percent of their own pull requests are now created by autonomous agents running in cloud virtual machines, with developers reviewing the output rather than guiding the process. Their CEO frames this as a transition between eras: the first era (tab completion) lasted roughly two years; the second era (synchronous agents, where developer and AI work in tandem) "may not last one year" before the third era arrives, autonomous agents where the developer's primary role is defining problems and reviewing artifacts.

A company building the leading AI coding tool is publicly stating that its own product's current interaction model is temporary. That's a signal worth taking seriously.

Cost collapse and the Open-Source floor

What was impossible becomes expensive, what was expensive becomes cheap

While capability climbs, cost collapses. Stanford's 2025 AI Index reports a 280-fold reduction in the cost of achieving GPT-3.5-level performance between November 2022 and October 2024. Hardware costs are declining roughly 30% per year. Energy efficiency is improving 40% annually. Training compute requirements halve approximately every eight months through algorithmic improvements alone.

The open-source ecosystem is compressing the gap further. The performance difference between the best proprietary and open-weight models shrank from 8% to 1.7% on the Chatbot Arena benchmark in a single year. Capability that was exclusive to frontier labs in 2024 is accessible to anyone with a GPU in 2025.

There's a caveat: the most capable reasoning models remain significantly more expensive. Frontier reasoning systems cost roughly six times more and run thirty times slower than their non-reasoning counterparts. More capability costs more. But the trajectory is clear, at any given capability level, cost drops dramatically year over year. What was expensive becomes cheap. What was impossible becomes expensive. The frontier extends while the floor rises.

The counterarguments

The case for deceleration, and why it deserves engagement

Not everyone believes this pace continues. There are serious counterarguments, and they deserve engagement rather than dismissal.

A 2026 analysis re-fitted a sigmoid curve to METR's autonomous capability data and found that the inflection point for overall capabilities may have already passed, estimated around mid-2025. The sigmoid model, which predicts growth that accelerates, peaks, and then decelerates, achieved a lower error than the exponential fit. If accurate, it suggests new breakthroughs would be required to sustain current improvement rates.

Acemoglu has been the most prominent skeptic. His analysis suggests only 5% of tasks are economically beneficial to automate with current AI, yielding a total GDP impact of roughly 0.07% over a decade, a fraction of the projections from PwC ($15.7 trillion), Goldman Sachs ($7 trillion), or McKinsey ($4.4 trillion).

The harder benchmarks support this caution. FrontierMath, a dataset of original research-level mathematics problems, sits at 2% solved. Humanity's Last Exam, a collaborative effort to create the hardest possible evaluation, peaks at 8.8%. The easy problems are saturated. The hard problems, the ones that require genuine reasoning, creativity, and multi-step problem-solving, remain largely unsolved.

And there is a gap between benchmark performance and production reliability that matters for practitioners. As one Cursor engineer put it, static benchmarks "lie": models score well in controlled conditions and struggle in the messy reality of production codebases. Ninety-five percent of enterprise AI pilots fail to deliver measurable impact, according to MIT research.

Why even the skeptical view supports the argument

The thesis doesn't require exponential growth

Here's why even the skeptical view supports the core argument. The debate between exponential and sigmoid growth is real, and this paper won't pretend to resolve it. But the paper's thesis doesn't require perpetual exponential improvement. It requires that AI capability continue to grow faster than human processes can absorb it. Even moderate, decelerating growth, a sigmoid that's past its inflection point, delivers that condition for years.

Consider what's already true, today, with no further improvement needed: AI can resolve the majority of standard software engineering tasks autonomously. Code generation costs have fallen by orders of magnitude. The leading AI coding tool has shifted its entire user base from augmentation to agentic workflows in under a year. A quarter of Y Combinator's most recent cohort has codebases that are 95% AI-generated.

The current capability is already sufficient to transform workflows. The question isn't whether AI keeps getting better at the same rate. The question is whether your processes are designed for the capability that already exists, and for the improvement that's coming, at whatever pace it arrives.

Whatever you build today for the current state of AI will be inadequate within twelve months. That's not a prediction about exponential growth. It's an observation about the gap between where AI is and where most organizations' processes are. Even a decelerating AI capability curve hits a static process like a wave hitting a seawall.

The current shift

Individual output is up 21%.
Organizational delivery is flat.

The adoption question is settled. In 2025, 84% of professional developers use or plan to use AI coding tools. Roughly half use them daily. GitHub Copilot serves 20 million users across 90% of Fortune 100 companies. Cursor, which barely existed two years ago, reached $500 million in annual revenue with 360,000 paying customers, growth driven almost entirely by word of mouth. JetBrains reports 85% regular usage, with 68% of developers expecting AI proficiency to become a job requirement.

The tools are here. They're mainstream. The interesting question isn't whether teams are using AI, it's what's happening when they do.

The individual productivity story

Context determines whether AI helps or hinders

The individual productivity story looks straightforward at first glance. In a controlled experiment by GitHub and Microsoft, developers using Copilot completed tasks 55.8% faster than those without it. A larger study across Microsoft, Accenture, and a Fortune 100 company, 4,867 developers in randomized controlled trials, found a 26% increase in pull requests per week. Google's internal experiment showed a 21% speed improvement. Across studies, less experienced developers generally see the largest gains.

But the story gets complicated fast. A rigorous field study by METR, the AI evaluation organization, tracked experienced open-source developers working on their own repositories: codebases they'd maintained for years, averaging over a million lines of code and 22,000 GitHub stars. These developers, using Cursor Pro with the best available models, were 19% slower with AI assistance than without it.

The telling detail: they predicted a 24% speedup going in. After the measured slowdown, they still believed they'd been 20% faster.

This isn't a contradiction so much as a clarification. AI accelerates unfamiliar, routine, and greenfield work, the kinds of tasks where pattern-matching and code generation provide genuine leverage. On deeply familiar, high-complexity, contextual work, AI introduces overhead: reviewing suggestions, correcting subtle errors, managing the cognitive load of a second contributor that doesn't understand the codebase the way a five-year maintainer does.

The takeaway isn't "AI doesn't help." It's that AI's value is profoundly context-dependent, and the context that matters most isn't the tool or the model. It's the nature of the work and the experience of the person doing it.

The organizational disconnect

Three studies, same finding: faster but worse

Here's where it gets genuinely interesting: the individual gains aren't translating to organizational outcomes, and the Faros AI study is the clearest evidence. Tracking over 10,000 developers across 1,255 teams over two years using telemetry data, not self-reported surveys, they found that individual developers using AI tools complete 98% more code changes and finish 21% more tasks. By any individual measure, AI is working.

But at the organizational level: zero measurable improvement in speed, throughput, or time-to-deliver on DORA metrics. Meanwhile, pull request review times increased 91%. Pull request sizes grew 154%. Bugs per developer rose 9%. Faros calls this the "AI Productivity Paradox": AI is everywhere except in the productivity statistics.

They're not alone in seeing this. The Cortex Engineering Benchmark, tracking 50 engineering organizations over a year, found pull requests per author up 20%. But incidents per pull request up 23.5% and change failure rates up 30%. They independently arrived at the same label: the Productivity Paradox.

Google's DORA reports tell the same story in evolution. The 2024 report found that a 25% increase in AI adoption correlated with a 1.5% decrease in delivery throughput and a 7.2% decrease in delivery stability. The 2025 report showed improvement, AI adoption now correlates positively with throughput. But stability remains negative. Speed improved. Reliability didn't.

The DORA team's framing is worth sitting with: "AI accelerates software development, but that acceleration can expose weaknesses downstream. Without robust control systems, strong automated testing, mature version control practices, and fast feedback loops, an increase in change volume leads to instability."

Three independent research programs. Same finding. More individual output. Same or worse organizational delivery. The bottleneck has moved.

The AI productivity paradox

Individual developers are faster, organizational delivery isn't.

Code changes per dev
Tasks completed
Delivery speed
PR review time
PR size
Bugs per developer
+98%
+21%
Flat
+91%
+154%
+9%
Individual gainOrganizational degradation

The bottleneck has moved

More volume, same review capacity

This is the mechanism: when delivery was constrained by how fast humans could write code, speeding up code production sped up delivery. AI removed that constraint. But it didn't remove the constraints downstream, code review, testing, integration, deployment, coordination across roles. It just flooded the pipeline with more volume.

A team that used to produce 100 pull requests a week now produces 139. Each pull request is 26.7% larger. Review capacity hasn't changed. Testing infrastructure hasn't scaled. The review queue grows. Corners get cut. Bugs slip through. The metrics that measure speed look great. The metrics that measure delivery don't.

This is a well-understood pattern. Erik Brynjolfsson's Productivity J-Curve describes the same dynamic with every general-purpose technology: measured productivity dips initially, then rises, but only after organizations redesign their processes around the new capability. Electrification didn't improve factory output until factories were redesigned for distributed power rather than centralized steam engines. That redesign took decades. AI won't take decades, but the principle holds: acceleration without adaptation creates turbulence.

The trust paradox

Trust falls as usage rises

There's a trust dimension that makes this more complex. Stack Overflow's 2025 developer survey, 49,000 respondents across 177 countries, found that 46% of developers actively distrust AI tool accuracy, up from 31% the previous year. Only 3% report high trust. Yet 84% use the tools anyway.

Trust is decreasing as usage increases. The naive expectation would be the opposite, familiarity breeds confidence. Instead, experienced users are discovering the failure modes that casual users miss. Studies bear this out: AI-generated code carries measurably more issues. One analysis of 470 pull requests found AI-generated code contains roughly 1.7 times more issues overall, with critical issues up 40%, logic errors more than doubled, and I/O performance problems nearly eight times more frequent. A separate analysis of 211 million lines of code over four years found duplicate code blocks increasing eightfold, refactored code falling 44%, and code churn, newly written code revised within two weeks, rising from 5.5% to 7.9%.

Developers know this. They don't trust the output. They use the tools anyway. And the volume of output is overwhelming the review processes designed to catch exactly these problems. This is the governance paradox that Section 7 will address directly, but it starts here, in the gap between what AI produces and what organizations can absorb.

The structural gradient

Greenfield gains, brownfield risk

The shift isn't uniform across organizations. It varies dramatically by who you are and what kind of work you do.

The Stanford software engineering productivity research team analyzed data from nearly 100,000 engineers across more than 600 companies, examining tens of millions of commits. Their finding cuts to the structural explanation: AI productivity gains depend fundamentally on project type and complexity.

For greenfield projects with low complexity, developers see 30–40% productivity gains. For brownfield projects with high complexity, extending existing systems, navigating legacy code, maintaining architectural integrity, gains drop to 0–10%, and sometimes go negative. AI introduces hard-to-detect errors in precisely the environments where errors are most expensive.

AI productivity gains by project type

AI is helping some teams more than others, and the reason is structural.

Low complexity

AI's sweet spot: clear requirements with no legacy or architecture drag. This is typical startup and agency work. Y Combinator reports 25% of its Winter 2025 batch has codebases that are 95% AI-generated.

+35%
65%
High complexity

From-scratch work hits limits when architecture gets complex. Novel design and cascading trade-offs need experienced judgment. AI can generate components, but struggles to reason across the entire system.

+12%
88%

Greenfield means new products built from scratch, common in startups and many agency projects. AI gains are strongest here because there is little legacy context or integration overhead. Stanford's software engineering productivity research analysed ~100,000 engineers across 600+ companies using commit-level telemetry.

This structural difference explains why different types of organizations experience the shift at different speeds.

Startups and small teams operate primarily in greenfield territory. They're building new things with new code. Y Combinator reports that 25% of startups in their Winter 2025 batch, roughly 60 companies, have codebases that are 95% AI-generated. ICONIQ Capital's survey of 300 software executives found an average of 33% of code being written by AI across their portfolio, with leading adopters reaching 90%. AI-native startups like Midjourney ($200 million in revenue with roughly 40 employees), Cursor (over $100 million with a team starting at 12), and Bolt ($40 million with about 30 people) demonstrate revenue-per-employee ratios that would have been inconceivable five years ago. For these teams, the human-to-AI execution ratio is roughly 40–50% human, 50–60% AI, and the leading edge has pushed well beyond that.

Agencies and studios sit in a similar structural position. Their work is predominantly zero-to-one: new products, MVPs, prototypes for clients. A survey of 547 U.S.-based agencies found 91% actively using AI, with reported efficiency gains of three to four times in some production pipelines. A design consultancy documented building three complete MVP variations in under 48 hours using AI tools, with client feedback loops replacing traditional internal review processes. Agencies are operating at roughly 50–60% human, 40–50% AI on project delivery, approaching startup ratios for greenfield work.

Enterprise organizations face a fundamentally different landscape. Their codebases are brownfield. Their processes carry compliance requirements, architecture review boards, and security audits. McKinsey's 2025 survey shows 88% of large organizations use AI in at least one function, but only 38% have begun scaling it enterprise-wide. The barriers are structural: 46% cite talent gaps, roughly 60% cite legacy system integration, and only 25% of AI initiatives deliver expected ROI. The enterprise ratio sits closer to 75–80% human, 20–25% AI, not because enterprise teams adopt more slowly (they don't), but because the work itself is harder for AI to do well.

The Stanford data makes this structural, not cultural. It's not that enterprise teams are less innovative. It's that brownfield, high-complexity work, the work that defines most enterprise engineering, is precisely where AI provides the least value and introduces the most risk. Startups and agencies get more from AI because greenfield work is what AI is best at.

The diagnosis

The process, not the tools, is the constraint

The diagnosis, then, is not that AI tools don't work. They do. Individuals are genuinely faster. Code generation is genuinely cheaper. The gap between "idea" and "working prototype" has genuinely collapsed.

The diagnosis is that organizations built for a linear process, discover, then design, then build, then ship, cannot absorb the speed that AI introduces without breaking downstream. Review processes, quality gates, coordination mechanisms, governance structures, all designed for a world where code production was the bottleneck, are now the bottleneck themselves.

The process needs to become something different. Not just faster. Adaptive.

Role convergence

Execution spreads.
Judgment concentrates.

Here's a scene that's playing out in product teams right now, described by practitioners at Adaline Labs: A product manager uses Claude Code to build a working internal dashboard that had been sitting in an engineering backlog for months. A designer ships a prototype that already includes the awkward edge-case states users hit in production. An engineer writes three copy variants while adjusting the UI component they're building.

None of them asked permission to cross into someone else's territory. The tools made it possible. The speed of AI made it practical. And the result, working artifacts instead of specifications describing artifacts, made it obvious.

The constraint, as Adaline Labs puts it, has shifted. It used to be about who can make something, each discipline had exclusive production capabilities that defined its role. Now that AI gives everyone the ability to produce credible artifacts, the constraint becomes who decides what should exist. Execution spreads. Judgment concentrates.

How each role is expanding

Designers ship code, PMs build prototypes, engineers make product decisions

The evidence shows this happening across all three core product roles, though unevenly. Designers are gaining production capabilities fastest. A survey of roughly 400 designers by Foundation Capital and Designer Fund found that 90% say AI has improved their process, particularly during ideation and prototyping. Some teams are skipping static mockups entirely, prompting functional prototypes with real code.

At Figma's 2025 Config conference, the company launched tools that generate designs from prompts, publish designs directly to the web, and convert design layers into React code, explicitly acknowledging that the boundary between design and production "has always been artificial, created by tools, not by the requirements of the creative process."

What's striking is how this adoption happened: 96% of designers report being self-taught with AI tools. They discovered them through peers and curiosity, not through organizational training or top-down mandates. The tools spread through the same informal channels that Figma itself once did.

Product managers are gaining building capabilities. Aakash Gupta, writing for Atlassian, calls AI prototyping "the biggest change that has happened to the product management job in the last few years." The old pattern: PMs included working prototypes in their product requirements documents about 5% of the time. The emerging pattern: PMs attach functional prototypes built with v0, Bolt, or Lovable directly to their PRDs. Andrew Ng described one of his teams proposing a ratio of one PM to half an engineer, twice as many product managers as engineers, because AI has made engineers ten times faster at prototyping while product decisions haven't gotten faster at all. Product management work, user feedback, feature prioritization, problem definition, is becoming the bottleneck, not engineering capacity.

Engineers are gaining product and design sensibilities. The "product engineer" title, someone who combines engineering skill with product thinking and user empathy, has become the fastest-growing role archetype in startup hiring. Recruiters report that employers are pivoting toward engineers skilled in "innovative thinking, design, problem solving, delivery, and implementation," not just coding ability. At Vercel, design engineers "blend aesthetic sensibility with technical skills," working with designers not through handoff documents but through direct iteration in code. Cursor's third-era framing makes this explicit: the developer's role is shifting from writing code to "breaking down problems, reviewing artifacts, giving feedback." The engineering skill becomes the ability to direct AI toward good outcomes, not the ability to type the code yourself.

An honest caveat: when LinkedIn analyzed millions of job postings for their 2026 Jobs on the Rise report, hybrid titles like "product engineer" or "design engineer" didn't appear among the 25 fastest-growing roles. The list was dominated by AI-specific titles and non-tech roles. This suggests the convergence is real in practice, recruiters and practitioners are describing it consistently, but hasn't yet registered in formal job classification at scale. The trend is structural, showing up in changed responsibilities within existing roles, rather than in new job titles.

Organizations are restructuring

Smaller teams, fewer layers, more AI, and the limits of compression

Organizations are restructuring around this shift, not just talking about it. Shopify's CEO issued an internal memo in April 2025 with a directive that became widely cited across the industry: teams must "demonstrate why they cannot get what they want done using AI" before requesting additional headcount. AI usage was added to performance and peer review processes. Prototyping, the memo stated, "should be dominated by AI exploration." The company's workforce had already declined through two rounds of cuts, 14% in 2022 and 20% in 2023, and continued to shrink.

Klarna's trajectory is the most documented case, and the most instructive precisely because it didn't go as planned. The fintech company's workforce dropped from roughly 7,000 employees in 2022 to 3,000 by 2025, with the CEO attributing the reduction to AI and natural attrition. AI was cited as doing the work of 700 customer service agents. The company expected further reduction to under 2,000 by 2030, with remaining roles focused on "human connection." Then the reality check: quality degraded. The company began rehiring. The lesson isn't that AI-driven team reduction doesn't work, it's that pure headcount reduction without process redesign creates quality problems that eventually force reversal.

Google Cloud cut over 100 design and UX positions in late 2025, including quantitative UX research and platform experience teams, reducing the size of certain design teams by half. Employees were "urged to integrate more AI into their daily tasks." Amazon cut roughly 14,000 managerial positions through consolidation. McKinsey's March 2025 survey found that 55% of top-performing organizations had restructured processes radically, three times the rate of others, and that organizations were less likely than in previous surveys to report hiring design and visualization specialists.

The pattern is clear: organizations are getting smaller, flatter, and more cross-functional. Deloitte's 2026 Tech Trends survey found that only 1% of IT leaders report no major operating model changes underway, and cross-functional teams are 30% more likely to report significant AI gains. The traditional hierarchy of specialized roles organized by discipline is giving way to smaller, AI-augmented teams organized around outcomes.

This connects to a structural dynamic that accelerates the shift: as AI makes teams smaller, smaller teams adopt AI more effectively, which makes them smaller still.

The mechanism is straightforward. Brooks's Law, from the foundational engineering text The Mythical Man-Month, establishes that communication pathways in a team scale as n(n-1)/2. A team of five has 10 communication paths. A team of fifteen has 105, a tenfold increase for a threefold increase in headcount. Every additional person adds overhead that partially offsets their contribution.

AI breaks the arithmetic. It adds productive capacity without adding communication paths. A team that shrinks from fifteen to five while maintaining output through AI tools doesn't just cut headcount, it eliminates 95 communication paths. The coordination overhead that consumed a significant share of everyone's time evaporates. And with less overhead, the remaining team members have more attention available for deeper AI integration, which further increases what the smaller team can accomplish.

Amazon's famous two-pizza team model, teams of fewer than ten people with single-threaded ownership, is being compressed further. Dan Shipper, co-founder of Every, describes "two-slice teams": with AI agents handling most code generation, single-engineer teams now run entire products that previously required three to four people. ICONIQ Capital's data shows portfolio companies redirecting headcount budgets toward AI productivity investments, doubling internal AI budgets across all startup revenue tiers.

The flywheel is visible in the data. The AI-native startups documented in Section 4, teams of 12 to 40 people generating hundreds of millions in revenue, aren't companies that got small by cutting. They were born small because AI meant they never needed to be big.

But the flywheel has limits. Klarna's reversal is the clearest evidence: aggressive team compression without maintaining governance quality leads to degradation that eventually forces partial reversal. Gartner projects that 50% of enterprises will abandon aggressive AI-driven downsizing plans after misjudging AI complexity. The reinforcing loop works best when the work is greenfield, the team is already small, and the governance requirements are lightweight. It breaks down when quality oversight can't keep pace with output volume, which is, again, the governance challenge at the center of this paper.

What doesn't converge

The judgment each discipline owns matters more, not less

When execution is cheap and universal, when anyone can prompt a prototype into existence, the ability to evaluate what was produced becomes the scarce resource. Katie Dill, Head of Design at Stripe, describes AI reorienting the designer's role "toward direction, curation, and the application of uniquely human judgment." Olivier Chatel, a design practitioner, puts it more sharply: "The power of AI is speed. What it loses is intention." Dylan Field, Figma's CEO, frames it as competitive advantage: "In a world where AI makes it easier than ever to build software, design will become more essential and powerful. It's craft, quality, and point of view that make a product stand out."

The distinct value of each role concentrates rather than disappears. Designers own visual and experiential quality, the judgment that determines whether something is good, not just functional. Engineers own technical integrity, the architectural thinking, performance standards, and security requirements that determine whether something works reliably at scale. Product managers own the "why": the business alignment and problem-solution fit that determines whether something should exist at all.

These aren't the execution tasks that AI is absorbing. They're the judgment tasks that become more important precisely because execution has become trivial. A world where anyone can build a prototype in an afternoon is a world where the ability to distinguish a good prototype from a bad one is the most valuable skill on the team.

The roles converge in what they can do. They diverge in what they judge. And as AI handles more of the doing, the judging becomes the work.

The tool landscape

Every tool excels in its domain.
None of them were built for this.

The tools product teams use today were designed for the process described in Section 2. Figma provides a collaborative canvas for visual mockups. GitHub provides version control and CI/CD for code. Storybook renders components in isolation for developer reference. Design systems are maintained in parallel, one representation in Figma for designers, another in code for engineers, with handoff processes, token pipelines, and bridge tools connecting them.

The very existence of an entire category of tooling dedicated to design-code handoff tells you something about the underlying architecture. Code Connect, Dev Mode, "Ready for Dev" statuses, design token sync pipelines, these exist because the tools were built for separate workflows practiced by separate roles. Practitioners describe the friction directly: design tools "define how things should look, but don't ensure how they're built." When developers receive a Figma file, they "interpret the design rather than implementing it directly." Storybook, for its part, is "solid, but its sandboxed environment makes it somewhat limited when trying to understand how the design is being captured as intended."

A survey of roughly 300 design system professionals confirms the pattern quantitatively. Over half have minimal or no automation in design system workflows. Only 10% are using AI for documentation and idea generation. The most common automation is CI/CD for design tokens, the simplest bridge between design and code. Teams aspire to automate token syncing, documentation, accessibility checking, and ticket creation. The infrastructure for doing so barely exists.

This is the tooling landscape that AI is disrupting. And it's disrupting it from multiple directions simultaneously.

Figma's strategic response

Bridges that send value outward

Figma, the dominant design tool with over $900 million in annual recurring revenue and 13 million monthly active users, is responding aggressively. At their 2025 Config conference, they launched four new products: Figma Make (prompt-to-code generation), Figma Sites (publish designs directly to the web), Figma Draw (vector illustration), and Figma Buzz (marketing content). Code Layers converts design components to React code. Figma's CEO framed the ambition directly: "The line between design and production has always been artificial, a boundary created by tools, not by the requirements of the creative process itself."

The execution is real. Figma's MCP server, launched in beta in mid-2025, integrates with Cursor, VS Code, Windsurf, and Claude Code, providing structured design context to AI coding tools. This is a significant architectural move: Figma is building connective tissue between its design environment and the code-native tools where an increasing share of production happens.

But there's a structural tension. Every bridge Figma builds, Code Connect, MCP, Make, Code Layers, sends value outward from Figma toward code-based environments. The direction of flow reveals the paradigm constraint: Figma's canvas renders visual mockups, not running code. Making those mockups more useful to coding tools is valuable, but it positions Figma as a source of design intent rather than the environment where production happens. The community reception to Figma Make reflects this: one user described it as "fun to play around with" but lacking "actual application to my design workflow." Another noted that minor tweaks "required so much time for the code to re-write itself" that each iteration felt like a gamble.

Figma's financial trajectory adds context. Revenue grew 48% year-over-year, with gross margins at 88%, strong fundamentals by any measure. But the stock has declined 72% from its IPO peak, and the current market capitalization sits below what Adobe offered to acquire the company in 2022. The S-1 filing mentions AI over 150 times, positioning it as both opportunity and threat. AI infrastructure costs are increasing, $2.6 million in additional spending in Q1 2025 alone. The market is pricing in uncertainty about whether the mockup paradigm will remain central to product development workflows.

The AI-Native tools

New entrants built on a different assumption

The AI-native tools are growing at a pace that suggests the market is shifting, not just supplementing. Cursor, founded by MIT students in 2022, reached over $1 billion in annual recurring revenue within roughly two years. Its most recent valuation exceeded $29 billion. The product has evolved from an AI-enhanced code editor to a platform that manages autonomous coding agents, the trajectory from autocomplete to autonomous agent documented in Section 3, happening within the same product, in under three years.

Vercel's v0 attracted over 4 million users and 100 million messages since its general availability launch. Revenue growth accelerated from the first million in ten months to a million every 14 days. Vercel earmarked $250 million specifically for v0's development. Their rebuild in early 2026 targeted enterprise adoption explicitly, with the stated goal of moving "vibe coding from novelty to business critical."

Lovable reached $20 million in annual recurring revenue within two months of launch, reportedly the fastest European startup to that milestone. Bolt.new, a full-stack browser-based development agent, reached $40 million ARR with a team of roughly 30 people. These aren't incremental improvements on existing tools. They're new entrants built around a fundamentally different assumption: that code production is an AI task, and the human contribution centers on direction, evaluation, and refinement.

The connective layer matters as much as the individual tools. The Model Context Protocol, launched by Anthropic in late 2024, has become the de facto standard for connecting AI tools to external systems. Adopted by OpenAI, Google DeepMind, Replit, Sourcegraph, and dozens of others, with tens of thousands of MCP servers available, MCP creates network effects across the AI tool ecosystem. ThoughtWorks, in their Technology Radar, assessed that "it would be hard to provide a convincing snapshot of technology in 2025 without discussing its incredible rise." The implication for product teams: MCP-based integrations between design systems, component libraries, and AI coding tools are emerging rapidly. Figma-to-Cursor, Storybook-to-LLMs, Zeroheight-to-Claude, each connection reduces friction between tools that were designed as separate systems.

Venture capital signals confirm the market direction. In 2024, 48% of early-stage developer tool investments were AI-related. Globally, AI venture capital reached $258.7 billion in 2025, representing 61% of all venture capital deployed that year.

The gap that remains

Fast individuals, fragmented teams

The current landscape, mapped against what product teams actually need, reveals a pattern. Each tool excels in its domain but leaves gaps that others don't fill.

Figma leads in collaborative design but works with mockups, not production code. UXPin Merge renders real code components on a canvas, and has for years, proving the concept works at scale (three designers supporting 60 products and over 1,000 developers), but lacks AI-native capabilities. Knapsack is building toward enterprise design system governance with an AI-era positioning, but is early. Builder.io's Fusion, launched in late 2025, indexes existing codebases and understands design system patterns, their CEO explicitly calls out the problem: "Most AI tools today make individual contributors faster but leave teams disconnected." Storybook renders components for developer reference but isn't a design collaboration environment. The code generation tools, Cursor, v0, Bolt, Lovable, produce code at extraordinary speed but focus on individual productivity, not team governance or collaborative refinement.

A survey of roughly 400 designers captures the practitioner experience of this gap: teams "juggle prototypes shared via ephemeral links and feedback scattered across disconnected tools like Figma, Notion, and Loom." The tools are fast. The collaboration is still fragmented.

The MCP integrations emerging between these tools are point-to-point bridges, connecting Figma to Cursor, Storybook to Claude, Zeroheight to AI assistants. Each bridge is useful. But bridges between separate paradigms are different from an environment designed natively around how the work actually happens now. As Section 8 will explore, the bridges themselves may be temporary, serving a transitional period while the tooling landscape resolves into something more integrated.

The market is moving fast. Multiple entrants are converging on the intersection of real code, visual collaboration, AI generation, and governance. Who gets there first, and in what form, remains genuinely open.

The governance question

The safety net is universal.
It's already breaking.

Every industry deploying AI has arrived at the same answer to the same question. The question: how do you ensure AI output meets quality standards? The answer: put a human in the loop. A radiologist reviews the AI's scan interpretation. A compliance officer challenges the algorithm's trading decision. A developer reviews the AI-generated pull request. A designer evaluates the AI's layout before it ships.

Human-in-the-loop oversight is the dominant governance model across every sector where AI has gained traction. McKinsey's 2024 global survey found that 27% of organizations review all generative AI outputs before use; most review at least a portion. The organizations that get the most value from AI, the high performers, are far more likely to have defined validation processes: 65% compared to 23% among the rest. The EU AI Act, enacted in 2024, legally mandates "effective" human oversight of high-risk AI systems. NIST's AI Risk Management Framework recommends human-in-the-loop for high-impact decisions alongside continuous automated monitoring.

This isn't controversial. It makes intuitive sense. AI is powerful but imperfect; humans provide the judgment layer that catches errors, maintains standards, and ensures accountability. The question isn't whether this model exists, it does, everywhere. The question is whether it works. And increasingly, the evidence suggests it doesn't, at least not the way it's currently implemented. Not because the principle is wrong, but because the mechanism breaks under the conditions AI itself creates.

The ironies of automation

A paradox identified forty years before modern AI

The theoretical foundation for this problem was established decades before modern AI existed. In 1983, Lisanne Bainbridge published "Ironies of Automation," now cited over 1,800 times, identifying a structural paradox: the more advanced an automated system becomes, the more crucial the human operator's contribution is, but the worse the human is at providing it. The designer removes the human from active operation because humans are "unreliable and inefficient." The remaining tasks, monitoring and emergency intervention, are precisely what humans perform worst at without regular practice.

Twelve years later, Mica Endsley and Esin Kiris documented the mechanism: when humans shift from actively operating a system to passively monitoring it, situational awareness degrades. Decision-making slows. The ability to intervene effectively when something goes wrong deteriorates, not because the human became less competent, but because passive monitoring is a fundamentally different cognitive mode than active engagement.

These aren't theoretical curiosities. The EU AI Act explicitly acknowledges automation bias as a threat to oversight effectiveness. Its own Article 14 requires humans to "remain aware of the possible tendency of automatically relying or over-relying on the output produced by a high-risk AI system." Legal analysis published in 2025 concluded that the framework's approach may be insufficient: empirical evidence demonstrates "significant limitations to human oversight's effectiveness, including due to humans' cognitive constraints and automation bias."

The regulator is mandating human oversight while simultaneously recognizing it may not work. This tension sits at the heart of the governance challenge.

What happens in practice

More defects, less trust, rising volume

The challenge gets worse when you look at what happens in practice. In software development, the domain this paper focuses on, the evidence paints a consistent picture.

AI-generated code carries more defects. As documented in Section 4, AI-authored code contains roughly 1.7 times more issues than human-written code, with logic errors more than doubled and performance issues nearly eight times more frequent. Code churn is rising, duplicate code is multiplying, and refactoring is declining. ThoughtWorks independently confirmed the pattern in their late-2025 Technology Radar assessment.

Meanwhile, the people responsible for catching these issues trust AI less with each passing year. As Section 4 documented, 46% of developers now actively distrust AI-generated code, up from 31% the year before. Only 3% report high trust. Trust is eroding with experience, not building.

And yet, usage keeps climbing. Developers use tools they don't trust to generate code they know needs more review, while the volume of that code overwhelms the review processes designed to catch problems. DORA's data confirms the downstream effect: for every 25% increase in AI adoption, delivery stability decreases 7.2%. The researchers tested whether teams could "fail fast, fix fast" to compensate. They found no positive results.

There is a striking finding from the medical domain that illuminates how dysfunctional this dynamic becomes. A 2025 study published in NEJM AI, 1,334 participants in a controlled experiment, found that when a radiologist uses AI assistance and then overrides an AI recommendation that turns out to be correct, mock jurors find the radiologist more liable than if no AI had been used at all. The governance layer is legally penalized for exercising the very judgment it exists to provide. The message to the human in the loop: you're responsible for catching the AI's mistakes, but you'll be punished more harshly for your own if the AI was right.

The bottleneck is forming

Review pipelines designed for human-speed production

The bottleneck is forming, not everywhere and not yet, but in high-velocity software development it's already visible. Developers spend only 24% of their time writing code; 76% goes to overhead including code review, meetings, context switching, and documentation. Code review was already a constraint before AI. Now AI has increased pull request volume by 20–39% while increasing pull request sizes by 26–154%. The review pipeline was designed for human-speed code production. It can't absorb machine-speed output without either degrading quality (which the data shows is happening) or becoming a bottleneck that negates the speed gains (which the data also shows is happening).

The direction this is heading is visible at the leading edge. As Section 3 documented, Cursor's internal development has already shifted to autonomous agents producing completed pull requests for human review, and their CEO states the current interaction model may not last another year. The human role is moving from co-pilot to reviewer to architect. Each transition compresses the governance window further.

An industry analysis from early 2026 mapped this trajectory explicitly: human oversight drops in discrete steps at each major capability milestone, not as a smooth gradient. Each leap in AI capability, Claude Code in early 2025, Gemini 3 and Opus 4.5 later that year, agentic frameworks across the ecosystem, triggers a rational reduction in the intensity of human review. Each step makes the system more capable and the human slightly less practiced at catching its failures. The fragility doesn't accumulate gradually. It accumulates in steps, each one a new equilibrium that's harder to reverse.

The case for caution

Speed without stability is destructive

The evidence for caution is real and shouldn't be dismissed. In healthcare, 75% of health system leaders still require human validation of AI outputs. Only 12.5% report that autonomous AI has delivered the most value, hybrid intelligence, where clinicians retain decision authority, remains the preferred model.

The FDA acknowledges that continuous monitoring infrastructure isn't yet integrated into the regulatory process. In financial services, compliance staff often lack the technical expertise to meaningfully challenge AI decisions, creating what one legal analysis calls "an insurmountable barrier to effective supervision."

The DORA research is unambiguous on one point: speed without stability is destructive. Removing human oversight prematurely, before the AI is reliable enough and the monitoring infrastructure is mature enough, creates worse outcomes than keeping humans in the loop despite its limitations.

Redesigning governance

Earned autonomy, not assumed autonomy

The resolution isn't to abandon human oversight. It's to redesign it. Several models for evolved governance have emerged, all converging on the same principle: graduated autonomy, where AI earns independence through demonstrated performance rather than receiving it by default.

Autonomous vehicles provide the clearest real-world model. A teleoperation architecture allows a single human operator to oversee dozens of vehicles simultaneously. Each human intervention generates training data, creating a flywheel: the more the AI drives, the better it gets, the fewer interventions it needs, the more vehicles each human can oversee. Human-centered design of the operator interface reduced errors by 67%. The human doesn't disappear, the human's role transforms from driver to fleet architect.

This is exactly the pattern visible in Cursor's three eras. In the first era (tab completion), the developer writes code and AI suggests the next line, the human is the driver. In the second era (synchronous agents), the developer and AI work in tandem, the human is a co-pilot. In the emerging third era (autonomous agents), the developer defines the problem and reviews the output, the human is the architect. Each era represents a governance phase transition. Cursor isn't theorizing about graduated autonomy; they're living it, with 35% of their own pull requests already in the third-era model.

The principle that connects these examples is earned autonomy. AI doesn't get independence because it's convenient or because human review is slow. It earns independence by demonstrating, measurably, that its output meets the quality criteria that humans define. The governance layer doesn't thin because we give up on oversight. It thins because the system earns trust through performance, the same way a junior engineer earns more autonomy as they demonstrate judgment.

Progressive autonomy

The governance model that replaces human-in-the-loop, already emerging at the leading edge. Switch between phases to see how oversight evolves as AI earns trust.

Human role

Reviews every AI output against human judgment. Retains all decision authority, and nothing ships without explicit human approval. Defines what "good" means through decisions AI learns from. Human drives.

90%
10%
AI role

Produces recommendations that are logged never executed. Observes human decisions to learn patterns and criteria. Like Cursor's first era, AI suggests next lines. Outputs are training signals not actions.

10%
90%

AI runs in parallel with every human decision but does not execute independently. It logs recommendations against real outcomes so teams can measure alignment before delegating live work. Advancement gate: 90% alignment with human decisions across 100+ reviewed cases.

This is where the Shifting Ratio Model intersects with governance. The ratio of human-to-AI execution doesn't just shift because AI gets better at producing things. It shifts because governance models evolve to accommodate what AI has proven it can handle reliably.

For startups, the reinforcing loop is already compounding. Smaller teams mean less coordination overhead, which means deeper AI integration, which means the team can stay small. The governance challenge isn't slowing down, it's keeping quality gates in place as speed accelerates.

Agencies trail startups but lead enterprise, structurally advantaged by short feedback loops and greenfield project types. Client review replaces architecture review boards. The governance mechanism is lighter, so the ratio shifts faster.

Different speeds, same direction

Same trajectory, different pace

But the shift happens at different speeds, for structural reasons. For startups doing greenfield work with small teams, the governance surface area is small. Review cycles are short. The person who wrote the problem definition is often the person who reviews the output. Progressive autonomy advances quickly because feedback loops are tight and the consequence of an error is a bug fix, not a regulatory incident. These teams are already operating with significant AI autonomy.

For enterprise teams working on brownfield, regulated, high-complexity systems, the governance requirements are genuinely heavier. Code review has compliance implications. Architectural decisions carry multi-year consequences. The progressive autonomy model still applies, but the thresholds for advancement are higher and the phases take longer to traverse. This isn't organizational sluggishness, it's a rational response to higher stakes.

The trajectory is the same. The pace is different. And the governance model has to be designed for both, for the domain where 35% autonomous pull requests is today's reality, and for the domain where 75% of leaders still require human validation on every output.

The teams that navigate this well won't be the ones that rush to remove humans from the loop or the ones that cling to manual review as volume overwhelms them. They'll be the ones that design governance for earned autonomy, building systems where AI demonstrates its reliability, humans define the standards and monitor the metrics, and the ratio shifts as trust is justified by evidence, not assumed by default.

The shifting ratio: three tracks, same direction

Drag the timeline to see the shift over the next 12 months.

Now
AI-native startups
Greenfield, <20 people
AI 55%
45%
Agencies & studios
Zero-to-one, client work
AI 45%
55%
Enterprise
Brownfield, regulated, 500+
AI 22%
78%
Bridge features

Everything you build today is temporary.
Design for removal.

Software engineers have been designing temporary things on purpose for decades. They just don't always admit it.

Martin Fowler's Strangler Fig pattern, transform, co-exist, eliminate, is a migration strategy built around a deliberately temporary layer. The routing façade that directs traffic between the old and new system exists only until the migration is complete. Fowler's observation from 2004 remains sharp: "When designing a new application you should design it in such a way as to make it easier for it to be strangled in the future. Let's face it, all we are doing is writing tomorrow's legacy software today."

TOGAF formalizes the concept at enterprise scale as "transition architectures", explicit stepping stones between the current state and the target state, with gap analysis required at each boundary. Greg Young took it to its logical extreme: the time it takes to fully rewrite any component should not exceed one week. His entire design philosophy optimizes for one thing, the ability to delete code and start over.

And polyfills may be the purest bridge feature ever created. A polyfill checks whether the browser supports a feature. If it does, the polyfill does nothing. If it doesn't, the polyfill provides the feature. Its entire purpose is to become unnecessary. core-js has been downloaded over nine billion times and runs on more than half of the world's most visited websites. It is, by design, working toward its own irrelevance.

These patterns share a property: they are deliberately temporary implementations designed with their own removal in mind. The concept isn't new. What's new is the speed at which things become temporary.

Prompt engineering: from hot job to historical artifact

Eighteen months from peak to obsolescence

Consider prompt engineering. In 2023, it was the tech industry's hottest new job. Job listings peaked in April 2023. Companies created dedicated prompt engineering roles. Training programs sprang up. LinkedIn profiles with the title proliferated.

By early 2025, job listings for "prompt engineer" had declined to near zero. LinkedIn profiles with the title dropped 40%. The skill didn't die, it dissolved. As models improved, they became less sensitive to prompt variation. Prompting best practices were absorbed into the platforms themselves. Anthropic released tools to "at least partially automate" prompt engineering, a meta-bridge that facilitates the obsolescence of the skill it supports. Gartner projected that 70% of enterprises would use AI-driven prompt automation by 2026, effectively eliminating dedicated prompt roles.

One AI training academy captured the trajectory perfectly: prompt engineering "was a transitional skill, the bridge to prompt architecture and agentic system design." The skill existed because early models needed careful human guidance. As models became more capable, the guidance became less necessary. The bridge served its purpose and was decommissioned, not through any deliberate decision, but because the gap it bridged closed.

This happened in roughly eighteen months. From hot job to historical artifact.

The contested case: RAG

When you can't tell a bridge from a permanent fixture

Not every bridge is so cleanly resolved. The debate around retrieval-augmented generation, RAG, illustrates how difficult it can be to distinguish a bridge from a permanent fixture.

The case for RAG as a bridge: context windows were small, models couldn't hold enough information to answer questions about large document sets, so engineers built retrieval pipelines to fetch relevant chunks and inject them into prompts. One practitioner wrote a "RAG obituary" in late 2025, calling it "a clever workaround for a context-poor era" and predicting that in hindsight, "RAG will look like training wheels. Useful, necessary, but temporary."

The case against: context windows have grown from thousands to millions of tokens, but accuracy still degrades with noise. Enterprise data requires permissions-aware retrieval that context windows can't replicate. Latency scales with prompt length. Practitioners point out that RAG solutions are embedded in ChatGPT, Cursor, and Claude Code, the most important players in AI still employ retrieval. One analyst observed that RAG "has been dying since the first large context window LLM," and each predicted death has proven premature.

The paper doesn't take a position on whether RAG is a bridge or a permanent pattern. That ambiguity is the point. Distinguishing between features that will be replaced and features that will persist is genuinely difficult, especially in a domain where capability changes quarterly. The ability to design for either outcome, to build RAG pipelines that can be removed cleanly if context windows make them unnecessary, or that can evolve if they prove enduringly valuable, is the architectural skill that matters. The pattern isn't "everything is temporary." The pattern is "design as if it might be."

Why bridge features matter now

Bridge lifespans measured in model generations

The reason this matters now, the reason bridge features deserve a named concept rather than remaining an implicit engineering practice, is that AI is compressing the lifespan of bridges from years to months.

Cursor's CEO provides the sharpest example. The three eras described in Section 3, tab completion, synchronous agents, autonomous agents, represent a product whose current interaction model is explicitly temporary. The first era lasted roughly two years. The second may not survive one.

A platform CEO publicly stating that his product's current interaction model is temporary. Not in a theoretical sense. Not as a hedge. As a timeline: under a year. And this is a product with hundreds of millions in annual revenue and hundreds of thousands of paying customers who chose it specifically for its current interaction model.

OpenAI's model deprecation tells a similar story. When GPT-4o was retired from ChatGPT in early 2026, only 0.1% of users were still choosing it daily. The characteristics users loved about GPT-4o, its warmth, its conversational style, were absorbed into its successors as configurable settings. The bridge feature's traits merged into the replacement. The feature disappeared; the value persisted.

The SWE-bench trajectory from Section 3 provides the macro view. AI coding capability went from resolving under 5% of real engineering problems to over 70% in twelve months. Any feature, workflow, or governance model designed for the low-capability world is obsolete in the high-capability world. And any design for today's world will be obsolete in the next. The bridge lifespan isn't measured in product cycles anymore. It's measured in model generations.

The Anti-Patterns

Nothing more permanent than a temporary fix

The anti-patterns are well documented and worth naming, because the most common failure mode for bridge features isn't that they don't work, it's that they work well enough to stick around forever.

Software practitioners have independently converged on the same observation: "nothing more permanent than a temporary fix." One describes it as code that "will survive three company re-orgs, two CTOs, and a migration to the cloud." Another: "That temporary fix becomes permanent... now it's load-bearing. Touch it and something breaks."

CSS vendor prefixes are the canonical cautionary tale. Browser vendors introduced prefixes (-webkit-, -moz-, -ms-) as bridge features, use them while the spec is in progress, drop them when the standard is finalized. What happened: developers used prefixed features on production websites despite their experimental status. The bridge became load-bearing. Removing the prefixes would break live sites. The result: a new bridge (Autoprefixer, a build tool that manages prefix insertion and removal automatically) was required to clean up the old bridge. A bridge to remove a bridge.

The core-js sustainability problem illustrates a different failure mode. The library has nine billion downloads and runs on most of the world's top websites. Its sole maintainer has publicly documented earning roughly $400 per month for maintaining infrastructure that half the internet depends on. Bridge features struggle to attract investment precisely because everyone understands they're temporary, but "temporary" can last a decade when no one invests in the removal.

Organizations that have learned this lesson enforce expiration actively. Google mandates auto-removal of feature flags after 30 days unless explicitly renewed. Netflix tracks "flag health" dashboards monitoring the age and ownership of every active flag. Amazon requires quarterly reviews of all feature flags to prevent accumulation. These are organizations that have learned, through accumulated scar tissue, that the default state of any temporary implementation is permanence unless you actively design against it.

The anti-patterns cluster around three failures: not enforcing expiration (lacking hard deadlines means temporary becomes permanent by default), not communicating disposable status (stakeholders build dependencies on code they don't know is meant to be replaced), and coupling temporary to permanent systems (deep integration makes removal expensive enough that it never happens).

Designing for removal

Reversibility, locality, safety of change

The design principle that emerges is straightforward: in an era where the ground shifts every six to twelve months, every architectural decision should include a removal strategy alongside its implementation plan.

Delete-driven design, as one practitioner framework describes it, optimizes for three properties: reversibility (can I undo this decision?), locality (does changing this touch only its own files?), and safety of change (can I delete this and know exactly what breaks?). These aren't novel concepts. They're the principles behind microservices, feature flags, and loose coupling, reframed for an environment where the rate of change has compressed dramatically.

Greg Young's one-week rule takes on new urgency. If any component in your system takes more than a week to rewrite from scratch, it's too large, too coupled, or too opaque. In a world where AI can rewrite most components in hours, the threshold should arguably be lower. The goal isn't to build disposable software. The goal is to build software that assumes its own replacement, and makes that replacement painless rather than traumatic.

This principle applies beyond code. The governance models described in the previous section, human-in-the-loop review processes, synchronous code review at scale, are themselves bridge features. They exist because current AI isn't reliable enough to operate without oversight. As AI earns autonomy through demonstrated performance, these governance models will evolve. Teams that design their review processes as permanent fixtures will find them calcified and resistant to change. Teams that design them as bridge features, with explicit criteria for when to lighten the review, measurable thresholds for autonomy, and architecture that supports graduated trust, will adapt as the capability landscape shifts.

Build for where AI will be tomorrow, not just where it is today. And build with the assumption that "tomorrow" arrives faster than you expect.

The adaptive process

The bottleneck has moved.
The process must move with it.

The linear model was elegant and functional for the world it was designed for. Roles defined what to build. Process sequenced the work. Tools executed each phase. The result was a product. Each stage completed before the next began. Handoffs carried intent from one discipline to the next. The bottleneck was execution, how fast skilled humans could turn ideas into working software.

AI hasn't just accelerated the execution phase. It has introduced feedback loops that compress the entire cycle. A product manager can now generate a working prototype while defining requirements, the "what to build" and "how it might work" happen simultaneously rather than sequentially. A designer can iterate on real code rather than static mockups, testing variations in production-quality components rather than approximations. An engineer can explore ten implementation approaches in the time it used to commit to one.

The exploration-to-execution gap that justified the linear model, the reason discovery had to precede delivery, the reason design preceded engineering, has collapsed. The production process looks fundamentally different too:

The adaptive production process

The same four stages from Figure 1, restructured for what's emerging now.

DefineProblems & scopeReviewQuality gatesJudgeTrade-offsShipFinal authorityGenerateCode, componentsBuildIntegrate, testIterateOptimize, fixDeployShip, monitor

How each role transforms

Execution converges, judgment diverges

Roles are still human, but expanded. Each person's reach extends into adjacent disciplines because AI handles the execution that previously required specialized skill. The designer can produce code. The engineer can explore design variations. The PM can build a dashboard. Execution is converging. What diverges, what concentrates, is judgment. Visual judgment. Technical judgment. Business judgment. The human contribution narrows toward governance: defining what "good" means and evaluating whether AI's output meets that standard.

Process is no longer sequential. It's iterative, with feedback loops flowing backward from tools to process to roles. A prototype generated in minutes informs the problem definition. A code review reveals a design issue that triggers a redesign. The cycle time compresses from weeks to hours. The process becomes adaptive, not because discipline disappears, but because the barriers between phases become permeable.

Tools have agency. They're no longer passive instruments that wait for human input. AI-powered tools suggest, generate, and propose. They create artifacts that didn't exist before the human started the session. The tool is a collaborator, not a canvas. And the governance of what it produces, the review, the refinement, the quality gate, becomes the primary human task.

The Adaptive Production Roles

Execution converges, judgment diverges.

Human Governs
AI Executes

DESIGN · AI-Native Design Engineer

Define

Visual direction, brand identity, UX patterns, interaction models

Generate

Component code, layouts, design tokens, variant exploration

Review

Visual polish, consistency across components, experiential judgment

Build

Assemble component systems, style libraries, responsive layouts

Judge

Brand exceptions, taste calls, edge-case UX decisions

Iterate

Refine based on judgment, optimise assets, adjust variants

Ship

Final visual authority and sign-off

Deploy

Documentation, changelog, accessibility checks

ENGINEERING · AI-Native Engineer

Define

Architecture, constraints, technical standards, security posture

Generate

Feature implementations, test suites, scaffolding, boilerplate

Review

Code quality, performance, security audit, edge cases

Build

Assemble integrations, pipelines, test infrastructure

Judge

Architectural trade-offs, scaling decisions, technical debt calls

Iterate

Refine implementations, fix defects, optimise performance

Ship

Final technical authority, release approval

Deploy

CI/CD execution, production deployment, monitoring setup

PRODUCT · AI-Native PM

Define

Problem framing, success criteria, business alignment, prioritisation

Generate

Specs, working prototypes, dashboards, competitive analysis

Review

Outcome alignment, user value validation, business fit

Build

Assemble documentation, analytics, release artifacts

Judge

Scope trade-offs, priority exceptions, go/no-go decisions

Iterate

Refine specs and criteria based on updated judgment

Ship

Final product authority, release decision

Deploy

Release notes, status updates, analytics activation

Illustrative examples, not a fixed role taxonomy. Other cuts (for example frontend, backend, or research) are equally valid. This snapshot focuses on the transitions most visible today across design, engineering, and product.

The governance layer

Judgment applied at the right moments

The product, the artifact, is unchanged. A button is a button regardless of whether a human or an AI wrote the code. What changes is everything upstream of the artifact: who decided it should exist, how it was explored and refined, what tools produced it, and how its quality was governed.

The human role in this model is governance. Not in the bureaucratic sense (not review boards and approval chains) but in the sense of judgment applied at the right moments. Defining what to build and why. Setting quality criteria. Evaluating output against standards. Making the calls that require context, taste, and accountability. This governance layer sits above the entire production chain, and it thins over time, not because humans become less important, but because AI earns autonomy through demonstrated performance. The thinning is progressive, measurable, and earned. Not automatic.

The Adaptive Production Tools

A landscape snapshot, not a prescriptive stack.

Generalist

Claude Chat / Cowork - Define · Review · Judge

Cross-product context and architectural thinking. Handles brainstorming, delegation across tasks, and maintains continuity across the adaptive process, from problem definition through governance review.

ChatGPT - Define · Generate

General-purpose reasoning and rapid ideation. Useful for drafting, exploratory analysis, and parallel idea generation where breadth of thinking matters more than deep domain specificity.

Deep Research

Perplexity - Define

Citation-backed source synthesis for evidence-driven decisions. Competitive analysis, market data, and structured research where traceability to original sources is critical for governance.

Gemini - Define

Deep research with large context windows. Trend analysis, market intelligence, and long-document synthesis, processing the volume of information that AI-native workflows generate.

Manus - Define · Generate

Autonomous multi-step research agent. Executes complex web tasks, synthesises across dozens of sources, and delivers structured findings, shifting research from human execution to human governance.

Ideation + Canvas

FigJam - Define

Collaborative whiteboarding for journey maps, workshops, and alignment sessions. The ideation layer where roles converge: designers, engineers, and PMs contributing to shared visual thinking.

Miro - Define

Diagramming and async ideation for distributed teams. User flows, system architecture, and strategic alignment, the spatial thinking layer that complements AI-generated artifacts.

Engineering

GitHub - Review · Ship · Deploy

Version control, CI/CD, code review, and Actions workflows. The governance infrastructure where human judgment meets AI-generated code, with pull requests as the primary quality gate.

GitHub Copilot - Generate

Inline code completion and PR summaries. The first wave of AI-assisted engineering, accelerating individual developers while creating the output volume that strains traditional review.

VS Code - Generate · Build

Extensible editor with growing MCP integrations and AI extension ecosystem. The bridge between traditional development workflows and AI-native toolchains.

Cursor - Generate · Build · Iterate

AI-native code editor with agent mode and autonomous PR generation. The shift from synchronous human-guided coding to asynchronous AI execution with human governance of outputs.

Claude Code - Generate · Build · Iterate

Terminal-based agentic coding operating directly in the developer's environment. AI as autonomous collaborator: define tasks, review results, govern quality.

OpenAI Codex - Generate · Build

Cloud-based coding agent with parallel task execution. Multiple workstreams running simultaneously, the throughput that demands new governance models beyond sequential code review.

Replit - Generate · Build · Deploy

Collaborative cloud IDE with AI agent and deploy-from-browser. Full development environment with instant deployment, engineering infrastructure for distributed teams.

Vercel - Deploy

Deployment, hosting, and edge infrastructure. Closes the loop from code generation to production, enabling the rapid iteration cycles that make the adaptive process possible.

Prototyping + Zero-to-One

v0 - Define · Generate

AI interface generation from natural language prompts. PMs attach working prototypes to PRDs. Designers explore variations in real code. The tool that collapses the exploration-to-execution gap the linear model was built around.

Lovable - Generate · Build · Deploy

Prompt-to-app full-stack generation. A working prototype in minutes rather than weeks of sequential handoffs, compressing the discover-design-build cycle into a single session.

Bolt - Generate · Build · Deploy

Browser-based full-stack agent with instant deployment. Zero-friction from idea to running code. Used across all three roles, the clearest evidence that execution capability is spreading while judgment remains role-specific.

Design System Tooling

Pencil - Generate · Build

IDE-native infinite canvas with bidirectional MCP integration. Design files stored as open JSON in Git. 100K users in five months, validating demand for code-aware design canvases, but single-player with no governance layer.

Paper - Generate · Build

Canvas where every element is real HTML and CSS, not vector approximations. Browser-based, accessible to designers and PMs. Approaching the "one source of truth" vision from the design side, with React component rendering on their roadmap.

MagicPatterns - Generate · Build

AI prototyping with multiplayer canvas and design system integration via Figma, Storybook, and GitHub. 100K+ users. The team collaboration layer that Pencil lacks, but generates new UI rather than managing existing component libraries.

Brand Design + Product Marketing

Flora - Generate

AI brand identity and visual systems with style-consistent asset generation. Maintains visual governance at scale, AI-generated assets meeting brand standards without bottlenecking on designer capacity.

Recraft - Generate

Vector illustrations, icons, and on-brand visual assets at scale. Production-quality output respecting design system constraints, AI execution with human governance of visual quality.

Adobe Firefly - Generate

Commercially licensed image generation integrated with Creative Cloud. Solves the IP governance problem other generative tools ignore, safe for production use with clear licensing.

Illustrative examples, not an exhaustive or recommended stack. New tools are emerging quickly, and many strong options are not shown here.

What the stack reveals, even at a glance, is natural specialisation. These tools are not positioning themselves into tidy categories by design. They are clustering around distinct phases of the production process because the work itself demands it: reasoning, research, ideation, engineering, prototyping, design systems, brand. The shape of the adaptive production model is already emerging in the market, whether or not any individual tool vendor intends it.

Where the landscape stands

A snapshot of early maturity, not a settled map

This is also a snapshot of early maturity, not a settled landscape. The maturity gradient across categories is telling. Engineering tooling is furthest along, already cycling through its second and third interaction paradigms in under two years. Prototyping tools grew explosively but remain in flux, their current form almost certainly transitional. Design system tooling is the youngest category by a visible margin: Pencil launched five months ago, Paper is in open alpha, MagicPatterns is still primarily a prototyping layer. The wave has reached this part of the stack last, and the tools here are only just beginning to address the structural challenge of managing component libraries as AI accelerates everything around them. Expect every section of this list to look materially different within twelve months, and expect the categories themselves to shift as capabilities converge and new interaction patterns emerge.

The Shifting Ratio Model captures the pace and direction of this change, but as Section 7 documented, the pace varies dramatically by context. AI-native startups doing greenfield work are already operating at majority-AI execution. Agencies and studios trail close behind, structurally advantaged by short feedback loops and zero-to-one project types. Enterprise teams move slower, not because they adopt slower, but because brownfield, regulated, high-complexity work is where AI provides the least leverage and the governance requirements are heaviest.

All three tracks are moving in the same direction. The endpoint isn't "AI replaces humans." It's "AI handles execution while humans govern." The distinction matters because it reframes the question from one about replacement to one about redesign. The challenge isn't holding onto the current process or surrendering to automation. The challenge is designing governance that scales with AI capability: earned autonomy, not blanket trust or blanket oversight.

What comes next

The ground is shifting.
The question is, what you build on it?

The bridge features concept completes the picture. If the ratio is shifting, and the tools are evolving, and the governance model is transforming, then the systems you build today to manage this transition are temporary by design.

The current-generation code review process, designed for human-authored pull requests, is a bridge feature. As AI output volume overwhelms human review capacity, the process will evolve toward automated pre-screening with human review reserved for edge cases and architectural decisions.

The synchronous interaction model with AI coding tools, where a developer guides the agent step by step, is a bridge feature. The leading-edge tools are already transitioning to asynchronous models where the developer defines the problem and reviews the output.

The parallel design system, one in Figma, one in code, is a bridge feature. As the source of truth consolidates around code and AI tools can work directly with production components, the mockup layer becomes an intermediary that adds overhead rather than value.

None of these will disappear overnight. Bridge features serve real needs during the transition. The mistake is designing them as permanent infrastructure. The CSS vendor prefix lesson applies: bridges that aren't designed for removal become load-bearing, and removing load-bearing bridges requires building new bridges on top of old ones.

The design principle: build for where AI will be in twelve months, not where it is today. Include a removal strategy with every implementation. Optimize for reversibility and locality. Expect to replace, and make replacement painless.

The ground is shifting

The advantage goes to the teams that redesign the process

Product development becomes an adaptive process not because any single tool or model changes it, but because the entire chain (roles, process, tools, governance) is shifting simultaneously. The transformation is already visible in the data: mainstream AI adoption paired with flat organizational delivery metrics, individual speed gains that don't translate to team outcomes, governance models strained by volume they weren't designed for, role boundaries blurring toward a common center of judgment and problem definition.

The teams that navigate this well won't be the ones that adopted AI tools first. The tools are commodity; 84% of developers are already using them. The advantage goes to the teams that redesign the process around what AI makes possible. That means governance designed for earned autonomy, not blanket oversight. Roles defined by judgment, not execution. Tools evaluated by how well they support the adaptive process, not how well they replicate the linear one. And architecture built with the expectation of its own replacement.

The ground is shifting. The question isn't whether to adapt. It's whether you design for the shift deliberately, or discover, as the data already suggests, that acceleration without adaptation creates new problems faster than it solves old ones.

Who we are

Two people + AI. Researching and
working as one unit.

Rob Surpateanu

Rob Surpateanu

Research, process, and product direction

Seventeen years in product design and development, across teams at InVision, Fresha, JustEat, and bpPulse, with consulting stints at Microsoft, Deliveroo, and Reckitt Benckiser. My academic path took me from a BA in Graphic Design to a Masters at Central Saint Martins in design thinking.

Two years ago I turned my attention to early-stage AI startups, offering product design consultancy to teams building with generative tools. What started as designing AI products became something deeper: understanding how the process itself breaks down when everyone on a team has access to AI. At LP5, I authored the white paper that grounds everything we build, and I operate the AI-native process it describes firsthand.

David Lazar

David Lazar

Systems, components, and critical AI practice

Front-end developer with over five years across React, Next.js, TypeScript, and Tailwind, specialising in the areas our research keeps surfacing as critical: design systems, component architecture, and the path from design tokens to production UI. Before LP5, I built products at early-stage AI startups. My academic path, from Electrical Engineering to an MFA in Computational Arts at Goldsmiths, shapes my relationship with AI that is both technical and deeply critical.

I've exhibited at Tate Modern through the Tate × Anthropic AI residency, and published writing on the ethics of creative AI. At LP5, that combination is the point: I know how to ship components, and I know how to question the systems those components operate within.

Lagrange Point 5

Build the right system and quality sustains itself. Not through constant correction, but through architecture that makes good outcomes the natural state.

In orbital mechanics, L5 is one of only two naturally stable equilibrium points. Objects that reach it stay there, self-organising without intervention. LP5 builds tools that work the same way.