Carl Pei said it at SXSW last week. His company, Nothing, makes smartphones. He stood on stage and told the room: "The future is not the agent using a human interface. You need to create an interface for the agent to use."
A consumer hardware CEO publicly declaring that the product category he sells is the wrong shape for what's coming. That's not a hot take. That's a company repositioning before the floor disappears.
The room moved on. Most of the coverage focused on the "apps are dying" framing. That's the wrong thing to argue about. Apps are not dying next Tuesday. The question worth sitting with is quieter and more immediately useful: what does it mean that the thing we've been building — apps designed for human eyes, human fingers, human intuition — is increasingly being navigated by something that has none of those?
For thirty years, software design started with a user. A person with a screen, a mouse or finger, a working memory of roughly four things, limited patience, inconsistent behavior. Every design decision flowed from that starting point. Navigation was visual because users have eyes. Flows were linear because users lose context. Confirmation dialogs exist because users make mistakes and need to undo them.
Those constraints weren't arbitrary. They were load-bearing. The entire architecture of how software works was built around the specific limitations and capabilities of a human being sitting in front of it.
Agents don't have eyes. They don't navigate menus — they call functions. They don't get confused by non-linear flows — they parse structured outputs. They don't need confirmation dialogs — they need permission boundaries defined at initialization, not presented as popups mid-task.
When an agent tries to use an app designed for a human, it's doing something like what Pei described: hiring a genius employee and making them work using elevator buttons. The capability is real. The interface is friction. The agent scrapes what it can, simulates the clicks, and works around the design rather than with it.
This works. Barely. Temporarily.
The split is already visible in how developers describe their workflows.
Karpathy, in a March podcast, described moving from writing code to delegating to agents running 16 hours a day. His framing: macro actions over repositories, not line-by-line editing. The unit of work is no longer a file. It's a task.
Addy Osmani wrote about the same shift at the interface level: the control plane becoming the primary surface, the editor becoming one instrument underneath it. What used to be the center of developer work is becoming a specialized tool for specific moments — the deep inspection, the edge case, the thing the agent got almost right and subtly wrong.
These descriptions share a structure: something that was primary becomes secondary. Something that was implicit becomes explicit. The developer who used to navigate the editor now supervises agents. The app that used to be the product now needs to also be a legible interface for non-human callers.
Here's what that means practically, for anyone building software right now.
The apps built for human navigation will still work for humans. That's not going away. But increasingly, those apps will also be called by agents acting on behalf of humans — booking the flight, filing the form, triggering the workflow. And when the agent calls your app, it doesn't navigate. It looks for a contract: what can I call, what will you return, what happens when I'm wrong.
Most apps don't have that contract. They have a UI. They have an API if you're lucky. But the API was designed as a developer convenience, not as a primary interface for autonomous callers. The rate limits assume human usage patterns. The error responses assume a developer reading them. The authentication assumes a human with a session.
None of those assumptions hold for agents.
The developers who see this clearly are already building differently. Not abandoning the human interface — users still need screens, still need control surfaces, still need to understand what's happening on their behalf. But building the agent interface in parallel. Structured outputs. Explicit capability declarations. Error responses designed to be parsed, not read. Permission boundaries that don't require a human to click through them.
The spec is where this shows up first.
A product spec written for human implementation describes features. What the user sees. What they can do. How the flow works. A spec written for agent implementation describes contracts. What the system accepts. What it returns. What it guarantees. Where the boundaries are.
The [ASSUMPTION: ...] tags in spec-writer exist because that boundary — between what was specified and what the agent decided — is where agent implementations go wrong. Not in the happy path. In the assumptions that weren't stated because a human developer would have asked a clarifying question.
An agent doesn't ask. It fills the gap with whatever its training suggests is most plausible. If the assumption was wrong, you find out in production.
The spec that makes agent implementation reliable is the one that surfaces the assumptions before the agent starts. Not because agents are unreliable — they're remarkably capable within a well-defined contract. But because the contract has to be explicit in a way it never had to be when the implementer was human and could pick up the phone.
Carl Pei's argument is about device form factors. The smartphone built around apps and home screens doesn't fit a world where agents intermediate between intention and execution.
The same argument applies to every level of the stack.
The database schema built for human-readable queries. The API designed for developer convenience. The workflow tool that requires clicking through five screens. The SaaS product whose entire value lives in a visual interface with no programmatic equivalent.
None of these are broken today. They will accumulate friction as agent usage grows — the same way command-line tools accumulated friction when GUIs arrived, the same way desktop software accumulated friction when everything moved to the web.
The difference is pace. The GUI transition took a decade. The web transition took another. The agent transition is compressing.
The developers who will build the next layer are already asking a different question. Not "how does the user navigate this?" but "how does the agent call this?" Not "what does the confirmation dialog say?" but "what are the permission boundaries at initialization?" Not "how do we make the flow intuitive?" but "what does the contract guarantee?"
These are not new questions. They're the questions API designers have always asked. What's new is that they now apply to everything — not just the backend service, but the whole product. The human interface remains. It becomes one layer, not the only layer.
The future is not the agent using a human interface. The future is building interfaces designed for both — and knowing which decisions belong to which layer.
The developers who get that distinction early will have built the right foundation before it becomes mandatory. The ones who don't will spend the next three years retrofitting contracts onto systems that were never designed to have them.
That's the same pattern as every previous transition. The only thing that changes is how much runway you have before the friction becomes a crisis.
Right now, you still have some.
Top comments (22)
But does having a well designed (REST) API not solve/address this?
Reminds me of a dev.to article that I came across some time ago, where someone said:
"What if I do not design a GUI or a frontend for my system but just let an AI agent, talking to the API, be my 'user interface' ?"
That sounded like a fascinating idea - maybe not an "AI agent" but more a kind of "chat bot" ...
The "no GUI, just API as UI" idea is exactly the direction the piece is pointing and it's closer than most people think. The REST API is the right foundation. What changes is what the API needs to say about itself.
When a human developer consumes an API, they read the docs, infer the intent, ask Slack. When an agent consumes an API, it needs the contract to be explicit — what's callable, what the boundaries are, what failure looks like, what it's permitted to do on behalf of the user. The API design question shifts from "how do I make this easy to call" to "how do I make this safe to delegate."
The chat bot framing is close but slightly off — the interesting case isn't a bot with conversational UX, it's an agent with structured permissions acting autonomously. Less chat, more contract.
Well, you point the agent to the API docs and let them read it (they tend to be pretty good at that), and you're halfway there - doing all of the other investment might make sense economically, or not ... but of course it depends on your goals.
Yeah the "bot" idea was that the user can have a conversation with the bot (by way of an alternative UI), and then the bot (or agent) talks to the API (maybe via an MCP server?) - I'd have to look up what the original article said exactly.
The MCP server is the bridge and it changes what "the API" means. Instead of a developer calling endpoints, you have a protocol that declares capabilities, describes what's callable, and defines the contract upfront. The bot doesn't need to read API docs. It gets a structured description of what the system can do. That's closer to what agent-native design actually requires than a well-documented REST API alone.
Do protocols or standards exist for that?
MCP is the closest thing to a standard right now — Anthropic's Model Context Protocol defines how agents discover capabilities, call tools, and receive structured responses. It's gaining adoption fast: Claude Code, Cursor, and several other agents already support it natively. The OpenAPI spec covers the REST layer but doesn't address agent-specific concerns like capability declarations, permission scoping, or failure contracts. There's active work on agent identity and trust standards but nothing settled yet. Early days, but MCP is where the convergence is happening.
Right, nevertheless it sounds like quite a bit of extra work for API developers - but of course it can/will pay off, depending on the goals ... I should find that article that I referred to, I'm now intrigued myself!
Once an agent is your primary “user”, you’re forced to admit the real product is the contract, and the UI is just a skin and oversight layer on top. In my own projects, as soon as I design clear agent‑first capabilities (what it can do, limits, and failure modes), the human UI naturally becomes thinner and more about audit/override than driving every step.
"Audit and override rather than driving every step" is the governance clock applied to interface design. The UI's job shifts from primary control surface to reconciliation layer — you're not operating the system through it, you're verifying the agent operated correctly and intervening when it didn't. That's a fundamentally different design brief than thirty years of UX has been optimizing for.
Your article is the first of its kind that I have seen. Well done and very proud to have been able to read it.
Keep going ! More people should read this and think long and hard about the implications that this brings.
Thank you — means something coming from someone thinking carefully about cloud security. The implications run deep.
You are welcome :)
This is one of those reframes that sounds obvious once stated but has real practical consequences. We've been thinking through this at Othex while designing our AI-assisted workflows.
The "agent using a human interface" pattern is fragile by design — you're adding a layer of brittleness (UI parsing, click coordination) when the underlying problem is just data and actions. It's like building a robot to push elevator buttons instead of just wiring the elevator directly.
The harder question for us has been: what do you do when the "real interface" doesn't exist? A lot of legacy systems only have the human-facing UI as the accessible surface. No API, no webhook, nothing. In those cases, browser automation isn't laziness — it's the only option.
But for greenfield and modern systems, 100% agree. Design the agent interface first, the human interface second. It changes what the system is fundamentally about.
The legacy systems case is the honest gap in the piece. Browser automation for legacy UI-only systems isn't the fragile pattern — it's a reasonable bridge while the underlying contract gets extracted. The fragility argument applies to greenfield systems that choose the UI-only path when they didn't have to.
The harder version of your question: what happens when a legacy system gets acquired or integrated and the new owners discover there's no API surface? The retrofit isn't just adding endpoints — it's excavating a contract that was never explicitly designed, because the UI handled all the ambiguity that a contract would have to resolve. That work is expensive and almost always gets deferred until an agent breaks something by working around the UI.
Agent-first for greenfield is the easy win. Legacy contract extraction is the actual problem.
This is the clearest articulation of something I've been struggling to explain to my team. We've been building internal tools with traditional CRUD UIs, and recently started adding MCP server endpoints so our agents can interact with the same data. The funny thing is -- the MCP interface ended up being simpler and more honest about what the system actually does than the UI ever was.
The UI hides complexity behind modals and multi-step wizards. The agent interface forced us to define: what are the actual operations? What are the real constraints? What are the failure modes? It's like the agent API became the source of truth and the UI became a skin on top of it.
I think the transition is gonna be rougher than people expect for SaaS products that only have a UI and no real API. They'll basically need to rebuild their product contract from scratch.
"The agent API became the source of truth and the UI became a skin on top of it" — that's the transition stated precisely, and it happened faster than most people expect because the MCP forcing function is real.
The UI wasn't hiding complexity out of negligence. Modals and wizards were solving a genuine human problem — progressive disclosure, the ability to change your mind mid-flow, context at each step. Strip those away and you don't get a simpler product. You get the product's actual contract, which was always there underneath the chrome.
Your point about SaaS products with UI-only products is the one that should make people nervous. They don't just need to add an API. They need to excavate a contract that may never have been explicitly designed — because the UI was handling all the ambiguity that a contract would have to resolve.
As someone building an AI coding terminal right now, this reframing is spot on. The tools that win won't be agents clicking buttons for us -- they'll be the ones with native agent APIs from day one. The UI becomes the debug view, not the primary interface.
"The UI becomes the debug view" . That's the governance clock applied to interface design. The UI's job shifts from navigation to reconciliation: you're not operating the system through it, you're verifying the agent operated correctly. Build the agent API first, then the UI as the oversight layer.
Been heavy thinking about this.
Building for agents isn't about abandoning the human interface, it's about recognising that the contract was always the real product; the UI was just the only way to express it.
The contract was always the real product" That's the piece in one sentence.
And it explains why retrofitting is harder than building agent-native from scratch. You're not adding a new interface to an existing product. You're excavating the implicit contract that was always there but never had to be stated because humans resolved the ambiguity through context and judgment. Agents can't. The retrofit work is making legible what was always implicit, and implicit contracts are harder to surface than they look.
Some comments may only be visible to logged-in visitors. Sign in to view all comments.