Interfaces are dead. Long live the interface.
Thinking from first principles about what screens, input, and human behaviour look like in an agent-driven world
I’ve been thinking a lot about what interfaces will actually look like ten years from now. If you set aside the noise in AI and just reason from how humans actually behave, what we find natural, what we find awkward, and where technology is headed — what does that picture look like?
Start with the most basic question: what is an interface actually for? At its core, it’s a translation layer. It takes human intent and converts it into machine action. Every button, form field, and dropdown menu exists to bridge that gap. The interesting question is: as AI agents get better at understanding intent directly, how much of that translation layer is still necessary?
That’s the thread I want to pull on.
From Operator to Overseer
For forty years, interface design has rested on a single assumption: that a human is driving. Every button, every form field, every dropdown menu was built for a human hand and a human decision. UX was essentially the science of making that loop as frictionless as possible.
But what happens when you introduce agents into that picture? AI systems that can perceive context, reason about it, take action, and adapt — systems that don’t inherently need a human to navigate a seven-step checkout flow. The “clicks” can increasingly be done on your behalf.
So the natural question becomes: if the agent is doing the operating, what does the interface actually become?
A lot of people are saying “agents are the new apps.” I’d push that further: agents are the new users. Apps aren’t going away — if anything, more of them will get built. But they’ll be built for workflows that agents need to execute, not for humans to navigate. The interface will be designed around what an agent needs to understand and act on, with the human sitting one layer above i.e. setting the destination, not driving the route.
Two Futures for the Frontend
I think frontends will diverge into two distinct schools — both a radical departure from what we build today.
The Ambient, Immersive Frontend. If the agent is handling all the functional work, the interface stops being a control panel and becomes something closer to a window. You’re not driving, you’re observing, reviewing, occasionally steering. Think spatial computing: no mouse, no keyboard, you look at something to select it, gesture to confirm. The design philosophy moves closer to cinema or architecture than software UI. Rich visuals. Ambient status. Immersive context — not interactive menus.
The Minimal, Get-Out-of-the-Way Frontend. The second future goes the opposite direction. If the agent is doing everything, strip the screen back to the absolute minimum. A status feed. A confirmation prompt. An alert. Less dashboard, more notification layer — the interface as a quiet audit trail rather than a cockpit.
Both schools share a common principle: the era of the interface as a control surface is ending. The interface becomes either a canvas or a receipt.
The Input Problem Nobody Wants to Solve
Here’s where it gets interesting — and is unsolved.
If agents are doing most of the work, how does the human communicate intent? How do you communicate to the agent what you want?
The obvious answer seems to be voice. And voice will be a big part of this. For extended instructions, nuanced requests, anything requiring explanation — voice is the most natural input modality we have. The LLM revolution has finally made it work properly. You can speak in incomplete, contextual, messy sentences and the system understands you. The era of “I’m sorry, I didn’t get that” is genuinely over.
But voice has a social problem that technology alone cannot fix.
People find it deeply awkward to talk to their devices in public. This isn’t a niche complaint — it’s consistent, documented human behaviour. WhisprFlow, Monologue, Superwhisper, and other dictation tools have passionate early-adopter communities, but mainstream adoption is still early. You’re not going to narrate instructions to an agent on the Tube, in a café, or in an open-plan office. The social friction is real.
Some people are using specialised mics to whisper into so that they are not disturbing coworkers or people around them.
But here’s an interesting counter-thought: if the agent truly becomes the new user — as natural a presence in your life as a colleague, a friend, a family member — does that awkwardness actually dissolve? When you’re talking to your agent, you’re not talking to a device. You’re talking to someone. And nobody finds it strange to have a conversation in public. The social taboo was always about looking like you’re talking to yourself. If the agent becomes a genuine relational presence, that stigma may fade on its own.
There’s also a privacy dimension worth sitting with. Some conversations you’ll want the agent to hear. Others — sensitive, personal, confidential — you won’t. Voice as input will carry its own etiquette, the same way phone calls do today. You step outside for some; you take others at your desk. The norms will form naturally.
So voice handles private, extended, high-context input — with the social calculus shifting as agents become more embedded in daily life. But it still cannot be the only input layer.
The Omni-Button
Which brings me to what I think could be an underrated idea: radical input minimalism.
If agents handle execution and voice handles extended intent — what does the residual manual input surface look like?
I think it converges on something I’d call the Omni-Button. Not literally one physical button, but the idea that the entire active input surface collapses into a single, context-aware interaction point. At its simplest: one tap to approve, one to reject. Go or no.
But the Omni-Button is more interesting than that binary suggests. It’s not static — it evolves with context. Sometimes it’s a confirm prompt with a summary of what the agent is about to do. Sometimes it’s a slider that lets you dial the agent’s autonomy up or down for a given task. Sometimes it’s a simple status ring that glows to tell you something needs your attention. The form changes. The principle stays the same: one intentional human touchpoint, designed around the moment of decision, not the process of execution.
What the Omni-Button does is force an important design discipline — it makes the agent earn the human’s approval. Before every consequential action, there’s a moment of legibility: here’s what I’m doing, here’s why, here’s what happens next. You tap to proceed. That tap is the entire interaction. Everything else — the research, the comparison, the navigation, the execution — happened without you.
Users, in this model, do four things:
- Think — form an intent, a goal, an outcome
- Ask — communicate it (voice for complex, tap for simple)
- Audit — review what the agent proposes or has done
- Approve — the Omni-Button moment
Everything between asking and approving is the agent’s domain. The interface is the audit and approval surface. Nothing more.
Why Hallucination Isn’t the Dealbreaker Everyone Thinks It Is
The most common objection to this vision is reliability. Agents hallucinate. They use excessive tokens. They make costly mistakes. You can’t hand over execution to a system that confabulates.
Fair — for today. Not a valid objection for the trajectory.
We are living through the worst version of these tools that will ever exist. Every model shipping next month will be more accurate, more efficient, and less prone to error than the one running today. That’s not optimism — it’s the observed rate of progress over the last three years, with no signs of slowing.
More importantly: the frontier isn’t just big cloud models anymore. Edge models — small, optimised, running on-device without a server round-trip — are now more capable than the largest frontier models were twelve months ago. The model running locally on your phone today would have been considered state-of-the-art in early 2024.
This matters for agents because:
- Token efficiency is improving fast — agents will do more with fewer compute cycles
- Latency collapses — on-device inference means near-instant response
- Reliability climbs — fine-tuned small models on specific tasks hallucinate far less than general-purpose large models
- Privacy is solved — edge inference means your data never leaves the device
The reliability objection is valid today. In eighteen months, it becomes a historical footnote.
The Bigger Picture
The history of computing is the history of abstraction. Punchcards to command lines to graphical interfaces to touchscreens — at each step, a layer of complexity was hidden from the user. What changed wasn’t the underlying capability. What changed was how much the human had to manually operate to access it.
Agents are the next abstraction layer. They don’t just hide the complexity of the machine — they hide the complexity of the task. You don’t need to know how to navigate a fintech onboarding flow. You don’t need to manually compare options, fill fields, or confirm every micro-step. You state an outcome. The agent does the work. The interface shows you the result.
The button is the new fax machine.
What replaces it isn’t nothing — it’s something more honest. An interface that reflects what computing was always supposed to be: a tool that works for you, not one you work for.
Onward.