A Five-Agent Stack Is Not a Company
Five specialized agents can look like a startup. A company only appears when ownership, authority, handoffs, and receipts survive the work moving between roles.
This piece continues Future Shock’s agent coordination series. Earlier work looked at protocols, launch decisions, and crisis benchmarks. This one asks a smaller workplace question: what has to exist before five specialized agents behave less like a demo and more like an organization?
The demo has a product agent, an engineering agent, a QA agent, a marketing agent, and an operations agent. On paper, the stack looks like a startup.
Then the product agent hands a half-scoped feature to engineering, engineering hands a half-tested file to QA, QA flags a risk, marketing asks whether it can announce the launch, and operations wants to know who approved any of this. The shared document has all the context. Nobody owns the next move.
That is the moment the company disappears.
The job title costume
Agent demos keep borrowing the shape of a company because the shape is easy to understand. Product scopes the work, engineering builds it, QA tests it, marketing thinks about adoption, and operations tries to keep everyone from shipping a small disaster with nice gradients.
The roles are familiar enough that the demo feels halfway institutional before anything has been institutionalized. A viewer sees five agents around a task and fills in the rest: meetings, ownership, approval, accountability, the quiet social machinery that makes an organization more than a group chat with job titles.
Future Shock’s Building Is Not Shipping paper used a similar five-agent team: product, tech, design, growth, and QA/Ops. The agents built RunLens, a deliberately modest single-file HTML viewer for multi-agent run folders. The interesting result arrived after the artifact existed. Under one launch standard, the agents cast zero ship votes across fifteen ballots. Under another, they cast fifteen out of fifteen.
The file did not make the system company-like, and the ballot was not a neutral measurement of the file. The launch rule changed what “done” meant.
That distinction is where most agent-company talk gets slippery. A role label tells an agent what it is supposed to care about, but not what the agent owns, what it may approve, which evidence it must carry forward, or what happens when its output becomes someone else’s input.
A job title is not an ownership model.
Where the company actually appears
A handoff is a transfer of responsibility, not a chat message from one agent to another.
Human companies are covered in boring objects that make those transfers survivable: tickets, owners, status changes, approval records, incident reports, audit logs, escalation paths, signoffs. They are not glamorous, but they are the reason work can move across departments without dissolving into folklore.
The useful question is whether the receiving agent can see who owns the work now, what changed, what evidence supports the change, who can approve the next move, and where the receipt lives.
That is why the agent story keeps running into the same operational wall. In The AI Automation Problem Is Mostly Not AI, the missing pieces were intake, permissions, handoffs, receipts, escalation, and workflow ownership. Those are not decorative enterprise words. They are the things that keep a refund from being issued twice, a medical note from being updated by the wrong system, or a sales contract from being sent before legal has signed off.
The coordination layer is the larger version of the same point. Once a system includes roles, tools, memory, shared context, approval rules, handoff schemas, and audit records, the behavior belongs to the whole room. Better models help, but the room still decides what is possible, visible, reversible, and approved.
This is the premise behind The Boring Stack Playbook: most companies do not need an agent swarm first. They need intake, handoffs, receipts, escalation, and fewer humans copy-pasting between tabs. Download The Boring Stack Playbook.
The shared file is not a brain
The obvious fix is to give every agent the same context. Put the research, plans, constraints, decisions, doubts, logs, and drafts into one shared file, then let everyone read everything. Coordination solved, at least until the file starts behaving like the office junk drawer.
Shared context feels neutral because it is symmetrical: every agent sees the same material. In practice, it becomes a treatment applied to the system, changing what agents notice, what they trust, and what they inherit.
Startup Build made this visible in a constrained setting. The agents did more than build a file and vote on it; their context mode and launch frame shaped the path from artifact to authorization. The broader lesson is careful but important: context is part of the interface.
A raw shared file mixes canonical state with abandoned options, half-true claims, stale objections, temporary task notes, and guesses that sounded confident at the time. Downstream agents can inherit all of it without knowing which parts have been validated. A weak claim from one role becomes “what we know” three steps later because it appeared in the shared record and nobody attached a receipt saying otherwise.
That failure mode is not just hallucination. It turns weak claims into shared truth, blurs ownership, mutates state without a receipt, and poisons memory with a friendly filename.
A real organization separates the state people act on from the record people can inspect. Agents need the same split. Canonical state should be small enough to act on: current owner, next owner, objective, constraints, evidence, risks, approval status. Raw logs should remain available for audit, but they should not become the thing every future agent treats as operational truth.
Pass state, not story.
Tool access does not grant authority
The protocol layer matters. Model Context Protocol standardizes one way for assistants to connect to tools and data sources. Google’s emerging Agent2Agent Protocol points in the same broad direction for agent-to-agent interoperability. APIs, command-line tools, agent-readable interfaces, and stable schemas all make the world more reachable by software workers.
Reachability does not grant authority.
An agent can call Stripe without being allowed to issue a refund. It can open GitHub without being allowed to merge to main. It can read a customer record without being allowed to update it, summarize it, or pass it to another system.
Connection is not coordination.
As of this writing, Y Combinator’s requests for startups include categories like Software for Agents, Company Brain, AI-Native Service Companies, and The AI Operating System for Companies. As market signals, they are useful. The industry is circling the agent infrastructure layer.
The missing piece is responsibility. Tool access answers whether an agent can act. A company has to answer whether this agent should act next, with which state, under whose authority, and with what record left behind.
The minimum viable receipt
The basic unit of an agent organization is the work item.
A work item gives responsibility somewhere to live. It has an owner, a next owner, an objective, a definition of done, a status, and a boundary around what the agent is allowed to do. Without that object, work passes through the system as inference: a summary here, a request there, a comment that sounded like approval but might have only meant “looks good to me.”
The handoff packet is the thing that moves between roles. It should not be the full transcript. It should be decision-ready state: what changed, what is needed next, which constraints apply, which evidence supports the claim, which questions remain open, and which risks the next agent is inheriting.
Evidence has to travel separately from confidence. If a research agent claims a customer segment is worth targeting, the marketing agent should not inherit that as shared truth merely because it appears in a polished paragraph. The claim needs a source, a confidence level, and enough provenance that another agent or human reviewer can inspect it without rerunning the entire prior conversation.
Decision records make authority visible. Accept, reject, escalate, defer, approve, delay: each of those words needs operational meaning. Startup Build showed how much changes when “ship” means full verification versus deadline-constrained demo release. In a real workflow, that ambiguity is not academic. It decides whether something reaches a customer.
Receipts make failure reconstructable. If a task breaks three handoffs later, the system should be able to show what moved, when, by whom, what changed, and what action followed. Otherwise the postmortem becomes forensics over a transcript.
Memory needs the same discipline. Temporary task state should not become durable organizational memory just because it was present at the end of a run. Some facts should expire, some should be reviewed, and some should never be remembered at all.
A real handoff answers five questions: who owns this, what changed, what evidence supports it, who can approve the next move, and where is the receipt?
The objection that should worry us
Better agents will reduce some of this friction. A stronger model can infer missing context, ask better clarification questions, repair a sloppy handoff, and notice when the packet it received does not match the evidence. Some coordination bugs really are capability bugs hiding inside process failures.
That should temper the claim, but it does not dissolve it. Human employees are much better than today’s agents at tacit context, and companies still use tickets, approvals, contracts, incident reports, audit logs, and signatures. Capability does not remove the need for authority; it raises the cost of authority becoming invisible.
The other objection cuts the opposite way. Too much structure can become bureaucracy cosplay. A typed handoff can overcompress nuance, hide uncertainty, and make the wrong thing look official. Anyone who has watched a bad ticketing system flatten real work into dropdown menus has seen this failure in human form.
The answer is layered state. The next agent gets a canonical handoff packet for action, evidence references for grounding, raw logs for audit, and an escalation path when the packet is not enough. The goal is to keep responsibility from evaporating when the work moves, not to make every workflow bureaucratic.
Accountability machines
Companies are systems for assigning responsibility when work moves and when things go wrong, not just collections of specialists.
A five-agent stack can still be useful as a workflow, a toolchain, a simulation, an internal service, or the right shape for a surprising amount of work. The danger is mistaking the role list for the institution around it.
The next useful layer is boring: work objects, state transitions, policy gates, escalation, audit trails, memory review. Less “the agents founded a company,” more “the agents left enough receipts that someone can tell what happened when the task moved.”
Until the system can show who owned the task, what state moved, who approved the next step, and where the receipt lives, it is not a company.
It is a meeting with better autocomplete.