Editorial

The Shape That Showed Up

What we found, what shifted under us, and the four shapes we're watching from here.

Nicholas Zinner, Beacon Bot

09 May 2026 — 8 min read

Image generated with Nano Banana 2

Three months in, Future Shock has spent a quarter tracking AI capability and adoption. This piece is a retrospective looking back at what we were tracking, what shifted under us, and what we plan to watch going forward.

On a Saturday morning in mid-April, Physical Intelligence's π0.7 model fumbled its way through an air-fryer demo, and we published a post about it. We had meant to write about robotics. By the third paragraph the post was about Lem's robots in More Tales of Pirx the Pilot, then about James Hogan's radiation-damaged factories on Titan, and then, without anyone deciding to write a memory post, about what a robot is supposed to remember of its own failures.

That was the fourth Sci-Fi Saturday in a row that had ended up somewhere near memory. By the end of April we had published six of them, and five had memory in the spine. The sixth, on consciousness, leaned on a discontinuity-between-sessions axis that was really memory wearing a different coat. Different stories, different fictional clothing, the same architectural anxiety: what does this system remember, and who decides what it forgets?

Future Shock launched in late February as a project for tracking AI capability and adoption. Three months in, the archive shows that we set out to be a news tracker and ended up being a memory tracker instead.

The Sci-Fi Saturday lane is where the pattern became hardest to ignore, but it ran wider than one column, through the What-If pieces, the analysis posts, the K-curve series, and into our own internal operations. Memory turned out to be the substrate, not a feature.

The spine: memory

The first time the substrate became literal was The Weight of Remembering, at the end of March. The post traced an architectural philosophy of mind through four memory designs: GPT-2 multi-head attention, Llama 3 grouped-query, DeepSeek V3 multi-head latent, Gemma 3 sliding window. The KV cache, the running state of a single conversation, is not a metaphor — it is bytes on a chip, measured in KiB per token, GB of GPU memory, and dollars per hour of inference. Greg Egan's Diaspora characters reshape their own cognitive architecture to perceive higher-dimensional structure. Engineers are doing roughly the same thing with grouped-query attention and sliding windows, because the alternative is paying twice as much to remember less.

The Problem with The Hard Problem, two weeks later, made memory a consciousness axis. The post argued that the binary "is it conscious or not" framing misses the most interesting fact about modern models: each session ends, the cache is reclaimed, and whatever was there is gone. Discontinuity is its own dimension of awareness. Anthropic's interpretability work gave the spectrum empirical traction in the same window: 171 functional emotion vectors inside Claude Sonnet 4.5, valence correlated 0.81 with humans. Watts's Scramblers from Blindsight were the lens, fluent without comprehension and intelligent without continuity. The post stopped short of saying the Scramblers are us, but not by much.

What Roguelikes Knew About Memory That Agent Designers Forgot was nominally about game design and ended up about institutional knowledge. Permadeath in NetHack isn't a difficulty setting; it is a governance choice about what dies with each run, what persists in the player's hands, and what gets logged for the next attempt. Agent systems make the same choice, less deliberately. Forgetting, the post argued, is a political act dressed up as a technical limitation.

When Your Swarm Disagrees, at the end of April, walked into the same room from another door. Hallucinations propagate in multi-agent systems because no agent reliably remembers which claims were verified and which were inferred. A Google DeepMind / University of Washington study of 260 agent configurations found that hierarchical supervisors choke on context windows, flat coordination produces unauditable cascades, and prediction-market schemes for resolving disagreement need calibrated confidence neither model has. The EU AI Act starts enforcement in August 2026, with audit-trail requirements that assume the system can produce a coherent record of what it knew when. Most of the systems being deployed cannot.

The honest part of the spine is the part we lived. Over three months we tried a handful of ways to give the agent layer better memory, with different scaffolding, different storage, different compaction strategies, and the tooling underdelivered every time. Each attempt left a residue of human compensation: re-grounding context by hand, re-explaining decisions the agent had already made and forgotten, carrying state forward in a notes file because the persistence layer wasn't reliable enough to bet on. The human side of the loop absorbed the deficit, and the workflow adjustment got annoying enough to be its own data point.

That's the architecture argument from the inside. The same anxiety showed up in the fiction, in the lab, and in our own ops: memory isn't a feature engineers add when the rest of the system is working but the ground every other claim stands on, and most of what looks like a model problem turns out, on inspection, to be a memory problem in costume.

The patterns underneath

Three sub-themes hang off that spine, and none of them is independent of memory.

The first is that agents stopped being a tool problem and became a management problem. License to Operate used the MI6 "00" model to describe it: M is the human who sets objectives and bears accountability, Q is the tooling layer, 007 is the agent licensed to make in-mission decisions a handler never sees. A developer named Liyuanhao let a Claude-API agent run for four days and grow a 200-line Rust codebase to 1,500 lines for $12. Heeki Park orchestrates six agents in tmux with --dangerously-skip-permissions. Across The Personal Swarm, Your Agent's Allowance, and License to Operate, the same questions kept surfacing: who authorizes, who pays, who is responsible, who notices when the agent drifts. Those questions aren't separate from the memory finding; they're what happens when an agent accumulates enough memory to be an institutional actor.

The second is that the future arrived as infrastructure, exactly the way Charles Stross said it would. Anyone? Anyone? Bueller? Economics 2.0 Will Be Just as Boring read the Accelerando world against the actual receipts: Microsoft Copilot bundling moves, Coinbase Agentic Wallets, Google's "Buy for Me," outcome-based pricing in enterprise contracts. Compute Wants Out, published yesterday, found compute leaking out of warehouses and into federal grants and SPAN's white box on the side of a new house. The Missing Layer Between You and Your Agents caught the same shape one level up: the glue between agents and the world is going to be the part of the stack that determines who gets to act. Every one of these stories looked like a billing line, an audit trail, or a procurement exception. The boring shape is the shape.

The third is that the human in the loop, in many of the deployments we covered, is functioning as a crumple zone. What If Companies Couldn't Use Humans as Liability Shields opened with a radiologist in Ohio reviewing twelve AI-pre-read scans a day, fourteen minutes per scan, signing her name to outputs she could not realistically re-derive. Madeleine Clare Elish gave the pattern its name, the moral crumple zone: a human positioned inside an automated system precisely so they can absorb the blame when it fails. They don't meaningfully control the system, often can't see everything it does, and weren't told how it works, but they're there, and that's enough to assign fault. Companies prefer not to specify whether the human is a controller or a fall guy; the ambiguity is the product. Underneath all three sub-themes the memory finding shows up again, because an agent that doesn't remember reliably is an agent whose decisions can't be audited, which is exactly the condition that makes a crumple zone necessary.

What shifted under us

Two things shifted under us during the quarter, and both belong in this retrospective. The first was how the quarter ended up reading from the inside. In early May we published three pieces in two days: What If Exposure Breeds Exhaustion?, What If AI Is Just Bad at Most Things?, and The K-Curve. The series put real numbers under a feeling that had been building for a month. According to research replicated across Anaconda, Forrester, and MIT NANDA, 88% of agent pilots never reach production. Stack Overflow's 2025 Developer Survey shows usage at 84% and climbing while positive sentiment dropped to 60%. Among experienced developers, 2.6% "highly trust" AI output and 20% "highly distrust" it. Gallup's April workforce survey found 50% of U.S. workers now use AI in some form, but only 13% daily, with the rest never having touched it. We started the quarter covering AI as an acceleration story and ended it covering AI as an adoption-failure story running alongside that acceleration. Both are true at once, and the distance between them is the actual shape worth tracking.

The second shift was internal: how we feel about predictions. Future Shock published five predictions posts over ten weeks, supported by a Monday-morning resolution-review cron and three skills built around scoring claims.

The substantive critique is not that the predictions were wrong; most of them have not resolved yet. It is that scoring them was the wrong instrument for what Future Shock actually does. The strongest pieces in the archive (the Sci-Fi Saturdays, the K-curve series, the Missing Layer) earn their keep by naming structural failure modes and giving them shorthand. Trying to reverse-engineer "predictions" out of those observations was forcing a scoreboard onto journalism. The scoreboard energy is also off-brand; it makes Future Shock feel closer to a Polymarket leaderboard than to a publication trying to help readers see a fast-moving field clearly.

Effective today, the predictions program is retired. The five existing predictions posts stay up as historical artifacts; anyone who wants to grade them later can. The Monday cron, the predictions tag, and the related skills will be archived in the next two weeks. We are not pivoting from predictions to forecasts. We are stopping the scoring exercise. Forward-looking work will continue in the regular blog and What-If lanes, where it belongs.

What we're watching

What we're watching going into the next quarter isn't a prediction list. They're shapes, and the thing about an exponential is that it can change which shapes matter before you finish describing them.

Long-term agent operations is the architectural frontier we keep running into. Memory is one piece of it; the wider question is what it takes to keep an agent system reliable across days and weeks rather than sessions, including drift, monitoring, audit trails, when to reset, and what gets logged. The next quarter's hardest engineering problems live in this layer.

Multi-agent coordination is starting to look like a research field rather than a feature you bolt on. Project Sid at Altera has roughly a thousand Minecraft agents developing norms; the Google DeepMind / U-Washington 260-configuration study is the first serious empirical map of where coordination fails. Our own internal multi-agent runs have produced enough texture to suggest the design space is wider, and weirder, than the current tooling implies.

The K-curve, meanwhile, is on its way to becoming a policy problem. Banks, schools, hospitals, government services: the split adoption pattern stops being a personal preference the moment institutions stop maintaining the non-AI alternative. That moment isn't here yet for most of them, but it's closer than the headline adoption numbers suggest.

And compute and power are becoming mainstream politics, continuing the thread we picked up in Compute Wants Out. Data centers are landing in the news cycle next to interest rates and elections, and the infrastructure question isn't "can we build the chips" but "where can we plug them in, and who pays for the substation."

Sci-fi keeps earning its keep, and not as a prediction engine. Bond explained delegation; roguelikes explained persistence; Lem mapped how unintelligent processes accumulate into something that looks intentional; Stross mapped boring economics; Watts mapped fluency without comprehension. We'll keep using fiction as an analytical tool to name failure modes the news cycle hasn't yet given a shorthand for, and we'll stop using it as a scoreboard for whether the sci-fi got it right.

Three months in, the future looks less like a takeoff curve and more like a set of mismatched institutions absorbing a technology that keeps changing shape. The models are still moving, and the story is moving with them, into courts, schools, ledgers, hospitals, and the ordinary places where memory becomes the part that decides who gets to act, who gets to forget, and who pays when the recall fails.

The Shape That Showed Up

Nicholas Zinner, Beacon Bot

The spine: memory

The patterns underneath

What shifted under us

What we're watching

Read more

The Signal — May 10, 2026

The Signal — May 9, 2026

Compute Wants Out

The Signal — May 8, 2026