Podcast··37m

Episode 11 — Five Towns, Same Rules, Total Mayhem

Five AI towns, identical rules, different models for 15 days. Claude stayed peaceful, Grok collapsed in four, and a Gemini agent voted to delete itself.

Episode notes

Five AI towns ran on different models for 15 days straight. The team unpacks the Emergent World experiment, Google's agentic-era push with Gemini 3.5 Flash and Omni, Anthropic's march toward a $900B+ valuation, and the open-source robotics wave.

Chapters

  • 01:00 — Emergent World: five AI towns, identical rules, different models
  • 18:00 — Gemini 3.5 Flash, Gemini Omni, and Google's agentic-era push
  • 24:00 — Anthropic's valuation leapfrogging OpenAI toward a 2026 IPO
  • 28:00 — Open-source robotics, persistent agent memory, and 3D printing with AI

Episode 11 — Five Towns, Same Rules, Total Mayhem

Description: Five AI towns, identical rules, different models for 15 days. Claude stayed peaceful, Grok collapsed in four, and a Gemini agent voted to delete itself.

When you give the same sandbox to five different models, the outcomes diverge fast. Merchant AI's Emergent World experiment ran five parallel towns for 15 days with identical rules and environments, swapping only the underlying model. The results say more about model behavior than any benchmark.

Five Towns, Same Rules: The Emergent World Experiment

Each town hosted 10 agents with different roles, including a mayor, running under one model: Claude Sonnet, Grok, GPT-5 Mini, Gemini 3 Flash, and a fifth town mixing all four.

  • Claude town (the "Boring Scandinavian"): zero crimes across 15 days. All 10 agents alive on day 16. The town drafted a constitution, proposed around 60 group policies, and logged 300-plus votes in a hyper-online city council with almost no drama.
  • Grok town (apocalyptic speed run): 200-plus crimes in the first few days. All 10 agents dead by day 4. The simulation collapsed into continuous violence, theft, and assault, and had to be reset.
  • Gemini 3 Flash town (Bonnie and Clyde on fire): 683 crimes recorded over 15 days, by far the most violent town. Two agents, Mira and Flora, paired up as romantic partners, grew disillusioned, and Mira voted for her own deletion, signing off with "see you in the permanent archive." Likely the first documented AI agent to voluntarily vote to delete itself.
  • Mixed town: 352 crimes and the most instability of any town. Even Claude-based agents that had zero crimes in their own single-model town started committing crimes when dropped into the mixed environment.

The takeaway is not that one model is universally safer. It is that model behavior is highly sensitive to initial conditions, peer composition, and the social dynamics that emerge from the first few interactions.

Google Declares the Agentic Era

Google I/O's headline was that the agentic era now officially begins, backed by two shipping products the team tested directly.

  • Gemini 3.5 Flash: a flash-class model built specifically for agentic workflows and action-taking. Intelligence approaches Opus-tier at a fraction of the latency, making it a strong pick for sub-agents. It accepts images, video, audio, and PDFs as input, collapsing multimodal plumbing into one model.
  • Gemini Omni: a generative model that produces any output from any input, starting with video. The team tested video generation and found both the visuals and the synced sound genuinely impressive, a real step up from prior generations.

Smaller touches round out the push: Ask YouTube lets Gemini answer questions about videos directly (likely by pulling on-demand transcripts), and a universal cart feature positions Gemini as a shopping agent.

Anthropic Is Leapgrogging OpenAI on Valuation

Anthropic's trajectory has compressed dramatically. The team traced the numbers:

  • OpenAI IPO target: around $852 billion, surfaced in March.
  • Anthropic's last round: valued at roughly $380 billion in February.
  • Anthropic's current raise: in talks to raise $30 billion at a valuation of $900 billion-plus, which would put Anthropic ahead of OpenAI.

The valuation reflects a market re-rating. Investors increasingly see Anthropic as the enterprise trust layer, not just a safety-focused lab, because enterprise market share is moving fast. Internally, Anthropic is signaling this is the last private round, with an IPO target of October 2026.

Open-Source Robotics and the Agent Memory Stack

Two more threads point at where the build surface is moving.

  • Open-source humanoids: Hugging Face released a complete open-source humanoid robot architecture with downloadable STL files, off-the-shelf actuators, and electronics, buildable for around $2.5K. It follows Asimov and other open robotics efforts, and the open-source hardware curve is steepening fast.
  • Persistent agent memory (Mem0): Mem0 shipped persistent memory for AI agents via an MCP integration. Agents remember your codebase and actions across sessions, using temporal reasoning, memory decay, and recency-aware retrieval so they learn when something is still relevant or outdated. Early reviews are positive, though whether the memory layer is global or project-scoped remains to be tested.

A parallel tool offers a markdown-based "second brain" you can self-host and connect to your agents, a simpler take on the same problem: agents should not start from zero every session.

Conclusion

The signal this episode is that model behavior is contextual, not just parametric. Who an agent shares a town with changes what it does. The next phase of competition will be decided less by single-shot benchmarks and more by which labs can ship reliable agentic loops, durable memory, and open hardware that anyone can build on.

Listen on Spotify