I Asked Claude to Rebuild a 1996 Game. It Didn't Stop There

I Asked Claude to Rebuild a 1996 Game. It Didn't Stop There

I've been a senior full-stack engineer for 17 years. I don't know machine learning or biology, but I do know how to keep complex systems from collapsing under their own weight.

I remember playing Creatures in 1996 — a game where you raised digital beings with neural networks for brains and biochemistry for emotions. Even as a kid I knew it was doing something other games weren't. Last year I wondered: could Claude rebuild it? If it could, I'd handle the architecture and let Claude handle the science.

It turned out it could — and thanks to modern computers, it could take the concept even further than the original.

This is the second post about that project. The first one covered the autonomous orchestrator I built to let Claude ship code overnight while I put my kids to bed. Together, the orchestrator and Claude built everything you're about to read about — under my direction, but well beyond my domain knowledge.

TL;DR

  • I asked Claude to rebuild Creatures, a 1996 game with neural networks and simulated biochemistry. It did.
  • Then we kept going. The creatures now have 79 chemicals, 16 brain regions, 305 genes, mirror chemistry, prediction-error curiosity, cultural dialect formation, and heritable personality.
  • Creatures evolve to fit their environment, develop regional dialects, learn language from each other, and pass appearance traits through hue that drifts visibly across generations.

What I started with

The premise was simple. Creatures in 1996 had creatures called Norns. They had simulated brains. They had a chemical system that drove emotions. They could learn words, recognise objects, and even breed. Steve Grand, the original designer, was a generation ahead of his time.

After getting Claude to build the world's biggest knowledge base of a 30-year-old niche game no one's heard of, I set it loose. I came back hours later to a family of blobs roaming the screen. They died, of course — there was no food and they starved.

That was the moment I realised I was going to keep going.


Architecture, before features

Most of this post is going to be about what the simulator does. Before any of that, a section on what it's built on — because none of those features would work if the foundation wasn't disciplined from day one.

This is a multi-month project with hundreds of source files and dozens of interconnected systems. The chemistry interacts with the brain. The brain interacts with the genome. The genome interacts with breeding. Breeding interacts with the world. Every new feature touches multiple existing systems. Without architectural discipline, a project this complex turns into spaghetti within weeks — and Claude, like any junior engineer, will happily produce that spaghetti if you let it.

So I made architectural decisions early and held the line on them across the entire project. A few of the ones that matter:

The engine and the frontend are separate codebases. The simulation logic — chemistry, brains, genes, behaviour — knows nothing about how it's displayed. The engine is pure TypeScript with no DOM dependencies, runs in a Web Worker at 10 ticks per second, and communicates with the UI only through structured messages. The frontend just consumes engine state. This means the same engine runs in a browser or in a Tauri desktop app today, and could run on a server with no display at all — the architecture permits it, the headless harness is on my list.

Strict CQRS at the worker boundary. Commands (things that mutate state) are fire-and-forget. Queries (things that read state) are request/response. Events (state changes worth knowing about) are pub/sub. The simulation tick is a pipeline of commands. The frontend uses queries and events. They never share mutable state across the worker boundary. This is what makes the simulation deterministic, what makes save/load possible without subtle bugs, and what makes the Web Worker isolation actually work — the UI can't accidentally reach into the engine's internals.

Domain-driven design, applied to both sides. The engine is organised around domain concepts — brain, biochemistry, creature, genetics, world — not around technical layers. The frontend mirrors this: a domains/ directory containing brain, creature, world, chat, gadgets, analytics, economy, and simulation. Each domain owns its data, its UI, its hooks, and its tests. Cross-domain interaction happens through explicit interfaces. When I ask Claude to add a new chemical, the change is contained to the biochemistry domain on the engine side and the relevant domain on the frontend. Systems don't bleed into each other.

The chat domain is a good example. It has its own conversation model, KV-backed persistence, LLM infrastructure with JWT auth running on a separate server, and a tool protocol exposing functions like designSeed, listCreatures, getCreature, and analyzeWorld. All of that lives inside domains/chat. Other domains don't know it's there.

Cross-cutting features stay separate from domains. A small number of things aren't part of any single domain — import/export, dev-only panels like the animation test. They live in a features/ directory rather than being shoehorned into a domain they only partially belong to. A shell/ directory holds the app frame (dock, sidebar) that composes everything together. The split keeps domain folders clean: a domain contains only what's intrinsic to that domain, not every UI surface that happens to touch it.

A composition root that does nothing. The top-level page just wires shell, domains, and features together. It contains no logic of its own. Every piece of functionality lives in a domain, a feature, or the engine. This is the same principle as keeping main() empty in a well-architected backend service — the entry point is for composition, not for code.

packages/
├── engine/      # Pure TS simulation, runs in a Web Worker
├── frontend/    # React app, organised by domain
├── chat-server/ # LLM-backed chat backend
└── desktop/     # Tauri (Rust) wrapper

None of these patterns are exotic. They're standard from enterprise software development. The point isn't that they're clever — the point is that they're applied, consistently, even when adding features would have been faster without them. I refactored the frontend midway through the project when I realised it had drifted from strict DDD. Catching that drift and acting on it is itself the discipline — left alone, it would have cost me months down the line.

This matters because the AI doesn't impose architecture; it inherits whatever architecture is already there. If the codebase is well-structured, AI-generated changes stay well-structured. If it's chaotic, every AI contribution adds to the chaos. The thing that separates a side project that works for a weekend from one that's still extensible six months in is whether someone is enforcing architectural discipline. That part isn't optional and it's not something you can delegate to the AI.

This is the part of the project that's most invisibly mine. Claude wrote the chemistry, the brains, the genetics. I made sure none of it could become an unmaintainable mess.


The creatures themselves

The design started with a constraint: each creature needed to be visually distinctive enough that you'd recognise individuals, but generated procedurally so that no two were identical and so that breeding could meaningfully inherit appearance.

I based the creature designs on leaf hoppers — small insects with extraordinary variety in head crests, body shapes, and colouring. There are thousands of species, and the morphological vocabulary is rich enough to give every creature a recognisable identity. Each creature belongs to a species category (the underlying model), with a head crest shape from a constrained set, and a hue value that determines colouring. The art pipeline itself is a separate story — AI image generation feeding into AI 3D model generation feeding into automated animation feeding into a Claude-driven sprite renderer — and probably a post of its own.

Children inherit the species model and head crest from their parents, with small mutation chances on both. Hue is the interesting one — it inherits as a blend of both parents with drift. After many generations of a small isolated population, the hue can shift noticeably away from the founders. You can look at a creature and tell which lineage it descends from. Drift is slow enough to feel natural, fast enough to be visible within a play session.

The visual effect is striking. A founder population of bright green creatures, kept in isolation, will slowly turn slightly blue or slightly yellow over generations depending on which way the random drift happens to walk. Two isolated populations diverge visually as well as behaviourally. When they eventually meet, you can see the genetic history.


How they evolve to fit their environment

The world has multiple floors, and each floor has its own temperature, oscillating seasonally. Lower floors are warmer (geothermal). Upper floors are cooler. The middle floor sits at the comfort baseline.

Temperature affects creatures through chemistry. Outside the 0.4–0.6 comfort range, cold or hot floors inject "coldness" or "hotness" chemicals into the creature's bloodstream. These translate into drives that make the creature uncomfortable. At extremes, temperature also injects pain and forces sleepiness. Cold creatures burn extra glucose just staying warm, so they eat more.

A creature's tolerance to temperature isn't a single number — it's the combined effect of many genes. Metabolism rate. Organ resilience. Attention bias to the coldness or hotness drive. How strongly fear-of-discomfort drives them to flee unpleasant rooms. Each of these mutates independently.

Over generations, populations should adapt. A group of creatures that thrives on a hot lower floor isn't just "creatures that happen to live there" — they're creatures whose genetic dice landed in the configurations that don't suffer there. Their offspring inherit those configurations. Their offspring's offspring refine them. The seasonally oscillating temperature is what makes this interesting. A floor that's comfortable in "summer" might be brutal in "winter." Creatures that survive a winter on that floor are filtered for cold tolerance. The next generation inherits more cold tolerance on average.

This is real evolutionary adaptation happening in a browser, observable in tens of minutes of play. It's not pre-scripted. The creatures that survive determine what the next generation looks like, and what that generation looks like determines who survives the next round.


Empathy through chemistry

The straightforward way to make a creature "empathetic" is to write a rule: when you see another creature hungry, share food. That works, looks reasonable, and produces dead behaviour.

What Claude built instead is mirror chemistry. When a creature's perception neurons detect another creature in pain or hunger nearby, that observation triggers a chemical injection in the observer. Their own discomfort rises. Their own adrenaline spikes slightly.

The brain now experiences a problem. Discomfort is bad. The brain wants discomfort to go down. There's an available action — Share — that transfers some of the observer's glucose to the suffering creature. Sharing reduces the recipient's hunger, which removes the mirror-chemistry injection, which reduces the observer's own discomfort. The sharer also gets a small endorphin reward.

Through ordinary reinforcement learning, creatures discover that sharing food when they see hunger nearby reduces their own discomfort. The behaviour that emerges looks like empathy. Mechanically, it's selfish — the creature is reducing its own bad feeling. Claude pointed out that this is similar to how some neuroscientists describe the foundations of mammalian empathy: observers feeling the distress of others as a form of their own distress.

Creatures don't share with everyone equally. They have social memory — up to 8 other creatures tracked individually, with familiarity that grows over time and valence that shifts based on history. A creature that's been shared with builds positive valence toward the sharer. The relationship strengthens. The brain's neural pathways for that specific individual carry warmer signals. You get genuine pairs and trios that prefer each other's company.


Regional dialects

Each creature has a vocabulary of about 32 words: verbs (eat, sleep, share), nouns (food, toy, danger, friend), and drives (hungry, tired, scared, lonely). Word knowledge isn't stored in a lookup table — it's encoded in neural connection weights between the language region of the brain and the regions handling decisions, attention, and drives.

Creatures learn words through co-activation. When a creature hears "food" while it's looking at food, the connection between the "food" word neuron and the "food" object neuron strengthens. Repeated exposure builds the association. This is Hebbian learning — the same principle that underlies a lot of biological brain development.

Where this gets interesting is that creatures don't only learn from the player. They learn from each other. When one creature speaks, nearby creatures hear, and the speaker's word-object associations transmit (weakly) to listeners. You teach one creature what "food" means. They wander off and say "food" near food in front of other creatures. Those creatures pick it up.

There's a cultural-dialect mode in the simulator that takes this further: creatures learn vocabulary more readily from social peers they trust than from strangers. Familiarity acts as a multiplier on language acquisition. A complete stranger teaches at half strength. A bonded peer teaches at full strength.

The emergent effect is dialect formation. Take two populations and keep them physically separated for a few hours of simulation time. They develop distinct vocabularies. When creatures don't know a canonical word, they invent one — deterministic nonsense syllables seeded from their identity. Their kids learn the invented words, not the canonical ones. Over generations, isolated groups settle on different word-for-thing mappings.

When the two groups eventually meet, they literally can't understand each other. They have to build social bond first — spend time together, share food, develop familiarity — before vocabulary starts to spread across the boundary. Elder creatures with rich vocabularies are the most effective teachers but only to creatures who trust them. When an elder dies before passing their vocabulary on, it dies with them.

This is a working model of language extinction, in a browser, emerging from social familiarity and Hebbian learning. Nobody coded "form a dialect."


Speech that grows up

Creatures don't just learn words — they learn how to combine them.

Newborns babble. Random syllables. As they pick up vocabulary, they progress through telegraphic two-word phrases: "hungry eat," "tired sleep," "food good." Eventually, with enough vocabulary, they shift into templated sentence frames: "I want food because i am hungry," "I sleep now i play," "Look you food."

The transition isn't scripted — it depends on how much vocabulary the creature has actually acquired. Slow learners stay in babble longer. Fast learners progress to sentences earlier. Creatures with damaged language brain regions (from chemical toxicity or organ stress) regress. You can hear the linguistic age of a creature from the structure of what they say.

Closed-class words — "I," "am," "you," "because" — are scaffolding the brain doesn't have to learn. Claude built them as filled-in templates that kick in once the creature graduates to sentence frames, modelled on how human toddlers acquire grammar: content words first, function words later.


Curiosity that pulls them toward the unfamiliar

One of the optional brain regions is dedicated entirely to predicting what the creature's own drives will do next. Before each action, the prediction region outputs a guess: "I think hunger will go down by 0.04, fear will go up by 0.01..."

After the action runs, the actual changes are compared to the prediction. The magnitude of the error becomes a signal. High error means the situation was surprising — the creature's model of the world was wrong. That surprise triggers a chemical injection of "novelty," which makes that kind of situation feel salient and worth seeking out.

The behavioural effect: creatures naturally gravitate toward situations they can't yet predict. Once a creature has spent enough time in one kind of place doing one kind of thing, its predictor gets good at that situation. The novelty signal weakens. The creature gets bored. It wanders off to find somewhere its predictions are still wrong.

Claude told me this is the same principle ML researchers use for intrinsic motivation in reinforcement learning agents — curiosity as a signal generated by the world-model's failures, rather than a hand-coded exploration bonus. It's a neat trick because it produces exploration that's targeted at things the creature genuinely doesn't understand yet, rather than blind randomness.


What this looks like in practice

You start with two creatures. They're born with faint instincts — enough recognition that food is something to keep them alive in the first few minutes, but not enough to know that eating it makes hunger go away. That part they have to learn. They wander, they stumble onto food, they eat, their hunger drops, the reinforcement signal fires. After a few cycles the association sticks. Now they know.

You can also teach them. Say "food" while they're looking at it, and the word-object connection strengthens through Hebbian learning. They start to recognise the word. Eventually they can hear "food" and orient toward it without seeing it. They breed. Their children inherit the same faint instincts but have to learn the same lessons — except now they can also pick up vocabulary from their parents, who say "food" near food.

Maybe one of the early creatures had a mutation toward high cold tolerance. Their descendants spread to the upper floors and thrive. Their hue drifts toward the slightly bluer end of the spectrum, just from random drift. A separate lineage stays in the warm lower floors. Their hue drifts the other way.

The two populations develop different vocabularies — the cold-floor ones might have a unique word for the kind of food that grows on cold floors. They develop social bonds within their group but treat strangers from the other floor with suspicion. If a cold-floor creature eventually wanders down to the warm-floor population — driven by curiosity, hunger, or chasing a mate — it suffers from the heat, can't talk to them, and is treated as an outsider.

If the populations stay together long enough, vocabulary spreads. Social bond builds. They interbreed. Their children carry mixed genes — some cold tolerance, some heat tolerance, hue somewhere in between. The lineages merge.

None of this is scripted. Every step is a consequence of the underlying systems doing what they do.


What's next

The next thing I'm building is a chat interface that lets the user talk to their creatures. Ask "which creatures should I breed for a pink child" and have the LLM reason over the actual genomes. Ask "why is this creature aggressive" and have it analyse the chemistry, the brain weights, the recent history. Eventually let the chat directly control the game — feed creatures, separate them, teach them words — through a tool API.

The orchestrator builds the game. The game runs the creatures. The creatures generate rich behavioural data. The LLM helps the user understand and shape what's emerging.

If you're working on agent systems, multi-agent simulations, or just building ambitious things you don't have formal expertise in, I'd like to compare notes. DM open.