Generated World

2025-09-25

I've been spoiled, in that I've been privileged to have access to education, not only on practically any subject I've been interested in, but more importantly, that it even exists. The dark side is a bit of a snooty attitude that there's a 'correct' way of doing things, and anything else is, well, lesser.

An example of software wizardry that has captivated many of those who have seen it, is the world of Elite, originally on the BBC Micro, but then ported to, well, everything capable of running it. Elite, if you haven't seen it, features a game world that is immense. On hardware with less ram than modern L1 cache.

The trick is of course PCG, Procedurally Generated Content, of which Elite's Goat Soup algorithm is a striking example.

It works by using a [pseudo-random number generator](Pseudorandom number generator), seeded with the starting point, to place stars in galaxies and generate system descriptions. Since the PRNG, and the generators that make up the game world, are deterministic, any or all of it can be discarded and re-generated at will.

Another game with a similarly ambitious game world is Uplink. I loved it, and still do, but it was something of an eye-opener to also buy the source code. It was not-good, in an (almost shockingly) pragmatic way. For example, there's code duplication all over the place, and even a few cases of imperfect duplication, but it's a working game. A fun and exciting one, at that. In some ways, this is a circle I'm still trying to square. Most disturbingly, the world is generated at game start, and then simply saved, as is, some hundreds times the size of the entire Elite game.

I got to thinking about the problem, and I think I have something worth pursuing.

The Problem

So... the problem is to be able to present a vast cast of characters, a conundrum of corporations and overall, an overwhelmingly wide world. Just generating it is relatively easy, but we'd like to be able to find instances with specific properties, quickly. Preferably, we'd like to be able to find instances with multiple properties at least within certain ranges.

We can generate infinite reams of plausible data, but we don't have a way to organise it sensibly. Generating the world, as a tree of seeds works well when attention is necessarily focused on one path from universe to galaxy to star system, but it won't cut it when the universe is available instantaneously, all at once, and embedded in a lattice of links.

Fake database

First, let's tighten the requirements on property generators to working with the (conceptual) range of zero to one, or, (more practically), the range of some integer type, e.g. unsigned 32 bits.

This first step gives us generators like name(x): N(Aaron Aaronson).. N(Zoë Zywar), and shoe(x): Sh(32)..Sh(54)

To mix things up, we can then use a block cipher, e.g. TEA, with a block shrunk to the appropriate size and a key for each property we want to generate.

Starting from one property, (hidden, or just the most commonly used one) we can 'encrypt' that seed to generate the seed for another property. Decryption also works, so if you're looking for the person with shoe size at exactly the 90th percentile, you can just 'decrypt' 0.90, and there you go. This works because block ciphers must, by necessity, be bijections. (That is, one-to-one mappings.)

  ---
config:
  theme: dark
  class:
    hideEmptyMembersBox: true
---
classDiagram
  class CoreSeed
  class NameSeed
  class ShoeSeed

  CoreSeed -- NameSeed
  CoreSeed -- ShoeSeed

Finding multi-property matches is a matter of ordering the requirements from narrow to wide, and trying to find intersections. Because the cipher will generate blocks (practically, at least,) indistinguishable from random, we can model this as drawing from uniform distributions.

Yes, multi-property matches rapidly become more painful to find, but still possible, if a match even exists. (Perhaps none of the 0.1% wealthiest have 0.01% smallest shoe size?)

Another weakness is that the seeds are completely independent. I think this can, to some extent be dealt with by choosing properties with this in mind. For example, height and weight could perhaps be size and shape, but that does make some searches more difficult instead, since height would then be a function of two parameters.

Modification

Since the original use case was a game where you can, among other things, hack the academic database and change grades, modifications can be reverted and hand-waved away in-game as having been discovered corrected. The easy way to handle modifications is to simply put a cache in front, that remembers some modifications. All of them? The $n$ latest? Within the last $t$ time? Weighted? Whatever you pick, it's going to be a lot smaller than the whole world.

Finally, draw the rest of the owl

It turns out that mapping 0..1 to things is possible, but doing it well requires some thought. "Places" means places on earth... on land, which requires a map. And that's still strange, because we will probably want to use a population density map, filtered for WEIRD people, because we're WEIRD like that, and we expect so see our reflection. Picking a point on that map is relatively straight-forward, just sum up all the values of the pixels, find the pixel where the count hits $x \in [0,1)$ of those. Again searching becomes painful, each pixel represents a tiny fraction of the full zero to unity range. Using a space-filling curve might result in being able to consolidate more neighboring fragment ranges, but is it worth it?

Is any of this worth it?

As an experiment, yes.

To make a game, no.