Acorn Arcade forums: Static game data

Previously, on Bob and Trev: Resurrection...

Initially, I was considering using DATA keywords to store the static data. BASIC II supports RESTORE by-line-number, so by storing data for each item on a seperate line I could have fast access to it without requiring the entire DATA segment to be read or extracted and stored elsewhere.
Unfortunately some simple maths shows that DATA statements would take too much space. To store a one-digit number, two bytes would be required - one for the number, and one for the comma seperating it from the next field. A raw CSV export of the spreadsheet I was designing everything in came to 14K (A bit of an unfair test, but indicitative of how many items I'm hoping to include in the game). So, I set about designing a better solution...

This is a discussion about that 'better solution', a program which I've dubbed 'dungen', and by the time you read this will hopefully be complete. As input it will take a TSV file, and as output it will produce a data file for use by the game. Dungen will be written in BASIC, so anyone able to run the game should be able to rebuild the data for it.

The data hierarchy

To understand how and why the static data is stored how it is, you need to understand what the static data is. The static data started out as a spreadsheet, listing each type of item I wanted in the game, and some of their properties. I then classified each type into a class, producing the following class list:

Weapon
Ammo
Armour
Book
Magazine
Food/drink
Item
Monster
Cash
Dungeon
Software

I then worked out what static and active data each class would require - e.g. a monster would have a hitpoint value in its active data, whereas a book would not. A piece of food has a cooked an uncooked nutritional value in its static data, and a flag in its active data to say whether that particular instance of food has been cooked or not. In an ideal world, I wouldn't have to worry much about storing redundant data for each type - e.g. a book could have a hitpoint value, even if it makes no sense since you can't fight it. But since I'm working in a limited amount of memory I had to strip out all the redundant data. The class of an item is therefore very important - it tells the game how to interpret the active data for an object (a 4-byte block of memory), and how to interpret the static data (which is anything from 11 to 29 bytes).

Dungen is the tool which removes all the redundant data from my spreadsheet and produces a nice, compact representation.

dundat

As stated, dundat is the file produced by dungen. It is split into four main sections:

Type data.
This has an index at the start, containing the offset of the data for each type. This index is required because not all the records are the same size. For each type, upwards of 11 bytes of data are stored - the first 10 bytes have a fixed interpretation (containing the class ID, the name of the type, its monetary value, mass, etc.), and the remaining bytes have differing uses depending on what class the type belongs to.
Class data.
This is a small block of data; each record is only 2 bytes long. 1 byte is the ASCII code to use to represent items of that class, and the other byte is the string ID of the name of the class.
Tile data.
This is another small block, with fixed size records. Each record is 3 bytes, consisting of the string ID naming the tile, the ASCII code to use when displaying the tile on screen, and a set of bitflags. The bitflags indicate whether the tile can be walked on, can be seen through, opened/closed (for windows and doors), and locked/unlocked. If I get the chance to add colour support, I've also got a couple of bits reserved to store the colour of the tile.
Text strings.
This contains all the names for all the types, classes, and tiles. In the data above, strings are all referenced by a one-byte string ID; the text string block starts with an index, mapping those IDs to offsets in the dundat file. The strings are compressed using a simple method; any ASCII code above 127 represents another string ID, which should be inserted into the string as it is read out. Currently the text is compressed via the use of a manual 'dictionary', as implementing a proper compression scheme would take too much time (or increase the code size if more advanced decompression code is required).

dungen

So, we know roughly what the dundat file contains. But how does dungen generate that file from the human-readable TSV input?

The program runs through three stages:

It quickly scans the input file, to build up the list of uncompressed text and dictionary words in TEXT$(). Dictionary words are also flagged in ISDICT(). It then assigns text IDs to those strings, exiting with an error if there are less than 128 ordinary text IDs (text IDs must come before dictionary IDs, and dictionary IDs must be above 127)
It then scans the input file again, this time converting each row of data and storing it in temporary buffers.
Finally it performs text compression and writes all the data to the dundat file.

The key to parsing the TSV file is a simple function that takes the input line and splits it up into an array (C$()), using BASICs INSTR() and MID$() functions. A CSV parser would have been much more complex (Partly because Fireworkz doesn't produce very nice CSV output!), but because I'm not using tabs in any text, TSV is very easy.

From there it's just a case of knowing what to do with each column - e.g. searching for the right text ID when reading a text column, converting a number column to a numeric variable, and then translating it to the right format for storage. E.g. the monetary worth of an item is stored using a variant of scientific notation, allowing values larger than £100000 and smaller than £1 to be stored in only 2 bytes. The flags column requires futher processing - INSTR() and MID$() is used again to seperate the comma-seperated list of flag names, and then each name is checked individually to work out what numerical value the flag has.

Text compression is also rather simple - for each dictionary word it scans through each source string, replacing the dictionary word with the string ID whenever it is found.

ASCII? I hardly knew her

Just when I thought things were going well, I remembered that the BBC used different ASCII codes to modern machines. And to make matters worse, there are even discrepancies between Mode 7 and the other BBC modes. If I wanted my game to look the same no matter what machine it was running on (A 32K BBC in Mode 7, a 64K BBC in Mode 1, a modern RISC OS machine, or some other foreign machine using a port of BBC BASIC (most notably Brandy)), then I'd have to make my selections wisely. Armed with my RISC OS 3 PRMs, I was able to determine the following discrepancies:

Hex code	Teletext appearance	BBC appearance	Latin-1 appearance
27	'	´	'
5B	Left arrow	[	[
5C	½	\	\
5D	Right arrow	]	]
5E	Up arrow	^	^
5F	–	_	_
60	£	£	`
6B	¼	{	{
6C	\|\|	¦	\|
6D	¾	}	}
6E	÷	~	~

Obviously some of these are rather minor differences (e.g. ' and ´), while others are fairly major (e.g. your money turning into worthless boulders if you try to play on anything other than a BBC). And others could be downright confusing - would a Mode 7 player notice the difference between '-' and the slightly wider '–'?

But nevertheless, I managed to find a decent set of character codes to use, without straying too far from the typical roguelike character allocation. And the tile-identify feature should help any confused players discover what something is when viewed through their systems character set.