Initially, I was considering using DATA keywords to store the static data. BASIC II supports RESTORE by-line-number, so by storing data for each item on a seperate line I could have fast access to it without requiring the entire DATA segment to be read or extracted and stored elsewhere.This is a discussion about that 'better solution', a program which I've dubbed 'dungen', and by the time you read this will hopefully be complete. As input it will take a TSV file, and as output it will produce a data file for use by the game. Dungen will be written in BASIC, so anyone able to run the game should be able to rebuild the data for it.
Unfortunately some simple maths shows that DATA statements would take too much space. To store a one-digit number, two bytes would be required - one for the number, and one for the comma seperating it from the next field. A raw CSV export of the spreadsheet I was designing everything in came to 14K (A bit of an unfair test, but indicitative of how many items I'm hoping to include in the game). So, I set about designing a better solution...
The data hierarchy
To understand how and why the static data is stored how it is, you need to understand what the static data is. The static data started out as a spreadsheet, listing each type of item I wanted in the game, and some of their properties. I then classified each type into a class, producing the following class list:
- Weapon
- Ammo
- Armour
- Book
- Magazine
- Food/drink
- Item
- Monster
- Cash
- Dungeon
- Software
Dungen is the tool which removes all the redundant data from my spreadsheet and produces a nice, compact representation.
dundat
As stated, dundat is the file produced by dungen. It is split into four main sections:
- Type data.
This has an index at the start, containing the offset of the data for each type. This index is required because not all the records are the same size. For each type, upwards of 11 bytes of data are stored - the first 10 bytes have a fixed interpretation (containing the class ID, the name of the type, its monetary value, mass, etc.), and the remaining bytes have differing uses depending on what class the type belongs to. - Class data.
This is a small block of data; each record is only 2 bytes long. 1 byte is the ASCII code to use to represent items of that class, and the other byte is the string ID of the name of the class. - Tile data.
This is another small block, with fixed size records. Each record is 3 bytes, consisting of the string ID naming the tile, the ASCII code to use when displaying the tile on screen, and a set of bitflags. The bitflags indicate whether the tile can be walked on, can be seen through, opened/closed (for windows and doors), and locked/unlocked. If I get the chance to add colour support, I've also got a couple of bits reserved to store the colour of the tile. - Text strings.
This contains all the names for all the types, classes, and tiles. In the data above, strings are all referenced by a one-byte string ID; the text string block starts with an index, mapping those IDs to offsets in the dundat file. The strings are compressed using a simple method; any ASCII code above 127 represents another string ID, which should be inserted into the string as it is read out. Currently the text is compressed via the use of a manual 'dictionary', as implementing a proper compression scheme would take too much time (or increase the code size if more advanced decompression code is required).
dungen
So, we know roughly what the dundat file contains. But how does dungen generate that file from the human-readable TSV input?
The program runs through three stages:
- It quickly scans the input file, to build up the list of uncompressed text and dictionary words in TEXT$(). Dictionary words are also flagged in ISDICT(). It then assigns text IDs to those strings, exiting with an error if there are less than 128 ordinary text IDs (text IDs must come before dictionary IDs, and dictionary IDs must be above 127)
- It then scans the input file again, this time converting each row of data and storing it in temporary buffers.
- Finally it performs text compression and writes all the data to the dundat file.
From there it's just a case of knowing what to do with each column - e.g. searching for the right text ID when reading a text column, converting a number column to a numeric variable, and then translating it to the right format for storage. E.g. the monetary worth of an item is stored using a variant of scientific notation, allowing values larger than £100000 and smaller than £1 to be stored in only 2 bytes. The flags column requires futher processing - INSTR() and MID$() is used again to seperate the comma-seperated list of flag names, and then each name is checked individually to work out what numerical value the flag has.
Text compression is also rather simple - for each dictionary word it scans through each source string, replacing the dictionary word with the string ID whenever it is found.
ASCII? I hardly knew her
Just when I thought things were going well, I remembered that the BBC used different ASCII codes to modern machines. And to make matters worse, there are even discrepancies between Mode 7 and the other BBC modes. If I wanted my game to look the same no matter what machine it was running on (A 32K BBC in Mode 7, a 64K BBC in Mode 1, a modern RISC OS machine, or some other foreign machine using a port of BBC BASIC (most notably Brandy)), then I'd have to make my selections wisely. Armed with my RISC OS 3 PRMs, I was able to determine the following discrepancies:
Hex code | Teletext appearance | BBC appearance | Latin-1 appearance |
---|---|---|---|
27 | ' | ´ | ' |
5B | Left arrow | [ | [ |
5C | ½ | \ | \ |
5D | Right arrow | ] | ] |
5E | Up arrow | ^ | ^ |
5F | – | _ | _ |
60 | £ | £ | ` |
6B | ¼ | { | { |
6C | || | ¦ | | |
6D | ¾ | } | } |
6E | ÷ | ~ | ~ |
Obviously some of these are rather minor differences (e.g. ' and ´), while others are fairly major (e.g. your money turning into worthless boulders if you try to play on anything other than a BBC). And others could be downright confusing - would a Mode 7 player notice the difference between '-' and the slightly wider '–'?
But nevertheless, I managed to find a decent set of character codes to use, without straying too far from the typical roguelike character allocation. And the tile-identify feature should help any confused players discover what something is when viewed through their systems character set.