What have we got?
RISC OS is documented in many different places these days.
Originally it was documented through the Programmers Reference Manuals ('PRMs'), a physical set of manuals in multiple volumes for each of the major areas. Over the years these have been augmented by updates in the form of volume 5/5a, application notes, specifications for some of the 'new' components that were never integrated into centralised documentation, and information that was noted by people and submitted to the StrongHelp manuals.
The PRMs were released as part of the 'Techie CD', which had a hypertext application which could be browsed on RISC OS. The content was the same as the physical books, with the same page numbers. Later, there were PDFs and HTML versions 1 produced by RISCOS Ltd, which were more accessible, but had their own problems 2.
The application notes and various specifications came out of Acorn which included some detailed documentation which covered usage notes and APIs for 'newer' features. These 'newer' features include such things as the URI/URL fetcher stack, nested window manager, redraw manager, and miscellaneous updates that followed on.
The StrongHelp manuals for SWIs and the like were originally written by Guttorm Vik, and maintained by myself for a few years, before being passed on to other people - they were never official. However, they included many of the fixes and notes about oddities within the PRM and the different implementations within various RISC OS versions. The StrongHelp manuals were the primary place for developers to refer to APIs whilst writing code, with reference to the PRMs when more detail was required. The main 'SWIs' StrongHelp manual contains links to the pages of the original form of the PRMs to aid in this process of escalating gathering information.
There is also a Wiki on the RISC OS Open site, which is very similar to the StrongHelp documentation in its detail and degree of authority.
So we have...
- Original PRMs, in printed, Techie CD, PDF and HTML forms.
- Random specifications produced officially and unofficially.
- Random user-sourced cheat-sheet documentation like StrongHelp (and later the RISC OS Open Wiki).
What did I want?
Back when I was doing RISC OS work in the dim past, after the RISCOS Ltd days, I felt that this situation was not ideal and needed to be addressed. So I tried to work out what the goal was, and what was problematic.
The goal...
- Produce documentation that's viewable on RISC OS and other systems.
- Manage the source as a living document that can be editable by developers, on RISC OS (or other) systems.
- A documentation system that was easy to work with, so that people without experience could produce documentation.
The documentation as delivered by Pace was in FrameMaker format, as that's how Acorn's publications department had managed it. FrameMaker is an Adobe product which was only available for Windows. As supplied, the documentation was in the binary file format, which meant that at any time only one person could work on a chapter 20. We only had a single license for the product, so in any case only one person could work on any part of the documentation at a time.
FrameMaker did have textual formats, but these were not easy to work with, and were not easy to bring into an automation system which we were hoping to use. FrameMaker could output to PDF, HTML and PostScript, which was useful to us for physical manuals if we had wanted to produce them in the future. It's a good system if you happen to be producing manuals for large projects through a central publications group (as Acorn had).
However, we wanted to be able to work on the documentation ourselves, and to be able to farm out documentation to others outside the company if they had an interest (or as contract work). We hoped that this would encourage others who had had complaints about the documentation to be able to contribute.
The problems with this...
- FrameMaker had a steep learning curve for those who hadn't used it before.
- It was important that what was being produced was able to be worked on within RISC OS.
- Not being able to be worked on by multiple people was quite limiting.
- The binary file format meant that changes couldn't easily be managed in the source control system in the same way as the rest of the sources.
- Expecting third parties to purchase the expensive product 18, just to update documentation, was not reasonable.
- The HTML that FrameMaker produced was not good in RISC OS browsers - hence the transformation work to make it usable.
- RISC OS native products like Ovation or Impression would show a preference and be prejudicial to others.
This is where I began to look at an XML format with transformations through XSLT.
DocBook become a thing that people were using, so I had a look at it and ... it's quite generic, a little weird in places 6, and had a steep learning curve. But because it was generic, it didn't have any specialised layout for the style that I wanted for the PRM - that would have to be built on top. You'd essentially be writing XML elements that were for DocBook styled to look a particular way.
There is documentation for DocBook and you'd be able to produce something useful, but you're still not capturing the essence of the RISC OS documentation in a useful way - I'd become convinced that the benefit in using XML was not so much in the fact that I could use it to produce readable documentation, but that it was capturing structured information about how the APIs worked. That is, not only could you write the XML and get out a page of readable documentation at the end, but that the XML that you wrote could be cross referenced, indexed and even converted to other formats easily.
I'd been tinkering with other things at the same time. Thoughts about how modules might become self-documenting were going around my head - if a module could report information about its SWI interfaces in a structured manner, then in-system lookup of APIs, and debug information for tools became possible.
Toolbox already had its own format for describing the toolbox SWI interfaces, which were then built into C/ObjAsm files that were used in
toolboxlib
. If those definitions were built into the module, we could always extract them from a running system. And things like the toolbox test applications (!ResTest) could use toolbox event information from the module to present messages about what was going on.The BTS system 7 recorded every piece of information about modules when an abort happened. You could later show that information on another system. If the modules that were running reported their API's parameters, then it would be possible to report the meaning of those recorded registers within a diagnostic dump.
Creating PRM-in-XML
All of these things were pointing to having structured information about interfaces and data structures 8.
This produced the idea of using an XML representation to hold the structured information in the PRM. Having separate tags to describe SWIs, vectors and events, etc, with information about the register usage being structured within that. This seemed like a workable solution to me, and I set about trying to represent the content of a few pages of the PRMs in an XML format.
Initially it was a bit HTML-like, but some elements were culled - most of the elements which were related to formatting were replaced by semantic elements 21. The only formatting elements which remain are for stress within the prose, line breaks, and for super-script and sub-script, as these are often needed to explain more maths-heavy sections.
As I was working on it, I created the XSLT - the transformation that turns it into HTML - so that it could render the elements using tables. That meant that it worked in the browsers of the time. The transformation is important to make it look presentable, but the point of the XML format is that if you want it to look different, or you want to extract just certain information from the files, you can do so with whatever XML tools you have. And there are a lot of tools for doing that.
There are actually a few transformations that were working, to varying degrees:
- There's the HTML transformation I've mentioned, which used tables and inline formatting.
- There's a simple extraction of
*
Command usage, so you can get simple information that might go into*Help
. - There's a C header file generation that creates constants from the APIs in the file.
- There's an Impression DDF generator, that really didn't work very well.
- There's a rudimentary StrongHelp format generator, which worked reasonably but wouldn't be suitable for the quick-reference use that you would have in StrongHelp.
- There's a specialised HTML transformation that uses frames to link content in a different way. It didn't work very well.
Although these are the transformations that exist, more can be created with a little effort.
What does it look like?
Here's the definition of a SWI for AMPlayer in the PRM-in-XML format:
<swi-definition name="AMPlayer_Pause"
number="52E02"
description="Pauses playback"
irqs="undefined"
fiqs="enabled"
processor-mode="SVC"
re-entrant="no">
<entry>
<register-use number="0">flags :
<bitfield-table>
<bit number="0" name="Resume">
Resumes playback.</bit>
<bit number="1-30">Reserved, must be 0.</bit>
<bit number="31">R8 contains the instance handle to which this call
should be directed.</bit>
</bitfield-table>
</register-use>
<register-use number="8">if bit 31 of R0 set:<br />
instance handle to direct at, or 0 for
the base</register-use>
</entry>
<use>
<p>This SWI is used to pause or resume playback. When paused, decoding to
the output buffer continues, but at a much reduced rate. There is no sound
output.</p>
<p>Pause mode may also be cancelled by stopping. If
<reference type="swi" name="AMPlayer_Stop" /> is used to
cut to the next file, or if a different file is started, pause mode will
continue to be in effect, freezing the new file at the start of the file.
This can be used to ensure that playback starts at the instant of calling
AMPlayer_Pause (as opposed to calling
<reference type="swi" name="AMPlayer_Play" />, which can have a delay while
opening the file etc).</p>
</use>
<related>
<reference type="command" name="AMPause" />
<reference type="swi" name="AMPlayer_Play" />
<reference type="swi" name="AMPlayer_Stop" />
</related>
</swi-definition>
- The definition declares, as attributes of the
swi-definition
element, what the SWI is and some of the common fields that are described in the PRMs. Because these are attributes with defined meaning, they can be extracted - the contents generation uses this information to create references to the SWIs. - Then there is a description of the registers on entry to the SWI call (
entry
). Each of these is described through aregister-use
, which describes one or more registers. The flags inR0
are described in abitfield-table
which explains what each bit means. - There could be a description of the registers on exit from the SWI (
exit
), but this SWI does not have any, so it is omitted. - There's a longer section of prose which describes what the SWI does and how it is used (
use
). This includes cross-references to other SWIs or sections as necessary. - Finally, there's a section for
related
interfaces, which includes a collection of references to other sections within the documentation.
What did converting the PRMs involve?
So having decided on a format that can represent the information contained within the PRM, the next step is to turn the chapters of the PRMs into that format. However, it's useful to prove that it will work first by starting to use it for real things. Using it for real APIs and structures meant that it got 'battle hardened' and the error messages improved, and redundant parts stripped or made more useful. Then real conversions began.
And that's a lot of work. It's not a process I completed, but it's one that's largely mechanical. Take the HTML PRMs, pass it through a Perl script which turns the patterns of usage into skeletons of structured elements, and then chip away at the many bits that it got wrong until you have a document in the right format. It's tedious and it takes time to do, but the results are XML files that describe the same content as the PRMs. It's important at this stage to not start introducing differences. Ideally you can visually skim two copies of the documents - old and new - and see that, apart from styling, the content is the same.
This was a long running process that I embarked on, and got some help from Andrew Hill in creating some of the documents. Take a chapter, convert it, review in comparison to the original and fix. Repeat. Then go and do something else because it's a bit mind numbing 10.
There are bound to be mistakes, but the point is to get it into a format that's easier to manage, and has greater flexibility. Each chapter of the PRMs can be handled in (mostly) isolation. There will always be cross references, but these can be given general targets and we can be reasonably sure that a link checker will be able to verify these later. Because new documentation is all being written in this format, you're not falling behind and having to convert things repeatedly.
When the original PRMs were written, my understanding is that the original developers of the OS had written notes about the components to explain the interfaces and interactions. I believe that there were also some interviews done by the documentation group to understand the intent of the parts of the system 9. So there was some point of divergence at the point that the documentation and transfer of understanding of the components were made - from the developers to the documentation department.
This transfer was necessitated by the large, and quite skilled, job of writing documentation being different to the day to day work of developers. Generally, developers don't write good documentation, and they certainly don't have access to the tools and the knowledge of how to drive those tools well to create a full manual. Or at least they didn't at that time.
However, I was never going to get professionals skilled in documenting and publishing tools, so the PRM-in-XML project, as I named it, was going to have to be 'good enough' with developer written documentation. That also meant that the tools needed to be pretty simple to use. So I wrote a RISC OS application that could take a given XML file and turn it into HTML - which is about as simple as it gets. It's just invoking xsltproc
on the file in a TaskWindow and showing you the output.
Moving the goal posts
The goals have changed slightly, as the needs become more obvious. Instead of purely wanting to have something that was easy to work with, it should be something that was easy to work into the software development life cycle. That is, as you update a component like a module or tool, the documentation gets updated with it. And the easiest way to do that is to have the documentation sit alongside the component in its source directory.
If the multi-stage build process that builds an application (or the whole OS) includes a set of targets for exporting documentation, much as you export the resources for ResourceFS, then you can keep the documentation in the same system. That's where it is most useful and it can be used to construct a set of manuals from the application or system that you're building. Or even just a set of manuals for a section, if you only process the parts you need.
By keeping the source with the documentation with the component, you remove that need to transfer knowledge from one place to another at marked points. You can update the documentation as you change the code. Yes that means that adding a new feature is more than just writing code, but if you believed it was just writing code in the first place you were woefully mistaken.
And it's not that hard to do. Before the PRM-in-XML began to be used in builds of components (and that was really late on), most components just had plain text files which described the functioning of the components. Usually these were structured as an introduction and description of the component and its interfaces, then a description of the SWIs and services that it provided. They were intentionally structured that way because that eased the transition to the PRM-in-XML format, or to be released directly in that format because it would still be a familiar and useful manner of introducing new features.
Retrospective
So, with an extra 15 years of experience, what do I think about it?
I dusted off the PRM-in-XML project last year and started to make it work. The xsltproc
tool had become a bit more strict so there were some operations that were broken that I'd had to fix. I cleaned up and exported the style sheets and tools as part of the release of resources with my Build service presentation 22 - because it's used within Pyromaniac's build, and in a number of my other builds that I do.
It's even more useful to me now because systems are so much faster - processing a chapter is a sub-second task. And it's still a structured format so the pages that were converted years ago are still useful, and can be converted to other formats if I wanted.
Other technologies exist for managing documentation, so is it still the right way to do things? The goals these days, I think, have changed a little...
- Being viewable on other systems is more important now - most users will be using a second device (a tablet or phone, or Windows/Linux/macOS device), so being able to be viewed on such systems is vital.
- Editing in the single format by developers is more important - as the community is more open, it is important to allow anyone to be able to submit documentation changes, and doing so in a similar way to the regular flow of development means that having the documentation alongside the main source is vital.
- Having a community of unskilled developers is a useful commodity for documentation. They can provide better insight into what is useful, and be able to offer better suggestions. And if the format is simple enough that it's relatively obvious what they need to do to work with it, then that is all the better.
- A textual format that is diff-able is even more important for the review process in source control systems such as
git
, as exposed by GitLab, GitHub, etc. This was important in my original goals but essentially implicit. With the expectation that contributors will want to be able to review changes that happened and submit their own, this is important to the review process. - The documentation can be worked on by multiple people at once - because it's text, and therefore able to be processed by the standard tools, two people working on a documentation file is very easy to resolve. And because chapters are split between components, the likelihood of conflicting will be low-to-non-existent.
- It's all free - there's no cost in using a the XML processing tools, and therefore you don't have to buy Impression, Ovation, or FrameMaker to work on the documentation.
- Anyone can do it - the tools and docs available to everyone, and anyone can process the content into HTML, or into other formats with some processing.
What about PDF production? Change the HTML stylesheet to use classes 13 and then use CSS media queries to allow the pages to be formatted for printing. Or change the output format to directly output a presentation format like DocBook which is then able to be laid out for printing. This isn't so much a challenge as just another step.
Honestly, though, HTML is more flexible than a PDF most of the time, and PDF as a format to transition to a physical manual is a huge waste of time and money 14. I looked into it for a print version of my Rambles - it would have been about £80 a copy if I remember correctly 15. If the new PRMs were going to be sold in printed format they'd be much more expensive, for content that you could download yourself and browse on your machine, or online. So, what's the point? Some people might prefer that, but not enough to make a printed version of the manual viable. At least that's my opinion - nothing about the conversions precludes you doing it if that's what you want.
EPub is probably more useful, as it would allow for layout on tablets and other mobile devices. And EPub is just HTML 16 with some metadata around it, so that's easy.
Conclusion
PRM-in-XML offers...
- Structured information for the documentation and APIs.
- The ability to transform into other formats easily - HTML, StrongHelp, PDF, etc.
- Requires only a text editor to work with.
- Easy to read and edit format.
- Easy to process format.
- Easy to get community involvement for creation and review.
- Easy to ensure it remains up to date (if it remains with the source).
- Simple to integrate with build systems.
- Interacts well with modern version control systems like git.
- Zero start up cost.
A few of the things that I have released include some documentation in the PRM-in-XML format, because for me it's a good solution.
Any new things that people write could always be updated to use the PRM-in-XML format if they wished. One way of doing that is to just start out with some basic documentation and as people provide feedback on problems, or where things aren't right, it's easy to update.
Of course, updating all the RISC OS documentation to this format takes time and effort. But it doesn't have to be done all at once. Taking on small parts of the effort of documenting components breaks it up into nice chunks, that are relatively easy to do (if a little dull). That's where having an active community is pretty handy - anyone that's got an interest in improving things, can do so.
I'll write more in the future about how the PRM-in-XML can be used, but for now I think this covers what I think should be done to create consistent and community managed documentation - whether it be to replace the PRMs, or for your own software.
Resources
I'm working on releasing these resources out on GitHub, but the release from the original Build system talk is available right now and usable. The release contains documentation on how to use the tools and what the format looks like.
- PRM-in-XML stylesheet and tool (124K)
- PRM-in-XML example files (89K)
- Pyromaniac API documentation (48K) - documentation generated from PRM-in-XML sources, including the sources.
- RISC OS LibXML2 port and LibXML2+XMLLint binaries
- RISC OS LibXSLT port and LibXSLT+XSLTProc binaries
- The HTML manuals released by RISCOS Ltd were intended to make the manuals more accessible to people, and update them to a format that could be handled by the browsers of the time. This meant no special symbols, removing the class-based styling and fixing much of the poor output from FrameMaker, so that the manuals were appropriate for use by most RISC OS users. ↩
- At RISCOS Ltd, David Thomas originally did the HTML conversion from FrameMaker. Due to mistakes 17, these were released in an incomplete form with incorrectly formatted content. Recently he's been producing some patches to fix these issues so that they're in a state that he feels more comfortable with. You can find the fixes at https://github.com/dpt/PatchedPRMs. ↩
- 'HTML Sucks Completely' - a tool originating on the Amiga that had been ported to RISC OS well by Sergei Monesi. It was originally a tool from the Amiga and the author had a dim view if the direction that HTML was going at the time. It allowed new macro elements to be defined which could perform conditional transformations of the content which was very powerful and made building sites very easy. This easy came with a learning curve and having to use a system which was 'novel' to say the least. ↩
- For many years HSC managed the creation of my personal website. Actually it's used in the building of the
riscos.online
website content - it works just as well on macOS as on Linux and RISC OS 5. ↩ - This is the reason why I wanted to be using standardised, non-platform specific tools - they keep working years later and don't have a running cost in upgrades, or an initial cost when you start using them (well, other than the usual learning curve). ↩
- Little weird is fine, sometimes. ↩
- The
BTS
system is what reports the full backtrace from the first Kernel call all the way to the thing that aborted, through theDiagnosticDump
module. ↩ - In the very dim past of (I think) 1998, I had been playing with updating Zap to understand module workspace blocks through structured data definitions. That would allow you to look at a given module's workspace and have symbols and more readable information about the module's operation. This didn't come to anything, but the idea of having this information available to you to aid in debug wasn't new. ↩
- Actually, that's just a feeling I get from the manner in which some parts are documented, some of the drafts of the PRM that I've seen, and how I think it worked back then - maybe it was all from specifications and documents in the drawing office, but I doubt it. ↩
- Mind numbing but it gave a really good feeling when documents were finished, or at least partially finished, enough to slot them into the index. You could see a new, structured, and maintainable, set of manuals coming out. ↩
- StrongHelp is a great viewer, but with structured information about APIs, rather than formatted API information for presentation, you could produce a much more featureful browser. ↩
- All modern editors have means for integrating intelligence into what you are editing in the form of suggestions for completions at its most basic, but also including documentation hints for APIs. ↩
- This should be done anyhow; I've just not got around to it. It's not a small job, but it's also not that hard - it just takes time to get the XSLT conversion right and produce the right stylesheets. ↩
- And trees. ↩
- It was a long time ago so I can't remember exactly. I think I was looking at maybe 10% of the pages being colour, plus the size of content meant that it probably would have to be in multiple volumes just because of the binding needs once you get above a certain number of pages - that is only worse in the case of the PRMs because despite not having colour (although the style guide does have colour) there are a lot more pages. ↩
- Styled HTML, strictly. Using tables in EPub is problematic because the device's wrapping of tables horizontally and vertically within a variable sized display is going to produces a very compromised output. Tables can't be avoided, but styling them suitably means that the device is able to give a better approximation. ↩
- The original HTML conversion came from David Thomas, I packaged it up with other things that were going on the CD and then sent it off... I'm pretty sure that's where the mistakes crept in - I think I sent off the wrong collection of documents and not the final version. Mea culpa. Oh well. ↩
- Modern versions of FrameMaker are available for subscription at about £30/month for an individual user 19. Back in the 2000s we bought it as a non-subscription application. I don't remember how much it cost but I'm certain it was not cheap. ↩
- Although it's a monthly cost, you have to commit to 12 months, so a cost of £360 for a year. ↩
- If multiple people wanted to work on it, they would have their own copies and one them would have to manually copy the changes from the other versions into a master document. This would be tedious and error prone. ↩
- The idea behind having semantic elements is that you describe what you're trying to express, rather than how you want it to look. Modern HTML has elements like
<section>
which indicate that a block is intended to group a section of information, where earlier versions would have merely been able to say that it was a divided block withdiv
. Semantic markup is easier to read raw, and easier to extract information from automatically. If the content you produce uses semantic markup then changing the style that it's presented in is a lot easier. ↩ - https://pyromaniac.riscos.online/ is the main site, and the released files are in the Technologies section. ↩