Making Finding Aids for the WWW

..or Adventures in extracting HTML from Microsoft Access

2. Why use databases?

The techniques that I describe below use Microsoft Access databases to create HTML finding aids. But, of course, your finding aids or collection data might not currently be in a database. Perhaps your finding aids exist in word-processing format, or maybe just as typescript, what can you do? If you have typescript finding aids then naturally the first step is to use a scanner and optical character recognition (OCR) software to get them into digital form. OCR software is improving all the time, and will now preserve much of your formatting as well as the text.

Once you have your finding aids in a word-processing format, there are two ways of getting them into HTML:

  1. Export the word-processing file to HTML directly (adding and editing markup as required)
  2. Export the word-processing file to a database and then generate HTML out of the database
Option 1 is probably faster - you could use Microsoft Word's Internet Assistant (built into Word97), or a shareware program called RTFtoHTML to convert your documents, although you will probably need to do a lot of manual editing to get something looking halfway decent. I'm focusing on Option 2, simply because I believe it is more flexible and more efficient in the long run. Yes, it will take more work to get all your listings into a database, but once they're in there, you will have much greater control over your data, and many more options for creating output. By defining appropriate tables and fields, you can give your data a meaningful structure. If constructing your own database seems daunting a task, then you can use an archival management system like ASAP's ADS or Tabularium. Both ADS and Tabularium are Microsoft Access based applications, so the techniques I describe here will work with them.

There are other means of imposing structure on your finding aids, such as marking them up in SGML. The Encoded Archival Description (EAD) is an SGML-based standard for the marking up of archival finding aids. So encoding your word-processing documents according to the EAD standard will give you some of the advantages of having them in a database. However, it's easier to go from a database to an EAD document han the other way around, so again I think the database wins in terms of flexibility. Whether marking up your word-processing document as HTML or SGML you face the same problem, you are mixing the data and the markup all together. If, in the future, you want to change the style, meet new standards, or take advantage of developments in HTML, you'll have to edit the whole thing again. By exporting into a database, you are keeping the data and the markup separate, this means that it will be easy to generate new versions as you require them, or even generate alternative versions of the one finding-aid for different purposes (eg. one for a standalone PC, another for WWW server access; one as EAD compliant, another in HTML). (I will be adding a section on EAD/XML soon!)

Well, maybe I haven't convinced you, but I hope at least you're now thinking about using a database. In fact, you might be thinking, 'If I get my listings into a database, why do I have to generate HTML files at all? Why can't I just provide access to my database through the WWW?' The answer is 'You can!'. There are now a lot software gateways to connect WWW servers and databases, and this may be something you want to explore. However, you will need to be aware that you may be limiting the ability of WWW search engines to 'find' your data, and if you use proprietry software, rather than an open standard like Z39.50, it may be difficult for your data to be integrated with that of other institutions - which sort of defeats the purpose of putting it on the WWW to begin with. (I'll also be adding a section on searching!)

In any case direct WWW-database access will not be a practical solution for many institutions. It involves a much higher level of technical expertise and administration, special software, and probably a dedicated WWW server. The methods for generating HTML finding aids I describe here will allow any institution to create a WWW finding aid that can then be mounted on any WWW server, or even distrubuted on a floppy disc or CD-ROM for use by non-Internet connected computers. All you need to view such finding aids is a standard WWW browser!

Next Section - 3. Exporting your files to databases

1. Introduction
2. Why use databases?
3. Exporting your files to databases
4. Producing HTML from databases
4.1 Export to rtf method
4.2 The module method
4.2.1 Contents page
4.2.2 Item listings
4.2.3 The results


Created by Tim Sherratt (Tim.Sherratt@asap.unimelb.edu.au)
Last modified: 16 March 1998