The UDC MRF Database Development and Design – a historical review
by P. D. Strachan and F. M. H. Oomes, UDC Consortium, The Hague
In March 1990 a Task Force on UDC System Development that in October 1988 had been established by the UDC Management Board, submitted its final report. Its scope had been defined as follows:
In view of this somewhat global and anyhow flexible definition, it may surprise - and demonstrate the pragmatic approach of the Task Force - that it said in its first and primary recommendation:
This database should be completed within two years and provide the individual publishers of UDC versions the material for the compilation of their editions. It should also be the basis for revision of the schedules and the starting point for "Extensions and Corrections to the UDC". To achieve this a consortium of interested institutions should be set up. The Management Board accepted the recommendations of the Task Force and per 1 January 1992 FID transferred the intellectual ownership and the responsibility for maintenance and development of the UDC to the UDC Consortium.
At that time the creation of the machine readable version in the meantime baptized Master Reference File (MRF) of the schedules had already started. For practical reasons the International Medium Edition, published by BSI Standards was selected as basis for this database. The text was already available in digitalized form and, taking into account the necessary updating, its size corresponded fairly well with the 60.000 notations recommended by the Task Force. This basis has been modified and supplemented by:
The database was compiled using UNESCO's Micro CDS/ISIS version 3.0. The development of the database design, the formats ("worksheets") for printing, editing and display was done by Gerhard Riesthuis, senior lecturer at the University of Amsterdam and David Strachan. Drs. Riesthuis also wrote the different programs for conversion of the various sources to CDS/ISIS files.
The design had to take account of a highly complicated process, caused by the fact that the database had to be compiled from various sources. Firstly the conversion to CDS/ISIS of the files of the International Medium Edition; secondly the materials from Extensions and Corrections before Series 13 that had to be keyed in and converted; then the conversion of the already existing text files of Extensions and Corrections 13/14 up to EC14:3 that was published in October 1992.
Besides there existed separate lists of cancellations and modifications, including replacements of cancellations, to IME and finally of the selections made from the more recent medium editions.
For practical reasons the entire UDC was divided in ca. 30 subject sections,
each of which was completed and edited separately. For almost each of
those sections separate databases had to be built for the material from
each of the applicable sources. The diagram below is a very simplified
representation of the compilation process and one should realize that
this had to be done for each of the ca. 30 sections mentioned above.
The last stage of final editing included, among other things, checking of references, the translation of entries of which only a German (or in some cases French) text was available and expansion of the IME in line with more recent medium editions.
The field structure of the CDS/ISIS database had to account for the various components of an entry in UDC-schedules and for the individual sources of the database content as well as for the different type of intervention during the process of editing. While compiling, some of the original fields turned out to be superfluous or not practical, so they were never used. Some fields were only declared because they allowed for selections supporting the editorial operation or to produce printed output in a certain format. The design has known several versions of which the final one is almost complete listed below. Some fields were divided in subfields that could be individually accessed by the CDS/ISIS software. To fill in the fields in many cases a table of codes had been defined.
UNESCO's CDS/ISIS software proved to be a very reliable, although in many cases somewhat tough and somewhat unfriendly tool for compiling, editing and managing the databases. Its main advantage appeared to be its flexibility in output and display of the database content, and in converting database from one format to another.
However it would be very useful if it would offer facilities for automated checking of references, which can now only be done by hand via printed lists.
A minor but awkward problem is that Micro CDS/ISIS uses the apostrophe for delimitation of the search argument. Apostrophe-auxiliaries therefore disturb the search facility. For the time being this problem has been circumvented by replacing the apostrophe by an inverted comma; for printed output this has to be corrected by a search-and-replace action of the text processing software.
In the last stage of the creation of the Master Reference File the separate databases for the various sections had to be merged into one database file.
Before doing this a new database design had to be developed. Some fields in the former design were no longer functional, others had to be added so as to register revisions and revision history. Of course, a copy of the original database files has been kept for future reference and to keep track of the sources.
The new design, which is so far more or less experimental - as said, converting a database to a new format is relatively easy in Micro CDS/ISIS - has the following field structure:
The database in its last update in the year 2007 contains totals ca 67,770 records = UDC class numbers. The datafile (.MST) occupies ca. 15,000 Kb. The distribution of the records according to section and subject fields is as follows (updated in July 2008):
As a database the MRF will certainly be useful in automated systems for cataloguing and information retrieval. Therefore, the UDC Consortium decided to make it available as such to interested libraries and documentation institutes.
In this stage of its development the MRF can be delivered as a database in Micro CDS/ISIS, as a file in ISO 2709 interchange format and as a text file in plain ASCII that can be loaded in a text processor. Other ways of distribution including special user applications for accessing the MRF will be developed if they respond explicitly to the users' needs and the necessary funding is available.
With regard to this the UDC Consortium will be very grateful for suggestions from users.
The MRF database will be the core material for all editions of the UDC in whatever language, size and form, and on whatever medium. It is also the starting point for all future revisions and enhancements of the UDC. It is the intention of the UDC Consortium to approach the revision process in a more structural way and to shorten the revision procedures.
The needs and wishes of the users of the UDC will remain the most important source for revision, for the UDC should be their tool and not a purpose in itself. User clubs might be the vehicle for their comments and suggestions.
However, users should realize that maintenance and enhancement of the UDC requires not only the involvement and enthusiasm of users, but also money needed for committing revision work, staffing and equipment.
It would be disappointing for all those involved in the creation of the MRF if this project could not be continued and further developed, if this first step were not followed by many others.