When does a cornucopia of digital information become too much of a good thing?
Consider the situation of a person who wants to buy a copy of Moby Dick.
If this person resorts to an online listing of available books, like Books In Print
or one of the online booksellers' databases, and searches for the novel, what will
they find? The result of a title search on "Moby Dick" generates 128 items, each
with its own ISBN, listed as available in Books In Print (the online bookseller
databases are based on the data in Books In Print or its equivalent as assembled
by the large book wholesalers). The problem is that only 65 of these are editions
of the novel Moby Dick by Herman Melville. The other 63 items listed include
33 critical works, 15 study guides, 12 adaptations (including juvenile and dramatic),
2 art books, and one audio book.
The problem is that a title search in these systems draws in all sorts of
peripheral material, since it is based only on the presence of the words "Moby"
and "Dick" somewhere in the title or subtitle. If, on the other hand, the would-be
reader happens to know the ISBN of a particular edition of Moby Dick, that number
would screen out all the irrelevant material. But, since the ISBN represents only
one published manifestation of the novel, the ISBN search will miss the other 64
editions of Moby Dick, including other formats of the same edition by the
same publisher.
What the reader needs is a search method which will be accurate enough to
screen out the irrelevant material, and yet inclusive enough to draw in all the
different published manifestations of the work. What is needed, in effect, is a
"super-ISBN" which would identify a given literary work rather than just
one published version of the work. Such an identifier is currently under development
by the International Standards Organization (ISO), Technical Committee 46, Sub
Committee 9, Working Group 3, and has been dubbed the "International Standard Textual
Work Code," or ISTC. We believe the ISTC may prove to be of major importance to
authors, booksellers, publishers, and libraries once it is finalized and implemented.
As its name implies, the ISTC is an identifier for textual works, as distinct
from item identifiers like the ISBN. The ISTC is meant to identify the abstract
literary work as originally created by its author, independent of any particular
edition, format, or publisher. All versions of a work would bear the same ISTC,
allowing them to be easily linked to one another in databases.
Another important feature of the ISTC concept is the linking of related
works by their ISTCs. Not only will different manifestations of the same work
bear the same ISTC: Works which are modifications or derivations of other works will
also bear the ISTC of the work to which they are related as part of their ISTC
descriptive data.
The ISO imprimatur will give this identifier the status of an approved
International Standard, much like the ISBN. The ISO process is a lengthy one,
measured in years, and involves the balloting and approval of a series of drafts
by the 22 countries which are participating members of ISO Technical Committee 46,
Sub Committee 9. The actual Working Group which is developing the standard includes
participants from diverse constituencies in many countries, including libraries,
publishers, retailers, the information industry, standards bodies, rights societies,
and author organizations (including the Authors Registry). The diversity of interests
represented by this group makes the formulation of this standard a special challenge.
In a sense, the developments that led to the current ISTC project started in
the mid-1990s with the introduction of the Common Information System by CISAC.
CISAC (Confédération Internationale des Sociétés d'Auteurs
et Compositeurs, or International Confederation of Authors' and Composers' Societies)
is an umbrella organization which counts as its members most of the world's rights
organizations, including such giants as ASCAP and BMI (the Authors Registry was
recently admitted as a member organization of CISAC). CISAC is an organization of
organizations, and its function is to facilitate the efforts of the many national
rights societies which maintain reciprocal agreements with one another. These rights
societies manage a large portion of the intellectual property licensing and payment
transactions that take place worldwide, particularly in the realm of music.
With the increasing importance of digital media and the Internet in intellectual
property distribution, CISAC saw the need for coordination of the information standards
and systems used by its member societies. This coordination aims at the development of
an international infrastructure for clearing rights and transferring payments, which has
been dubbed the Common Information System.
One of the most potentially influential components of the Common Information
System is a series of identifiers for works in various media. Two examples of these
works identifiers already in use are the International Standard Music Work Code, and
the International Standard Audiovisual Number. The ISTC is another of these works
identifiers. The common thread here is the intention to apply an identifier to a
work that is, to an abstraction rather than a physical manifestation.
From the standpoint of rights societies, such a number would be invaluable.
Once implemented in a rights database or publisher's royalty system, the ISTC would
enable all manifestations of a particular work to be linked to the appropriate
rightsholder(s). Versions of a work with varying titles would be collocated
within databases by the ISTC. The need for such an identifier has also been
keenly felt for some time by book retailers. The large databases assembled
by online booksellers are sorely in need of better organizing principles which
will allow readers to move beyond keyword searching, as in our opening scenario.
The ISTC is potentially a godsend to these retailers. For example, a reader at a
bookstore website who wants to find all currently-available versions of a literary
classic would be able to do so by performing a search using the ISTC.
Another aspect of the functionality planned for the ISTC is the accessibility
of the descriptive data attached to the ISTC. When an ISTC is assigned to a work, a
set of required descriptive data (or "metadata" current jargon for data
describing an information object) will be submitted to the ISTC Agency by the
registrant of the work. This metadata will include essential information such
as title and title variants, names of author(s) and contributors, and relationship
to other works for example, part/whole relationships, or derivative
relationships. Ideally, this metadata will be easily available to anyone who has
the ISTC in hand. In fact, the hope is that the ISTC will be resolvable in the
Internet environment, meaning that it will link to a persistent source of its
metadata, enabling hypertext access from the ISTC to the metadata.
The design of the number itself is relatively straightforward. Major
decisions include: "dumb" versus "intelligent" numbers, number and type of
digits (relating to the number of objects which must be accommodated in the
system), check digit system, and structure of the number, if any. Formulation
of the metadata required for each ISTC is more challenging: How much data
concerning the work and its creator must minimally be captured, what sources
of the data are acceptable, and how this data should be formulated, are questions
which generate a lot of discussion.
Probably the most challenging task this group must perform will be the
design of an Agency system to implement the ISTC, and the selection of an entity
to perform the Agency function. To understand this it may be helpful to think of
the implementation of the ISBN. Not only was it necessary for publishers to adopt
the ISBN and use it universally: In addition, an international ISBN Agency had to
be created, accompanied by nationally-based ISBN registration agencies to dole out
the ISBN numbers and collect the data attached to them. The result of all this
development work is twofold: On the one hand, books have ISBN numbers attached
to them, which come to reside usefully in bibliographic databases; at the same
time, the Books in Print database is assembled from the data collected at the
time of ISBN registration. The ISTC system is likely to have many similarities
with this ISBN system.
Another aspect of the ISTC which differentiates it from other existing
identifiers is the fact that it can theoretically be applied to any textual
work of any extent. That is, ISTCs can be assigned to magazine articles,
newsletters, chapters of books, individual poems, or anything which can
legitimately be considered a work of text. However, while the ISTC can
potentially be applied to every textual object in existence, it
will actually be applied only in those cases where it is worth the trouble
and expense to someone to do so. The requirement for the ISTC system is that
it can potentially accommodate any textual work, not that it represent
all textual works.
The obvious question "How will the system be paid for?" is very much under
discussion. The system must pay for itself, one way or another. One way is to
charge enough for the registration process to pay for the system. However, there
is concern that this might push the price of registration too high, and discourage
universal registration of works. The alternative approach would be for the system
to recover some of its costs, and whatever profit is needed, by using the data collected
to produce a value-added product, on the model of Books in Print produced from ISBN data.
The business model ultimately adopted will probably combine these approaches.
Finally, the schedule we're working with. The first Working Draft of the
Standard is now in existence. Many sections and details still need work. The next
major task of the working group is to complete and publish a Request for Proposal
from organizations interested in taking on the role of International ISTC Agency.
This RFP is expected to be published by the beginning of June 2001. The completion
of the Standard and selection of the Agency are planned by early 2002, with
implementation of the Standard beginning in mid-2002.