ISO/IEC JTC 1 / SC 32 / WG 02
![]()
Mechanisms for organizing and searching centralized catalogs, directories, inventories, databases, and file systems have been developed over many decades. More recently, mechanisms for searching decentralized networks such as the World Wide Web have also been invented. Unfortunately, virtually all such mechanisms are single-purpose facilities without support for search interoperability. A bibliographic catalog of publications does not interoperate with an organizational directory, and neither interoperates with inventory systems or collections of Web pages.
From a searcher's perspective, finding relevant resources and services involves looking for particular characteristics. For news items, the date of publication is very important, while for maps the place characteristic is critical. In searching electronic mail, the searcher may need to specify to whom the message was sent, but this characteristic is not meaningful in a catalog of publications. Although there are a variety of characteristics employed for locating information, useful search interoperability is achieved in practice only where characteristics are in agreement.
Bibliographic elements such as Title, Author, and Subject are well-known and catalogers are in good agreement on what these characteristics mean. A parallel concept exists for data sets, where the equivalent of bibliographic elements are called "metadata" elements. For the purpose of search interoperability, a File Name metadata element may be semantically equivalent to a Title bibliographic element, and a Principal Investigator may be semantically equivalent to an Author. To enable search interoperability between metadata and bibliographic elements, one must connect a metadata element to its bibliographic equivalent based on their known semantics.
Interoperability through reference to known semantics is a powerful approach, widely applied: in data dictionaries used with database management systems, in directories based on structures such as X.500, in cataloging rules used throughout libraries and publishing, and in message sets used with Electronic Data Interchange. Because in this approach the actual tag used for a characteristic is irrelevant, catalogers use whatever language and tag is preferred by the primary audience.
Libraries are already deploying catalog interoperability using an agreed semantic base represented in Machine Readable Cataloging (MARC). Many other communities are deploying metadata search interoperability using the Global Information Locator Service (GILS) Profile. GILS and similar profiles share the library semantic base and also provide gateway mechanisms for interoperability with X.500 Directories and with the Electronic Data Interchange community. Another community recently invented the Dublin Core set of information characteristics, which can also interoperate with GILS. Facilitating reference to common semantics is also key to the Resource Description Framework and XML-DATA proposals of the World Wide Web Consortium.
The current GILS Profile uses the ISO 23950 standard to specify in application service definition and protocol terms how electronic network searches are expressed and how results are returned. In the GILS Profile, a server is modeled as an interface to sets of "locator records". The search service is only defined at the server side of this client/server interface. It specifies support of Boolean queries using four structures (word, word list, date, URL) and five relations (less than, less than or equal, equal, greater than or equal, greater than, not equal). The GILS Profile is silent with respect to any other client or server behaviors. Clients may be automated processes, gateways, or direct access applications. Servers may employ any manner of data access, whether flat files, databases, network distributed search interfaces, or gateways to other protocols.
Queries on a GILS-compliant server are applied to a few well-known search access points common to bibliographic and other structured metadata: Title, Originator, Distributor, Subject Terms, Date of Last Modification, and Record Source, plus a server-unique identifier called Local Control Number. The search access attributes known as "Any" and "Anywhere" enable full-text search, and the GILS Profile also notes over 100 well-known record elements.
Although it models a set of locator records, the GILS Profile does not specify the contents of any particular set of records. A set of records may populate any or none of the available well-known elements, and may also populate any number of locally-defined elements. All well-known elements in GILS have direct one-to-one counterparts in the MARC set of semantics. This provides essential interoperability with accumulated knowledge in the collections of bibliographic resources maintained over the centuries in libraries, museums, and archives worldwide.
Several aspects of the GILS design provide for extensibility. The Geospatial Profile is a superset of the GILS Profile which adds a coordinate (latitude/longitude) structure, a few relations (e.g., overlaps, contained-within), and about 100 more elements. The Scientific and Technical Attribute Set provides many additional elements, as well, and is being used by Chemical Abstracts Service to support searching on chemical conformations. The biological community has used Z39.50 for genetic sequence searching. Also, additional query types can be defined for search situations where other than general purpose Boolean constructs are appropriate.
While GILS already represents a powerful design for search interoperability, its further evolution depends on progress in some areas of standardization. Evident in how readily GILS is applied to protocol gateways, GILS is really a service definition more properly described in an interface language than a protocol specification. There is work underway to specifying the GILS and Geospatial profiles in OMG-IDL (Object Management Group -Interface Definition Language). The goal is not only to provide protocol wrappers for LDAP, SQL, and Z39.50, but to support in a protocol-independent way the essential features of networked information discovery and retrieval.
Frustrating progress in many search standardization efforts is the lack of a process for handling multiple overlapping semantic domains. Such domains range from low-level programming constructs, through structured metadata elements, and into linguistic regimes such as thesauri and semantic networks. There have been many efforts over the years to address these problems, and efforts are ongoing still.
Although the current state of understanding and technology may not be adequate to address all of these problem areas, it should be possible to improve the situation within a sufficiently constrained problem space such as GILS. The GILS semantic problem is focused specifically on the search service interface at the server side of a client/server interface. The relevant semantic domains are only those that are commonly used in locating information resources, which is a tiny subset of the full array of possible resource characteristics.
Progress in semantic interoperability for the GILS problem space can begin with just a few widely used sets of semantics. For example, semantic cross-walks already exist among GILS, MARC, Geospatial, and Dublin Core elements. It would be immediately useful to forge consensus on a convention for expressing such element cross-walks in a common machine-processable way, such as the approach taken with Metadata Registries. Immediate progress on GILS at the semantic registry level would be useful for related standards initiatives such as the W3C Resource Description Framework, the Open GIS Consortium, and possibly OMG work on the Trader Service and Query Service.
In the proposed standards activity, a technical report will examine information technology characteristics of a next generation search interface that would further enable the deployment of a Global Information Locator Service. It is expected that these characteristics will include an emphasis on object-oriented techniques, Metadata Registries, and Interface Definition Language, among other things.
In simplifying mechanisms by which people find information resources, this standards activity would benefit most those interests that exchange information intensively. It is especially pertinent to situations involving the discovery of information, as distinct from querying a database for specific content known to reside in the database. The interest groups likely to have immediate needs include governments, information services, directory services, and industries involved in electronic commerce.
The major factor that may hinder the successful establishment or general application of the standard is the perception that semantic issues are somewhat abstract in comparison to syntactic issues. However, this attitude is changing rapidly as more computer systems implementors face the deep problems of semantic interoperability that libraries have faced for hundreds of years.
Most of the technology bases expected to be used for this activity (e.g., object modeling, interface definition language, protocol specification, query languages) are reasonably stabilized. Some aspects, such as metadata registries are in the stablizing currently but wioll become stable over the course of next couple years.
It is not likely that advances in technology will render the proposed standard outdated over the next few years. In fact, the proposed standard is likely to help engender the future positive development of metadta registries and other aspects of information discovery and retrieval.
Considering the immediate needs of the interests most likely to be affected, the work on this standards activity is fairly urgent. Also, this activity is designed to leverage the current state of readiness among several discrete technology bases. Such an opportunity for synergy is rather rare.
The likely affects on the major interests if no standard is established within a reasonable time are continued fragmentary approaches to information search and retrieval. The lost opportunity for interoperability would have profound affects on efficiency in the information industry and inestimable negative affects on pure research.
National, state, and international policies, regulations, and laws deal with the socio-political aspects of ensuring the free flow of information in societies. A number of these address GILS specifically. This relationship is likely to have a positive affect on the standards activity, especially with respect to its support by governments.
![]()