Description
of Statistics Track
With the enormous popularity and usefulness of the Internet, users of statistical data expect fast, automated, and complete access to data and reports. Metadata, the descriptive data and documentation that users need to understand, locate, access, and manipulate statistical data, must be organized and available along with the data. The survey designs, specifications, procedures, and instructions developed within statistical offices are all metadata. So, metadata systems help the internal production oriented purposes of the statistical office, too.
Metadata is either active or passive, and sometimes used both ways. Passive metadata consists of human readable descriptions, such as documents. Active metadata is used to drive systems. These systems create new metadata that drive other systems. The systems serve three purposes:
1) Automatic creation of metadata
2) Use of some metadata in active mode
3) Enhancement of survey processes through metadata re-use.
The first item is worth highlighting, as the capture of metadata is difficult, especially if it is already created in another form.
Metadata serves to make processes reproducible and data understandable. This is the goal of statistical metadata in the survey life-cycle. Both active and passive metadata are required. Generally speaking, passive metadata is created for management and the users of data dissemination systems, and active metadata is created by subject matter experts while using tools that support the survey life-cycle.
In statistical agencies, metadata has been created and managed for years, however, not in a uniform, organized way. Examples of some kinds of metadata from the survey world are:
1) Data element and domain descriptions
2) Questionnaire documentation, including question wording, skip patterns, response choices, etc
3) Classification schemes, e.g. the North American Industrial Classification System
4) Sampling scheme details
5) Editing specifications, i.e., constraints.
A metadata registry supplies the structure to manage and query metadata in uniform, meaningful ways across time, different surveys, and the survey life-cycle. Now, many statistical offices around the world are implementing or making plans to implement metadata registries. The extent and purposes of these systems vary, but an increasing number of the offices are choosing ISO/IEC 11179 as a basis for their designs. The statistics track of the Open Forum will contain talks describing several of these registry efforts. There will be 4 main foci:
1) Registries, design and development
2) Classification databases, managing classifications, and statistical terminology
3) User requirements
4) Metadata registry driven systems
Program
See section on Statistics Track for detailed agenda.