Agenda as of January 12, 2003

DAY 3 -- Wednesday, January 22, 2003

Link to Track Descriptions

Statistics Track

8:30 - 10:00 AM

8:30 - 8:40 AM - Welcome
Dan Gillman
US Bureau of Labor Statistics

 

8:40 - 9:20 AM - Planning a Metadata Registry at the Bureau of Labor Statistics
Dan Gillman
US Bureau of Labor Statistics

This talk covers some of the plans and possibilities for a metadata registry at the Bureau of Labor Statistics.  Experience at the Census Bureau shows that gaining acceptance for metadata management within an organization is made much easier building applications that use the registry.  Selling the concept of metadata to an organization and convincing management to require the use of a registry as means for instituting it don't work very well. Some potential applications and plans for building a registry will be discussed.

9:20 - 10 AM - Metadata Registry Development in Slovenia
Joza Klep
Statistical Office of the
Republic of Slovenia

During the last years, Statistical Office of the Republic of Slovenia (SORS) has (supported through several technical assistance projects financed through Phare) achieved substantial success in the alignment of Slovene Statistics with the Acquis Communautaire. This applies in particular to the creation of new data management techniques and the implementation of substantial business and agricultural statistics. Furthermore, SORS introduced a modern future oriented data management system upon the basis of a statistical data warehouse. Within STAT2000, SORS continues to address the remaining weakness of its statistical system. A deficit in management related activities has been complained. Therefore the sub-activities concerning organization and management have been bundled up within "Basic and background activities" with the following content:

(a) Development of clear and binding metadata management and operational rules around "basic metadata entities" /set of basic building blocks (e.g. basic documentation and basic metadata), as defined by Sundgren (2002). This direction might lead towards integrated data/metadata management in future production of official statistics. Major components in a metadata system were identified and the organizational responsibilities and rules will be built around them. Target data flow in SORS is accepted by top management.

(b) SORS aims to move from a publication oriented organization into web oriented organization with PC-Axis family. That will allow users to access both the data and the publications with browsers only. "Permanent" web archive of publications (pdf and html formats) will be developed. Standard publications will be created with templates supported by PX-Publ. "Data shooting" functionality will be built on the basis of saved queries. Metadata, the descriptive data and documentation that users need to locate, access, understand, and manipulate statistical data, will be organized and available along with the data. Quality (contents) declaration will be available. Data and publications will be released according to "Advance release calendar". A new interface that would facilitate loading of data to the Statistical Data Bank will be developed.

Solutions will be deployed and tested on labor market statistics (wages). The STAT2000 project will be finalized by May 2003.

10:00 - 10:30

Break

 

10:30 - Noon

10:30 - 11:15 AM - Implementation Architecture of the Integrated Metadata Database

Amie Lee

Statistics Canada

 

The Integrated Metadata Database (IMDB) provides a repository of information about all surveys conducted by Statistics Canada.  This paper discusses the implementation architecture of the metadata repository at Statistics Canada from a historical perspective of the initial design goal, architectural model and implementation software selected in 1998 to our target implementation of an XML based model in 2002. 

11:15 - Noon - Description of Metadata Registry Efforts in Switzerland
Claude Macchi
Swiss Federal Statistical Office

This paper presents the description of the conception and the first steps of the implementation of an integrated metadata management, started in 1999, into the Swiss Federal Statistical Office within the scope of the CODAM project– Corporate Data Management. The introduction of a centralized metadata management will replace the old working method, used till now, where each statistical activity created, administrated and disseminated its own metadata independently of all other surveys. In the future, metadata supporting the whole statistical production (from the survey design until the data dissemination) will be collected in a centralized system and placed at the disposal of every statistical producer and user.

The paper emphasizes also the changes that the introduction of a partially new organization brings along into a since longtime applied work flow. The introduction of new tools and a new philosophy of work is generally associated with an acceptance problem of these changes by the concerned people. It is very important to avoid difficulties which could jeopardize the whole project, to involve the concerned organization units as far as possible since the beginning of the work and simultaneously to implement standards and policies, in order to define clearly general conditions.

 Noon - 2 PM

Lunch

2:00- 3:30 PM

2:00 - 2:45 PM - The DAIS/NESSTAR Project - Providing a Virtual Metadata Registry for Evidence-based Knowledge on the Web
Bill Bradley

Health Canada

A description of the DAIS/NESSTAR Project.

2:45 - 3:30 PM - Harmonizing ISO 11179 and the DDI - Challenges and Recommendations
Jostein Ryssevik
Nesstar Ltd. and
Norwegian Social Science Data Services (NSD)

Using ISO/IEC 11179 for integrating metadata in XML documents described by the Data Documentation Initiative DTD. This is documented in a paper by authors (in alphabetical order): Pasqualino Assini, Bill Bradley, Gordon Colquhoun, Dan Gillman, Tim Glover, Jostein Ryssevik (senior author).

3:30 - 4:00 PM

Break

4:00- 5:30 PM

4:00 - 4:45 PM - Metadata and Data Warehouses in the Australian Bureau of  Statistics: Current Facilities and Future Possibilities
Don Bartley
Australian Bureau of Statistics

 

 

4:45 - 5:30 PM -  Expanding the use of ISO/IEC 11179 Metadata Registries beyond Data Standardization and Harmonization
Gail Wright
Oracle Corporation

The speaker will present how Oracle Consulting has implemented the ISO/IEC 11179 Metadata Registries standard as a generic software component that can be integrated into a variety of software architectures and business applications to solve a variety of IT and business problems. Specifically, the speaker will show how an ISO/IEC 11179 Metadata Registry can be used:

  • As part of an overall Enterprise Data Architecture
  • In Data Warehousing and Legacy Migration
  • In Data Quality Analysis
  • In Statistical and Survey Business Processes
  • In providing rich content about data to websites and portals
  • And more

The speaker will show real business examples as well as discuss higher-level architectural concepts.

Note: The speaker will give a demonstration. The demonstration may extend past 5:30 PM. Participants from other tracks are invited to come to the room, even after 5:30 PM, if they wish to see the continuing demonstration.

DAY 4 - Thursday, January 23, 2003

Link to Track Descriptions

Statistics Track

8:30 - 10:00 AM

8:30 - 9:15 AM - Managing and Using Code Sets in Database Systems
John L. McCarthy
Lawrence
Berkeley National Laboratory

 

Code sets have been used for many years in a number of different ways in conjunction with database management systems. Metadata registries and advanced database systems present additional opportunities as well as challenges for using and managing code sets. Since code sets inevitably evolve over time, one major challenge is how to manage such evolution in ways that still let code sets provide stable foreign key constraints for databases as well as authoritative labels for reports and descriptions for documentation. This presentation will explore some of the issues, tradeoffs, and design choices involved in practical management and use of code sets. It will also suggest techniques that can help us manage and use code sets in changing environments.

 

9:15 - 10 AM - A Common Vocabulary for Statistical Metadata (SDMX  project)
Marco Pellegrino
Eurostat, Statistical Office of the European Union
Statistical Information System Directorate

www.europa.eu.int/comm/eurostat

The Metadata Common Vocabulary project is conducted within the Statistical Data and Metadata Exchange initiative (SDMX) sponsored by BIS, ECB, Eurostat, IMF, OECD and UNSD. The focus of SDMX is the exchange of statistical information (data and metadata) in the field of socio-economic data by taking advantage of existing exchange protocols and dissemination formats, through emerging e-standards such as XML . This entails the development of a set of metadata content standard definitions that will enable the participating institutions to take maximum advantage of existing protocols and standards.

The aim of the Metadata Vocabulary project is to develop a common understanding of metadata items used by statisticians for describing statistical concepts and methodologies. The idea stems from the need to improve the consistency and interoperability of metadata repositories managed by national agencies and international organisations. Agreement on a common set of terms that can be used to describe the collection, processing and dissemination of data would still provide the flexibility for each organisation to manipulate these elements to derive a variety of metadata models and dissemination outputs according to its own needs. An agreed-to list of basic metadata elements would simply provide a common unambiguous vocabulary.

The electronic glossary developed in this context is a part of a more comprehensive glossary of statistical terms that contains a large number of specific definitions relating to concepts and variables commonly used in economic and social statistics.

10:00 - 10:30 AM

Break

10:30 - Noon

10:30 - 11:15 AM - Using an Ontology as Generalized Metadata Schema for Access to Distributed Heterogeneous Data Sources
Eduard Hovy
Information Sciences Institute
University of Southern California

This presentation describes the Energy Data Collection (EDC) project. We merged a large general-purpose ontology and a more focused domain model and embedded the result into a system for supporting user access to over 50,000 tables of information about gasoline price and production, obtained from the Energy Information Administration, the Bureau of Labor Statistics, the Census Bureau, and the California Energy Commission. The source data was provided in a variety of formats, including Microsoft Access spreadsheets, pdf and html pages, and raw text files. An important focus of the work was using the merged ontology / domain model as a generalized metadata schema.

11:15 - Noon - Creating and Using Reference models
Reinhard Karge
Run Software AG

 

The paper will discuss the principles of and practical experiences in building reference models for statistical metadata based on a terminology model. It refers to practical experiences in building reference models in the frame of the METANET work and discusses the practical use of reference models for managing and harmonising statistical metadata. The paper shows methods for mapping proprietary metadata models to a reference model. Methodological problems in building reference models and defining mappings between a reference model and a metadata model are discussed. Finally it shows how the reference model can be used for translating metadata between different environments and how IT applications based on a reference model can be build.

 

Noon - 2 PM

Lunch

2:00- 3:30 PM

2:00 - 2:45 PM - Terminology in Statistical Information Integration Tasks: What's the Problem?
Sheila Denn (presenter) Stephanie W. Haas (co-author),
University of North Carolina at Chapel Hill

End users face many challenges when they try to find and use statistical information on federal web sites. Many are caused or exacerbated by the words and terms used in press releases, explanations, and the statistical tables themselves. Encountering an unknown technical statistical term (e.g., "age adjustment"), or a familiar word that actually has a technical meaning (e.g., "full-time employment") may result in overlooked or misinterpreted information. As the Govstat research project envisions the Statistical Knowledge Network, it is clear that users need help in understanding the terms and concepts they will encounter. In a current project, we are exploring how vocabulary can help or hinder tasks requiring integration of statistical information across agencies and sources. In this presentation, I will provide findings from the first round of data collection (conducted in October 2002) and discuss the implications for vocabulary support tools

2:45 - 3:30 PM - Metadata and the Statistical Data Collection Process
Sarah Nusser (presenter), Deborah Reed-Margetan (co-author)
Center for Survey Statistics and Methodology
Iowa State University

Metadata can play a key role during many parts of the data collection process. We define metadata as information that provides context to a user about an object to enable more efficient and informed actions. A "user" may be a data analyst, a computer -assisted data collection program, or a mediator in a computing infrastructure. Objects described by metadata may be a question and its response options, a data record, or the field computing environment in a mobile computing setting. We will illustrate how metadata can facilitate the data collection process using a set of examples, including survey interviewing, data flow to the computer-assisted survey instrument, and accommodation of multiple computing devices in the field. For each of these applications, we will discuss the details of the setting with respect to the metadata and outline the information that would need to be included in a registry for managing metadata.

 

3:30 - 4:00 PM

Break

4:00- 5:30 PM

4:00 - 4:45 PM - Designing Data Collection Instruments with Reusable Metadata for the 2002 Economic Census
William Samples
US Census Bureau

The Census Bureau will offer the option of reporting the 2002 Economic Census by paper questionnaires or a Computerized Self-Administered Questionnaire (CSAQ) for all 3.5 million respondents. The Economic Census uses over 600 variations of the Economic Census questionnaire, depending on industry and classification. To accomplish this goal with existing resources, we constructed an Economic Metadata Repository (EMR) database of question content metadata to be used by a Generalized Instrument Design system (GIDS) to layout, both, paper and electronic variations of these forms, utilizing the same content for each of the sector specific sets of questions. GIDS is the first user of the EMR, which is intended to make metadata reusable across the Economic Census forms design process with significant benefits for the data capture and data dissemination processes. Also constructed was a Data Element Registry (DER) that houses data concepts, value domains, and other attributes for the response areas of a form.

The presentation shows the reasoning behind the development of the EMR and GIDS system, the review and updating functions of the EMR with an Intranet based user interface (EMR UI) and, also, two of the GIDS layout modules. A description of the DER and its interaction with the EMR and GIDS will also be discussed.

4:45 - 5:30 PM - Metadata Used in Statistical Information Integration Tasks Considered from the Users’ Perspectives
Carol A. Hert
Syracuse University

Recently, Green and Kent (2002) wrote: "All these manifestations of metadata give a partial view of data at a particular point in the data life cycle. They offer a selection of information relevant for a certain goal, and present it in a format appropriate for meeting this goal in a specific context." In light of this reality, if we wish metadata to be useful to a particular set of users, we must understand the users’ goals, and the information (i.e., metadata) necessary to support those goals. In the past several years, my colleagues and I have been investigating how "the person on the street" engages with statistical information and related metadata with the intent of developing systems that can provide both in easy to use environments. In a current project, we are exploring how metadata supports tasks requiring integration of statistical information across agencies and sources. In this presentation, I will provide findings from the first round of data collection (conducted in October 2002) and discuss the implications for metadata system design.

 

DAY 5 - Friday, January 24, 2003

Link to Track Descriptions

Statistics Track

8:30 - 10:00 AM

8:30 - 9:15 AM - Implementing Metadata: opportunities and barriers
Joanne Lamb
University of Edinburgh

In recent years that have been significant developments in the theoretical advancement of our understanding of statistical metadata, and there have been a number of examples of the building of statistical metadata systems, both in a research environment and in statistical organisations. However, there remain a number of barriers to the adoption of advanced metadata systems in statistical agencies. These are due to three main reasons: the complexity of the ideas and the systems they are describing; the need for long term investment and commitment to a system with no short term benefits; and the difficulty for users to see any benefit from their investment.

These barriers are due to a tension between two opposing tendencies. On the one hand systems tailored to a single organisations tend to produce specific solutions that are costly to implement and maintain. On the other hand, it is not possible to find off the shelf solutions that will provide a quick and easy solution for an organisation without the skills or resources to develop its own system.

This presentation will explore these issues based on the experiences from a number of research projects funded by the European Union under the EPROS (European Programme for Research in Official Statistics) initiative

9:15 - 10:00 AM - Documenting Data Elements at Statistics Canada: An application of ISO 11179

Paul Johanis
Statistics
Canada

A description of a systematic, standardized method of documenting the data published by a national statistical office.

10:00 - 10:30 AM

Break

10:30 - Noon

Demonstrations session.

Participants are invited to come to see more extended demonstrations of the systems and capabilities described in previous sessions.

End of Open Forum 2003

 

Open Forum 2003 home page