Data Policies and the National Spatial Data Infrastructure

Nancy Tosta
Staff Director
Federal Geographic Data Committee
U.S. Geological Survey
590 National Center
Reston, VA 22092 ntosta@usgs.gov


ABSTRACT

The National Spatial Data Infrastructure (NSDI) is conceived to be an umbrella of policies, practices, standards, organizations, and data that contribute to improved availability and use of high quality geospatial data and technologies. Although the effort to develop the NSDI is being led by the Federal Geographic Data Committee (FGDC), guided by existing Federal policies related to data dissemination, liability, and privacy, the NSDI is envisioned to encompass all data producers, managers, and users in the United States, regardless of organizational affiliation. Various programs begun under the NSDI include development of a national geospatial data clearinghouse, creation of a national framework digital geospatial data set, coordination of various themes of geospatial data such as soils, geology, and wetlands, and development and promulgation of standards for geospatial data collection and management. All of these efforts depend on various partnerships, agreements, and policies to share in the production, management, and use of geospatial data.

1. INTRODUCTION

The expanded base of users of geographic information system (GIS) technologies, coupled with ever more complex environmental, economic, and social questions, has led to increasingly distributed production of digital geospatial2 data, as well as rapidly growing demands for accurate digital data. As recently as two decades ago, few organizations public or private, had the ability to produce analog maps or map-based information. In the United States, a small number of Federal agencies and private sector firms were the primary producers and distributors of geospatial information. With the advent of increasingly less expensive computers and telecommunications, the transition from paper products or from doing nothing spatially, to producing and using digital geospatial data has occurred relatively quickly, often in a few years. Today, thousands of organizations at all levels of government and in private and academic sectors are producing various themes of digital geospatial data at different resolutions, levels of accuracy, and geographic extent. Concurrent with the spread of the technological tools that facilitate producing digital geospatial data has been recognition of the possibilities for and value of spatial analyses to study societal issues. Numerous organizations are realizing that many of their data relate to specific locations and can be integrated and understood by location. Agencies are moving from individual efforts to manipulate and overlay paper maps to entire enterprises built on developing, maintaining, sharing, and integrating digital geospatial data.

City and county governments are developing digital geospatial data to provide emergency response or "911" capability, to automate tax mapping processes, and to assess environmental impacts. At State government levels, digital geospatial data are commonly collected and used for natural resource planning and analysis, for environmental or historical site protection, and for economic analyses. Often State agencies rely initially on digital or analog paper maps produced by Federal agencies, which are subsequently reformatted, integrated, updated, and maintained by the State agency for future applications. At the Federal level, traditional map-production activities are being supplanted by digital geospatial data bases, both vector and raster based, and automated cartography tools that can be used to create traditional map products. Additionally, Federal agencies are developing digital geospatial data about natural resources, toxic waste sites and facilities, land ownership patterns, land cover, highways, street addresses, and off-shore resources.

Understanding the implications of the transition from analog to digital activities often happens after the fact, when the organization realizes that its data management approaches and policies are not adequate for the digital environment. Many organizations are slow to appreciate that digital geospatial data must be managed, maintained, accessed, and distributed quite differently than traditional paper products. Most agencies continue to use paper maps to display the results of analyses or for field data collection, but the reliance on paper maps as the primary means to manage spatial knowledge for decision making is shifting. Most recently, changes in networking capabilities, including who and what can be connected electronically, and the ease of moving information across the Internet, have begun to influence digital geospatial data managers and to raise additional questions about data distribution policies.

Many organizations are also recognizing that it is not possible to collect and maintain all of the digital geospatial data that are needed or useful for conducting their business. Most agencies depend on outside sources for at least part of their digital geospatial data. In some instances, these interdependencies involve data-sharing partnerships, in others, they include more traditional purchase, licensing, or contractual relationships. As organizations have moved to build these partnerships, issues of data ownership, fees, access, copyright, quality, privacy, and liability have become common. In the United States, the current digital geospatial data environment is a morass of conflicting policies.

2. POLICY BACKGROUND

The interaction of data producers and users in the United States is unlike interactions in most other parts of the world. Data are treated differently by the U.S. Federal Government than by almost any other institution anywhere. The Federal Government policies about data access and dissemination are outlined in the Office of Management and Budget (OMB) Circular A-130 (OMB, 1993). This Circular states that "The free flow of information between the government and the public is essential to a democratic society"; "The nation can benefit from government information disseminated both by Federal agencies and by diverse nonfederal parties, including State and local government agencies, educational and other not-for-profit institutions, and for-profit organizations", and "Because the public disclosure of government information is essential to the operation of a democracy, management of Federal information resources should protect the public's right of access to government information." The circular outlines specific policies for disseminating information, including the use of electronic means where "such techniques reduce the burden on the public, increase efficiency of government programs, reduce costs to the government and the public, and/or provide better service to the public." Charges for data shall be set at "a level sufficient to recover the cost of dissemination but no higher."

Another policy of the U.S. Federal Government is stated in the 1976 Copyright Act. Section 105 of this act specifies that copyright protection is not available for any work of the U.S. Government that is prepared by an employee or officer of the government as part of that person's specific duties (U.S. Government Works, 1976). This law does not apply to information or works of the Federal Government that are used within or supplied to other countries. At State and local levels of government in the United States and in most other parts of the world, access to geospatial data is controlled by copyrights, licensing, and cost recovery policies.

There are numerous debates between institutions and individuals at different levels of government in the United States and between the United States and other nations about the potential benefits and costs of providing unconstrained access to digital geospatial data. The issues and arguments are complex and seemingly justifiable from all points of view. In a world where the public sector's data collection resources are increasingly limited, one side argues simplistically that the only way to ensure the availability of accurate, current data is to charge the users for the costs of creating and maintaining the data. The other side often argues that governments collect data to carry out the business of government and usually only develop data sets as a byproduct of their management or assessment missions. (This argument is a bit more complex for public agencies whose primary mission is collecting or compiling data.) These data, which have already been "paid for" with tax payer dollars, should be made accessible for purposes of understanding the decisions of government. When the data are made freely available, more organizations will benefit and ultimately create more data that can be used in all sectors. Issues become more complex when organizations that access government data wish to sell or redistribute those data for a profit. Some individuals argue that governments should be compensated, or that the practice of redistribution should be discouraged through licensing or fee structures.

The complexity of the interactions among institutions, existing policies, technologies, and the economy creates a situation of great uncertainty. Questions about the best approaches to ensuring data availability are essentially unanswerable. The U.S. Government has set policies for digital data access that are based on certain fundamental beliefs about the role of government and citizens' rights. Other institutions develop policies that are based on other factors, such as economics. Given limited resources, how can more data be made available? Should data be freely distributed or should fees be levied? Does controlling access and charging for data improve data quality and availability? Are resources too limited to give data away? As the global information infrastructure contributes to increase the availability of digital data, issues of ownership, intellectual property rights, and data fees will become even more complex.

3. THE FEDERAL GEOGRAPHIC DATA COMMITTEE

In many ways, Federal Government policies of freedom of access to data and unconstrained use are creating a more complex institutional environment with increased needs for coordination. Numerous data sets have been made available, copied, enhanced, and repackaged, but they differ in accuracy, completeness, currency, and cost. Users often have difficulty finding and sorting through data to determine what might best meet their needs. Data producers may duplicate each others' efforts, with tax payers funding data development by different agencies for the same piece of geography. Better means of collecting, organizing, finding, and sharing data are required to maximize investments being made in geospatial data.

In October 1990, the OMB issued a policy document directing the Federal Government to change its approach to geospatial data activities and to provide leadership to the Nation in the coordination of geospatial data (OMB, 1990). Circular A-16: Coordination of Surveying, Mapping, and Related Spatial Data Activities assigns responsibility for certain themes of geospatial data to specific Federal agencies and charges them to "coordinate multi-agency interest, including, the facilitation of exchange of information and transfer of data; the establishment and implementation of standards for quality, content, and transferability; and the coordination of the collection of spatial data to minimize duplication of effort where practicable and economical." The circular also establishes the Federal Geographic Data Committee (FGDC) to provide guidance and oversight to the agencies. The primary objective of the circular is the "eventual development of a national digital spatial information resource, with the involvement of Federal, State, and local governments, and the private sector." This resource has come to be called the National Spatial Data Infrastructure (NSDI).

The FGDC is made up of a steering committee of senior policy representatives from 14 different Federal departments and independent agencies (fig. 1), and of more than three hundred individuals representing a variety of agencies on a series of subcommittees and working groups dealing with various aspects of geospatial data coordination. These include thematic subcommittees (such as those for vegetation, soils, and wetlands) and coordination, standards, framework, archives, and clearinghouse working groups.

Figure 1

The FGDC has begun many activities related to the development and evolution of the NSDI described in the next section. The FGDC has also grappled with the issue of policies that may affect the quality and availability of geospatial data. Obviously, the FGDC and its member agencies are governed by general Federal polices implemented by the OMB. However, the FGDC, in conjunction with the OMB, is also in a position to influence Federal, as well as national, perspectives on data sharing and access. In February 1993, the FGDC adopted a set of statements that described the Federal perspective on data sharing (fig. 2). These statements were modeled after existing Federal policies and were agreed to by all FGDC member agencies.

________________________________________________________________________

POLICY STATEMENTS FOR FEDERAL GEOGRAPHIC DATA SHARING

The overall purpose of these policy statements is to facilitate full and open access to Federal geographic data by Federal users and the general public. They were prepared in consonance with the goals of the Federal Geographic Data Committee, Office of Management and Budget Circular A-16, the Data Management for Global Change Research Policy Statements, and the proposed revision of Office of Management and Budget Circular A-130. As such, they represent the U.S. Government's position on access to Federal geographic data.

Geographic data that are created, collected, processed, disseminated, and stored by the Federal Government are a valuable national resource. The Federal Government serves as a steward of this resource, shall exercise information resource management with special emphasis on the information life cycle, and shall ensure the effective and economical development of the Nation's spatial data infrastructure.

________________________________________________________________________

Figure 2. Policy Statements for Federal Geographic Data Sharing.

4. OVERVIEW OF THE NATIONAL SPATIAL DATA INFRASTRUCTURE

The NSDI is conceived to be an umbrella of policies, procedures, standards, technologies, organizations, and data that contribute to more efficient use of geospatial data. The NSDI provides a forum or context for discussions to promote means to produce and enhance access to high(er) quality geospatial data at lower public cost. In September 1993, the Clinton-Gore Administration identified the development of the NSDI as one of the initiatives in the National Performance Review (Gore, 1993). This recognition that the NSDI could foster partnerships and alliances among different levels of government was reinforced by Executive Order (E.O.) #12906 - Coordinating Geographic Data Acquisition and Access: The National Spatial Data Infrastructure in April 1994 (Clinton, 1994).

The E.O. represents a policy statement of the Clinton Administration about developing the NSDI. The E.O. stresses four major areas of NSDI activity. These include creating the National Geospatial Data Clearinghouse to disseminate and search for data, identifying and developing standards to facilitate data sharing and coordination in the evolution of the NSDI, developing a framework of basic digital geospatial data from which other data can be derived or to which other data might be registered, and building partnerships to accomplish these efforts. Agencies are directed to participate in these activities in a variety of ways, within designated timeframes. These activities are not mutually exclusive.

4.1 National Geospatial Data Clearinghouse

The clearinghouse is an effort to use the existing and rapidly expanding information infrastructure to facilitate access to geospatial data. The clearinghouse is distributed in nature, meaning there is not a centralized "warehouse" of geospatial data. Thousands, and potentially millions, of data holdings will be cataloged and accessed across electronic networks.

There are three elements in the operation of the clearinghouse. The first is the FGDC geospatial metadata standard. Data producers nationwide, at all levels of government and within the private sector, will be encouraged to document or describe, with the geospatial metadata standard, any data they produce (Federal agencies are required to document all new geospatial data by using the standard). This documentation will then be made electronically accessible on the Internet. The Internet is the second element. It has become the most pervasive electronic network in the world, with an estimated 30 million users, and growth rates of 15 percent per month. This network of networks will continue to evolve and change as the private sector plays a larger role in the information infrastructure. Federal agencies, as well as other spatial data producers, are being encouraged to establish Internet connections. The third element of the clearinghouse is the use of software tools for searching and querying data on the network. The FGDC has been testing Wide Area Information Servers (WAIS) software, which is built on a library standard (Z39.50) for searching and querying. WAIS was originally designed to conduct text searches, but has been enhanced to enable geographic searches that are based on specifying the coordinates that cover an area of interest. New tools are continually being developed and offered to the public over the Internet. Recently Mosaic, a tool for use in the World Wide Web that provides the ability to build hypertext links between sources of data, has gained great popularity as a user interface for "surfing" the Internet. These three elements, metadata, the Internet, and searching tools together provide the ability to document, serve, search for, browse, and access geospatial data that vary in scope, context, format, detail, and location.

In 1994, training and workshops about using the metadata standard, serving data, and using search and query tools are being offered by the FGDC. The success of the clearinghouse depends on individual participation. Telephones would not be considered particularly useful or successful if few people used them. The clearinghouse depends on thousands of individuals modifying their behavior to more effectively manage their data, and doing so in a coordinated manner. Most of the data sharing policies, adopted by the FGDC and described in figure 2, are critical to the development of the clearinghouse. Public agencies must specifically acknowledge the broader public value of ensuring access to information, must train their employees in approaches to doing this, must develop policies that ensure fair and equitable access to all users of information, and must adopt a proactive approach to information management to meet changing demands of technology and society.

Under the NSDI, developing a clearinghouse that allows producers to "advertise" the availability of data, and users to search for, access, and use these data depends on policies that promote easy, equitable, and minimally constrained access and use of data. Policies that establish high costs or limit the use of data through licenses and restrictions are likely to impede the sharing and integration of information. High costs and use restrictions will often result in agencies deciding to re-create a data set on their own, possibly using lower quality source material. Although the overall cost to the agency may be similar to or in some cases even exceed the purchase price of a commercially available data set, agencies will be free to manage the data without use restrictions, including making the data accessible to others. The primary activity of the clearinghouse is to contribute to the ability to find and integrate data from diverse sources, over any geographic area, with the specific goals of minimizing redundancy in data collection and facilitating data integration through use of consistent data sets (see Framework discussion below). Data for sale can contribute to both of these goals; however, high-priced, restricted-use data do not encourage data sharing.

4.2 Geospatial Data Standards

There are multiple levels of standards under development within the FGDC. The Standards Working Group developed the "metadata" standard, which provides a consistent approach to describing and documenting digital geospatial data. The FGDC also endorsed the Spatial Data Transfer Standard (SDTS), which allows the transfer of digital data between unlike GIS software packages without loss of content. The various subcommittees of the FGDC are working on data content standards that will help classify, collect, and represent specific themes of data, such as elevation, soils, wetlands, cadastral, and transportation data, in ways that are most useful to the broadest community. These standards will not address every detail that might be useful in specific applications of the data, but they will relate to data characteristics that are universally required and commonly collected by many users. The goal is to facilitate the process of sharing data by ensuring that data collected by any entity will meet certain minimal criteria. These standards, in effect, become statements of policy specific to data collection and management activities.

4.3 Framework Data

The fact that many users have common needs for digital geospatial data sets is germane to the concept of framework data. As spatial analyses are conducted, numerous users depend on certain representations of geography, either to reference the collection or display of other data or as variables for analyses. Although many users may have interests in data sets that are relevant for specific applications, they often expend large amounts of time trying to find, collect, and organize basic framework information about roads or boundaries, for example. The FGDC established a Framework Working Group to begin to address the issues associated with developing these commonly required data sets. The working group defined framework data to include geodetic control and orthoimagery, transportation, hydrography, elevation (both onshore and offshore), governmental or administrative units, and cadastral data.

The U.S. Geological Survey (USGS) products, such as 1:24,000-scale quadrangle maps, have formed a foundation in the analog world for many geospatial data collection activities and analyses. Similarly, certain sets of digital data can form a framework for GIS analyses. If standardized digital geospatial data at 1:24,000 scale (roughly an accuracy of + 12 meters) were complete for the Nation, a framework would exist on which many agencies could register other data sets. However, local governments and utilities commonly require higher resolution data. Complete national data bases are only available at less accurate resolutions and are not current. Two examples are the Census Bureau's 1:100,000-scale topologically integrated geographic encoding and referencing (TIGER) data, with an accuracy of + 50 meters, and the USGS 1:100,000-scale digital line graph (DLG) data base At the present rates of public investment for digitizing geospatial data, a national data base at a resolution and complexity comparable to the 1:24,000-scale quadrangles will not exist for decades.

Today, however, given the lack of dependable, consistent data, or the lack of knowledge about the existence of data, many users (perhaps thousands) are independently digitizing or capturing data at various resolutions. This is being done in ad hoc, nonstandardized, one shot approaches. Numerous disparate organizations are producing high-resolution data (accuracies better than those of 1:24,000-scale maps) for small geographic areas, and others are digitally encoding 1:24,000-scale maps using a variety of approaches and standards. Still other users are working with new technologies, such as Global Positioning Systems (GPS) and digital orthoimagery (both satellite and photography), in an effort to more currently and accurately represent geography. The concept of a national digital framework data set is based on the thought that a number of data production efforts are under way and that perhaps users do not require exactly the same data everywhere. If independent data collection efforts were to build on known and relatively consistent standards or guidelines that allowed data to be connected across data production regions, the resulting product would be a national set of digital data that might vary somewhat by region; however, it would be current, as accurate as was needed or could be funded for any piece of geography, and consistent with some minimal set of standards that would make the data maximally useful to other organizations. Assuming these data meet some minimal level of standards, if they could be identified, made accessible, and linked together, major steps might be taken toward developing a framework.

The FGDC Framework Working Group has discussed this concept for the past year and developed an initial approach that will be tested in a series of pilot studies over the next year. These discussions could lead to various agencies working to produce minimum versions of certain digital geospatial data sets as fast as possible. These data sets would be of a nature and resolution required by most people in a given geographic area. Basic spatial representations of these themes would form the framework, such as delineations of transportation "pathways." More details about these "pathways" - for example, whether a pathway is a dirt road or a 10-lane highway or a railroad - might be derived from other sources and referenced to the framework. The "best" spatial data available over a geographic area would be designated framework. Best is defined by positional accuracy and appropriate attributions. Other data at lower resolutions might be designated framework if they are derived, when possible, from the best data available. The goal is create a standardized, "trusted" geospatial framework within a short time, to eliminate redundancy and duplication in data collection efforts, and to minimize the difficulties of data integration through some common spatial denominators.

The concept of establishing a framework digital geospatial data set derived from a multitude of sources depends on the ability to freely access and easily integrate data. Policies that in any way inhibit data availability may detract from the ability to use the most appropriate data for a given piece of geography. The goal is to provide a base that is then used by all data collectors within that geographic area to facilitate data integration. High priced data sets with use restrictions, even though they may be highly accurate, will not encourage shared use or agency interaction. Federal, State, and local governments, as well as private companies producing data that meet framework criteria, are encouraged to contribute their basic spatial representations and minimal attribution to the framework for adoption by all others as a base for subsequent data collection. This effort may be difficult to justify in agencies that have full cost-recovery programs for data.

4.4 NSDI Partnerships

Underlying all of these components of the NSDI is the need for creative partnerships. The NSDI will build on existing and yet-to-be formed institutions and relationships to facilitate the ability of the geospatial data community to share information and to develop and maintain standard data sets. Progress and success of the NSDI will depend on participation from all levels of government, the private sector, and academia from all regions of the country. The distribution of technology, decreasing budgets, and increasingly interrelated responsibilities are creating interdependencies that lead to new forms of organizations defined by relationships rather than traditional institutional boundaries. Policies must be developed to minimize the constraints of contracts and memoranda-of-understanding and to facilitate agency and sector interactions. Many of these partnerships may be short term, addressing a specific need, and continually evolving. Many may be based on linkages on the Internet or in cyberspace rather than the more traditional person-in-your-office or delivery-of-goods models. Interagency agreement mechanisms must recognize and promote these types of interactions.

5. SUMMARY

The NSDI is conceived to be a means to assist in coordinating geospatial data activities among many levels of government and institutions within the United States. Many other nations are developing similar concepts, and it is only a matter of time before the Global Spatial Data Infrastructure (GSDI) is common terminology. Organizational, national, and global policies will have profound effects on the evolution of the NSDI and the GSDI. The FGDC concept of the NSDI assumes that data represent significant costs in developing and using geospatial technologies, that data produced by public agencies are a public resource and should be made accessible to all at minimal or no cost, and that this approach will facilitate the production of, access to, and use of quality digital geospatial data. Inherent in this view is the understanding that many of the benefits of data sharing may be secondary and difficult to identify. All of the institutions and organizations that are expected to participate in the NSDI do not share these views. The FGDC believes that these policies are critical to the success of the NSDI, but the truth is that the hypothesis is untested.

Will more data be made available if agencies operate on a cost-recovery basis? Who supplies data for environmental or social decisionmaking that serves the "public good"? Where are the lines between public and private sector responsibilities and incentives for data production? Who is liable for the quality of a geospatial data set that is maintained multiple agencies and made freely available on the Internet? Does copyright protection encourage data production and sharing? If public agency budgets for data development are decreased, will private sector data provide the basis for public decisionmaking?

Although NSDI does not provide answers to these questions, it does provide a forum for discussion. Only through continued debate and experimentation can the effects of specific policies on the availability and quality of data and on the subsequent quality of decisionmaking be tested. Current policies of the Federal Government have provided an environment within which to do this. Current Federal policies are based on Jeffersonian principles of democracy, and less rigidly on economic arguments. There are those who would argue, however, that low cost data have great net societal benefit. Although the basis of future policies may change, the effectiveness of any policy is likely to be determined by a number of critical factors that transcend the specific issue of geospatial data. Does the policy take into account the nature of technological change? Does the policy encourage change and innovation? Does the policy encourage sustainability in the broadest sense? If the NSDI can contribute to the development of and be guided by policies that meet these criteria, it will at least have served part of its function.


6. REFERENCES

Clinton, W.J., 1994. Executive Order 12906. Coordinating Geographic Data Acquisition and Access: The National Spatial Data Infrastructure. April 11.

Gore, A., 1993, From Red Tape to Results, Creating a Government that Works Better and Costs Less. Report of the National Performance Review, Washington, D.C., September 11.

Office of Management and Budget, 1993. Circular A-130: Management of Federal Information Resources. June 25.

Office of Management and Budget, 1990. Circular A-16: Coordination of Surveying, Mapping, and Related Spatial Data Activities. October 19.

U.S. Government Works, 1976. The House Report on the Copyright Act of 1976, Section 105. 122 Congressional Record. H 10727-8, September 21.

7. ENDNOTES

1 Any use of trade, product, or firm names is for descriptive purposes only and does not imply endorsement by the U.S. Government.

2 "Geospatial" data are defined as data that identify the geographic location and characteristics of natural or constructed features and boundaries on the earth. This information may be derived from, among other things, remote sensing, mapping, and surveying technologies (Clinton, 1994).