/ced2ar-web

You are viewing the official metadata. View crowdsourced contributions.

Stata SAS

This document was generated: November 24, 2024 at 8:39 PM

Synthetic Longitudinal Business Database

View Variables (28 variables)

Last update to metadata: 2016-11-11 11:47:26 (auto-generated)

Document Date: January 6th, 2014

Codebook prepared by: Lars Vilhuber

Data prepared by:

Principal Investigator(s): United States Department of Commerce. Bureau of the Census. , Internal Revenue Service. , and  Cornell University. Labor Dynamics Institute.

Citation

Please cite this codebook as:
Comprehensive Extensible Data Documentation and Access Repository. Codebook for the Synthetic LBD Version 2.0 [Codebook file]. Cornell Institute for Social and Economic Research and Labor Dynamics Institute [distributor]. Cornell University, Ithaca, NY, 2013
Please cite this dataset as:
U.S. Census Bureau. Synthetic Longitudinal Business Database: Version 2.0 [Computer file]. Washington DC; Cornell University, Synthetic Data Server [distributor], Ithaca, NY, 2013

Abstract

In most countries, national statistical agencies do not release establishment-level business microdata, because doing so represents too large a risk to establishments' confidentiality. One approach with the potential for overcoming these risks is to release synthetic data; that is, the released establishment data are simulated from statistical models designed to mimic the distributions of the und ... more

Datasets

synlbd1997c.dta   synlbd1997c.dta (Incomplete URL provided - synlbd1997c.dta )  ( Stata )

Terms of Use

Access Levels

restricted

No description given

released

No description given

Access Restrictions (Default)

The data can only be used on the VirtualRDC Synthetic Data Server http://www.vrdc.cornell.edu/sds/ at Cornell University. While no SynLBD data downloads are permitted at this time, users do not have to operate behind the Census Bureau firewall to access this server.

Access Requirements

In order to access the Synthetic LBD, users should apply for a free account on the Synthetic Data Server (SDS) housed at the VirtualRDC at Cornell University. Application forms can be found at https://www.census.gov/ces/dataproducts/synlbd/accesslbd.html. Application decisions are based solely on feasibility, determined by evaluating whether the data necessary to conduct the analysis are included on the SynLBD Beta file. Decisions generally occur within 10 business days.
Additional information: https://www.census.gov/ces/dataproducts/synlbd/accesslbd.html

Access Permission Requirements

The SynLBD files have been cleared by the Census Bureau Disclosure Review Board and IRS for use by individuals wihtout Census Bureau Special Sworn Status and outside of Census Bureau facilities. Establishments in the SynLBD are fully synthesized using statistical models, and the SynLBD contains no data from actual establishments. Comparison at the establishment level shows SynLBD data differ substantially from the actual data. Modeling preserves variable relationships while protecting establishment identity.

Citation Requirements

Please use the following language in published work that make use of this dataset: "The creation of the Synthetic LBD was made possible through NSF Grant #0427889. Access to the Synthetic LBD was made possible through NSF Grant #1042181." Please also cite Kinney et al (2011) and use the bibliographic citation for the dataset provided in this document.

Disclaimer

Establishments in the SynLBD are fully synthesized using statistical models, and the SynLBD contains no data from actual establishments. Comparison at the establishment level shows SynLBD data differ substantially from the actual data. Modeling preserves variable relationships while protecting establishment identity. Because the SynLBD has not been fully validated, relationships between SynLBD variables may not correspond to the relationships in the underlying confidential microdata. Unless validated, there is no guarantee results from the SynLBD reflect results from the underlying confidential data. Researchers are strongly encouraged to request result validation prior to publishing results based on the SynLBD. Validation occurs as part of an internal Census Bureau process to improve current beta data products, and is free, as resources permit. (See https://www.census.gov/ces/dataproducts/synlbd/validatingresults.html)

Contact

For questions regarding this data collection, please contact: ces.synthetic.data.use@census.gov

Additional Information

Methodology

Sampling from posterior predictive distribution

Related Material

  1. https://www.census.gov/ces/pdf/SynLBD_Codebook.pdf

Related Publications

  1. Kinney, Satkartar K., Jerome P. Reiter, Arnold P. Reznek, Javier Miranda, Ron S. Jarmin and John M. Abowd. 2011. CES WP-11-04 In most countries, national statistical agencies do not release establishment-level business microdata, because doing so represents too large a risk to establishments' confidentiality. One approach with the potential for overcoming these risks is to release synthetic data; that is, the released establishment data are simulated from statistical models designed to mimic the distributions of the underlying real microdata. In this article, we describe an application of this strategy to create a public use file for the Longitudinal Business Database, an annual economic census of establishments in the United States comprising more than 20 million records dating back to 1976. The U.S. Bureau of the Census and the Internal Revenue Service recently approved the release of these synthetic microdata for public use, making the synthetic Longitudinal Business Database the first-ever business microdata set publicly released in the United States. We describe how we created the synthetic data, evaluated analytical validity, and assessed disclosure risk.