Synthetic Longitudinal Business Database
View Variables
(28 variables)
Last update to metadata:
2016-11-11 11:47:26 (auto-generated)
Document Date:
January 6th, 2014
Codebook prepared by:
Lars Vilhuber
Data prepared by:
United States Department of Commerce, Bureau of the Census
,
Duke University
, and
Cornell University, Labor Dynamics Institute
Principal Investigator(s):
United States Department of Commerce. Bureau of the Census.
,
Internal Revenue Service.
, and
Cornell University. Labor Dynamics Institute.
Please cite this codebook as:
Comprehensive Extensible Data Documentation and Access Repository. Codebook for the
Synthetic LBD Version 2.0 [Codebook file]. Cornell Institute for Social and Economic
Research and Labor Dynamics Institute [distributor]. Cornell University, Ithaca, NY,
2013
Please cite this dataset as:
U.S. Census Bureau. Synthetic Longitudinal Business Database: Version 2.0 [Computer
file]. Washington DC; Cornell University, Synthetic Data Server [distributor], Ithaca,
NY, 2013
In most countries, national statistical agencies do not release establishment-level
business microdata, because doing so represents too large a risk to establishments'
confidentiality. One approach with the potential for overcoming these risks is to
release synthetic data; that is, the released establishment data are simulated from
statistical models designed to mimic the distributions of the undIn most countries, national statistical agencies do not release establishment-level
business microdata, because doing so represents too large a risk to establishments'
confidentiality. One approach with the potential for overcoming these risks is to
release synthetic data; that is, the released establishment data are simulated from
statistical models designed to mimic the distributions of the underlying real microdata.
The Synthetic Longitudinal Business Database (SynLBD) is the synthetic data version
of the Longitudinal Business Database (LBD), an annual economic census of establishments
in the United States comprising more than 20 million records dating back to 1976.
More information is available at https://www.census.gov/ces/dataproducts/synlbd/index.html.
In this codebook, variables are noted as "blanked" if they are available on the confidential
version but have been removed from the synthetic version; "synthetic" if the confidential
values have been synthesized and released on the synthetic version. ... more
synlbd1997c.dta
synlbd1997c.dta
(Incomplete URL provided -
synlbd1997c.dta
)
(
Stata
)
restricted
No description given
released
No description given
The data can only be used on the VirtualRDC Synthetic Data Server http://www.vrdc.cornell.edu/sds/
at Cornell University. While no SynLBD data downloads are permitted at this time,
users do not have to operate behind the Census Bureau firewall to access this server.
In order to access the Synthetic LBD, users should apply for a free account on the
Synthetic Data Server (SDS) housed at the VirtualRDC at Cornell University. Application
forms can be found at https://www.census.gov/ces/dataproducts/synlbd/accesslbd.html.
Application decisions are based solely on feasibility, determined by evaluating whether
the data necessary to conduct the analysis are included on the SynLBD Beta file. Decisions
generally occur within 10 business days.
Additional information: https://www.census.gov/ces/dataproducts/synlbd/accesslbd.html
The SynLBD files have been cleared by the Census Bureau Disclosure Review Board and
IRS for use by individuals wihtout Census Bureau Special Sworn Status and outside
of Census Bureau facilities. Establishments in the SynLBD are fully synthesized using
statistical models, and the SynLBD contains no data from actual establishments. Comparison
at the establishment level shows SynLBD data differ substantially from the actual
data. Modeling preserves variable relationships while protecting establishment identity.
Please use the following language in published work that make use of this dataset:
"The creation of the Synthetic LBD was made possible through NSF Grant #0427889. Access
to the Synthetic LBD was made possible through NSF Grant #1042181." Please also cite
Kinney et al (2011) and use the bibliographic citation for the dataset provided in
this document.
Establishments in the SynLBD are fully synthesized using statistical models, and the
SynLBD contains no data from actual establishments. Comparison at the establishment
level shows SynLBD data differ substantially from the actual data. Modeling preserves
variable relationships while protecting establishment identity. Because the SynLBD
has not been fully validated, relationships between SynLBD variables may not correspond
to the relationships in the underlying confidential microdata. Unless validated, there
is no guarantee results from the SynLBD reflect results from the underlying confidential
data. Researchers are strongly encouraged to request result validation prior to publishing
results based on the SynLBD. Validation occurs as part of an internal Census Bureau
process to improve current beta data products, and is free, as resources permit. (See
https://www.census.gov/ces/dataproducts/synlbd/validatingresults.html)
For questions regarding this data collection, please contact:
ces.synthetic.data.use@census.gov
Sampling from posterior predictive distribution
- https://www.census.gov/ces/pdf/SynLBD_Codebook.pdf
- Kinney, Satkartar K., Jerome P. Reiter, Arnold P. Reznek, Javier Miranda, Ron S. Jarmin
and John M. Abowd. 2011.
CES WP-11-04
In most countries, national statistical agencies do not release establishment-level
business
microdata, because doing so represents too large a risk to establishments' confidentiality.
One approach with the potential for overcoming these risks is to release synthetic
data; that is, the released establishment data are simulated from statistical models
designed to mimic the distributions of the underlying real microdata. In this article,
we describe an application of this strategy to create a public use file for the Longitudinal
Business Database, an annual economic census of establishments in the United States
comprising more than 20 million records dating back to 1976. The U.S. Bureau of the
Census and the Internal Revenue Service recently approved the release of these synthetic
microdata for public use, making the synthetic Longitudinal Business Database the
first-ever business microdata set publicly released in the United States. We describe
how we created the synthetic data, evaluated analytical validity, and assessed disclosure
risk.