SIPP Synthetic Beta v6
View Variables
(121 variables)
Last update to metadata:
2015-10-26 13:17:35 (upload date)
Document Date:
January 14, 2015
Codebook prepared by:
Cornell NSF-Census Research Network
Data prepared by:
United States Department of Commerce. Bureau of the Census.
Principal Investigator(s):
United States Department of Commerce. Bureau of the Census.
,
Social Security Administration.
,
Internal Revenue Service.
, and
Cornell University. Labor Dynamics Institute.
Please cite this codebook as:
Comprehensive Extensible Data Documentation and Access Repository. Codebook for the
SIPP Synthetic Beta 6.0 [Codebook file]. Cornell Institute for Social and Economic
Research and Labor Dynamics Institute [distributor]. Cornell University, Ithaca, NY,
2015
Please cite this dataset as:
U.S. Census Bureau. SIPP Synthetic Beta: Version 6.0 [Computer file]. Washington DC;
Cornell University, Synthetic Data Server [distributor], Ithaca, NY, 2015
The SIPP Synthetic Beta (SSB) is a Census Bureau product that integrates person-level
micro-data from a household
survey with administrative tax and benefit data. These data link respondents from
the Survey of Income and Program Participation
(SIPP) to Social Security Administration (SSA)/Internal Revenue Service (IRS) Form
W-2 records and SSA records of retirement and
disability benefit reThe SIPP Synthetic Beta (SSB) is a Census Bureau product that integrates person-level
micro-data from a household
survey with administrative tax and benefit data. These data link respondents from
the Survey of Income and Program Participation
(SIPP) to Social Security Administration (SSA)/Internal Revenue Service (IRS) Form
W-2 records and SSA records of retirement and
disability benefit receipt, and were produced by Census Bureau staff economists and
statisticians in collaboration with
researchers at Cornell University, the SSA and the IRS. The purpose of the SSB is
to provide access to linked data that
are usually not publicly available due to confidentiality concerns. To overcome these
concerns, Census has synthesized,
or modeled, all the variables in a way that changes the record of each individual
in a manner designed to preserve the
underlying covariate relationships between the variables.
The only variables that were not altered by the synthesis process and still contain
their original values are gender and a
link to the first reported marital partner in the survey. Seven SIPP panels (1990,
1991, 1992, 1993, 1996, 2001, 2004)
form the basis for the SSB, with a large subset of variables available across all
the panels selected for inclusion and
harmonization across the years. Administrative data were added and some editing was
done to correct for logical
inconsistencies in the IRS/SSA earnings and benefits data. ... more
released
The data can only be used on the VirtualRDC Synthetic Data Server at Cornell University.
While no SSB data downloads are permitted at this time, users do not have to operate
behind the Census Bureau firewall to access this server.
restricted
No description given
The data can only be used on the VirtualRDC Synthetic Data Server at Cornell University.
While no SSB data downloads are permitted at this time, users do not have to operate
behind the Census Bureau firewall to access this server.
Researchers interested in using the SSB can submit an application to the Census Bureau.
The application form and instructions can be downloaded from
http://www.census.gov/programs-surveys/sipp/methodology/sipp-synthetic-beta-data-product.html. Applications will be judged solely of feasibility
of the proposed project (i.e., that the necessary variables are available on the SSB).
Once an application has been accepted, the new user will be given an account on a
server where the data can be accessed and analyzed.
Additional information: http://www.census.gov/programs-surveys/sipp/methodology/sipp-synthetic-beta-data-product.html
The SSB files have been cleared by the Census Bureau Disclosure Review Board, SSA,
and IRS
for use by individuals without Census Bureau Special Sworn Status and outside of Census
Bureau facilities.
We request that researchers who publish results from analyses done using these data
cite the SSB as their data source and acknowledge the use of the SDS server at Cornell
and the support of Census staff in running any validation programs. These citations
will help ensure continued funding for the SDS server and the creation of the Gold
Standard File and the SSB.
Suggested acknowledgement:
This analysis was first performed using the SIPP Synthetic Beta (SSB) on the Synthetic
Data Server housed at Cornell University which is funded by NSF Grant #SES-1042181.
These data are public use and may be accessed by researchers outside secure Census
facilities. For more information, visit http://www.census.gov/sipp/synth_data.html.
Final results for this paper were obtained from a validation analysis conducted by
Census Bureau staff using the SIPP Completed Gold Standard Files and the programs
written by this author and originally run on the SSB. The validation analysis does
not imply endorsement by the Census Bureau of any methods, results, opinions, or views
presented in this paper.
The data synthesis process employed by Census to protect the linked data from the
risk of disclosing the identity of individuals
is relatively new and substantially changes both the survey and administrative data.
The intent of the modeling done as part of the synthesis
is to preserve relationships among variables that are of interest to researchers while
ensuring that personally identifiable information is
not revealed to the data user. It has not been feasible to ensure accuracy by comparing
every relationship among SSB variables with the
corresponding relationship in the underlying confidential micro-data. Hence, we strongly
urge researchers not to publish results produced
from the SSB without first requesting that Census validate these results with confidential
data housed in a secure environment at the
Census Bureau. Census will perform this validation free of charge to researchers,
as resources permit and according to the protocol
established by the three agencies involved and outlined below. Without validation
of results, Census, SSA, and IRS make no guarantee
of the validity of the SSB for any research purpose. See
http://www.census.gov/programs-surveys/sipp/methodology/sipp-synthetic-beta-data-product.html for validation conditions.
For questions regarding this data collection, please contact:
sehsd.synthetic.data.use.list@census.gov
-
Using SSB:
The GSF and Completed Data implicates contain personally identifiable
information protected by Titles 13, 26, and 42 and cannot be accessed without
Census Bureau Special Sworn Status nor outside of Census Bureau facilities.
The SSB files, however, have been cleared by the Census Bureau Disclosure
Review Board, SSA, and IRS for use by individuals without Census Bureau Special
Sworn Status and outside of Census Bureau facilities.
Researchers interested in using the SSB can submit an application to the Census
Bureau. The application form and instructions can be downloaded from http://www.census.gov/programs-surveys/sipp/methodology/sipp-synthetic-beta-data-product.html.
Applications will be judged solely on
feasability of the proposed project (i.e., that the necessary variables are
available on the SSB). Once an application has been accepted, the new user will
be given an account on a server where the data can be accessed and analyzed.
While no SSB data downloads are permitted at this time, users do not have to
operate behind the Census Bureau firewall to access this server.
The SSB is designed to be analytically valid in that sense that point
estimates should be unbiased and estimated variances should lead to inferences
similar to those that would be drawn from an identical analysis on the Completed
Data implicates. Initial tests of analytic validity of the SSB have been
promising. All SSB users are invited to help further test the analytic validity
of the SSB by submitting programs used to analyze the SSB to be run on the
Completed Data and/or Gold Standard files. Users need only inform Census Bureau
staff of the location on the server of such programs and work with Census Bureau
staff to ensure that the programs run without error. Census Bureau staff will
run the programs on the confidential data and release to the user resulting
output that are cleared for release by the Census Bureau Disclosure Review
Board. In order to evaluate the effects of the data synthesis separate from the
effect of imputing missing data, comparisons should be made between results from
the SSB and the Completed Data. To evaluate the effects of missing data
imputation, comparisons should be made between results from the Completed Data
and the Gold Standard.
- When analyzing the SSB, users should account for the multiple imputation aspect
of the SSB by averaging statistics of interests across all sixteen implicates.
Variance measures should be created following the appropriate multiple
imputation formulae as described in the document Using the SIPP Synthetic
Beta for Analysis.
-
Protocol for Validation of Results:
Census will validate results obtained from the SSB on the internal, confidential version
of these data (Completed Gold Standard Files). Users who wish to obtain validated
results should follow the protocol outlined here.
The restricted access site will provide SAS and Stata analysis software and a computing
environment similar to the one used to analyze the confidential Completed Gold Standard
data on Census Bureau internal computers. Researchers should follow the Census Bureau
programming requirements described in SSB Validation Request Guidelines to ensure
that the programs will successfully transfer to internal Census computers for validation.
Researchers should plan to share their results and programs from the synthetic data
analysis with Census, ORES/SSA and SOI/IRS.
After programs have successfully run without error on the synthetic data, researchers
may request that Census run these programs on the Completed Gold Standard Files. Only
programs successfully run without error on the SDS will be eligible to be run on the
confidential data by Census staff. Any programs that produce errors on the Completed
Gold Standard Files will be returned to users for correction.
Once an analysis has been repeated on the Completed Gold Standard File, the results
will be reviewed by Census staff for disclosure concerns. Researchers should familiarize
themselves with standard Census disclosure rules for outside projects (See the RDC Researcher Handbook here) and should fill out the appropriate memo documenting the requested output (see
RDC Disclosure Request Memo). Data products and output approved by Census staff will be released to the users,
ORES/SSA, and SOI/IRS.
The validation process can be accomplished in as little as one week for simple results
that are generated by clean code and have no disclosure issues. However if the code
does not run properly, the sample sizes are too small, or the researcher does not
accurately fill out the disclosure memo, the process can take much longer. Census
makes no guarantee on the length of time between submission of programs and the release
of results from the confidential data.
For more information about the validation process, including advice on how to make
the process go smoothly and quickly, please see SSB Validation Request Guidelines.