selected households -- not a hospital population ◦ Face to face interviews ◦ 5,000 subjects with 3,000 variables per subject ◦ Environmental || Social || Psychological || Biological • We give you data to use for your research. • We collaborate on new research projects. ◦ Vitamin D ◦ Hair cortisol ◦ Sleep quality in children
2. Geographic subdivisions smaller than state 3. All dates 4. Telephone numbers 5. FAX number 6. Email address 7. Social Security number 8. Medical record number 9. Health plan beneficiary number 10. Account number 11. Certificate/license number 12. Vehicle identifiers and serial numbers, including license plate numbers 13. Device identifiers or serial numbers 14. Web URLs 15. IP address 16. Biometric identifiers, including finger or voice prints 17. Full-face photographic images and any comparable images 18. Any other unique identifying number, characteristic, or code http://www.hhs.gov/hipaa/for-professionals/privacy/special-topics/de-identification/index.html#standard
it change? “immutable” slowly Depends: age, gender, illnesses, income … Can we obfuscate it? no A little. But lose a lot of info. A lot. Most info will be preserved. Others give it away? yes yes sometimes “Why are genes different?” from Vitaly Shmatikov
Krishnan R, Padman R, Roehrig SF. Disclosure Limitation Methods and Information Loss for Tabular Data. In: Doyle P, et al., editors. Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies. Amsterdam: North Holland: 2001. pp. 135–166.
and colleagues doing research for hospital records and genes. Li, Y., Jiang, X., Wang, S., Xiong, H., & Ohno-Machado, L. (2016). VERTIcal Grid lOgistic regression (VERTIGO). Journal of the American Medical Informatics Association, 23(3), 570–579. https://doi.org/10.1093/jamia/ocv146