Synthetic Data
Pioneer the future of healthcare with synthetic data
Use synthetic data to train your machine learning models faster. Accelerate your efforts to help patients and caregivers without compromising privacy.
Learn More
SAMPLE DATASETS
Explore our datasets
Our synthetic data contains 1,500,000 member records. The sample datasets below are limited to 100 synthetically-derived individual records.
Member
One dataset providing member demographics and coverage details for plan benefit year.

June 25, 2020 (21 KB)
Medical Claims
Three datasets covering inpatient, outpatient, and emergency department claims, diagnosis codes, procedure codes, provider, and cost information for visits the member made.

June 25, 2020 (426 KB)

June 25, 2020 (1413 KB)

June 25, 2020 (207 KB)

June 25, 2020 (602 KB)
Pharmacy
Two datasets detailing pharmaceutical claims, drug codes, and payment information for prescriptions.

June 25, 2020 (55 KB)

June 25, 2020 (204 KB)

June 25, 2020 (59 KB)
Lab
One dataset providing lab tests and corresponding results for the coverage year.

June 25, 2020 (252 KB)
How it works
Fill out your profile
Let us know what organization you’re a part of, your role, and how we can contact you when you log in for the first time.
Describe and submit your project
Give us an overview of your project and how you will be using synthetic data.
A Humana representative will review your project details. While you’re awaiting approval, browse our sample datasets based on 100 synthetic, individual records.
Access approved!
Once your project has been approved, you will have access to all 1,500,000 records of synthetic data for 90 days.
Frequently Asked Questions
Have questions? We’ve answers. If you can’t find what we are looking for, feel free to
get in touch.
Synthetic data, as the name suggests, is data that is artificially created, as opposed to being generated by actual events. It is created with the help of algorithms and used for a broad range of activities, including as test data for new products and tools, for model validation, and in AI model training.
The Humana team used a machine learning engine to train algorithms against real patient data, preserving the different relationships across different health variables while excluding personal identifiers like PHI and PII.
Humana’s synthetic data ultimately provides a realistic representation of individual data over time while maintaining personal privacy.
You can access the 100-record sample of the full synthetic dataset in the data dictionary.
Humana’s synthetic data includes 1,500,000 synthetic member records representative of the 2018 to 2020 calendar years.
Request full access
Like what you see in our sample datasets? Request access to Humana's Azure test environment to start modeling with the full synthetic datasets.