Wednesday, September 13, 2017

Why is “big data” now more useful for healthcare?

Se-Hee Jung, BSN
PhD student, College of Nursing, University of Utah  

§  What is big data?
§  Technology advances for big data
§  How big data is being used in health care
§  Benefits of big data and the future of health care

§  Challenges of big data


Figure 1. Data has revolutionized our world (Schadt, 2012).

1. Introduction
           
Big data is a hot topic in healthcare. Searches for the keyword “big data” began increasing in 2011 (Google Trends). The term “big data” could mean different things to different people. Big data can be used to cover three fundamental characteristics: the size of information, the many kinds of complex information, and the speed at which data are produced (Brown et al., 2016). In detail, the key aspects of big data are volume, variety, velocity, variability, and veracity (5Vs). The definition of big data was developed by data scientists who are creating complex algorithms to deal with big data. Nursing researchers focus on analyzing big data in order to help evidence-based nursing practice. As Wong et al. (2016) suggested, “sample equals to population” is suitable for the definition of big data in nursing.

The adoption of electronic health records (EHR), decreased DNA sequencing cost, and sophisticated database development enabled big data analysis for healthcare. Potential impacts, an example, challenges, and future directions of using a big data approach to healthcare will be discussed in this post.


2. Improvement in technology has led to better EHRs

Prior to the EHR, a patient’s medical records consisted of handwritten notes, typed reports, and test results stored in a paper file system. Beginning in 1991, the Institute of Medicine (IOM) sponsored studies that led to the concepts we have today for electronic health records. The combination of several important studies set in motion economic and political forces that drove the transformation of our EHR systems.

1) Health Safety
In 1997, based on the results of two studies (in New York using 1984 data and in Colorado and Utah using 1992 data) of large samples (over 33.6 million) of hospital admissions, the IOM reported that 44,000 – 98,000 people die in hospitals each year as a result of medical errors. Forty-four percent of the errors were found to be preventable.

2) Health Costs
The IOM also reported in 1999, based on a study by the Center for Information Technology Leadership (CITL), that $44 billion could be saved annually by installing EHR system.

3) Changing Society
The Internet, one of the strongest forces for social change in the past two decades, has affected healthcare. Literally, millions of health-related information is readily available on the web. The increased use of smart phones and other mobile devices to track and store heart rate, respiration rate, blood pressure, and blood glucose levels provide self-generated patient health data.

Reference: Gartee, R. (2016); Kohn, L. T. (2009) 
Table 1.  Non-federal acute care hospital electronic health record adoption. (Office of the National Coordinator for Health Information Technology, 2015)

Since the passage of the American Recovery and Reinvestment Act of 2009 (ARRA), EHR adoption has increased rapidly and a strong emphasis was placed on standards and certification, so researchers could share greater volume of healthcare data. Under ARRA, approximately $27 billion will be spent on EHR incentives over the next several years. In 2015, over 80% of all non-federal acute care hospitals had adopted a Basic EHR with clinician notes. Almost all non-federal acute care hospitals have possession of an EHR certified by the U.S. Department of Health and Human Services (HHS)

Patients related data consist of demographics, insurance status, residential zip code, medical conditions, smoking history, personal identifiers (e.g. National Death Index), etc. Also, hospital generated data contain reason for visit, diagnosis, immunizations, laboratory and other diagnostic tests, procedures, treatments, medications, etc (DeFrances, 2016).

3. Decreased DNA sequencing cost has allowed for more big data genetic analyses.

Decreasing DNA sequencing costs (from about a hundred million in 2001 to a thousand dollars in 2015) enabled the formation of both large National Cancer Institute (NCI) consortia and individual laboratories that focus more on specific cancer research questions. As the amount of DNA sequenced data increases, the need for new paradigms for data storage and analysis becomes significantly more important.

Figure 2. The cost of sequencing a human genome (National Human Genome Research Institute, 2016)

4. Why do we need more sophisticated databases?


We need a unified data system (database) that promotes sharing of genomic and clinical data between researchers. The NCI’s Genomic Data Commons (GDC) provides cancer research data for research communities with a unified and standardized data repository like the left (or above) image. This repository enables data sharing across cancer genomic research studies.

The GDC currently contains data from 2 large-scale programs spanning 38 different cancer types and 14,551 cancer cases, with more data being continually integrated. Certain high level genomic data is open access, meaning no authentication or authorization is necessary for access. Other data is controlled access, requiring primary investigators (PI) to obtain approval to download the data. The GDC also provides several tools for visualizing and analyzing genomics data within the web page.


5. Potential impact of a big data approach to healthcare
Impact of using clinical big data
- Greater volume of data: Obtaining all inpatient, ambulatory, and outpatient visits including self-pay, charity, and prisoners. Also, rare conditions and new procedures should be included.
- More clinical depth and richness: Collecting clinical information without need for medical record abstract*.
*Definition of medical record abstract: “the process in which a human manual data entry effort where organizationally-defined, clinically relevant data elements that are not being electronically converted”.
- More secure and timely data is available and less burden can be on the data provider.
- Big data research makes linkage between hospital setting and other databases (national death index, Medicare and Medicaid claims, etc.) possible.
- Cost-effectiveness: Funding is always tight for research, so using secondary data is a cost-effective way of generating information and answering questions to research questions.

Reference: DeFrances, C. (2016, May 19)

Impact of using genomic big data
- Cost- and time-effectiveness: Improvements in technology have reduced the cost of DNA sequencing and sample preparation. One hundred and ninety-two sequencing libraries can be produced in a single day with a cost of about $15 per sample (Rohland and Reich, 2012).


6. Big data in action:  a personal story
One of the leading health care chief information officers (CIOs) in the world shares a personal situation representing how data and analytics can benefit patients and catalyze positive outcomes in health care delivery. His wife, of Asian descent, was diagnosed with stage IIIA breast cancer in 2011. He and his team queried data from all the Harvard hospitals on treatment and outcomes for the last 10,000 Asian females with a tumor similar to his wife’s. Data revealed a medication that would be most effective for her specific case. The CIO credits his wife’s full recovery to the findings that he was able to draw from this analysis.



7. Challenges of a big data approach to healthcare
Challenges of using clinical big data
- Unstructured data: Experts from International Data Corporation estimated that more than 80 % of health care data were unstructured (Martin-Sanchez and Verspoor, 2014). A large proportion of the unstructured data is text-based. Raw data from an EHR system or other databases often have not been set up or developed for research, so the unstructured big data is limited at analyzing specific questions. Datasets like these require a lot of data management, data manipulation, and a lot of storage.
- Incompatibility among the EHR systems provided by different vendors: There are more than 600 EHR vendors. Many of the vendors provide heavily customized versions of their EHR systems for each client. Therefore, most EHRs are not fully interoperable in the core functions (Gliklich, Dreyer, and Leavy, 2014).

Challenges of using genomic big data
- Reproducibility of genomic research: Genomic data from NCI is already standardized, so we do not need to manipulate the raw data. However, there are some reproducibility issues in analyzing genomic big data.

References: Interviews with Dr. Anne C. Kirchhoff and Dr. Younghee Lee.


8. Future directions of a big data approach to healthcare
Future directions of using clinical big data
- Data standardization and normalization in clinical systems is based on the Health Level Seven International (HL7) implementation guide.
- Functional interoperability among different systems: The goal of functional interoperability is to create and adopt effective open standards, so health information technology (HIT) systems, including some EHRs, have been used to set up registry databases for some time (Gliklich, Dreyer, and Leavy, 2014).

Future directions of using genomic big data
- Constructing regulatory (gene expression) map about epigenetic (environmental) and genetic factors.

References: Interviews with Dr. Anne C. Kirchhoff and Dr. Younghee Lee.


9. Summary  
The adoption of EHR systems and decreased DNA sequencing costs enabled a big data approach to health care. We need unified data systems (databases) that promote sharing of genomic and clinical data between researchers. Databases also make cost- and time-effective data analytics possible. Barriers to using a big data approach to healthcare are missing or unstructured raw data, incompatibility among different database systems, and reproducibility of already published research. Data standardization and functional interoperability among clinical data systems would help to solve the problems of using a big data approach to healthcare.      


10. References
Image and table references
National Human Genome Research Institute. (2016, July 6). The cost of sequencing a human genome. Retrieved August 07, 2017, from https://www.genome.gov/sequencingcosts/

Office of the National Coordinator for Health Information Technology. (2015). Non-federal acute care hospital electronic health record adoption. Retrieved August 03, 2017, from

Schadt, E. E. (2012, November 9). The changing privacy landscape in the era of big data [Digital image]. Retrieved August 9, 2017, from http://msb.embopress.org/content/8/1/612

The NCI Genomic Data Commons [Digital image]. (2016, October 17). Retrieved August 8, 2017, from https://www.cancer.gov/about-nci/organization/ccg/research/computational-genomics/gdc

Link references
Electronic health record. (2017, July 15). Retrieved July 18, 2017, from https://en.wikipedia.org/wiki/Electronic_health_record

Google Trends (n.d.). Retrieved July 18, 2017, from https://trends.google.com/trends/explore?date=all&q=big%20data

Health Level Seven International - Homepage. (n.d.). Retrieved July 18, 2017, from http://www.hl7.org/

National Cancer Institute. (n.d.). Retrieved July 18, 2017, from https://www.cancer.gov/

National Cancer Institute Genomic Data Commons. (n.d.). Retrieved August 8, 2017, from https://gdc.cancer.gov/

National Death Index. (2017, March 07). Retrieved July 18, 2017, from https://www.cdc.gov/nchs/ndi/index.htm

The Institute of Medicine (IOM). (n.d.). Retrieved August 8, 2017, from https://nam.edu/

The U.S. Department of Health and Human Services (HHS). (n.d.). Retrieved August 8, 2017, from https://www.hhs.gov/

Other references
Capella's Master of Science in Analytics program. (n.d.). 4 examples of data analytics in action. Retrieved July 18, 2017, from https://www.capella.edu/rio/it-hub/analytics-careers/big-data-and-analytics/

Brown, J. A., Chonghaile, T. N., Matchett, K. B., Lynam-Lennon, N., & Kiely, P. A. (2016). BigData–Led Cancer Research, Application, and Insights. Cancer Research, 76(21), 6167-6170.

DeFrances, C. (2016, May 19). Electronic Health Records and “Big Data” for Health Care. Retrieved July 18,2017, from https://www.cdc.gov/nchs/data/bsc/bscpres_defrances_may_2016.pdfCenters for Disease Control and Prevention (CDC)

Gartee, R. (2016). Electronic health records: understanding and using computerized medical records. Prentice Hall.

Gliklich, R. E., Dreyer, N. A., & Leavy, M. B. (2014). Interfacing registries with electronic health records.

Kohn, L. T. (2009). To err is human: building a safer health system. Washington, DC: National Academy Press. https://www.ncbi.nlm.nih.gov/books/NBK225187/#rrr00026

Martin-Sanchez, F., & Verspoor, K. (2014). Big Data in Medicine is Driving Big Changes. Yearbook of Medical Informatics9(1), 14–20. http://doi.org/10.15265/IY-2014-0020

Rohland, N., & Reich, D. (2012). Cost-effective, high-throughput DNA sequencing libraries for multiplexed target capture. Genome Research, 22(5), 939–946. http://doi.org/10.1101/gr.128124.111

Wong, H. T., Chiang, V. C. L., Choi, K. S., & Loke, A. Y. (2016). The need for a definition of big data for nursing science: a case study of disaster preparedness. International journal of environmental research and public health13(10), 1015.

No comments:

Post a Comment