Se-Hee Jung, BSN
PhD student, College of Nursing, University of
Utah
§ What
is big data?
§ Technology
advances for big data
§ How big data
is being used in health care
§ Benefits
of big data and the future of health
care
§ Challenges
of big data
1. Introduction
Big data is a hot topic in
healthcare. Searches for the keyword “big data” began increasing in 2011 (Google
Trends). The term “big data” could mean
different things to different people. Big data can be used to cover
three fundamental characteristics: the size of information, the many kinds of
complex information, and the speed at which data are produced (Brown et al.,
2016). In detail, the key aspects of big
data are volume, variety, velocity, variability, and veracity (5Vs). The
definition of big data was developed by data scientists who are creating
complex algorithms to deal with big data.
Nursing researchers focus on analyzing big
data in order to help evidence-based nursing practice. As Wong et al.
(2016) suggested, “sample equals to population” is suitable for the definition
of big data in nursing.
The adoption of electronic health
records (EHR), decreased DNA sequencing cost, and sophisticated database
development enabled big data analysis
for healthcare. Potential impacts, an example, challenges, and future
directions of using a big data
approach to healthcare will be discussed in this post.
2. Improvement in technology has led to better EHRs
Prior to the EHR, a patient’s medical
records consisted of handwritten notes, typed reports, and test results stored
in a paper file system. Beginning in 1991, the
Institute of Medicine (IOM) sponsored studies that led to the concepts we
have today for electronic health records. The combination of several important
studies set in motion economic and political forces that drove the
transformation of our EHR systems.
1) Health Safety
In 1997, based on the results of two
studies (in New York using 1984 data and in Colorado and Utah using 1992 data)
of large samples (over 33.6 million) of hospital admissions, the IOM reported
that 44,000 – 98,000 people die in hospitals each year as a result of medical
errors. Forty-four percent of the errors were found to be preventable.
2) Health Costs
The IOM also reported in 1999, based
on a study by the Center for Information Technology Leadership (CITL), that $44
billion could be saved annually by installing EHR system.
3) Changing Society
The Internet, one of the strongest
forces for social change in the past two decades, has affected healthcare. Literally,
millions of health-related information is readily available on the web. The
increased use of smart phones and other mobile devices to track and store heart
rate, respiration rate, blood pressure, and blood glucose levels provide
self-generated patient health data.
Reference: Gartee, R. (2016); Kohn, L. T. (2009)
Table 1. Non-federal
acute care hospital electronic health record adoption. (Office of the
National Coordinator for Health Information Technology, 2015)
Since the passage of the American
Recovery and Reinvestment Act of 2009 (ARRA), EHR adoption has increased
rapidly and a strong emphasis was placed on standards and certification, so
researchers could share greater volume of healthcare data. Under ARRA,
approximately $27 billion will be spent on EHR incentives over the next several
years. In 2015, over 80% of all non-federal acute care hospitals had adopted a
Basic EHR with clinician notes. Almost all non-federal acute care hospitals
have possession of an EHR certified by the U.S.
Department of Health and Human Services (HHS).
Patients related data
consist of demographics, insurance status, residential zip code, medical
conditions, smoking history, personal identifiers (e.g. National Death Index), etc.
Also, hospital generated data contain reason for visit, diagnosis,
immunizations, laboratory and other diagnostic tests, procedures, treatments,
medications, etc (DeFrances, 2016).
3. Decreased DNA sequencing cost has allowed for more big data genetic analyses.
Decreasing DNA sequencing costs
(from about a hundred million in 2001 to a thousand dollars in 2015) enabled
the formation of both large National Cancer
Institute (NCI) consortia and individual laboratories that focus more on
specific cancer research questions. As the amount of DNA sequenced data
increases, the need for new paradigms for data storage and analysis becomes
significantly more important.
Figure 2.
The cost of sequencing a
human genome (National Human Genome Research
Institute, 2016)
4. Why do we need more sophisticated databases?
Figure 3.
The
National Cancer Institute Genomic Data Commons (2016)
We need a unified data
system (database) that promotes sharing of genomic and clinical data between
researchers. The NCI’s Genomic Data Commons (GDC) provides cancer research data
for research communities with a unified and standardized data repository like
the left (or above) image. This repository enables data sharing across cancer
genomic research studies.
The GDC currently
contains data from 2 large-scale programs spanning 38 different cancer types
and 14,551 cancer cases, with more data being continually integrated. Certain
high level genomic data is open access, meaning no authentication or
authorization is necessary for access. Other data is controlled access,
requiring primary investigators (PI) to obtain approval to download the data.
The GDC also provides several tools for visualizing and analyzing genomics data
within the web page.
5.
Potential impact of a big data
approach to healthcare
Impact of using clinical big data
- Greater volume of data: Obtaining all inpatient,
ambulatory, and outpatient visits including self-pay, charity, and prisoners.
Also, rare conditions and new procedures should be included.
- More clinical depth and richness: Collecting clinical
information without need for medical record abstract*.
*Definition of medical record abstract: “the process in which
a human manual data entry effort where organizationally-defined, clinically
relevant data elements that are not being electronically converted”.
- More secure and
timely data is available and less burden can be on the data provider.
- Big data research
makes linkage between hospital setting
and other databases (national death index, Medicare and Medicaid claims,
etc.) possible.
- Cost-effectiveness: Funding is always tight for
research, so using secondary data is a cost-effective way of generating
information and answering questions to research questions.
Reference: DeFrances, C. (2016, May 19)
Impact of using genomic big data
- Cost- and
time-effectiveness: Improvements in
technology have reduced the cost of DNA sequencing and sample preparation. One
hundred and ninety-two sequencing libraries can be produced in a single day
with a cost of about $15 per sample (Rohland and Reich, 2012).
6. Big data in action: a personal story
One of the leading health care chief
information officers (CIOs) in the world shares a personal situation
representing how data and analytics can benefit patients and catalyze positive
outcomes in health care delivery. His wife, of Asian descent, was diagnosed
with stage IIIA breast cancer in 2011. He and his team queried data from all
the Harvard hospitals on treatment and outcomes for the last 10,000 Asian
females with a tumor similar to his wife’s. Data revealed a medication that
would be most effective for her specific case. The CIO credits his wife’s full
recovery to the findings that he was able to draw from this analysis.
Reference: 4 examples of data analytics in action. (Capella's Master
of Science in Analytics program, n.d.).
7.
Challenges of a big data approach
to healthcare
Challenges of using clinical big data
- Unstructured data: Experts from International Data
Corporation estimated that more than 80 % of health care data were unstructured
(Martin-Sanchez and Verspoor, 2014). A large proportion of the unstructured
data is text-based. Raw data from an EHR system or other databases often have
not been set up or developed for research, so the unstructured big data is limited at analyzing specific
questions. Datasets like these require a lot of data management, data
manipulation, and a lot of storage.
- Incompatibility among the EHR systems provided by
different vendors: There are more than 600 EHR vendors. Many of the vendors
provide heavily customized versions of their EHR systems for each client.
Therefore, most EHRs are not fully interoperable in the core functions
(Gliklich, Dreyer, and Leavy, 2014).
Challenges of using genomic big data
- Reproducibility of genomic research: Genomic data
from NCI is already standardized, so we do not need to manipulate the raw data.
However, there are some reproducibility issues in analyzing genomic big data.
References:
Interviews with Dr. Anne C. Kirchhoff and Dr. Younghee Lee.
8.
Future directions of a big data approach
to healthcare
Future directions of using clinical big data
- Data standardization and normalization in clinical
systems is based on the Health Level Seven
International (HL7) implementation guide.
- Functional interoperability among different systems:
The goal of functional interoperability is to create and adopt effective
open standards, so health information technology (HIT) systems, including some
EHRs, have been used to set up registry databases for some time (Gliklich,
Dreyer, and Leavy, 2014).
Future directions of using genomic big data
- Constructing regulatory (gene expression) map about
epigenetic (environmental) and genetic factors.
References:
Interviews with Dr. Anne C. Kirchhoff and Dr. Younghee Lee.
9. Summary
The adoption of EHR
systems and decreased DNA sequencing costs enabled a big data approach to health care. We need unified data systems
(databases) that promote sharing of genomic and clinical data between
researchers. Databases also make cost- and time-effective data analytics
possible. Barriers to using a big data
approach to healthcare are missing or unstructured raw data, incompatibility
among different database systems, and reproducibility of already published
research. Data standardization and functional interoperability among clinical
data systems would help to solve the problems of using a big data approach to healthcare.
10. References
Image and table references
National Human
Genome Research Institute. (2016, July 6). The cost of sequencing a human
genome. Retrieved August 07, 2017, from https://www.genome.gov/sequencingcosts/
Office of the National Coordinator for Health
Information Technology. (2015). Non-federal acute care hospital electronic
health record adoption. Retrieved August 03, 2017, from
Schadt, E. E.
(2012, November 9). The changing privacy landscape in
the era of big data [Digital image]. Retrieved August 9, 2017, from http://msb.embopress.org/content/8/1/612
The NCI Genomic Data Commons [Digital image].
(2016, October 17). Retrieved August 8, 2017, from https://www.cancer.gov/about-nci/organization/ccg/research/computational-genomics/gdc
Link references
Electronic health record. (2017, July 15). Retrieved July
18, 2017, from https://en.wikipedia.org/wiki/Electronic_health_record
Google Trends (n.d.). Retrieved July 18, 2017, from https://trends.google.com/trends/explore?date=all&q=big%20data
Health Level Seven International - Homepage. (n.d.).
Retrieved July 18, 2017, from http://www.hl7.org/
National Cancer Institute Genomic Data Commons. (n.d.). Retrieved
August 8, 2017, from https://gdc.cancer.gov/
National Death Index. (2017, March 07). Retrieved July 18,
2017, from https://www.cdc.gov/nchs/ndi/index.htm
The Institute of Medicine (IOM). (n.d.). Retrieved August 8,
2017, from https://nam.edu/
The U.S. Department of Health and Human Services (HHS). (n.d.).
Retrieved August 8, 2017, from https://www.hhs.gov/
Other references
Capella's Master of Science in Analytics program. (n.d.). 4
examples of data analytics in action. Retrieved July 18, 2017, from https://www.capella.edu/rio/it-hub/analytics-careers/big-data-and-analytics/
Brown,
J. A., Chonghaile, T. N., Matchett, K. B., Lynam-Lennon, N., & Kiely, P. A. (2016). BigData–Led
Cancer Research, Application, and Insights. Cancer
Research, 76(21),
6167-6170.
DeFrances, C. (2016, May 19).
Electronic Health Records and “Big Data” for Health Care. Retrieved July 18,2017,
from https://www.cdc.gov/nchs/data/bsc/bscpres_defrances_may_2016.pdfCenters for Disease Control and
Prevention (CDC)
Gartee, R. (2016). Electronic
health records: understanding and using computerized medical records.
Prentice Hall.
Gliklich, R. E., Dreyer, N.
A., & Leavy, M. B. (2014). Interfacing registries with electronic health
records.
Kohn, L. T. (2009). To
err is human: building a safer health system. Washington, DC: National
Academy Press. https://www.ncbi.nlm.nih.gov/books/NBK225187/#rrr00026
Martin-Sanchez, F., &
Verspoor, K. (2014). Big Data in Medicine is Driving Big Changes. Yearbook
of Medical
Informatics, 9(1), 14–20.
http://doi.org/10.15265/IY-2014-0020
Rohland, N., & Reich, D. (2012). Cost-effective,
high-throughput DNA sequencing libraries for multiplexed target capture. Genome Research,
22(5), 939–946. http://doi.org/10.1101/gr.128124.111
Wong, H. T.,
Chiang, V. C. L., Choi, K. S., & Loke, A. Y. (2016). The need for a
definition of big data for nursing science: a case study of disaster preparedness. International
journal of environmental research and public health, 13(10), 1015.
No comments:
Post a Comment