How big data will save your life

Comments

Dr. Robert Walker, director of health innovation for the U.S. Army Surgeon General, has been more a frustrated data entry clerk in recent years than a physician, a frustration shared by thousands of his colleagues.

Instead of freeing him for more face-time with patients, the electronic health record (EHR) system he uses has become a third person in the exam room, drawing his attention away from patients. The issue isn't the EHR Walker uses, however; it's the shortcomings of technology in general.

"The electronic medical record has become an impediment versus something that was going to streamline your day," Walker explained in a recent interview. "It took the focus away from the patient and put it all on the computer. People are clicking boxes and turning their backs to patients. It's all about jamming data into this thing."

EHRs makes it possible for every medical care facility to electronically capture a patient's family history, illnesses, treatments and current lifestyle. The promise of EHRs was that they would save the U.S. healthcare system up to $81 billion a year by streamlining workflows and creating massive clinical data warehouses that could be mined for information that could improve preventative care and disease treatment.

That has not yet happened, and doctors are less enamored with EHRs as a result. Last month, the American College of Physicians and AmericanEHR Partners released a survey of 4,279 physicians that showed fully 39% of them would not recommend their EHR to a colleague. That's up from 24% who felt that way in 2010. And 34% said they are "very dissatisfied" with the ability of EHRs to decrease workload.

Under the auspices of the Health Information Technology for Economic and Clinical Health Act, (HITECH Act), the U.S. government is requiring healthcare providers -- hospitals, clinics and private practices alike -- to implement EHRs. Providers must also prove their meaningful use of those systems through a three-stage government process that is taking place over the next four years.

Despite what has so far been an uneven rollout of EHRs in the U.S., Walker and others are already, in effect, building what a treasure trove of patient information that can be tapped to improve patient care, a repository that will revolutionize medicine for decades to come. That is, if everyone can figure out how to categorize it, sort it and access it easily.

The promise

Big data analytics engines such as Hadoop have the capability to mine the clinical data warehouses created by EHRs, warehouses filled with valuable unstructured data that can be used to help doctors make decisions about patient treatment.

Today, physicians and pharmaceutical companies still rely largely on text books and infinitesimally small clinical studies that typically use healthy patients with only one disease. That pool of subjects hardly mimics most real-world patients, many of whom have more than one health problem.

About 25% of hospitals use some form of data analytics to mine traditional databases to learn more about past treatments and about how future treatments can be improved. But, what is contained in the columns and rows of databases represents an almost insignificant portion of the information about patients that's been collected; the most important information lies in unstructured data - the physicians' notes, radiological images and lifestyle information gathered from patients using mobile devices.

"That's the real renaissance that's going to happen in health care," Walker said. "With big data, what happens in a doctor's office is going to be vastly different from what we see today. The top five or 10 things that people die from in America are life-style induced. That's absurd. Maybe instead of vital signs, I'm just going to look at what you buy in a grocery store."

Today, data analytics in most hospitals is used to manage costs and increase the quality of care. The more promising use for big data, however, is the ability to discover treatment-and-outcome correlations using physician and nurse notes and data driven by genetic profiles.

By combining big data and genetics analytics, scientists today can determine how a patient will react to a medication and may someday even be able to predict who may become ill and -- if they do -- what customized medications can best treat diseases.

"When I look at the historical growth rate, [big data] is definitely a hot application in the marketplace," said James Gaston, senior director of clinical and business intelligence at the Healthcare Information and Management Systems Society (HIMSS).

Personalized medicine

Currently, one of the more promising areas of big data analytics involves drug therapies devised through the study of genomics, also known as personalized medicine.

Genetic diseases are akin to buggy code in software; the key to finding the cause of an illness is to uncover that error in the code, according to Alexis Borisy, co-founder of Foundation Medicine, a cancer diagnostics company.

"Cancer, for example, is a disease of the genome where something has gone wrong with the programming code and a mutation occurred. There are actual errors in the code and that's a core reason why cancer develops," Borisy said.

While sequencing the first human genome took eight years and cost about $1 billion, genetic sequencing costs have fallen dramatically in the last decade. It now costs from $5,000 to $10,000 per human genome, and companies are working hard to cut that cost to $1,000 in the next few years. Sequencing a DNA strand is becoming so inexpensive that hospitals will soon be able to do it for on most patients and add the data to an EHR, according to according to Nigam Shah, an assistant professor of Medicine at Stanford University's School of Medicine.

Shah works in biomedical informatics, meaning he works toward making sense of the information in clinical data warehouses.

Sequencing of a human genome yields a massive amount of data, and storing one person's genetic code can require up to 1TB of data storage capacity, Shah said.

The human genome contains 3.2 billion lines of code, which means that finding a flaw in that code requires sophisticated computer algorithms and massive, clustered server farms. Adding to the complexity is that disease is often the result of multiple mutations, according to Shah.

While diseases such as Huntington's or Alzheimer's disease are caused by common genetic mutations, and are more easily spotted, most illnesses are caused by rare mutations. Diabetes, for example, is thought to be caused by a number of genetic mutations, which on their own confer a small amount of risk, but in combination can be more serious.

"If you genome type someone, and out of the 50 [mutations associated with diabetes] you have 10 of them, it's very hard to say what's going to happen to you," Shah said. "Part of the problem is that we just need to do more research and collect more data, and some of it we just need better methods."

But tremendous progress has been made. To date, scientists now know the genetic causes of about 5,000 rare diseases.

One of the most promising areas of genetic research is pharmacogenomics, which uses a person's genetic makeup to determine how they'll respond to drugs, tailoring treatments to specific mutations -- even mutations found in cancer tumors.

For example, the drug Zelboraf was developed by New York University's Cancer Institute a couple of years ago through genetic tests to target melanoma skin cancer tumors that express a gene mutation called BRAF V600E. Researchers found patients taking Zelboraf were 64% less likely to die from the advanced form of skin cancer than patients who received only standard chemotherapy.

"Looking at your genome does help in saying, 'For you, we should give half the dose of this drug, but for this other person we'll give you a double dose of that drug,'" Shah said.

Linking EHRs with genomes

Currently, there are several projects underway to link EHRs and human genomic data. Among the most promising is the Electronic Medical Records and Genomics (eMERGE) Network.

Funded by the National Human Genome Research Institute, the eMERGE network joins researchers from nine healthcare research organizations and hospitals with a wide range of expertise in genomics, statistics, ethics, informatics and clinical medicine. Up to 10,000 patients will have sequencing performed on them in reference to 83 specific genes, with another 50,000 to 80,000 patients getting more general genotypes.

The resulting data will improve genetic risk assessment, disease prevention, diagnosis and treatment, and can be used to develop genomic-based medicines, according to Dr. Gail Jarvik, head of the division of Medical Genetics at the University of Washington.

The eMERGE network includes the University of Washington, the Mayo Clinic, Boston Children's Hospital and the Geisinger Health System. The network started out looking for genes for more common diseases, using computer algorithms with EHRs to find the diseases associated with a particular genotype.

"This year, the network moved into pharmacogenetics, and it is very interested in sequencing of genes related to treatment response or adverse response to medications," Jarvik said.

Jarvik, one of the network's principal investigators, said the network has been successful in finding disease genes, immunity genes, and eye and cardiac disorders.

The eMERGE project has developed a computer algorithm that extracts disease types from a number of different EHRs at various institutions. Researchers then input the data and look for genetic markers that point to mutations responsible for diseases.

"When you move to pharmacogenetics, there are problems you can have with drugs," Jarvik said. "A drug can be ineffective, or you may have an effective use of that drug but you may need a different dose than someone else. Or you might have a bad reaction. We want to work on all those problems."

Shah and other researchers caution that many variables affect a person's health, and genomics won't be a cure-all. But the use of big analyses can help improve patient outcomes.

Notes, images and biometrics

Genomics is only "one tiny fraction" of the myriad efforts to improve healthcare, Shah said. "For the average Joe who has hypertension, diabetes [and] high cholesterol, genomics is completely useless."

One of the most valuable tools in diagnosing and tracking patients still involves medical notes, and new natural language processing software is allowing those physician's notes to be codified into database fields that most healthcare professionals don't have time to fill out themselves.

"Textural notes are how doctors communicate with other healthcare providers about what's going on with a patient, what's the plan for treatment and what are the concerns," said Dr. Isaac S Kohane, a professor of pediatrics and health sciences technology at Harvard Medical School & Children's Hospital.

Kohane is frustrated that it's easier to find out more about shoppers' experiences with a digital camera purchase than to determine what adverse events patients had with a particular drug. So, along with several colleagues, Kohane developed free open source software called i2b2 informatics that can collect both physician notes and other unstructured data as well as codified medical data from a patient's bedside.

The informatics platform is used by more than 100 academic health centers around the world. It has been used to pinpoint genetic predictors for diseases such as rheumatoid arthritis and to identify harmful drugs.

For example, the informatics engine revealed that there was a higher risk of heart attack from the drug Avandia than from other drugs in the same class.

When the i2b2 software was deployed in hospital emergency rooms, it was able to predict, on average, two years in advance of the typical healthcare system whether a patient was suffering from domestic abuse by detecting physical traits, Kohane said.

"At the same time, this is almost like a back door. The data is being offloaded and analyzed [after the fact]. What about real-time care of patients across healthcare systems?" he said.

In chronic care, what matters most is that a doctor be able to access clinical data warehouses that contain information on thousands similar patients.

"What matters is the ability for the doctor to say you have these four diseases and you're taking these four drugs, here are the results of treating these other similar patients," Shah said. "There is no clinical trial that has every looked at these four diseases and the effect of these four drugs."

When data from EHRs can be exchanged seamlessly, a physician will be able to query what thousands of other doctors did in the same situation.

"Then I want to ask myself, 'What am I worried about with this person: Am I worried about blood clots or heart attack?" Shah said. "Then I can query what happened to the 1,000 other people who suffered a blood clot and determine ... that outcome in those people very similar to you."

"It's sort of like doing a clinical trial in silicon," Shah continued. "I refer to this whole process as practice-based medicine."

Historically, medicine has relied on published guidelines for treatment or the results of clinical trials for drug prescriptions, which always focus on one disease and most often use only younger, healthier patients as subjects for tests.

Data pigeon holes

For example, more than 60% of cancer patients are over the age of 65 and have anywhere from two to five other chronic illnesses, such as congestive heart failure or high blood pressure. Trials with younger patients would not involve the same mix of health problems.

"You get a younger adult, in the age range of 50..., that doesn't have any diseases other than cancer," said Robert Hauser, senior director of the American Society of Clinical Oncology's (ASCO) Quality Department. "So, once a drug is developed from a trial, it ends up being used on a population that wasn't evaluated on a large scale. Right now, we only learn from 3% of all adult oncology patients because only 3% of them participate in clinical trials for drug development."

And, once a clinical trial ends, patients are no longer tracked, Hauser added.

Also hindering advances in personalized medicine is the compartmentalization of healthcare data at hospitals, private practices and even clinical trials.

Additionally, EHRs use proprietary software, meaning they don't communicate with other systems. An EHR from Meditech, for example, doesn't natively share data with one from Cerner, McKesson or Epic Systems - the four largest EHR makers in the world.

"We realize the data standards wars and interoperability issues that go on amongst EHR vendors is not something that's going to be overcome in the near future," said Josh Mann, assistant director of Oncology Technology Solutions for the ASCO.

There is, however, an industry-wide effort under way to break the logjam.

For example, the non-profit Health Level Seven International this month released standards and guidelines that enable hospitals to exchange medical information, including radiological images.

Beginning in March 2010, $564 million in federal funds were allocated to states to develop health information exchanges, which allow for the sharing of health information electronically through data translation engines that allow EHRs to share information over secure Internet links.

The federal government has developed the Nationwide Health Information Network (NwHIN), which encompasses a set of standards, web services and policies that enable the secure exchange of health information over the Internet.

Currently, health information exchanges are being adopted at the regional, or at best, state-wide levels. Some of the most significant health information sharing networks are being deployed among healthcare providers themselves or by healthcare non-profits.

For example, the ASCO recently completed building a data analytics engine that pulls together information from more than 100,000 breast cancer patients from 27 oncology practices using disparate EHR systems. While still a prototype, the system does represent one of the largest breast cancer data sets in the U.S., according to Hauser.

Built mainly on open-source software, the ACSO's CancerLinQ project is a "learning health system" that will eventually analyze data from millions of cancer patients via their EHRs. The prototype system ingests de-identified patient data form two dozen oncology practices.

"We architected the system in such as way as to be able to accept any data in any format and then we used machine-learning algorithms to identify what was sent to us," Hauser said.

Once in the database, the data is mapped to a standardized medical vocabulary such as would be contained in the World Health Organization's International Classification of Diseases (ICD).

While the prototype was built just as a proof of concept, cancer doctors will eventually be able to consult the full-scale database like a Google search. That will allow doctors to see how patients with the same types of cancer were treated around the country, and how they fared.

While currently using a NoSQL, CouchDB database backend, the ASCO is considering using Cassandra with Hadoop for the full build. That database is expected to be completed in 12 to 18 months.

Beyond helping an individual patient, big data will allow the healthcare community as a whole detect poor drug interactions quickly. "So this gives us the ability to look at that [common cancer] population and figure out the best dosages and cycles of treatment," Hauser said.

While the ASCO is among the largest cancer research organizations, it is by no means alone in its use of big data in determining best practices.

Cleveland Clinic - a 4,500-bed healthcare system - uses an EHR from Epic Systems and a SQL transactional database for retrospective data analysis of its EHRs to improve patient treatment.

"We think first about outcomes: what data can we collect and make available to clinicians so they know how well they're doing in treating their patient," said Dr. C Marin Harris, CIO of Cleveland Clinic.

Cleveland Clinic is also starting to use Hadoop, but it's still a small part of the research because data is internally confined.

"It may appear if we only analyze Cleveland Clinic data that we're doing well with regard to a patient, but in fact if the patient went to someone else's emergency room 10 times, we didn't know that," Harris said.

Cleveland Clinic is working with other state health plans to collect a broader swath of patient data. Along with Ohio's other largest healthcare provider, University Hospitals, Cleveland Clinic is preparing to share data across Ohio's statewide electronic medical records exchange, CliniSync .

Once on the exchange, Cleveland Clinic will be electronically linked to 21 other hospitals already using the system.

One chronic disease targeted by the Clinic's data analytics engine is diabetes. The analytics engine searches EHRs for the results of A1C tests, which is a long-term measurement of glucose in red blood cells. Knowing a person's average, long-term glucose level can predict their likelihood of suffering other diseases such as kidney failure or stroke.

Cleveland Clinic knows the problem is multi-faceted. Patients must follow treatment regimes and choose healthy lifestyles, and physicians must have long-term data to tailor treatment. But, as Harris notes, if the patient doesn't know how they're doing at a macro level, it's more difficult for them to change their behavior.

"...That information is used to not only send alerts to the physician but also [to] the patient," Harris said. "They can become stewards of own healthcare at some level."

To more directly engage patients, Cleveland Clinic allows them to enter their own data via glucose readers, ether manually or having it automatically entered via a mobile device to a personal health record (PHR). Cleveland Clinic uses Microsoft's free HealthVault cloud service as its PHR. The HealthVault application can then transfer that data to the clinic's EHR for physician and data analytics use.

"The best way to correct glucose levels is to know what's happening with a patient when they're at home, not when they're in a doctor's office," Harris said.

Also being floated in the heathcare community are scalable, less expensive and more patient-centric community health record banks. Those banks are community organizations that put patients in charge of a comprehensive copy of all their personal, private health information, including both medical records and optional information added by the patient.

The patient explicitly controls who has access to which parts of the information in his or her individual account.

Voice recognition joins big data

But, before information can be shared, it needs to make it into EHRs. One way physicians and nurses can add their notes to EHRs is with voice recognition technology.

For example, the U.S. Army has an enterprise-wide license for Nuance's Dragon Medical 360 Network Edition voice recognition software for use with its AHLTA EHR and Essentris-Inpatient System. The U.S. Veteran's Administration also has 12,000 Nuance Dragon licenses integrated with VistA EHR system.

In many cases, a physician will use voice recognition to enter observations, prognosis and treatment into a patient's electronic record.

Dr. Walker, with the U.S. Army's Surgeon General's Office, uses voice recognition technology as he examines a patient to populate their record. A wide-screen monitor in his exam room allows the patient to view the data as it's being input so any errors can be corrected, he said.

Walker believes the real game changer in medicine will be an engaged patient, one who will enter his or her own data through the use of mobile devices. And that data can include not just medical information, but also lifestyle updates involving diet and exercise.

By having a full picture of a patient's lifestyle, doctors are better equipped to help patients avoid the onset of chronic illnesses. Then, once the data is in an EHR, big data analytics engines could offer physicians information about patients who may need to adjust their caloric intake, level of activity or the amount of sleep they get.

"The answer to the obesity problem is not the operating table, but the dinner table, and that's where we need to get to," Walker said. "In this country, we're putting billions of dollars into healthcare and our life expectancies are less than in countries that spend a fraction of what we do.

"We're really doing disease care and not healthcare today," he said.