It is a truism of machine learning and predictive analytics that 80% of an analyst’s time is consumed in cleaning and preparing the needed data. I saw an estimate by a Google engineer that 25% of the time was spent just looking for the right data. A big part of this process is human-driven feature engineering – distilling, transforming and curating the data to identify and extract variables that have predictive power. A recent paper in Nature by a team of researchers from Google and several universities suggests that deep learning can short-circuit that time-consuming process.
Scalable and accurate deep learning with electronic records, by Rajkomar et al, published one year ago in Nature, considers data from two hospitals on over 200,000 in-patients. The data were organized as timelines of medications, doctor and nurse visits, medical tests, diagnoses, procedures, and provider notes. This follows the Fast Healthcare Interoperability (FHIR) standard. Data could be numerical, defined text (e.g. medication names), or free-form doctor notes. Each patient’s data can thus be visualized as a matrix, with each row representing a category of information (e.g. medication) and each column a time period. The goal was to predict death, readmission, length of stay, and diagnosis (14,025 possible). Overall, the data amounted to 46 billion data points.
In the standard data mining paradigm, the analyst would spend time and creative energy exploring and curating the data. Inevitably this means identifying and focusing on what appear to be useful and usable predictor variables. As a result, most of the data in a dataset this size are left out of the models. Plus, to reduce complexity, separate models would be developed for each of the two hospitals.
By contrast, the Google-university team simply dumped all the data into a deep learning neural network. Deep learning networks are most noted for their success in voice and image recognition, where they “learn” higher level features from individual soundwave or pixel values which are, individually, quite uninformative. In the FHIR context, the granular level data are not uninformative as with image and voice data; administration of the medication Vancomycin, for example, is quite informative. Rather they are complex and messy – one doctor’s note-taking style may be quite different from another’s, and sometimes the trade name Vancocin may be used instead of Vancomycin. Just as success or failure in labeling images guides the iterative deep learning process of making sense of pixels, so the success or failure of predictions (diagnoses, health outcomes, etc.) guides the neural network iteratively in navigating the complexity of the FHIR data.
The result? The deep learning model performed significantly better in all prediction tasks than did the traditional curated predictive models, as measured by the area under the receiver operator characteristics curve.
The lesson? The significance of this application of deep learning is not simply, or even primarily, the ability to produce predictions that are superior to traditional models. It is the ability of the deep learning network to take over much of the time-consuming data prep work that is traditionally done by the statisticians and researchers.