Using NLP to Identify Subgroups of Patients with Rare Diseases

The Problem:

Our client was a pharmaceutical company specializing in the research and treatment of rare diseases and disorders. By definition rare diseases only affect a small number of people, and each population of people with a given rare disease or disorder consists of various subgroups, some of which have treatment options available, and others that do not. Given the very large number of rare diseases, and the sparsity of information about each group, it is extremely difficult to find subgroups of patients who are lacking research or treatment options.

 

The Data:

There are many open textual sources that describe rare diseases, research, clinical trials, and treatment options, including inclusion and exclusion criteria. These sources include OMIM, Orphanet, Clinical Trials, and the FDA

 

The Solution:

Using an open source analytics platform, KNIME, we collected and combined these open source datasets with other text sources, performed natural language processing (NLP) to process and encode the data, and conduct an unsupervised analysis of the results. The workflows we created allowed a user to input a given rare disease or disorder, gather and analyze all related records, and return to the user a characterization of patient subgroups that are not serviced by existing research or treatment options.

 

The Impact:

Using our workflows, deployed in a KNIME server environment, the client was able to find patient subgroups who are in need of research and treatment efforts, and estimate the size of the populations. This allowed them to refine and prioritize their R&D program.