Roche: Leveraging AI for Better Personalized Healthcare
Roche uses Dataiku as their data science workbench for use cases in personalized healthcare, manufacturing, sales, and finance.
Learn Moretime to run each MFI classification
positive prediction rate
size of SVPs that can be detected
The monitoring of subvisible particles (SVPs) in injectable formulations development is a critical component of ensuring patient safety and product quality, both at Regeneron specifically as well as in the biopharmaceutical industry at large.
Currently, there is a gap in routinely measuring and controlling subvisible particles smaller than 10μm in biotherapeutic products. Recent studies have indicated the potential of proteinaceous particles in the subvisible size range (0.1–10μm) to aggregate and lead to down-the-line process failure or patient immunogenicity reactions.
Regeneron’s BioPerceptron platform — a deep learning solution built with Dataiku for high-throughput biomedical image processing that leverages AI, end-to-end cloud orchestration, and advanced visualization capabilities — addresses this gap, transforming proprietary, unstructured data into proactive pharmaceutical process monitoring.
While the problem of determining possible unseen contaminants in drug formulations is not new (and neither is the use of deep learning methods for image analyses and classification), Regeneron’s approach is a timely use of deep learning in a in a way that makes life-saving medicines more efficacious and less risky by extending the industry’s highest standards of quality biomanufacturing.
Current industry standards use light obscuration techniques (based on the ability of a particle to reduce measured light intensity when passing a light beam) to measure SVPs, but the method cannot distinguish different types of particles (synthetic vs. proteinaceous) in a test solution.
Regeneron’s IT teams partnered with formulation development scientists to develop a deep learning convolutional neural network (CNN) approach that assigns weights (importance) to various features of an image to be able to “learn” and differentiate characteristics of one image versus another.
This initial solution was then expanded into a cloud-native platform that can:
Thanks to development on Dataiku combined with the use of GPUs to streamline image processing pipelines, each microscopic flow imaging (MFI) classification takes less than 15 minutes to complete. The result of the classification is a better than 94% positive prediction rate for silicon and protein SVPs across various sizes.
In addition to potentially improved product quality and safety plus improved process development for more efficient manufacturing scale-up, the BioPerceptron platform is modular, which offers the potential of simplifying regulatory validation by only changing single components rather than an entire system.
For this use case, there were business and technical challenges as well as data and modeling challenges.
On the business side, unseen contaminant aggregates in pharmaceutical drug development present a multi-factorial challenge. They are very small — in the range of 1-25 μm — and they vary in type (silicon oil droplets, protein aggregates, fibers, glass particles, or air bubbles). Being able to classify type as well as size has several advantages, and understanding what type of SVP exists in the pharmaceutical product aids in diagnosing the source of the contaminant.
In addition, as with any machine learning or AI exercise, there are some more specific data and modeling challenges associated with this particular use case:
The data in this case consists of high-resolution microscopy images. The challenge here was in the ability to identify the different particles and their real world characteristics to apply industry quality limits in a timely manner from very large high-resolution microscopy images.
To solve this challenge, IT worked closely with their formulations research partners to replicate development and manufacturing conditions and generate samples representative of real-world data. These MFI files are captured and tagged with appropriate metadata from the microscopy system. Then, the image datasets are pushed through the parallelized high-throughput classification pipeline.
Regeneron applied state-of-the-art data validation and unsupervised learning methods to understand underlying patterns within the image data and systematically capture data inconsistencies. Methods include neural network-based dimensionality reduction, hierarchical clustering, and multi-dimensional variance analysis.
To bring this use case to life, Regeneron needed to have training data that was adequately representative —in terms of both size and type – of real-world conditions. Their solution balanced the use of randomized and stratified sampling to accurately capture the distribution of particle types and sizes expected in production settings without compromising the model’s need for sufficient data samples.
Regeneron integrated existing cloud capabilities with their deep learning pipeline to seamlessly leverage elastic and parallelizable compute resources for time-consuming model learning cycles.
Regeneron’s BioPerceptron platform is innovative because of its:
While each one of those points in isolation is not noteworthy, the innovation in Regeneron’s solution exists in the aggregate.
We consider subvisible particle classification as a single use case in a larger, cloud-native framework for ingesting, parsing, and processing images with the intent of addressing pressing biological questions.Shah Nawaz CTO/VP, Digital Transformation at Regeneron
There are several factors that enabled Regeneron’s innovation with the development of the BioPerceptron platform.
First, Regeneron has established robust data transfer, data validation, and data privacy frameworks to be able to scale pilot projects quickly once they prove feasibility. In addition, through an iterative process of human validation and statistical methods, Regeneron was able to establish the right datasets to test the validity of CNN models in trial.
Importantly, thanks to Dataiku, Regeneron has a well-established AI and machine learning test bed and scientific compute to be able to iterate through experiments relatively quickly. With the ability for both data and non-data professionals to use and collaborate with Dataiku, Reneneron found success by bringing the right mix of engaged wet-lab scientists, data scientists, and compute specialists with a collaborative spirit.
Novartis moved from repetitive manual calculations in Excel to informed decision making grounded in accurate and real-time data with Dataiku.
Read moreRoche uses Dataiku as their data science workbench for use cases in personalized healthcare, manufacturing, sales, and finance.
Learn MoreMount Sinai has pivoted its processes to create more holistic methods which enable lasting results and life-long, positive impacts in patients’ lives. At the core of this transformation? Dataiku.
Learn MoreThe NHS uses Dataiku for MLOps, model monitoring, and more.
Learn MoreTo address their growing challenges in keeping up with customer demands and providing quality customer service, Malakoff Humanis turned to Dataiku’s Deep Belief program and collaborated with Dataiku’s data scientists on two advanced natural language processing (NLP) projects.
Learn More