How, and where, is COVID-19 spreading? Who is most at risk of serious complications? When will it be safe to go back to work? How fast can we put an end to the pandemic?
The longer the pandemic lasts, the more questions like these seem to stack up. In the urgent search for some answers, Johnson & Johnson has undertaken an enormous data science effort that’s helping guide everything from the company’s research into a potential vaccine to its return to workplace policies for more than 130,000 employees around the world.
“This is an unprecedented pandemic, and there are so many unknowns—the virus is moving quite fast, and it’s seemingly unpredictable,” says Najat Khan, Chief Data Science Officer and Global Head of R&D Strategy and Operations, Janssen Pharmaceutical Companies of Johnson & Johnson. “But it’s not actually unpredictable. By combining several types of data into an advanced analytical model using new techniques, we are driving insights and leading the way to a deeper understanding of this disease in order to help shape our clinical development program.”
Indeed, the enormity of the pandemic has inspired researchers worldwide to work with extraordinary speed and unprecedented cooperation. At Janssen, developing and testing an investigational vaccine candidate against the virus is one such critical effort—and to have the best chance at success, as well as reach the most at-risk populations and geographic areas, the clinical study design must be informed by a rigorous, agile view of critical pandemic data that can change by the day, even the hour.
“The urgency of this disease means we cannot rely on only traditional methods to understand how the disease works and how it will spread—we must utilize all data and advanced tools at our disposal, both within Johnson & Johnson and broadly across universities and other companies,” says Jennings Xu, Director of Data Science for Portfolio Management and Lead for COVID-19 Janssen R&D Data Science. “A variety of data sources, a variety of techniques and a variety of cross-functional perspectives are helping propel our vaccine program forward. As a data scientist, it is an incredible honor to be able to contribute toward the fight against COVID-19 in this way.”
With data science experts like Khan and Xu as our guides, we’re taking a deep-dive look at how data science is helping teams across the company leverage data to reduce complexity and uncertainty, drive answers and inform potential solutions during this historic and unprecedented pandemic.
1.
Using data to track the pandemic and forecast hotspots
Determining where the next surge in coronavirus cases could happen is critical knowledge for everyone, and is of particular importance to government officials and public health practitioners who are entrusted with making decisions that affect the lives and safety of many.
To help paint a real-time picture of how the virus is moving around the world, Janssen built a global surveillance dashboard that pulls in data at a country, state and even county level, which helps guide where the company should test its investigational COVID-19 vaccine candidate.
“It tracks how the disease is impacting certain areas on an hourly basis,” Khan adds. “This helps us have a deep and clear understanding of how the disease is traveling.”
Of course, surveillance is just a starting point when it comes to predicting where the disease might spread. Plus, says Khan, any predictive data science model is only as good as the information you feed into it. The accuracy of the global surveillance dashboard relies on gathering information from a broad set of sources—and understanding which of those data sources is best. This means working with in-country experts to determine whether official numbers are accurate, and where to find the most valid information about things like case numbers, hospitalizations, mortality and testing rates, social compliance and local policies.
“Because it is a challenge to find the right data, we have a team constantly monitoring and scouting reliable data sources across geographic territories,” says Xiaoying Wu, Senior Director, Janssen R&D Platform and Privacy, Janssen R&D Data Science. “These data then get curated, ingested and integrated into the surveillance dashboard automatically on a daily basis. This dashboard serves as the single source of truth for disease tracking and forecasting, enabling teams to make data-driven decisions.”
As the pandemic has continued, sustained lockdowns have given way to more intermittent restrictions. Because countries, states and municipalities are turning restrictions on and off as cases rise or fall, the team has developed dynamic models that take these variables into account.
To further track future spread, Janssen partnered with Dr. Dmitri Bertsimas and his colleagues at the Massachusetts Institute of Technology to build machine learning-based prediction models that can show where COVID-19 is likely to spike next. This approach uses a combination of machine learning and epidemiological infectious disease models known as SEIR (which divides populations into four categories: susceptible, exposed, infectious and recovered).
In March 2020, a team of Janssen data scientists, epidemiologists and quantitative scientists began to predict the migration of the pandemic. Janssen also enlisted the help of the Centers for Disease Control and Prevention, which contributed data from their ensemble forecasts, and the Institute of Human Virology at the University of Maryland School of Medicine.
These predictive models integrated the global surveillance dashboard data with information about local policies and behaviors, such as how people are traveling or whether they’re being compliant with mask-wearing. Khan and her team then combined information from all of these sources to provide holistic guidance to the Janssen clinical teams in the vaccine therapeutic area as they planned and worked to execute studies of their vaccine candidate.
“As the pandemic has continued, sustained lockdowns have given way to more intermittent restrictions. Because countries, states and municipalities are turning these restrictions on and off as cases rise or fall, and social distancing compliance has been variable by county, the team has developed dynamic models that take these variables into account,” Khan explains.
This kind of information has been especially critical as the company has recruited participants for clinical studies of its lead vaccine candidate. The trial is known as an “event-based” trial, meaning it’s tied to the number of “events,” such as a participant becoming sick with COVID-19. It’s the collection of a sufficient number of events that enables scientists to tell if the vaccine candidate is working. And because it will make the trial go more quickly if such studies are conducted in areas where community spread is high, researchers look at COVID-19 hot spots when picking clinical trial locations.
“The goal is to recruit participants who are most likely to get exposed to the virus in areas with high disease transmission, which gives us more data to quickly demonstrate efficacy,” Khan explains. “Given the uncertainty of this pandemic, long-term forecasting models enabled us to cast a wide net when identifying such potential hot spots. Guided by these data, we considered these locations for potential vaccine trial sites, and have continued to reevaluate study sites as the study progresses.”
The predictions have proven remarkably prescient: The vast majority of clinical trial sites that were predicted to be hot spots for COVID-19 ultimately had extremely high numbers of cases.
2.
Harnessing data to learn more about who might be most at risk of getting sick
While COVID-19 can affect anyone, it doesn’t impact everyone in the same way. Due to biological risk factors (such as age), patient demographics (such as race) and other still-to-be-determined characteristics, the course of the disease can range from extremely mild to severe enough to be fatal.
Janssen is building models to better understand what might make someone more prone to severe illness from the virus, as well as how different treatment courses may affect patient outcomes.
“When we choose which populations to recruit for the trial, there are many layers to consider,” Xu says. “The first is the geographical layer: Where is the virus? The second is biological. For example, we want to include elderly people and diverse ethnic groups because we know they’re vulnerable. The third layer is environmental. That is, we also want people in specific occupations, like plant workers who have high occupational risk of exposure.”
We’re digging into large-scale patient registries, as well as real world databases and published studies, to understand what drives differences in outcomes across the population. The fact that a large portion of infections remain undetected in the world poses a tremendous challenge.
One way the company is doing this is by analyzing longitudinal data from a global COVID-19 registry about areas where the disease is most prevalent, as well as hospitals in New York City. Since New York was hit hard and early in the pandemic, this analysis helps give data scientists a broad view of the disease in a large population of patients.
“We’re digging into such large-scale patient registries, as well as real world databases and published studies, to understand what drives differences in outcomes across the population,” says Kristopher Standish, Principal Data Scientist, R&D Data Sciences, Janssen R&D. “The fact that a large portion of infections are asymptomatic and remain undetected in the world poses a tremendous challenge.”
All of this work is in service of a goal to make sure that the patients who are most likely to be exposed to the virus—and most at risk for severe outcomes—are prioritized for enrollment in the company’s vaccine trial; the investigational vaccine, if effective, could benefit people like these the most.
3.
Leveraging data insights to help inform decisions about returning to the workplace
Just as important as Janssen’s commitment to advancing a vaccine candidate for COVID-19 is its commitment to keeping its employees safe and healthy.
The Janssen R&D Return to Workplace Taskforce, which Khan is a member of, is using data science to make tactical decisions about which labs can remain open, how many people can be on-site at a time, and how different company facilities are configured and sanitized.
“We have employees all over the world, so we’re taking an individualized approach that’s data-driven and evidence-based,” says Oren Shur, Senior Director of R&D Strategy and Operations, Janssen R&D.
To reduce the number of people working at any given location at any given time, for example, the company is setting up a model to track site density in real time and provide feedback about how employees should stagger their work hours. Johnson & Johnson is also relying on data to help determine if and how many employees should be tested for the novel coronavirus, and how frequently testing should occur.
We’ve built an analytical model called COVID Lens that takes into account factors like COVID-19 test sensitivity and specificity, the prevalence of the virus in the community, and the number of people working on-site.
“We’ve built an analytical model called COVID Lens that takes into account factors like COVID-19 test sensitivity and specificity, the prevalence of the virus in the community, and the number of people working on-site,” Shur says. “It then takes that input and provides a detailed picture of how much testing we need to perform in that area.”
Johnson & Johnson is also partnering with IBM to develop an app for employees, who can enter information about any symptoms they may be experiencing. Shur says it will help play a critical role as the company moves into a later stage of its return-to-work plan, when a greater number of people will be working on-site. Through the app, employees can get approval from their managers to return to work, receiving a “green light” based on their responses to the questions.
“Certainly we all know that we shouldn’t work if we’re sick,” Shur says. But, as he notes, there’s a difference between “sensing in the back of your mind that you don’t feel well,” and going through the active process of inputting your symptoms into an app each morning, which can help make the decision to stay home more clear.
“Our employees are leading the way to build better medicines, potential vaccines and products for patients and consumers—and many are scientists and supply chain colleagues on the front line,” Khan says. “It’s a privilege to be part of a healthcare company that is ensuring both science and data science are central to accelerating the development of a potential COVID-19 vaccine candidate for those in need around the world. And it’s absolutely vital that we keep everyone safe.”