This is how the CDC is trying to forecast coronaviruss spread – MIT Technology Review

Every year the US Centers for Disease Control and Prevention holds a competition to see who can accurately forecast the flu. Research teams around the country vie with different methods, and the best performers win funding and a partnership with the agency to improve the nations preparation for the next season.

Now the agency is tapping several dozen teams to adapt their techniques to forecast the spread of the coronavirus in an effort to make more informed decisions. Among them is a group at Carnegie Mellon University that, over the last five years, has consistently achieved some of the best results. Last year, the group was designated one of two National Centers of Excellence for Influenza Forecasting and asked to lead the design of a community-wide forecasting process.

Roni Rosenfeld, head of the group and of CMUs machine-learning department, admits he was initially reluctant to take on the coronavirus predictions. To a layperson, it doesnt seem as if forecasting the two diseases should be so different, but doing so for the novel outbreak is significantly harder. Rosenfeld worried about whether his predictions would be accurateand, thus, whether they would even be useful. In the end, he was convinced to forge ahead anyway.

People act on the basis of forecasting models, whether they are on paper or in their heads, he says. Youre better off quantifying these estimations so you can discuss them rationally as opposed to making them based on intuition.

Sign up for The Download your daily dose of what's up in emerging technology

The lab uses three methods to pinpoint the rise and fall of cases during flu season. The first is whats known as a nowcasta prediction of the current number of people infected. The lab gathers recent and historical data from the CDC and other partner organizations, including flu-related Google searches, Twitter activity, and web traffic on the CDC, medical sites, and Wikipedia. Those data streams are then fed into machine-learning algorithms to make predictions in real time.

The second and third are both proper forecastsa prediction of whats to come. One is based on machine learning and the other on crowdsourced opinion. Predictions include trends expected up to four weeks ahead, as well as important milestones like when the season will peak and the maximum number of expected cases. Such information helps both the CDC and health-care providers ramp up capacity and prepare in advance.

The machine-learning forecast takes into account the nowcast as well as additional historical data from the CDC. There are 20 years of robust data on flu seasons in the US, providing ample fodder for the algorithms.

In contrast, the crowdsourcing method taps into a group of volunteers. Every week, experts and non-expertswho are found to do just as well with a little participation experienceare asked to log on to an online system and review a chart showing the trajectory of past and current flu seasons. They are then asked to complete the current seasons curve, projecting how many more flu cases there will be over time. Though people dont make very good predictions individually, in aggregate they are often just as good as the machine-learning forecast.

Carnegie Mellon University

Over the years, Rosenfelds team has fine-tuned each of its methods to predict the trajectory of the flu with near-perfect accuracy. At the end of each flu season, the CDC always retroactively updates final numbers, giving the CMU lab a chance to see how their projections stack up. The researchers are now adapting all the techniques for Covid-19, but each will pose distinct challenges.

For the machine-learning- based nowcast, many of the data sources will be the same, but the prediction model will be different. The algorithms will need to learn new correlations between the signals in the data and the ground truth. One reason: theres far greater panic around coronavirus, which causes a completely different pattern of online activity. People will look for coronavirus-related information at much higher rates, even if they feel fine, making it more difficult to tell who may already have symptoms.

In a pandemic situation, there is also very little historical data, which will affect both forecasts. The flu happens on a highly regular cycle each year, while pandemics are erratic and rare. The last pandemicH1N1 in 2009also had very different characteristics, primarily affecting younger rather than elderly populations. The Covid-19 outbreak has been precisely the opposite, with older patients facing the highest risk. On top of that, the surveillance systems for tracking cases werent fully developed back then.

Thats the part that I think is going to be the most challenging, says Rosenfeld, because machine-learning systems, in their nature, learn from examples. Hes hopeful that the crowdsourcing method may be more resilient. On the one hand, little is known about how it will fare in pandemic forecasting. On the other hand, people are actually quite good at adjusting to novel circumstances, he says.

Rosenfelds team is now actively working on ways to make these predictions as good as possible. Flu-testing labs are already beginning to transition to Covid-19 testing and reporting results to the CDC. The CMU lab is also reaching out to other organizations to get as much rich and accurate data as possiblethings like anonymized, aggregated statistics from electronic health records and purchasing patterns for anti-fever medicationto find sharper signals to train its algorithms.

To compensate for the lack of historical data from previous pandemics, the team is relying on older data from the current pandemic. Its looking to incorporate data from countries that were hit earlier and will update its machine-learning models as more accurate data is retroactively posted. At the end of every week, the lab will get a report from the CDC with the most up-to-date trajectory of cases in the US, including revisions on numbers from previous weeks. The lab will then revise its models to close the gaps between the original predictions and the rolling statistics.

Rosenfeld worries about the limitations of these forecasts. There is far more uncertainty than what hes usually comfortable with: for every prediction the lab provides to the CDC, it will include a range of possibilities. We're not going to tell you what's going to happen, he says. What we tell you is what are the things that can happen and how likely is each one of them.

Even after the pandemic is over, the uncertainty wont go away. It will be very difficult to tell how good our methods are, he says. You could be accurate for the wrong reasons. You could be inaccurate for the wrong reasons. Because you have only one season to test it on, you cant really draw any strong, robust conclusions about your methodology.

But in spite of all these challenges, Rosenfeld believes the work will be worthwhile in informing the CDC and improving the agencys preparation. I can do the best I can now, he says. Its better than not having anything.

See the original post here:
This is how the CDC is trying to forecast coronaviruss spread - MIT Technology Review

Related Posts
This entry was posted in $1$s. Bookmark the permalink.