Danqi Wang

Development of an early alert model for COVID-19 pandemic situations

Danqi Wang presents her work on the Development of an early alert model for COVID-19 pandemic situations.

Motivation

As the COVID-19 pandemic continues to spread across the world, many countries are implementing measures to slow down the spread of the virus. One of the key tools in this fight is the use of alert models to predict and track the spread of the virus. In this post, we'll take a look at how AI-powered alert models work and how they can be used to help fight COVID-19.

COVID-19 alert models

Alert models are used by public health officials to help guide decision-making around measures to slow down the spread of the virus. For example, if an alert model predicts that the number of infections is likely to increase rapidly, officials may decide to implement stricter social distancing measures or to increase testing and contact tracing efforts.

Social media

Over the last decade, several studies have investigated the use of digital data sources for predicting and tracking the spread of diseases such as the flu, dengue, Zika, MERS, and Ebola [1, 2, 3, 4, 5, 6, 7]. By analyzing social media data, e.g. Google searches and Twitter microblogs, AI-powered alert models can generate insights that are not available from traditional sources of data, such as confirmed case records, deaths and hospitalization rates. This can help public health officials make more informed decisions about how to allocate resources and target interventions.

© Fraunhofer SCAI
Figure 1

Generation of machine and deep learning models for forecasting COVID-19 activities in Germany

As the pandemic of COVID-19 has been spreading on a worldwide scale, machine and deep learning are considered effective approaches to analyze the sequence of historical disease data. More and more researchers are using LSTM models to estimate future likely COVID-19 report cases, and delivered significant results in some countries, for instance, the U.S., Canada and European countries. In Germany, however, there is a lack of research on leveraging digital data streams (Google searches and Twitter) to alert COVID-19 outbreaks. Initially, the thesis work involved generating a symptom-related corpus. In a literature review and using data-driven tools, symptoms highly associated with COVID-19 have been identified. In Figure 1, the top-25 symptoms are displayed based on the number of co-occurrences in SCAIView knowledge discovery software. Some common COVID-19 symptoms, including pneumonia, fever, cough, inflammation, shortness of breath were ranked in our top list.

After translating these symptom terms into German, this information was used as input to retrieve Google Trends and Twitter data, resulting in the preparation of two multidimensional, quantitative, and longitudinal datasets (Figure 2).

Following the generation of the datasets, the correlation between social media data and traditional surveillance data (RKI confirmed cases, deaths, and hospitalizations) was explored using log-linear regression models. As part of our results, Google Trends showed high correlation between RKI confirmed cases and RKI hospitalization, with F1 scores of 0.99 and 0.98, separately. Besides, Google Trends could give early signals of impending outbreaks 10 days and 18 days. As evidence indicates that digital data streams correlate with surveillance reports, we investigated the possibility of building machine and deep learning models, including Random Forest and LSTM, to forecast the up and downtrends of COVID-19 surveillance. Based on the result of trend forecasting analysis, LSTM models built on selected Google Trends symptoms showed better performances for predicting uptrends in RKI confirmed cases and RKI hospitalization (F1 scores of 0.78 and 0.81). Some predictive symptoms, such as 'Kurzatmigkeit', 'Bronchitis', 'Kopfschmerzen', 'Epistaxis', 'Rachenschmerzen', and 'Myalgie', contributed most to the final predictions.

Our methods explored the full potential of what online real-time data had to offer for COVID-19 surveillance and demonstrated that Google Trends data stream can be leveraged for robust alerts of the COVID-19 pandemic in Germany. Now we are examining the generalization capacity of our models on regional Google Trends data and preparing for the paper submission.

© Fraunhofer SCAI
Figure 2

References:
1. Househ, M. (2016). Communicating Ebola through social media and electronic news media outlets: A cross-sectional study. Health Informatics Journal, 22(3):470–478.

2. Lu, F. S., Hattab, M. W., Clemente, C. L., Biggerstaff, M., and Santillana, M. (2019). Improved state-level influenza nowcasting in the United States leveraging Internet-based data and network approaches. Nature Communications, 10(1):147.

3. Lu, F. S., Hou, S., Baltrusaitis, K., Shah, M., Leskovec, J., Sosic, R., Hawkins, J., Brownstein, J., Conidi, G., Gunn, J., Gray, J., Zink, A., and Santillana, M. (2018). Accurate Influenza Monitoring and Forecasting Using Novel Internet Data Streams: A Case Study in the Boston Metropolis. JMIR Public Health and Surveillance, 4(1):e4.

4. Marques-Toledo, C. d. A., Degener, C. M., Vinhal, L., Coelho, G., Meira, W., Codeço, C. T., and Teixeira, M. M. (2017). Dengue prediction by the web: Tweets are a useful tool for estimating and forecasting Dengue at country and city level. PLoS Neglected Tropical Diseases, 11(7):e0005729.

5. McGough, S. F., Brownstein, J. S., Hawkins, J. B., and Santillana, M. (2017). Forecasting Zika Incidence in the 2016 Latin America Outbreak Combining Traditional Disease Surveillance with Search, Social Media, and News Report Data. PLoS Neglected Tropical Diseases, 11(1):e0005295.

6. Odlum, M. and Yoon, S. (2015). What can we learn about the Ebola outbreak from tweets? American Journal of Infection Control, 43(6):563–571.

7. Shin, S.-Y., Seo, D.-W., An, J., Kwak, H., Kim, S.-H., Gwack, J., and Jo, M.-W. (2016). High correlation of Middle East respiratory syndrome spread with Google search and Twitter trends in Korea. Scientific Reports, 6(1):32920.