Post

Making sense of the COVID-19 information maze with text-mining

The COVID-19 pandemic has brought with it an ‘infodemic’, flooding society with myriads of conflicting ideas and opinions. To help cut through the noise, we applied some of our data tools to map recent developments and understand how technology is being used and discussed during the crisis.

Register to attend our webinar to discuss this research, Wednesday 3rd June 2020 at 5 PM CEST

We want these insights to be as useful as possible and are keen to adapt and analyse the data in different ways to answer your burning questions. We invite you to join us in a webinar to discuss our methods and results, and exchange ideas about the most pressing tech challenges.

You can also view our full analysis at https://covid.delabapps.eu

Trending terms in news articles from our analysis.

The COVID-19 pandemic has brought with it an ‘infodemic’, flooding society with myriads of conflicting ideas and opinions. To help cut through the noise, we applied some of our data tools to map recent developments and understand how technology is being used and discussed during the crisis.

As part of NGI Forward’s work to create data-driven insights on social and regulatory challenges related to emerging technologies, we have developed various data-science tools to analyse trends in the evolution of Internet technology. In our previous studies, we focused on such areas as the content crisis in social media, regulating tech giants or cybersecurity.

Now we have opened our toolbox and mapped recent developments in the fight against COVID-19, to bring some clarity to how the crisis is evolving. We concentrated on four major areas:

  • Online tech news
  • Open-source projects at Github
  • Discussions on Reddit
  • Scientific papers

Mixed feelings on COVID tech in the news

First, we examined trends in 11 respected online news sources, such as the Guardian, Reuters or Politico. Based on the changes in the frequency of terms, we identified trending keywords related to COVID-19 and the world of technology. This enabled us to focus on key issues such as contact-tracing, unemployment or misinformation in the following sections of the analysis.

Next, we analysed terms that are frequently used together, or co-occurring, (e.g. “contact-tracing” and “central server”) to see how technology was associated with different aspects of the crisis.  We also measured the sentiment of the paragraphs containing these word pairs to understand whether coverage of COVID technology issues is positive, negative or neutral. As an example, we identified the key actors, initiatives and challenges related to contact-tracing, focusing on EU-wide projects such as PEPP-PT. 

The table below shows terms co-occurring with ‘contact-tracing’, ranked based on sentiment scores. DP-3T and TraceTogether are more associated with positive sentiments, while discussion of privacy and mission creep show that there are  concerns about the implementation of these systems.

Mapping the COVID tech ecosystem

Alongside this specific analysis, we have also mapped articles based on their vocabulary and topic. You can explore the main areas of technology news with characteristic words in these interactive visualisations.

The map below shows the clusters of news articles covering specific technologies and tech companies.

Throughout the crisis, numerous programmers have devoted their time to developing open-source tools to support the fight against COVID-19. We collected COVID-19-focused projects from Github, the software platform where much of this development is taking place, to examine various trends about location, aim and technology. You can find an overview of the top 50 most influential repositories on our analysis page. Perhaps you will be inspired to get involved!

The map below shows the number of Github projects related to COVID-19 in the week commencing 20th April 2020.

Tracking changes in social media

Looking next to social media, we examined activity on Reddit to uncover relevant changes. By analysing the text of posts and comments, we discovered a surge in discussions related to the job market, mental health and remote work. Our analysis also provides insight into the changing perception of lockdown measures and growing lockdown fatigue.

The graph below shows a sharp increase in Reddit discussions about unemployment in the latter half of March 2020.

Social science counts the consequences

Finally, we also examined trends in scientific journal articles related to COVID-19. Analysing articles from the social sciences gives us a broader picture than news articles, and we found increasing discussion of the immediate consequences of the pandemic and lockdown. The trending words range from health-related (pneumonia, infectious, epidemiology) ones to more common for social sciences: economic recession, policy or GDP. 

The word cloud below shows some of the most common terms in social science articles relating to COVID-19.