How language technology can help fight COVID-19
BLOG POST by Khalil Rouhana, Deputy Director-General of DG Connect at the European Commission.
30 November 2020
In the fight against
the Coronavirus pandemic, language technology (like Natural Language
Processing) might not seem an obvious partner for medical research – and yet it
is playing a vital role.
MLIA, the COVID-19 MultiLingual Information Access initiative
Since March 2020,
one of the top priorities has been to find approaches to fight the new disease
that has hijacked the planet for months and has changed the way we work and
live our lives. To do this, we need to analyse and understand how the virus
works and how we can stop it. As I write this, in mid-November 2020, the second
wave of COVID-19 is hitting. So we need to act fast.
The scientific
community is working relentlessly, and the European Union is supporting
research by mobilising millions of euros. The EU-funded Exscalate4Covid
project, using European supercomputing, identified a molecule in an already
existing drug, known as Raloxifene, as a promising means of treating
mild-to-moderate COVID-19 patients. In addition, it also helps prevent the
disease from progressing towards severe and critical symptoms. A clinical trial
has been launched and the project will continue working to identify other
molecules.
However, the EU is
looking at every possible opportunity to advance research. Did you know that
more than 3,000 scientific articles are published in biomedical journals every
day? It is obviously impossible for researchers to go through them all in real
time and even more difficult for the public to access all the available
information. Back in March, the big questions were: what can the digital world
do to improve, accelerate, and simplify the work of thousands of researchers in
Europe and beyond? How can we bring together pieces of information from various
sources and in different languages? How do we share this information with
citizens?
To answer these and
other questions, the Commission joined the organisers of MLIA, the COVID-19
MultiLingual Information Access initiative. This project functions on a
voluntary basis, to support fast information exchange and accurate
communication in a multilingual environment, covering all EU official languages
and many more. A similar initiative exists in the US but only in English: the
COVID-19 Open Research Dataset (CORD-19), a language data competition to
analyse a large set of scientific papers on the virus. In Europe, we want to go
the extra mile and overcome language barriers, and the European Commission is
supporting the initiative through its language resources initiative
(ELRC-Share).
The idea is to
create resources and tools for improved information access, enabling us to come
up with sustainable methods to tackle this and future crises from a
language-related perspective.
This includes
finding an algorithm able to crawl, aggregate and present data from various
sources. It will not only process structured data (such as the numbers of cases
and length of incubation period), but also unstructured and textual data
contained in reports, studies, articles and so on. The final objective is to
create resources and tools for improved information access based on a large
multilingual data collection on coronaviruses and COVID-19, regardless of the
language, level of linguistic knowledge and the social background of the
public.
Where does the data
come from? The MLIA initiative is based on sharing: European institutions,
universities, private companies, and several news providers in the EU have
agreed to let the developers use their databases and content to make the
challenge possible. So far, more than 40 participants have joined in, aiming to
bring their best technical skills to the challenge. Among the contestants, we
have universities and IT companies from Europe and all over the world –
including Australia, China, India, Jordan, Saudi Arabia and Botswana, to name a
few. The project will consist of three rounds: the first will end in January
2021 and the final one in May 2021. By then, it should be possible to aggregate
and summarise various sources of information into a single coherent synopsis or
narrative, complementing different pieces of data, resolving inconsistencies,
and preventing misinformation.
This year has shown
the importance of unity in times of crisis. More than ever, it is crucial to
join forces, share knowledge, tools and ideas, all across Europe and beyond.
The MLIA initiative is on the right path towards bringing research communities
together, shifting the focus from competition to collaboration and helping us
fight COVID-19 more effectively.
Last update: 30
November 2020
Link to DG Connect here/.
Link to Covid-19 MLIA
Eval here/.
Aims and Scope
In the current
Covid-19 crisis, as in many other emergency situations, the general public, as
well as many other stakeholders, need to aggregate and summarize different
sources of information into a single coherent synopsis or narrative,
complementing different pieces of information, resolving possible
inconsistencies, and preventing mis-information. This should happen across
multiple languages, sources, and levels of linguistic knowledge that varies
depending on social, cultural or educational factors.
Covid-19 MLIA
Eval organizes a community evaluation effort aimed at accelerating the creation
of resources and tools for improved MultiLingual Information Access (MLIA) in
the current emergency situation with a reference to a general public use case:
Sofia has heard that
a drug has been experimented in different countries and she would like to have
a consolidated and trustworthy view of the main findings, whether the drug is
effective or not, and whether there are any adverse effects.
Distillation for the
general public also implies a level of specialist-non-specialist communication,
when the aggregated sources contain both disseminative and specialised sources.
Therefore, the general public would need to understand medical expertise by
using their correspondent in the "popular" language or by using an
appropriately calibrated language for the communication to be effective.
Community Evaluation
Effort
Covid-19 MLIA
Eval is an evaluation effort promoted by several communities which are closely
working together.
Link to Glossary here/.
See the links;
https://ec.europa.eu/digital-single-market/en/blogposts/how-language-technology-can-help-fight-covid-19
http://eval.covid19-mlia.eu/
https://ec.europa.eu/digital-single-market/en/glossary