Data Lab

Stage 1: Topic identification and synthesis

We started this analysis by casting a wide net, collecting large data sets from lots of different sources, including academic journals, journalism and social media. We wanted to build a fuller picture of the emerging social and technological trends and topics shaping the future of the internet.

The next step was to narrow down this large network and lists of topics (we ended up with several thousand) to a much smaller shortlist of keywords we consider vital to shaping the future of the internet.

We then extracted the most useful and important
and Technological topics.

These are the Social and Technological topics that were judged most important

Working across these Social and Technological topics helps us to define the following Ten Challenges for the Internet

Stage 2 - Topic Synthesis

Based our topic identification and analysis, these are 10 key challenges we believe will significantly impact the health of the internet in years to come. They can be divided in three categories: topics that cover the key aspects contributing to the resilience of the internet, three that are key to make the internet more inclusive, and lastly aspects central to creating a net that is more democratic.

1 Sustainable and Fair Infrastructure 2 Cybersecurity and Resilience 3 Trustworthy online information infrastructures 4 Online identities and trust 5 Decentralising power 6 The Right to Opt out & Self-govern 7 Data sovereignty 8 Ethical AI and Machine Learning 9 A Diverse and Safe Internet 10 An Accessible and Open Internet

Sustainable and Fair Infrastructure

One key challenge for the internet moving forward is ensuring that the hardware and infrastructures underpinning it are sustainable and can meaningfully contribute to building a more circular and fair economy. The challenges around the internet’s environmental footprint are myriad: from the extraordinary amount of energy used by data centres and emerging technologies like blockchain, to the costly mining processes behind the materials making our tech devices function. Estimates suggest that by 2020 the global number of connected ‘things’ will reach 21 billion– usually hard to recycle and deliberately designed not to have a long lifespan. Can the planet sustain such explosive growth in connectivity?

Our thirst for internet-enabled devices is not just harming the planet, but also supported by exploitative and dangerous production processes across the supply chain, from the mining of resources often reliant on child labour, to dangerous labour practices in the factories actually building our hardware.

Some of the questions we’re asking:
- How can we ensure the internet and its underlying infrastructures reduce humanity’s environmental footprint, rather than increase it?
- How can we bring more transparency to the supply chains underlying our internet consumption and the devices supporting it?
- How can we become less reliant on toxic and harmful mining processes, and move to a model where we can better reduce and recycle the resources we do use?
- Can we reduce the geopolitical risks associated with our over-reliance on resources mined in often politically volatile countries, and increasingly the subject of a global arms race for minerals?
Cybersecurity and Resilience

One key component of building a more democratic and inclusive Next Generation Internet is ensuring the infrastructures underpinning the internet itself are secure, safe and resilient. We live in a time of growing cyber threats for which we are ill-prepared: from rising cyber crime to ever more sophisticated cyber warfare capabilities, powered by such new technologies as autonomous weapons and quantum computing. Existing weaknesses and flaws in the internet’s physical infrastructure and protocols also require urgent mending. Governments, the private sector and citizens need access to the right tools and information to help them protect themselves against these kinds of threats, and larger systems changes are required to ensure our (critical) infrastructures are resilient in the face of emerging challenges.

Some of the questions we’re asking:
- How can we increase awareness among the general public and in the private sector about the risks associated with cybercrime and attacks?
- Similarly, can we promote more adoption of responsible practices in the development of tools and devices as well spark public demand for safer solutions?
- What are the kind of (cyber)-physical infrastructures we need to ensure resilience not just in the short run, but also into the future?
- As the international community, what can we do to reduce the risk of escalating cyber conflict and internet-enabled hybrid warfare methods? How can we put the necessary treaties and processes- still very much lacking!- in place?
Trustworthy online information infrastructures

One of the biggest scalps of the unbridled growth of the internet is probably the health of our media ecosystems: profits for quality journalism have cratered, and the ad-driven business models supporting information provision online have empowered the sensational over nuance, clicks over truth. The proliferation of “fake news” and the weaponisation of information is a key challenge for the internet today, threatening the very building blocks of our democracies and societies. Ensuring access to trustworthy information, and preventing the deliberate manipulation of information flows without resorting to censorship and hampering of freedom of speech remains an unsolved challenge. We need to not just think more about specific, particularly media-friendly, topics such as fake news, deepfakes, filter bubbles and Twitter bots, but take a wider view in thinking about how we can create healthy (social) media ecosystems and create viable alternative business models for quality news and information sharing.

Some of the questions we’re asking:
- The million dollar question: how can we make online quality journalism profitable? Which business models actually work, and how can we support them?
- We heralded the internet as the great democratiser, giving everyone a stage- also those who we might have preferred not to (extreme political views, trolls, harassment). How do we strike the right balance between freedom of expression and addressing threats to our the resilience of our societies and individual safety?
- ‘Fake news’ has already proven very hard to counter- certainly technological fixes such as AI-detection, appear to be only limited in their effectiveness. How will we deal with the next-generation of manipulation tools, such as deepfake technologies?
Online identities and trust

We are currently seeing a big surge of interest in online identity systems, both driven by private sector and government initiatives. Building a trustworthy and secure system for managing online identities would be incredibly valuable: they might help empower refugees, often left stateless, and ease their access to support systems, they could help make transactions online safer and improve the efficiency of our interactions with government services (see Estonia).

Effective identity management wouldn’t only increase trust on the internet (who am I really talking to? Can I trust the online service?) and so bolster the European digital economy, but would also help us build more personal online relationships. The currently dominant rate-and-review system places a lot of power in the hands of the reviewer (a single low score on ride sharing app can seriously damage a driver’s ability to attract new customers), alternative e-identity and reputation management systems could make these interactions more positive and equal.

Discussions about online identity management systems are only one element in a larger discussion about how we can move to an internet that is build on principles of trust (rather than the “trustlessness” much touted by some in the Blockchain community).

Some of the questions we’re asking:
- How can we promote more secure and trustworthy transactions and interactions online?
- How can we reduce the security and privacy risks associated with centralised identity management systems?
- Many emerging online identity systems (and associated reputation systems) are managed by the private sector and part of walled gardens. How can we democratise control and promote portability across applications?
- Good e-ID systems would be very valuable, but could also easily be turned into a powerful weapon for oppression and surveillance. What role could and should governments play in restoring trust on the internet? Should a secure online identity be as much a citizen’s right as holding a passport, or are the risks too substantial?
Decentralising power

Most of the issues the internet faces today are a direct consequence of the increased monopolisation of power over the internet, and the business models that sustain this dynamic. Even issues that are only seemingly tangentially related to Big Tech’s and certain governments’ disproportionate control become harder to address, because the small number of actors who do have the power to make a change are often unwilling to do so.

We urgently require new business models that can provide an alternative to the reigning surveillance capitalist model, and can sustain a more pluralistic and healthy digital economy. Alternative models, such as platform cooperativism or commons-based approaches can help empower smaller initiatives, help level the playing field and offer better protections to consumers and digital workers. We need to support initiatives and SMEs that operate under these currently still less-sustainable models, through policy (protecting net neutrality, designing next generation competition and antitrust policy), funding, and procurement, as well as promoting adoption among the general public.

Some of the questions we’re asking:
- There is a lot of talk about Europe needing to create its own tech giants. But rather than trying to build the next Google, shouldn’t we focus on building the kinds of infrastructures that would prevent the next Google instead?
- What kind of alternative, more inclusive business models will actually work? Can these models realistically compete, and under which circumstances?
- Power over the internet today mostly comes in the form of access to data, and will only become more important in the age of AI. How can we democratise access (while protecting data subjects’ sovereignty and privacy) and ensure existing power structures don’t get further calcified?
- There is a lot of talk about regulating Big Tech: breaking them up, turning them into public utilities, making them pay for our data… But not enough concrete, realistic proposals for what exactly this regulation should entail, and assessments of their effectiveness. How can we best use policy levers to curtail monopoly power?
The Right to Opt out & Self-govern

With the internet becoming ever more pervasive in our lives and societies, shaping our jobs, our cities, our interactions with the government and so forth, it has become harder and harder for individuals to shape our relationships with, or opt out of, “the internet” altogether.

With the rise of the smart city, and the millions of connected IoT-devices that will underpin it tracking our every move, how do we ensure citizens can meaningfully consent to what happens with the data they generate, and retain their privacy? With everything from our smart vacuums to credit card companies collecting and selling our data to the highest bidder (through very opaque processes), we need new solutions that help citizens give informed consent, as well as the ability to completely opt out of being part of, for example, data sharing systems, while still being able to use key services.

These questions about ownership and consent should not only pertain to our own personal data, but also to, for example, our relationship with our devices and digital ownership rights. The design of newer devices tend to restrict our ability to “tinker” with them (it is a lot harder to build your own smartphone than it used to be to build your own computer), and the growing popularity subscription services like Netflix and Spotify have turned us into content renters rather than owners.

Some of the questions we’re asking:
- Realistically, there is currently no such thing as informed consent when it comes to the tracking of our personal data off- and online. Can we move beyond the meaningless checkboxes and endless user agreements and rethink how users can give actual permission?
- We are usually given a binary choice when asked to give consent: either we agree with Facebook’s terms and conditions, or we do not get to use the service altogether. Can we design pathways to reduced consent that do not lead to complete exclusion from a service?
- Vulnerable groups are more likely to be wary of or unable to share their data. If more and more services have data-driven analysis underpinning them, from where we place streetlights to mental health services provisions, how can we ensure the needs of those who aren’t part of the data set are still represented?
- Can we invent new models for community consent- not just consent on the level of the individual? How can we let communities decide their own “rules of engagement” with, for example, smart city systems or sharing economy platforms?
Data sovereignty

One of the biggest problems the internet faces today, and the main cause of the unequal distribution of power in the digital economy, is the concentration of data in the hands of just a few key players. As the use of data-driven solutions is starting to permeate more and more aspects of our societies, this rat race to get access to our personal data is no longer just a threat to our privacy, but our sovereignty as individuals.

We need to experiment with and build on new models such as personal data stores and data commons models, as well as think about how regulation can be effectively used to help democratise data ownership. Breaking the dominant surveillance capitalist business models will however not be easy, which is why we believe we should be looking at larger systems changes as well- redesigning the internet’s underlying infrastructure to be more decentralised by nature, as well as rethinking the dominant economic frameworks driving the internet today.

Whichever sets of interventions we ultimately go for, re-establishing data sovereignty will also require reclaiming technological sovereignty. Today, too few of the technologies and applications we rely on are designed in Europe. Regulation is ultimately rather reactive- to truly embed European values into the process, we also need to move production of both hardware and software back into Europe.

Some of the questions we’re asking:
- How can we give citizens back control over what happens to their personal data?
- We want to protect our sovereignty and privacy when it comes to our personal data, but at the same time need to recognise that having access to substantial amounts of data is also critical in the global AI arms race, and thus retaining our technological sovereignty more broadly. How do we strike the right balance?
- Personal data stores, data commons, data trusts… We see a lot of alternative models mushrooming up that could help make the relationships between those data owners and data subjects more equal. Which (combination) of these do we need?
Ethical AI and Machine Learning

As discussions about the potential transformative impact of AI and Machine Learning have come to dominate public debate in recent years, so have concerns about the potential negative side-effects of allowing these kinds of technologies to play an ever-larger role in decision-making and the governing of our societies.

We need to embed ethics across the system’s value chain: the development of ethical AI and ML tools should only use responsibly sources and managed data (make sure we have a representative sample, subjects were able to give informed consent, privacy is protected) and algorithms that don’t further existing societal biases (around gender and race, for example) and are accountable. The tools themselves should furthermore only used for purposes we consider ethically just, or at least not malicious. Ensuring we have solutions that are fair and inclusive along the value chain (from the infrastructural level up to the impact of the decisions being made or tasks replaced) is the only way to build truly responsible AI.

Some of the questions we’re asking:
- How can we ensure AI systems are ethical, fair and accountable across all layers of the decision-making process?
- How can we democratise who gets to build AI systems? AI development is now dominated by the private sector and often not particularly benevolent governments. How can we support the creation of AI-for-good?
- When we talk about ethical AI, what “ethics” are we talking about? Which values should these systems embody, and how can we embed them in systems?
- How can we ensure AI systems are resilient to malicious attacks (such as gaming of deep learning algorithms) and conversely aren’t used for malicious purposes (think of emerging applications, such as deepfake technology and cyber- and autonomous weapons).
A Diverse and Safe Internet

The next billion internet users who will move online will look very different from the first billion internet users. Yet, it is still mostly the latter group’s needs who get taken into account in the design of technological solutions and applications. Increasing diversity in who gets to build and use the internet is important if we want to ensure we don’t perpetuate existing inequalities also in the digital economy, but also helps stimulate innovation, as diverse teams tend to be more creative.

Addressing the current lack of diversity in the technology industry will require a combination of ambitious education initiatives helping making training in, for example, computer science more accessible to members of underrepresented groups, but also try to effect culture change within the sector, for example by targeting bias in the hiring process.

Making the internet itself more diverse- whose voices do we allow to get heard and amplified?- not only relies on creating opportunities for those generally marginalised in online discourse, but also making sure the internet provides a space for all. The rise of twitter mobs, doxing campaigns and troll armies has driven many sharing different opinions or from different backgrounds of the internet. To truly foster a diverse internet, we also need to ensure it is safe and free from targeted harassment.

Some of the questions we’re asking:
- Technology is not neutral, but reflects the needs and context of its creators. How can we make the teams leading on developing the next wave of technologies more diverse?
- The internet, in particular social media, has unfortunately made it much easier to target and harass people with different views and ideas than our own. How can we keep the internet a safe space for all?
- We also need to put thought in children’s experiences on the internet. Children are particularly vulnerable when it comes to harmful content, cyber bullying, targeted advertising and even more nefarious ends. How do we provide them with a safe internet without imposing too much control?
- Openness and diversity go hand in hand: how can we make sure the internet remains a place where fringe communities and different ideas can continue to prosper?
An Accessible and Open Internet

Before we can begin to talk about building a Next Generation Internet, we need to ensure that everyone, in Europe and beyond, can have access to the internet to begin with. This means addressing structural barriers to access, through for example investing in infrastructure and broadband access, and promoting multilinguality and accessibility (many websites are still too hard to use for disabled users) online, but also thinking about the more social barriers that prevent people from moving online. Think here about improving digital skills, as well as ensuring that the internet provides a safe environment for all (particularly as online harassment and hostility disproportionately affect more vulnerable groups).

Access is also only truly access, if new users can actually enter the open internet- think for example of the Facebook-sponsored Free Basics programme, which gives people in developing countries free access to a small set of Facebook-approved websites. Is that truly the internet? We see the emergence of walled gardens and other types of internet fragmentation all over the world- from the curtailing of net neutrality, to censorship, link taxes and ultimately splinternets. It is of fundamental importance we protect and champion the open internet, particularly at a time when it is increasingly under siege.

Some of the questions we’re asking:
- How can we move discussions about internet access beyond conversations about broadband deployment, and do more about the also important social and economic dynamics preventing people from using the internet?
- How can we make the internet more accessible to users who are less tech-savvy, have disabilities, or don’t speak major languages like English?
- We see more and more governments and companies cordoning off bits of the internet. How can we counter the trend of the internet getting less open? What role should Europe play as one of the last “safeguards” of the open internet?

Keyword relationship

Based our topic identification and analysis, these are 10 key challenges we believe will significantly impact the health of the internet in years to come. They can be divided in three categories: topics that cover the key aspects contributing to the resilience of the internet, three that are key to make the internet more inclusive, and lastly aspects central to creating a net that is more democratic.

A deeper dive reveals the relationship between social issues and technology.

These keywords are frequently paired together.

Stage 3 - Issue mapping

Issue classification

Articles are classified in two dimensions: eu/us, social issue/technology EU axis: Articles from European sources or concerning Europe, residualized on the social issues axis

Social issues axis: Articles containing words from a pre-defined list of social topics based on latent dirichlet allocation, mapping trending words with article type based on number of occurrences.

Analysis shows a skew towards social issues in the EU...

...and technological issues in the US.

Sentiment Analysis

The sentiment analysis resulted in a compound score for every paragraph containing a given phrase. The score is calculated from the mean of the valence scores of each word in the paragraph apart from the analysed words themselves, which have been removed from the paragraph's text.

Sentiment analyisis shows the most positive and negative terms associated with our topic areas

Stage 4 - Trend identification

Application to explore relevant keywords by source. These terms are trending now or were trending in the past.

Common terms: compare keywords trending in all sources Wikipedia: browse normalized page views for multiple language versions

Trend analysis shows how interest in our themes have changed over the last few years.

Stage 5 - Meetup analyisis

Over time, meetups have spread across the EU and the rest of the continent.

Inside the EU the largest communities are in the core tech hubs of London, Pais, Muniuch and Berlin.

Vibrant communities can also be found in LIsbon, Warsaw and Dublin.

Outside the EU, focal points include Istanbul, Kiev and Moscow

Meetup Topics - The growth of AI

As meetup was founded in the US, it had been less widespread until 2014-2015 in Europe. However, in 2018 the number of EU meetups in chosen areas surpassed the US.

Groups around web development and AI are more much more popular than Agile in the US....

...while in the EU the 3 groups are similar in number.

The share of AI groups is similarly increasing across both sides of the Atlantic: in 2011, AI was the smallest group, while in the last year it almost surpassed the most popular web group.

Meetup Topics - Accross the EU

The figures compare EU member states regarding their shares of meetup groups located in the EU across the 3 different technological areas. The top 3 countries across all categories include Germany (DE), Great Britain (GB), and France (FR). Other countries with strong communities in all areas are: Spain (ES - especially in AI), The Netherlands (NL), and Poland (PL). Ireland (IE) appears in AI communities, Italy in AI and Web, Romania in Web, Sweden (SE) and Belgium (BE) in Agile.

A general trend that can be observed is the expansion of meetup groups to new locations between 2011-2014 and relatively constant differences between countries since then.

While UK, Germany and France led the way, there are now strong communites across Europe.

Stage 6 - Exploring research hubs

Research hubs by country

Story about how research is concentrated

Research hubs by institution

The top 2 institutions overall are located in the EU: Karlsruhe Institute of Technology and University of Toulouse. From the top 10, the sole US based institution is Google Inc, while the remaining 7 are all located in China.

Overall EU-based institututions are ranked 1 and 2, while for AI, China dominates.

Research clusters

These maps show the clusters of institutions whose affiliated researchers were likely to publish together

Showing clusters of researchers likely to publish together

We mostly still see European universities work together with other universities in their own country, and not so much abroad.

Research in countries over time

For a better assessment of the EU’s relative position in research, the aggregate results of the 28 EU member countries is compared to other countries.

The figures show that the overall EU28 research output is the largest across all categories.

The top 10 contributors to ACM are: EU28, US, China, Japan, India, Taiwan, Australia, South Korea, Canada, Taiwan and Brazil.

While the EU still produces more research than anywhere else, its lead has been eaten into by both China and USA.

Stage 7 - Our trending keywords on Wikipedia

These maps show the networks of connected keywords”

Analysis of how our keywords are linked shows clear topic hubs...

DATA LAB

An interactive overview of our work on NGI Engineroom mapping the technological and social issues shaping the future of the internet.

Stage 1: Topic identification and synthesis

Stage 2 - Topic Synthesis

Sustainable and Fair Infrastructure

Cybersecurity and Resilience

Trustworthy online information infrastructures

Online identities and trust

Decentralising power

The Right to Opt out & Self-govern

Data sovereignty

Ethical AI and Machine Learning

A Diverse and Safe Internet

An Accessible and Open Internet