Reglab – centro de estratégia & regulação

Global Copyright Mapping: How Regulations Shape the AI Race

A comprehensive mapping by Reglab analyzed copyright laws across 50 countries, revealing how regulatory disparities may give developed nations a significant advantage in the global Artificial Intelligence (AI) race.

Developed Nations: Embracing Flexibility

In regions such as the United States, Japan, and the European Union, legal frameworks allow for the use of protected works in large-scale automated analysis—even for commercial purposes. This flexibility has accelerated AI breakthroughs and bolstered technological competitiveness.

Brazil and Latin America: Among the Most Restrictive

The study identifies Brazil and much of Latin America as having some of the world’s most restrictive systems. The lack of clear rules regarding Text and Data Mining (TDM) creates legal uncertainty, potentially stifling both private enterprises and research institutions.

The Risks of Rigid Legislation

Our findings highlight that permissive legal environments have been a primary driver of AI innovation in recent years. Conversely, countries with closed or overly rigid rules face the risk of technological isolation and a loss of global competitiveness.


Brazil’s Legislative Landscape: Bill 2.338/2023

Currently under review in the Chamber of Deputies, Bill 2.338/2023 proposes copyright exceptions limited strictly to institutional use. For commercial applications—even those tied to research—the bill imposes a complex and economically unfeasible remuneration model, which could stall national AI initiatives.

Update Note (October 2025): Pages 7 and 8 of the study have been updated to correct the classification of Australia. Its status was changed from “High Permissivity” to “Medium Permissivity,” in accordance with the study’s classification parameters. This correction does not alter the mapping’s overall conclusions.

Citar

RIBEIRO, N.; GARROTE, M. Mapeamento Global de Leis de Direitos Autorais e Exceções para Mineração de Dados. Contexto Reglab. n. 1. Reglab, 2025
Ribeiro, N., & Garrote, M. (2025). Mapeamento global de leis de direitos autorais e exceções para mineração de dados. Contexto Reglab, (1). Reglab.
Ribeiro, N., and M. Garrote. Mapeamento global de leis de direitos autorais e exceções para mineração de dados. Contexto Reglab, no. 1. Reglab, 2025.

Autores

  • Marina Garrote
  • Natália Ribeiro

Tags

Global Mapping of Copyright Laws and Exceptions for Data Mining

[IMAGE 1 — replace with the corresponding image from the PDF]

August 2025

About Reglab

Reglab is a think tank specializing in research and consulting that assists companies, associations, and policymakers in data-driven planning and impact analysis. We focus on responsible and strategic decision-making, unraveling the regulatory challenges of the media and technology sector.

About the Contexto Series

Reglab’s Contexto series presents concise summaries that synthesize specific topics or emerging trends. They are designed to present information in a clear and accessible manner, incorporating visual elements such as charts, tables, and infographics, combining analytical rigor with practicality.

publication details

[IMAGE 2 — replace with the corresponding image from the PDF]
Authors: Natália Góis Ribeiro and Marina Gonçalves Garrote

Researchers: Natália Góis Ribeiro Cartographic Production: Amanda Silva Almeida Final Layout: Eliza Natsuko Shiroma

Suggested citation: RIBEIRO, N.; GARROTE, M. Global Mapping of Copyright Laws and Exceptions for Data Mining. Contexto Reglab. n. 1. Reglab, 2025.

[IMAGE 3 — replace with the corresponding image from the PDF]

This study is part of the Copyright and Technology Observatory. The Observatory is an initiative of Reglab and partners, dedicated to promoting studies and reports on the subject, aiming to strengthen the debate on the topic with clarity and evidence.

  • Text and data mining (TDM), also known as computational analysis,” is a technique used to extract patterns from large volumes of content, an important practice for AI model training and scientific research.
  • This study mapped copyright legislation in 50 countries, focusing on legal exceptions applicable to TDM, to identify how different jurisdictions address the relationship between data mining and copyright.
  • The international landscape is heterogeneous. There is no global standard regarding copyright exceptions applicable to TDM, with significant variations between countries.
Unequal access to TDM and the global nature of the internet may amplify global asymmetries in AI. High-income countries
concentrate the most permissive laws, while other countries still maintain restrictive regimes.
  • Brazilian law presents uncertainties regarding the legality of TDM on protected works, which may create legal uncertainty for technical, scientific, and commercial uses.
  • The adoption of express TDM exceptions in Brazil, far from meaning automatic alignment with external models, may constitute a strategy for strengthening technological sovereignty.

introduction

Amid the global race to drive the development of Artificial Intelligence (AI), countries have adopted different responses about what can, and what cannot, be used to train these systems.

Large-scale data extraction and analysis techniques, such as text and data mining (TDM), are essential for the advancement of AI. But their legal framework is still uncertain, especially given copyright rules designed for another era.

While some countries have introduced exceptions in their copyright laws for computational uses, others maintain strict interpretations. Brazil began discussing this topic in Congress with Bill 2338/2023, which contains specific provisions on the subject in its current wording. To understand the risks and possibilities of AI regulation in Brazil regarding copyright, it is important to understand how different countries are dealing with the issue. This brief, as part of Reglab’s Contexto series, seeks to translate comparative evidence on legal exceptions in copyright regimes into visual and accessible language, offering a starting point for the Brazilian regulatory debate.

We recognize that the topic involves legitimate tensions, and it is from this perspective that we seek to contribute to the debate with new perspectives.

Regulation can both protect creators, reducing barriers to access to knowledge, and perpetuate economic asymmetries, making full use of AI a

privilege of the few.

what is text & data mining?

Steps involved in Data Mining

Training AI models involves complex steps, from data collection and curation to parameter tuning and performance evaluation, using different techniques to analyze large volumes of information. Among them, text and data mining (TDM) stands out as one of these tools, focused on automated analysis of content at scale.

It is a technique that predates the generative AI boom, used in various fields to identify patterns, trends, and

Data collection Gathered from different sources

Pre-processing Preparation and transformation of data

correlations in large volumes of data

TDM is a technique widely used by different sectors and for various functions. During the COVID-19 pandemic, TDM was essential for accelerating scientific discoveries, enabling the automated analysis of thousands of medical articles. Governments and public organizations also use TDM to detect disinformation and hate speech on social media, support the design of public policies, and monitor climate change. In the private sector, the technique is applied in search systems, automatic translation, voice recognition, and even investigative journalism, helping to cross-reference large volumes of documents to identify patterns and inconsistencies.

Data organization

Quick access and data storage for search and mining

Mining Algorithmic inference and information extraction

Analysis

User analysis and navigation

and after all, is TDM compatible with copyright laws?

It depends. That is why, with this work, we seek to understand which legislations already provide for possible copyright exceptions for AI training.

any automated analysis technique intended for the analysis of texts and data in digital format, in order to produce information, such as patterns,

We drew on the article “Legal Reform to Enhance Global Text and Data Mining Research,” from 2022, to identify whether and how countries provide, in their laws, exceptions that authorize the use of protected works for TDM, whether in scientific, institutional, or commercial contexts.

We know that not all TDM use is for AI training, and not all AI training depends solely on TDM. However, our research revealed that, in different countries, a broad legal definition of TDM has been adopted that may also include use for AI training.

Because of this, and in the absence of a Brazilian definition, we used the one from the EU’s Directive 790/2019 as a methodological reference, which defines Text and Data Mining as:

trends and correlations, among others” (EU, 2019).

We then analyzed copyright laws available in the WIPO repository, supplementing with secondary data when necessary. Our sample included 50 countries, which were classified into three levels of permissiveness (high, medium, low):

High permissiveness: This category includes countries that, cumulatively: (i) expressly authorize TDM; (ii) do not restrict the user profile; and (iii) permit commercial use of the results. It also includes common law countries, such as the USA, where the fair use doctrine has been interpreted by courts as permissive to the practice in several cases analyzed to this date.

Medium permissiveness: This category includes countries with legislation that: (i) limits TDM exceptions to specific groups (e.g. universities, libraries), and/or (ii) restricts permissions to non-commercial uses only.

Low permissiveness: Jurisdictions whose legislation does not provide for express exceptions to the practice of TDM.

High permissiveness: Germany, Austria, Belgium, Croatia, Denmark, Slovenia, Spain, United States, Estonia, Finland, France, Netherlands, Ireland, Japan, Luxembourg, Malta, Portugal, Czech Republic, Singapore, Sweden.

Medium permissiveness: Australia, Canada, China, South Korea, United Arab Emirates, Ecuador, Philippines, Indonesia, Malaysia, Mexico, Nigeria, New Zealand, United Kingdom, Switzerland.

Low permissiveness: South Africa, Saudi Arabia, Argentina, Bangladesh, Brazil, Kazakhstan, Chile, Egypt, India, Iran, Morocco, Pakistan, Russia, Thailand, Turkey, Vietnam.

deep dive

In the United States, the fair use doctrine has so far permitted data mining practices. In cases prior to the generative internet boom, U.S. courts generally authorized TDM, including commercial use, when the uses are lawful, transformative, and do not harm the original market. Cases such as Authors Guild v. Google (2011), Hathitrust (2014), and Fox v. TVEyes (2014) illustrate this favorable interpretation. However, this understanding has been challenged in recent actions brought to trial, such as The New York Times v. OpenAI (2023) and Authors v. Anthropic (2025), which discuss the limits of TDM practice in training generative AI systems. The U.S. Copyright Office, in a report published in 2025, reinforces that there is still no definitive answer on the application of this doctrine in these cases, highlighting that the analysis must be done on a case-by-case basis, based on the four fair use factors.

Since 2018, Japanese law has permitted protected works to be used without prior authorization, including for commercial purposes, provided the use is aimed at information analysis and not at the creative reproduction of the work itself. This exception authorizes, in practice, TDM techniques, without the need for licensing or payment to rights holders. Recently, the government’s Agency for Cultural Affairs reinforced that, in that agency’s interpretation, the use of copyrighted works for AI training is, in principle, permitted without prior authorization from the rights holder.

The European Union has Directive 790/2019, which introduces specific exceptions for TDM, both for scientific research purposes and for general purposes, including commercial ones. As it is a directive, there is a dependency on transposition into each national law, which in practice allows variations between countries in how the directive may be implemented.

The EU also has an AI law: the AI Act. Although the text of the law does not directly address the issue of copyright, Recital 105, which provides interpretive guidance, reinforces that, with respect to data mining, the rules of the Directive must be respected, including the opt-out mechanism, which allows copyright holders to prohibit the use of their works for this purpose.

In Singapore, the copyright legislation explicitly authorizes the copying and use of legally accessed content for computational analysis purposes, including the identification of patterns through software. Commercial use is also permitted in the country.

Although there is no specific legal exception for the practice of TDM, South Korean legislation permits the use of protected works for educational purposes, research, citation, personal use, and in libraries, provided the use is within reasonable limits and does not cause harm to rights holders.

The United Kingdom has recognized the practice of TDM as an exception to copyright protection since 2014, allowing individuals to make copies for computational analysis purposes, provided there is no commercial purpose. In June of this year, the country approved the Data (Use and Access) Act. In one of its sections, the law requires the government to prepare reports assessing the economic and legal impacts of different regulatory models, such as the feasibility of exception regimes with the possibility of opt-out by rights holders and the adoption of transparency mechanisms regarding data used in the training of models.

In 2021, a bill was proposed creating a broad and specific exception for TDM, but the proposal is still under discussion.

China demonstrates medium permissiveness for data mining practices. The Copyright Law, reformed in 2020, allows a series of exceptions for teaching purposes, non-profit scientific research, preservation by libraries, and accessibility, but without expressly mentioning TDM practices.

Ecuador is the only country in South America whose copyright legislation expressly mentions the practice of TDM, authorizing “text mining” within the scope of libraries and archives. However, the scope of the exception is limited, applying only to institutional use, such as in libraries and archives, and is conditioned on the good faith of users and respect for “fair uses,” which evaluates factors such as purpose, extent of use, and impact on the market of the work.

The legislation of Nigeria does not expressly mention data mining, but brings the concept of fair dealing, which permits the use of protected works without prior authorization from the rights holder, provided certain conditions are met, such as reasonable, proportional use that does not cause unjustified harm to the market for the original work. The exceptions apply to both institutional users and the general public.

South Africa has more restrictive rules on the use of works protected by copyright. The legislation in force permits some uses called “fair dealing,” which authorize only the use of excerpts of works for personal study, teaching, or reporting, provided they do not harm the market for the work.

India adopts a fair dealing regime, with copyright exceptions restricted to non-commercial uses, such as study, research, and journalism, and does not provide a specific exception for TDM. The country seeks to develop a broader AI governance framework. In 2025, a Multi-Sectoral Group was created to formulate a regulatory framework, following the publication of a report with recommendations aimed at building a responsible ecosystem, submitted for public consultation. The document dedicates a section to copyright, raising normative questions and suggesting the need for clearer guidance on the use of protected data in the context of AI.

and in Brazil?

Brazil, like most Latin American countries, stands out for having more restrictive copyright legislation. The Copyright Law (Law No. 9,610/98) has not been updated to contemplate TDM practices, nor does it encompass broad exceptions. Even though there are legitimate discussions about the scope of the rights provided in the law, there is a scenario of legal uncertainty.

Bill 2,338/23, currently under discussion in the Chamber of Deputies, proposes a limited exception for the practice of text and data mining, allowing this activity only when carried out for the purpose of research and development of AI systems, and exclusively by scientific and educational institutions, museums, archives, and libraries.

[IMAGE 4 — replace with the corresponding image from the PDF]

In these cases, the use of protected content may be carried out without prior authorization from rights holders, provided there is no express restriction by copyright holders (opt-out). The text of the Bill also allows rights holders to retroactively claim against the use of their works, even after the system has been trained. Outside of these situations, the copyright holder could prohibit the use of their works for data mining purposes. The Bill also establishes that AI agents who use protected content in mining, training, and development processes must remunerate the holders of those works.

to learn more about the limits and consequences of the Bill 2338/23 proposal, access our study on the subject!

access our study here

analysis and commentary

the authors’ perspective

The map reveals a fragmented landscape: there is no regulatory convergence on copyright exceptions for data mining, which affects legal certainty for research and AI training. However, attention must be paid to the geographic distribution: in general, more permissive regimes are concentrated in developed countries, while restrictive legislation prevails in Global South economies.

The absence of exceptions may impact academic production and local innovation capacity, widening the gap between countries. Furthermore, the cross-border nature of the internet allows companies to shift AI training activities to more open jurisdictions, reducing the practical reach of restrictive national legislation.

Although some resistance to exceptions may come from sectors dependent on the exploitation of intellectual property rights (e.g. the publishing sector), the experience of the EU and the US shows that well-designed exceptions can coexist with sustainable business models — including opportunities for database licensing (the opt-out model).

Finally, concerns about technological sovereignty and the dominance of large technology companies are legitimate, but restricting TDM exceptions may produce the opposite of the intended effect. Well-structured exceptions can enable the local development of AI systems and the creation of competitive alternatives, including by small developers, universities, and research centers. That is why the discussion on copyright can move forward without setting aside other legitimate debates, such as data protection, content moderation, and competition policy.

directions for future studies

The Contexto series comprises summarized, non-exhaustive studies, with the objective of generating debate and encouraging future research, which may adopt different approaches to complement this work:

exhaustive mapping of countries, including analysis of bills currently under discussion;

assessment of social costs and benefits, grouping different stakeholders and measuring externalities;

economic analysis of the effects of different regulatory alternatives, including the real gains or losses of copyright holders in each scenario;

simulation of future scenarios, using techniques such as game theory, Delphi, or Shell to estimate the social consequences of different regulatory decisions.

Reglab methodology annex

FORMAT: CONTEXTO STUDY

Title of the Study Global Mapping of Copyright Laws and Exceptions for Data Mining
Research Question How do copyright laws around the world currently address exceptions
that enable the practice of text and data mining (TDM)?
Methodology Summary This study is a synthesis of research with an exploratory,
comparative, and non-exhaustive character.
Data collection relied on documentary analysis techniques,
based on Flynn et al. (2022), with data processing
conducted through deductive matrix content analysis.
Data Collection Documentary analysis of legislation from the
World Intellectual Property Organization (WIPO) repository.
The sample comprised 50 countries, selected as follows:
(i) 30 countries with the largest number of internet users
(Statista, 2025); and (ii) 20 countries selected through purposive sampling,
based on legal diversity, leadership in technological innovation,
and explicit references to TDM practices.
Data Analysis Through content analysis, information was systematized in spreadsheets
using standardized codes (e.g., type of copyright exception,
legal basis, etc.). Based on these initial codes, countries were classified
into three levels of permissiveness (high, medium, and low),
defined deductively. This classification was visually represented
in a world map, created using the Geographic Information System (GIS)
Quantum GIS (QGIS) 3.40.4 for operationalization.
Bias Reduction Procedures
  • Method triangulation:
    In addition to research in the WIPO database,
    exploratory searches were conducted using search engines
    and AI tools to find supplementary references,
    such as judicial decisions and academic literature,
    to confirm categorizations.
    Any caveats to classifications were noted in the
    deep dive section.
  • Double validation and cross-validation:
    At least two researchers reviewed the collected data
    and categorizations.
    We thank researcher Isabella Cristina Pereira,
    who kindly provided data from her ongoing research
    for cross-validation, ensuring verification through
    an alternative analytical method.
Other Methodological Limitations
  • Dependence on Open-Access Sources:
    The study relied on public databases, news portals,
    and open-access academic journals,
    which may limit the breadth of analysis.
  • Non-Exhaustive Nature:
    The Context series delivers concise,
    exploratory analyses, subject to methodological limitations
    regarding scope and duration.
  • Data Collection Period:
    The study considered only data available up to July 25, 2025.
Software used MS Office Suite, ChatGPT, Perplexity, WIPO Lex, The Eckert IV.
Ethical Guidelines
  • Research Funding:
    This study is part of the Observatory on Copyright and Technology,
    an initiative of Reglab supported by AWS, Google, Meta,
    Microsoft, and YouTube.
    RegLab retained full editorial control and autonomy
    in defining the scope, objectives, and methodology.
    The authors bear sole responsibility for the content
    and analyses presented in this work.
  • Use of Public and Documentary Data:
    The research relied exclusively on public and documentary sources,
    such as national legislation available in the WIPO repository
    and official country documents.
    No personal data was collected,
    nor were interviews with individuals conducted.