Privacy-Enhancing Technologies (PETs) in AI: Potential and Barriers in Brazil

September 30, 2025

Emerging Privacy-Enhancing Technologies (PETs) hold immense potential for protecting personal and sensitive data within Artificial Intelligence systems. However, according to a Reglab study, their implementation in Brazil still faces significant hurdles.

The Study: Privacy in Layers

The research titled “Privacy in Layers: The Role of Privacy-Enhancing Technologies (PETs) in AI Systems” explores how these solutions can mitigate privacy risks. The report highlights that despite the innovative nature of PETs, their adoption in Brazil remains low and fragmented.

Primary Barriers to Adoption in Brazil

Cybersecurity and data protection experts identify three main obstacles to the advancement of these technologies in the country:

High Implementation Costs: The financial burden of deploying advanced PETs.
Lack of Technical Standardization: Absence of uniform protocols and industry standards.
Low Digital and Privacy Literacy: A gap in the specialized knowledge required to manage these tools.

These factors prevent companies and institutions from incorporating solutions that could significantly elevate data protection levels in AI projects.

The Need for Holistic Governance

Reglab’s study concludes that the isolated use of PETs does not guarantee full data protection. To be truly effective, these technologies must be integrated into a robust governance strategy that combines:

Advanced technical resources.
Continuous professional training.
Digital literacy.
An organizational culture oriented toward privacy.

Without this holistic approach, even the most sophisticated solutions may leave security gaps that compromise sensitive data.

Read the full study

Citar

RAMOS, P. H.; NOMURA, D. N. S. Privacidade em Camadas: O Papel de Privacy Enhancing Technologies (PETs) em Sistemas de Inteligência Artificial. Policy Briefs RegLab, n. 3. São Paulo: RegLab, 2025.

Ramos, P. H., & Nomura, D. N. S. (2025). Privacidade em camadas: O papel de privacy enhancing technologies (PETs) em sistemas de inteligência artificial (Policy Briefs RegLab, n. 3). RegLab.

Ramos, Pedro Henrique, e Daniela Naomi Shimabukuro Nomura. Privacidade em Camadas: O Papel de Privacy Enhancing Technologies (PETs) em Sistemas de Inteligência Artificial. Policy Briefs RegLab, n. 3. São Paulo: RegLab, 2025.

Autores

Daniela Naomi Shimabukuro Nomura
Pedro Henrique Ramos

Executive Summary

The adoption of Generative Artificial Intelligence and systems based on personal data has brought new layers of complexity to the privacy debate. In this unprecedented study in Brazil, Reglab investigated how Privacy Enhancing Technologies (PETs) can be applied to mitigate risks in the training and operation of these systems, based on the vision of experts in AI, data protection and cybersecurity.

Qualitative interviews revealedconsensus on the central role of PETs in reducing the risk of re-identification, abuse or undue exposure. Although they do not completely eliminate vulnerabilities, these technologies provide additional layers of protection that make data processing more reliable and capable of providing greater security to organizations that operate with AI solutions.

Among the main findings, the following stand out:

Risk Mitigation: PETs improve the protection of personal data, but are currently unable to eliminate existing privacy risks;
Project feasibility: technologies such as federated learning and trusted execution environments make possible high social value initiatives, such as health research, that would previously have been unfeasible due to privacy risks;
Persistent limitations: experts emphasize that there is no “zero risk” in cybersecurity and that many solutions still lack extensive testing, which limits market confidence and delays investments;
Fragmentation in Brazil: The practical use of PETs in the country is still punctual and poorly systematized, making gains in scale and integration with broader organizational strategies difficult.

Based on these elements, the study proposes the Layered Privacy Model, an unprecedented framework that organizes data protection into five interconnected levels:

Institutional environment (regulation and external pressures);
Literacy (privacy education and training);
Governance (internal strategic alignment processes);
Information security (robust cybersecurity infrastructure);
PETs (technical risk mitigation tools).

This model allows you to visualize different degrees of maturity of organizations and guide both companies in self-assessment and definition of investment priorities, as public policy makers in building regulatory environments that encourage the responsible adoption of privacy-preserving technologies.

The main contribution of this study is to show that PETs only realize their potential when integrated into a broader governance, literacy and security strategy. In isolation, these technologies are not sufficient to guarantee full protection, but together they can consolidate a more robust regulatory and business ecosystem, aligned with the responsible use of data.

Introduction

The importance of artificial intelligence (AI) in society is no longer a subject of debate: it has become a consolidated fact. It is a transformative element in several dimensions: in the economy, with the potential to boost global GDP by up to 7 trillion dollars and increase productivity growth by 1.5% over 10 years (Eoldman Sachs, 2023); and in organizational management, being able to reduce the time dedicated to people management by 10% (Edin et al., 2025), just to name a few examples.

The current challenge seems to be not only in recognizing the benefits of AI, but in develop governance models that accompany its accelerated dissemination and the disruptive processes it triggers. And one of the most complex topics is the relationship of this technology with privacy and protection of personal data.

As different applications become integrated into everyday life and critical decisions in society, questions about how to protect personal data and comply with legislation on the subject have been appearing more and more frequently in headlines, court decisions and discussions in legal forums. On the other hand, there appears to be an equally relevant technological advance in the field of technical information security solutions. Between these fields, it is not uncommon to perceive that there is a gulf of knowledge between them.

This study aims to be a bridge between the field of data protection and advances in privacy-enhancing technologies – also known, in English,

like PET scans. We want to explore how these technologies can redefine the contours of data governance in AI systems, especially in a scenario where regulatory frameworks seek to keep up with the speed of technological development.

What is AI and why does it matter?

As in any research work, it is necessary to clearly define our object of analysis. In this study, we adopted the following definitions:

AI Systems: software systems that process and analyze data through algorithms and mathematical models, using statistical and computational techniques to identify patterns, make predictions and generate specific results. For teaching purposes, they can be classified into two main types:
Analytical AI Systems: systems designed to solve specific problems from structured data sets, operating within pre-defined parameters. They function as “sophisticated calculators” that perform deterministic tasks such as classification, prediction or

recognition of existing patterns (Barr, 2023). Examples include banking fraud detection systems and personalized recommendations from virtual assistants.

Generative AI Systems: systems that employ statistical and machine learning techniques to generate new content, such as texts, images and videos (Barr, 2023). They are based on large language models (LLMs) that transform training data into mathematical representations (vectors) that capture statistical patterns and correlations. Currently, the most popular applications are chatbot models (Stryker; Scapicchio, 2024).

Enerative AIobs. AGI systems can also combine AI Analytics features

Personal datais information related to an identified or identifiable natural person and that can be treated in different ways in AI systems, either as input data entered directly by a user, or as part of the sets analyzed during training. We will explore this relationship in item 2.1 below.

The Methodological Proposal of this Research

Between 2016 and 2024, Several countries have developed or updated their personal data protection legislation. Most of these laws, such as the 2018 General Data Protection Law (LGPD), were established before the spread of generative AI applications and, consequently, do not always provide clear guidance for emerging challenges.

In this scenario, it is It is important to investigate practical knowledge about how the market itself, which works with different dynamism and speed than legislators, has developed technological solutions to protect rights in the field of AI, adding layers of security to reduce risks and foster confidence in its adoption.

This research examines the relationship between data protection and PETs in AI systems in Brazil. Our objective is to indicate paths that support the development of good practices and privacy by design guidelines, facilitating the incorporation of privacy from the initial stages of the development and use of AI systems by Brazilian companies.

The study is based on one of Reglab’s main methodological premises: the policy translation approach, still little explored in digital governance in Brazil. It is a methodology that emphasizes the active process of interpreting and adapting complex research findings into formats that are understandable, relevant, and applicable by public policymakers (Ingold; Monaghan, 2016).

When we use colorful charts, graphs, examples and anecdotes, we do so intentionally. We recognize the risk of possible technical inaccuracies, but we understand that translating complex evidence into applied knowledge requires making content clearer and more accessible. This is a necessary methodological choice — and a position that we take with complete transparency.

For our data collection methodology, we chose a different approach from conventional bibliographic reviews and documentary research: in-depth qualitative interviews. Inspired by reception studies methods, we seek to understand how professionals who face technical cybersecurity challenges dailyunderstand the processing of personal data in AI systems, which PET tools are available, how they are effectively implemented in practice and how they contribute to better protecting individuals’ privacy.

Over the course of a month, we carried out 11 interviews with experts, focusing on senior-level professionals with practical experience in cybersecurity and compliance issues in personal data protection. The interviews followed pre-defined scripts and confidentiality protocols, with their transcriptions and memorials evaluated using the Atlas.ti software using the thematic analysis technique.

Man, business sector, machine learning engineer
Man, academic sector, cybersecurity professor
Man, academic sector, cybersecurity professor
Man, business sector, systems engineering consultant
Man, business sector, cybersecurity consultant
Man, business sector, cybersecurity researcher
Woman, business sector, privacy and security management consultant
Man, business sector, cybersecurity consultant
Woman, business sector, information security and privacy consultant
Woman, business sector, privacy consultant
Woman, academic sector, data scientist

The complete methodology, with details on the procedures adopted, can be found

at the end of the study.

Main Results

How is personal data used in AI models?

FUNDAMENTAL CONCEPTS

In Analytical AI, personal data is generally processed in a structured way, on delimited bases and maintained by organizations. This format increases protection obligations, but facilitates the application of the principles of minimization, purpose and legal basis, as the contexts of use are defined in advance. In Generative AI, large language models (LLMs) are trained on massive volumes of data collected on the internet. This data is fragmented into units called tokens and converted into mathematical representations (vectors), which capture statistical relationships between words and phrases. As a rule, models do not directly store personal databases. However, as they work by statistical patterns, information that is frequently repeated in training can be “remembered” and reproduced in the results, since vectors function as knowledge representations. Models can also generate personal data that is non-existent or not seen in training. In these cases, it is about statistical inferences, not memory. This is what happens, for example, when the model generates combinations of numbers in the format of a valid CPF, even without having stored that data.

Interviews with experts helped to understand how the use of personal data works and how they are anonymized in the processing processes. However, although there seems to be a uniform knowledge when interviewees talk about systems

of Analytical AI, we observed divergences and gaps of knowledge as we delve deeper into the functioning of Generative AI systems and, particularly, LLMs — which received more prominence in the interviews.

Data usage – Analytical AI

On call Analytical AI, interviewees highlighted that personal data is treated in a more structured, delimited and directly linked to the purposes of the project.

Structured Data: At some point during the treatment, the information is organized into standardized formats that allow classification or prediction, such as tables in databases or sets of numerical variables.

Delimited Database: When personal data is used, these records are associated with individuals explicitly (e.g. CPF, medical record, account number) or through techniques such as pseudonymization, which replaces direct identifiers with codes.

Defined purpose: The data is processed to solve specific problems, such as analyzing credit risk, detecting fraud in financial transactions or supporting assisted medical diagnoses.

“In practice, these models are mathematical functions that are trying to find patterns in the data. A classic example is if you have structured data in table format, which you use as input and will use a specific function to interpret that data” [Interviewee K]1

In general, respondents highlighted that Analytical AI offers smaller scale and greater precision: the data volume may be smaller, but the quality and relevance Individual information is usually more decisive for the model’s performance.

However, some interviewees highlighted that these models have greater risks of re-identification, being, compared to LLM models, even most dangerous for privacy [Interviewee B].

This happens because the data remains in structured databases and, even if pseudonymized, can be easily reconnected to individuals when crossed with other sources. This operation is different from LLMs, which tend to dilute patterns in large-scale statistical representations, as we will see later.

Data Usage – Generative AI and the role of LLMs

Whether explicitly or implicitly, different interviewees explained the use of personal data in IAG models based on two phases:

Training: moment when large volumes of data, potentially containing personal information, are collected and transformed into tokens to generate statistical representations. The risk lies in the possible memorization of recurring passages, which may reappear in the answers.
Inference: stage in which the model generates outputs based on new user prompts. Here there is no direct access to the original database, but rather to statistical recombinations. Still, personal information can emerge both through memorization and through plausible inferences produced by the model’s pattern.

DIDACTIC EXPLANATIONTraining: it’s like when someone studies for a test by reading dozens of books and “learns” patterns based on the knowledge acquired. If information appears repeatedly in this material (e.g. the periodic table), the model will end up “memorizing” and reproducing this information later. Inference: it is like using this knowledge in a test without consultation: the model does not search the original books, but combines statistics to create the answer. Thus, he can either repeat something memorized or invent something new that seems true.

1 In order to preserve the anonymity and confidentiality of research participants, specific changes were made to the quotes presented in this study. In certain circumstances, specific linguistic adaptations were made

as possible, respecting the established methodological principles.

In training, Interviewees explained that LLMs operate through massive data analysis, converting texts into numerical representations. These systems do not function as banks of complete sentences nor as repositories of raw personal data: what they internalize are statistical language patterns, that is, the frequencies and relationships between words and expressions (Stryker, 2025).

In practice, this means that themodel prioritizes statistical correlations over individual records. Just as someone who reads a thousand resumes doesn’t remember all the names, but notices career patterns (for example, that people with a degree in Administration tend to work in private companies), an LLM absorbs trends in word usage.

However, there are situations in which specific information can be reproduced, especially when it is very frequent in the training material (Kandpal; Wallace; Raffel, 2022):

“The model only knows who Harry Potter is because there are tens of thousands of pages on the internet mentioning the character. This is related to what we call statistical relevance. Whether or not to train the model with the original Harry Potter books is practically irrelevant, as these passages are already replicated in thousands of books.

other websites that were used during the training” [Interviewee D]

There are also points of attention in the inference. During inference, therefore, personal data may be processed transiently, especially when users enter personal data into prompts or share documents. This information is temporarily processed by the model to generate a response, but do not become a permanent part of the base model, although they may remain in short-term memory or be used to personalize the user’s own profile.

“People have this feeling that all AI is learning all the time, right? And that’s not true. In practice, it isn’t learning anything, especially in these more sophisticated environments.”

approved, that every interaction we have is practically disposable from the point of view that goes beyond the user, it stays there within that user” [Interviewee H]

Another recurring point in the interviews was the ability of Generative AI to produce personal data at the output, even if these records did not appear literally in the training. Interviewees described this process as an “invention”

of the model, a result of its statistical ability to combine patterns in a believable way. Examples include creating valid CPF sequences (following the 11-digit rule with checker), or common name combinations, such as “Ana Maria”.

This point is relevant because it shows that, even without functioning as personal databases, generative models can produce identifiable information, which brings practical challenges to the interpretation and application of data protection laws.

Data Anonymization in Generative AI

The most debated issue among interviewees was the anonymization of data in Generative AI models. The main emerging point is the difficulty in guaranteeing the total loss of the identifiable nature of personal data, especially in the most advanced models.

There is a convergence among experts: the data is not stored as raw records, but transformed into statistical vectors. In this process, names, numbers and phrases are converted into tokens and mathematical weights that come to represent language patterns. This characteristic significantly changes the debate on data protection, shifting the focus from literal collection and storage to risks of re-identification and statistical use of information.

At this point, many of the experts, people with seniority and experience in their areas of expertise, They were cautious in their responses and, in some cases, even admitted that they did not know how the process works. His phrases and explanations did not demonstrate the same confidence in explaining the other points of operation of the technology.

“It’s not that I transform it into numbers, it’s not a “pseudonymization”. It simply dissolves into tokens […] it doesn’t have a database, what it has is a piece of word” [Interviewee D]

“I read an article, I was reading an article about this, when I extract data from one AI model to another, I can re-identify people. And then the person takes that and throws it into another model, and another model, and then I can reach people. […] Yes, there is a risk of re-identification, even if I apply encryption, anonymization, pseudonymization, there is. If I start crossing there, I can do it.” [Interviewee I]

“I don’t believe he’ll lose his identification. You come back and he’ll identify you eventually. It’s really crazy because I can’t say. He says he doesn’t have a memory, but he does. I can’t explain it.” [Interviewed and]

In any case, the debate revealed a common conclusion: it is necessary to increase literacy and public knowledge about anonymization in Generative AI. The disagreements between experts indicate that this is not just a technical issue, but also one of interpretation and risk management.

What are PETs

FUNDAMENTAL CONCEPTS

Definition: PETs is the acronym in English for Privacy Enhancing Technologies, which translates as Privacy Enhancing Technologies. These are software solutions developed to reduce privacy risks and improve cybersecurity. Little Known and Used Term: Although it is popular in the legal data protection community, the majority of respondents agree that the term “PETs” is little known or not widely used in Brazil. However, standardization would help to consolidate the area and create clear references on which mechanisms are effective and in which scenarios. Importance of Raising Awareness: Respondents consistently point to the lack of knowledge and low technical literacy about PETs as one of the main challenges for their adoption in Brazil. Without knowledge, companies are unlikely to invest in PETs, especially because the financial and computational costs of these technologies are high.

PETs is an umbrella term that brings together different technologies and techniques aimed at reducing privacy and cybersecurity risks. Its objective is to enable the use

responsible for data, enabling projects that, without these tools, could be considered unfeasible or excessively risky from a regulatory and personal data protection point of view.

Despite the term being recognized internationally and among legal privacy experts, interviewees pointed out that it still has little penetration in Brazil. Many argued that the term should receive greater visibility in the national debate, harmonizing with the importance it already has in international forums – which reveals a disparity between the global discussion and the application of the concept in the Brazilian context.

WHY DO PETS APPEAR MORE IN THE LEGAL FIELD THAN AMONG CYBERSECURITY SPECIALISTS? A possible explanation is that the term Privacy Enhancing Technologies initially emerged from reports from personal data protection authorities in Canada and the Netherlands in the 1990s, and then from a famous report from the European Commission in 2007 (NicFab, 2023). From this, the acronym PETs was consolidated in guides and standards as an umbrella category to describe practices of anonymization, pseudonymization and privacy by design, where it fulfills the function of translating different techniques into a single normative language. In the technical field, it seems that professionals prefer to name specific methods, often considering umbrella labels as too vague, as they do not clarify which measures were applied, under what conditions and with what guarantees.

Despite the low recognition of the term PETs, interviewees demonstrated familiarity with specific technologies such as differential privacy, federated learning and homomorphic encryption when mentioned concretely. Several reports have indicated that professionals tend to be guided more by the name of the manufacturer or trademark of privacy solutions than by the technical term of the technology.

This seems to highlight how the debate about PETs in Brazil is closer to the market and specific products, and not based on a standardized or conceptual language.

Interviewees repeatedly pointed out that knowledge, uniformity and standardization around the term PETs are crucial for its adoption in Brazil.

They emphasized that low technical literacy and lack of awareness on the topic compromise both investment decisions and the consolidation of the area, and that the lack of standardization also prevents the creation of clear references on the effectiveness of each mechanism in different scenarios.

The high financial and computational cost of these technologies, combined with the lack of familiarity, makes it difficult to justify investments. This gap impacts both public and private financing: as several interviewees highlighted, Brazil lacks specific lines of support for PET research, which compromises the development and viability of these solutions.

“To be quite honest: I know the term, but I’ve never seen it used, whether in academia or even in industry (…) I think the term PETs may be appropriate because we need to start building this knowledge, there is a lot of work to be done in the area, and the officialization of terminology would be an interesting thing” [Interviewee B]

“People will talk more about a tool, a solution, or a technique, but they say that I still lack literacy in this sense” [Interviewee I]

In other words, to assess risks and potential in a more concrete way, it is necessary to go beyond the generic label and observe the specific technologies that make up this universe. It is these tools, with their limits and practical applications, that make it possible to actually measure the usefulness of PETs in AI projects.

The following technologies are organized according to the frequency of mention in interviews and the level of knowledge demonstrated by the interviewees about each of them.

Differential Privacy

Definition: Technique that introduces statistical noise into data or results, so that the presence or absence of an individual is practically indistinguishable, preserving aggregate utility. Didactic Example: It’s like mixing a little noise into several voices in a crowd: you understand what the group is saying in general, but you can’t identify a single person. Practical Example: Cell phone manufacturers apply differential privacy in operating systems to collect usage statistics without exposing individual users. Relevance: Provides formal mathematical guarantees of privacy, allowing train or analyze data without revealing raw records.Challenge: Noise calibration can reduce model accuracy and make training more costly in terms of time and resources.

A differential privacy is a technique that protects against re-identification risks

on large datasets (RTT, 2024). It works by adding controlled random noise to information, which significantly reduces the possibility of associating data with specific individuals (CIPL, 2025). This way, it is still possible to extract patterns and make useful inferences from the data, while maintaining the privacy of the people involved.

In the interviews, differential privacy was characterized as a complex but well-established method of anonymization, capable of strengthening protection in data-intensive contexts. As Interviewee H summarized:

“in differential privacy, we do not lose the capacity of AI

to learn from that data, but we make it difficult to re-identify that individual, because I am manipulating the source data without it losing its essence”[Interviewee H]

In this sense, differential privacy has gained prominence in sectors that rely heavily on data, such as healthcare (Feretzakis et al., 2024). In a concrete project shared by Interviewee I, the introduction of controlled noise in sensitive databases, such as medical records, allowed researchers and developers to train AI models capable of identifying relevant clinical patterns without exposing individual patient information.

Despite its high potential for anonymization, interviewees also warned about the trade-off between privacy protection and utility: the technique can reduce

the accuracy of AI models and diminish the analytical value of data, a concern

confirmed by recent research (CIPL, 2025).

Experts have indicated that differential privacy works best in AI Analytics with tabular data (age, medical records), where controlled noise reduces re-identification risks without significantly compromising the usefulness of the data.

In Generative AI, the addition of noise can end up being interpreted as a statistical signal, causing the model to learn this “artificial pattern” instead of the original data, reducing the effectiveness of training and compromising its practical usefulness.

Trusted Execution Environment

Definition: Creates an isolated environment (enclave) within the hardware, where data is processed in a way that is protected against unauthorized access, including from the infrastructure provider itself. Didactic Example: It is a kind of “digital safe” in which data can be used for calculations, but inside, no one can spy on what happens. Practical Example: Cloud services that allow processing of health data in TEEs, so that not even the cloud company’s engineers can access the information raw.Relevance: It differs from techniques such as anonymization because it does not alter the data, but ensures that it is only processed in a controlled, high-security environment. It is critical for sectors such as healthcare or finance.Challenge: Most TEE solutions are offered by large foreign technology companies, raising concerns about data sovereignty and technology dependency.

Trusted Execution Environments (TEE) are isolated areas within a computational system that allow processing data with a high degree of security (CIPL, 2025). Unlike other PETs that act directly on data, through techniques such as anonymization or encryption, TEEs protect the environment in which this data is processed. As Interviewee B explained: “the other PETs work at the data layer […]. The secure execution environment does not operate on the data itself, but on the environment in which it will be processed”.

This architecture is especially useful in applications involving sensitive data. By processing sensitive information within a TEE, it is ensured that it remains protected even in contexts where the rest of the system may be vulnerable.

Despite its advantages, interviewees also expressed concerns regarding the data sovereignty. As the majority of TEEs available today belong to foreign companies, the importance of developing protected national data centers, capable of offering reliable execution environments without the infrastructure provider itself having access to the processed information. The Interviewee

D explains that even if a cloud instance is located in Brazil, if the infrastructure belongs to a company in the United States, the Cloud Act it can allow remote access to data from Brazilian holders to other jurisdictions (Teofilo, Rocillo, 2018).

Synthetic Data

Definition: Artificially generated data to simulate real information, used to expand samples, balance data sets and reduce the use of raw personal data in training. Didactic Example: The system “invents” fictitious customer records or creates fake photos, based on real information, so that the computer “learns” without needing to access information from real people. Practical Example: Create artificial faces to train a facial recognition system without using real photos of citizens. Relevance: They expand the training base, increase the robustness of models and allow you to replace columns of personal information with synthetic versions, preserving the structure of the data. Challenge: They can reinforce stereotypes or generate false information, producing less reliable models when used exclusively for training.

Synthetic data is artificial information created by algorithms that reproduce the statistical properties of real databases, without copying true records (Microsoft, 2025). This technique allows you to train and test AI models without exposing personal data, replacing them with fictional versions, in whole or in part. In addition to reducing re-identification risks and mitigating the impacts of breaches, synthetic data is especially valuable in industries with strict regulatory compliance and data scarcity, such as healthcare (IBM, 2023).

In the interviews, Interviewee A highlighted the relevance of this technology for training image templates, whether creating variations of existing photos (data augmentation) or 3D models (digital twins) for simulations, increasing the robustness of the systems. On the other hand, Interviewee I warned about the drop in accuracy in trained models exclusivelywith synthetic data.

“And it’s no use just generating synthetic images with AI, because it has already been proven that a model trained with synthetic data generated by other models, its accuracy drops, absurdly. It is our diversity, for example, that will provide richness for a facial detection system, which uses computer vision” – [Interviewee I]

Despite its potential, it is important to recognize that synthetic data generation often relies on personal databases to train the algorithms that produce it. As Interviewee D noted, the direct use of real data, when protected by robust governance, anonymization and access control mechanisms, can offer greater reliability and reduce the risk of distortions or “hallucinations” in the models. This view aligns with recent academic analyzes (Eiuffrè; Shung, 2023).

Therefore, synthetic data should be seen as complementary instruments, not as complete substitutes in protection and innovation strategies.

Federated Learning

Definition: Technique in which data remains on users’ devices (cell phones, notebooks), and only mathematical representations are sent to a central server to update the model. Didactic Example: Imagine a team in which each athlete practices in their gym and sends only the results of their training. The coach uses these results to improve the team’s strategy, without ever seeing each team’s complete training sessions. Practical Example: A hospital can train a model in several different medical centers without needing to receive medical records from each of them. Each unit trains locally and shares only statistical results. Relevance: Reduces the need to centralize large volumes of personal data, increasing privacy and security. Challenge: The technique still has limited use outside of sectors that rely heavily on data (such as healthcare and technology) and demands high computational power on devices at each end.

Federated learning is a technique that allows you to train machine learning models collaboratively, keeping data in its original sources. Instead of transferring data from peripheral devices to a central server, each participant uses

your own local information to train the model (Caballar; Stryker, 2025). After each training cycle, only the updated parameters, not the raw data, are sent to a central server, which consolidates them to improve the global model (Caballar; Stryker, 2025).

The technology enables collaboration between organizations that could not share sensitive data for privacy and security reasons. Interviewee D mentioned a fraud detection project in the healthcare sector as an example. In this case, companies needed to train a joint model, but refused to share their databases and did not trust an intermediary to guarantee privacy. The solution was to process the data locally at each institution and, in the end, combine only the training parameters. This arrangement overcame trust barriers and demonstrated how federated learning can create joint solutions even in contexts of high sensitivity and little willingness to share information.

However, due to the need for frequent communication between devices and the central server to update model parameters, respondents highlighted that federated learning requires high processing power and connectivity

(CIPL, 2025). For this reason, as confirmed by Interviewees A and B, its practical application is more viable for companies that have data at the core of their business, such as technology companies, and that already have the necessary infrastructure to maintain this continuous flow of information.

Homomorphic Encryption

Definition: Technique that allows you to perform calculations directly on encrypted data, without having to reveal the original content. Didactic Example: Imagine doing math with numbers inside locked safes: you never see the numbers, but you can add and multiply without opening the safe. Practical Example: A bank could analyze customer balances to predict credit risks without ever accessing the real values of the accounts. Relevance: It is considered one of the strongest protection solutions, as it keeps the data encrypted from the beginning to the end of the processing.Challenge: It is a solution still under scientific development. The supported operations are still basic, insufficient to train advanced models or perform complex calculations. Furthermore, the computational cost is still very high, making large-scale use impractical.

Homomorphic encryption is considered one of the most active and challenging areas in information security research. Its theoretical proposal dates back to the 1970s, and the first proof of concept was proposed in 2009 (Eentry, 2009). Since then, the

Scientific literature has advanced more efficient schemes, but the consensus is that the technology is still in the experimental phase, with practical applications limited to low-complexity scenarios or controlled prototypes.

Typically, data is encrypted during storage or transmission. However, for operations such as updates, searches, analysis or calculations, it is generally necessary to decrypt them first, which exposes them to possible unauthorized access.

authorized. Homomorphic encryption offers a more secure solution: it allows you to perform computational operations directly on encrypted data, without the need to reveal it during processing (CIPL, 2025).

In other words, this technique allows systems to perform calculations or analyzes with protected data, maintaining its confidentiality at all stages. Therefore, homomorphic encryption has been identified as a promising tool

for applications in sectors that deal with sensitive data, such as healthcare, finance and

electoral processes (Ruiz, 2022).

A inference on already trained models emerges as the most viable use to date, allowing organizations to carry out queries or forecasts based on

encrypted data, without risk of exposure and maintaining the accuracy of results. However, in interviews, experts highlighted that homomorphic encryption has not yet reached full technical feasibility and requires additional research to become applicable on a large scale. The main obstacle is the high computational cost required for its operation. Interviewee B even mentioned that “in homomorphic encryption, in particular, the cost can be up to a thousand percent higher”.

PETs and Privacy Risks

There was consensus among interviewees that PETs have a central role in mitigating risks related to the use of personal data. Experts pointed out that these technologies significantly reduce the chances of

re-identification, abuse or undue exposure. It was highlighted that PETs offer additional layers of security, capable of making data processing more reliable and providing greater security to organizations that use AI solutions.

By indicating that these technologies significantly reduce the risks associated with the use of data, experts also suggest that their adoption can make processing more proportional and balanced. In AI projects, this means bring the practice closer to the ideal of collecting only what is necessary and employing safeguards that demonstrate commitment to the responsible use of information.

“[PETs] are very, very important because they mitigate more risks and bring more security to the company that has the AI solution with personal data there, especially because, as the company controls the data, it needs to observe these legal requirements”. [Interviewee J]

Furthermore, PETs make it possible to carry out projects that would otherwise be unfeasible due to high privacy risks. Examples include the using federated learning for

train AI models with health data from different institutions and the secure processing of personal information in TEEs, enabling collaboration without unduly exposing data.

It is important to highlight that experts recognized that PET scans do not completely eliminate risks – “there is never 100% elimination of risks, this is a primitive of cybersecurity and information security”, stated Interviewee B. Still, the perception is that The potential of these technologies remains underutilized: Although they offer relevant gains in data protection and security, many solutions still lack broad and consistent testing, which reduces market confidence and makes new investments difficult.

O Layered Privacy Model results from qualitative empirical research and thematic analysis, based on interviews with experts, and organizes a progressive data protection framework in a replicable way. It does not start from abstract normative assumptions, but emerges from the systematization of reported practices, structuring layers that integrate the institutional environment, literacy, governance, information security and PETs.

Analysis and Comments

The research revealed a consistent set of practices, perceptions and tensions among the experts interviewed. Although their perspectives varied on specific aspects, they converged on a common point:PETs play an important role in mitigating risks, but their practical use in Brazil is still fragmented and poorly systematized.

This perception reinforces the need to organize the different elements that support data protection in AI projects, from governance and information security to the adoption of PETs itself, in order to build a clearer picture of how these technologies can be effectively applied in the Brazilian context.

This convergence allowed us to systematize the findings in the Layered Privacy Model, which organizes and gives structure to the patterns identified throughout the interviews. It is a character model exploratory, which does not derive from

normative assumptions established a priori, but emerges directly from the thematic analysis of the empirical material collected.

Its value lies in its ability to organize elements already present in professional practice, offering a structured representation that can guide both companies and public policy makers.

In academic literature, models are understood as simplified and functional representations of reality (Bhattacherjee, 2025; Martins, 2005). Unlike theories, models do not seek complete explanations of phenomena, but capture specific relationships between variables to guide analysis and predictions.

Public policy makers also frequently use analysis models to structure evaluation processes, and it is within this perspective that the proposal developed here is inserted: a practical tool to organize a field still characterized by conceptual and methodological dispersion.

The Layered Privacy Model is a conceptual framework aimed at guiding the adoption of PETs in AI projects. It organizes data protection into five interconnected levels – institutional environment, literacy, governance, information security and the PETs themselves. The logic is progressive: each layer creates the conditions for privacy technologies to be applied consistently, reducing risks and strengthening responsible practices.

O Layered Privacy Model results from the qualitative empirical research of this study itself: its layers emerge from the interviews, interdependent on each other:

Institutional environment: understands the regulatory context and external pressures

that shape decisions about the use of personal data;

“When we analyze projects in organizations that operate in highly regulated sectors with greater concerns regarding privacy, it is more common to see the use of this type of protection technology.” [Interviewee E]

Literacy: covers the level of technical and organizational knowledge, frequently identified by interviewees as essential for the full adoption of advanced protection techniques;

“The biggest challenge is related to ethics, governance and education about the use of these technologies: to what extent are they really useful and to what extent is human discernment still necessary? To deal with this, it is not enough to have tools; It is necessary to train, educate and empower people.” [Interviewee I]

Governance: refers to internal strategic coordination and alignment processes,

necessary to avoid fragmented or disjointed technological implementations;

“The governance It involves ensuring that, if there is data such as CPF, it will be anonymized in a hash, and can only be decrypted after a request goes through a legal team, a compliance team, some type of governance.” [Interviewee A]

Information security: constitutes a transversal basis, since the protection of personal data is difficult to sustain without robust cybersecurity practices;

“There is concern about the infrastructure of these environments, linked to the traditional information security, which remains relevant to prevent attacks by malicious actors.” [Interviewed and]

PETs make up the technological core itself, offering concrete instruments for mitigating privacy risks. Without the previous foundations, the effectiveness of PETs will be difficult to take advantage of.

“The More elaborate PETs mitigate more risk than more traditional technologies used in industry” [Interviewee B]

Although exploratory, the model allows the structuring of a practical framework for companies that develop and use AI solutions. Its structure in progressive layers makes it easier to visualize that organizations can be at different levels of maturity, with advances obtained through the articulated strengthening of these dimensions.

In practice, the model can be used to:

Organizational self-assessment: mapping at which layer the company is most solid (e.g., robust information security policies) and where there are weaknesses (e.g., lack of training in PETs). Defining investment priorities: aligning resources to critical areas, such as creating privacy literacy programs before adopting advanced technologies. Regulatory and institutional planning: assessing whether the regulatory environment and contracts allow for the safe use of PETs (e.g., data sharing clauses). Progressive implementation of PETs: start implementing PETs with more accessible solutions (e.g.: anonymization and pseudonymization) and move towards sophisticated techniques (e.g.: homomorphic encryption, federated learning). Integration with corporate governance: incorporate privacy maturity metrics into compliance or risk audit reports.

Conclusion

The advancement of AI and the intensive use of data present new challenges for protecting the privacy of data subjects. In this scenario, the PETs emerge as essential tools to mitigate risks, offering technical solutions that reduce the exposure of personal data at various stages of the information life cycle.

The findings of this study reveal, however, that These technologies are not sufficient when applied alone. The effective implementation of PETs must integrate a broader data governance perspective, which includes ongoing training, organizational awareness and privacy literacy. Without these complementary elements, even the most sophisticated solutions remain vulnerable.

It is important to highlight that this study did not conduct a technical or operational analysis of the individual effectiveness of PET scans. Our objective was highlight these technologies by mapping experts’ perspectives and highlighting their role as instruments of

mitigation of privacy risks in the Brazilian context. By introducing the topic into the public debate, we also seek to encourage greater engagement of companies, regulators and society, promoting investments and contributing to the maturity of these solutions in the country.

Finally, protecting privacy in AI systems requires more than innovative technologies – it demands a robust institutional environment based on effective governance, clear standards, good practices and investment in training. Only this combination will allow PETs to realize their potential to strengthen trust in the responsible use of data and consolidate a regulatory ecosystem prepared for the challenges of AI.

Suggestions for future studies

This study presented the topic of PETs and analyzed their potential application in data protection in AI systems. Still, several questions remain open. Below, we highlight gaps that can guide future research and contribute to the continuity of the regulatory debate, expanding knowledge about these technologies and their impacts in the Brazilian context.

Language standardization and effects on adoption: The lack of uniform terminology for PET scans in Brazil was recurrent in the interviews. Future studies can investigate how regulatory entities, sectoral associations and standardization bodies can disseminate clear nomenclatures and encourage the adoption of these technologies.

Practical application of the model:There are gaps regarding the functioning and technical operation of PETs in real scenarios. Future research, developed with the support of STEM experts, can map use cases in Brazil, testing costs, technical barriers and regulatory impacts resulting from the application of these technologies.

Political economy of adoption: The high cost of PETs and the concentration of expertise in large international companies create asymmetries. New studies can explore

incentives, financing mechanisms and the role of the State in democratizing its adoption.

Effects on the debate on legitimate interest: PETs can influence the legal interpretation of legitimate interest as a legal basis for data processing. Research can examine whether the use of these technologies strengthens arguments of proportionality and necessity in Brazilian regulatory contexts.

References

CABALLAR, Rina Diane; STRYKER, Cole. What is federated learning? 2025. Available at: https://www.ibm.com/br-pt/ think/topics/federated-learning. Accessed on: 28 Jul. 2025.

CENTER FOR INFORMATION POLICY LEADERSHIP (CIPL). Privacy-Enhancing and Privacy-Preserving Technologies

in AI: Enabling Data Use and Operationalizing Privacy by Design and Default. 2025. Available at: https://www. informationpolicycentre.com/uploads/5/7/1/0/57104281/cipl_pets_and_ppts_in_ai_mar25.pdf. Accessed on: 10 September. 2025.

EDIN, Per et al. Quantifying the GenAI opportunity: Lessons learned from benchmarking 17 million+ companies worldwide.2025. Available at: https://kpmg.com/kpmg-us/content/dam/kpmg/pdf/2025/quantifying-genai-opportunity. pdf. Accessed on: 04 Sep. 2025.

FERETZAKIS, Eeorgios; PAPASPYRIDIS, Konstantinos; GKOULALAS-DIVANIS, Aris; VERYKIOS, Vassilios S.. Privacy-Preserving

Techniques in Generative AI and Large Language Models: a narrative review. Information, [S.L.], v. 15, no. 11, p. 697, 4 nov. 2024. MDPI AG. http://dx.doi.org/10.3390/info15110697. Available at: https://www.mdpi.com/2078-2489/15/11/697. Accessed on: 19 September. 2025.

GENTRY, Craig. Fully homomorphic encryption using ideal lattices. Proceedings Of The Forty-First Annual Acm Symposium On Theory Of Computing, [S.L.], p. 169-178, 31 May 2009. ACM. http://dx.doi.org/10.1145/1536414.1536440. Available at: https://dl.acm.org/doi/10.1145/1536414.1536440. Accessed on: 17 September. 2025.

GIUFFRÈ, Mauro; SHUNG, Dennis L.. Harnessing the power of synthetic data in healthcare: innovation, application, and privacy. Npj Digital Medicine, [S.L.], v. 6, no. 1, p. 1-8, 9 Oct. 2023. Springer Science and Business Media LLC. http://dx.doi. org/10.1038/s41746-023-00927-3. Available at: https://www.nature.com/articles/s41746-023-00927-3#citeas. Accessed on: 11 September. 2025.

GOLDMAN SACHS. Generative AI could increase global GDP by 7%.2023. Available at: https://www.goldmansachs.com/ insights/articles/generative-ai-could-raise-global-gdp-by-7-percent. Accessed on: 04 Sep. 2025.

IBM. What is synthetic data? Available at: https://www.ibm.com/think/topics/synthetic-data. Accessed on: 23 July. 2025. INGOLD, Jo; MONAGHAN, Mark. Evidence translation: an exploration of policy makers⠹ use of evidence. Policy and Politics,

[S.L.], v. 44, no. 2, p. 171-190, apr. 2016. Bristol University Press. http://dx.doi.org/10.1332/147084414×13988707323088. Available at: https://bristoluniversitypressdigital.com/view/journals/pp/44/2/article-p171.xml. Accessed on: 16 September. 2025.

KANDPAL, Nikhil; WALLACE, Eric; RAFFEL, Colin. Deduplicating Training Data Mitigates Privacy Risks in Language Models.

Arxiv, [S.I.], p. 1-11, Dec. 2022. Available at: https://arxiv.org/abs/2202.06539. Accessed on: 19 September. 2025.

MARR, Bernard. The Difference Between Generative AI And Traditional AI: An Easy Explanation For Anyone. 2023. Available at: https://www.forbes.com/sites/bernardmarr/2023/07/24/the-difference-between-generative-ai-and-tradit… Accessed on: 19 September. 2025.

MICROSOFT LEARN.Generating synthetic data in the Azure AI Foundry portal. Available at: https://learn.microsoft. com/pt-br/azure/ai-foundry/concepts/concept-synthetic-data. Accessed on: 23 July. 2025.

NICFAB. Privacy Enhancing Technologies (PETs): an evergreen category – part 1. 2023. Available at: https://notes. nicfab.eu/en/posts/pet01/?utm_source=chatgpt.com. Accessed on: 19 September. 2025.

TECHNOLOGICAL TRENDS RADAR. Privacy Enhancing Technologies. 2024. Available at: https:// radar.apps.bb.com.br/tendencia/Radar-2024/Temas/Techeuardian/Tecnologias-de-Aprimoramento-de-Privacidade/21001. Accessed on: 23 July. 2025.

RUIZ, Evandro Eduardo Seron. Homomorphic encryption: does this technique solve the problem of personal data security?. 2022. Available at: https://www.migalhas.com.br/coluna/migalhas-de-protecao-de-dados/375269/ homomorphic-encryption. Accessed on: 22 July. 2025.

STRYKER, Cole. What are LLMs? 2025. Available at: https://www.ibm.com/think/topics/large-language-models. Access

on: 19 Sep. 2025.

STRYKER, Cole; SCAPICCHIO, Mark. What is generative AI?2024. Available at: https://www.ibm.com/br-pt/think/topics/ generative-ai. Accessed on: 19 September. 2025.

TEOFILO, David; ROCILLO, Paloma. CLOUD Act: a case of Human Rights and Jurisdiction. Available at: https://irisbh. com.br/cloud-act-um-caso-de-direitos-humanos-e-jurisdicao/. Accessed on: 11 September. 2025.

VAN AUDENHOVE, Leo; DONDERS, Karen. Talking to People III: Expert Interviews and Elite Interviews. In: BULCK, Hilde van Den; PUPPIS, Manuel; DONDERS, Karen; VAN AUDENHOVE, Leo (ed.). The Palgrave Handbook of Methods for Media Policy Research. [S.I.]: Palgrave Macmillan Cham, 2019. p. 179-197.

Reglab Methodology Annex

Title	Privacy in Layers: The Role of Privacy Enhancing Technologies (PETs) in Security Systems Artificial Intelligence
Search question	How do AI, privacy and cybersecurity experts evaluate the use of Privacy Enhancing Technologies<br>(PETs) in artificial intelligence systems and their effectiveness in protecting personal data?
Methodology summary	This research adopts a qualitative and exploratory approach, combining primary data collection through semi-structured qualitative interviews with experts (expert interviews), complemented by secondary data collection (documents, literature and practical cases). Data analysis followed the reflective thematic analysis technique with the help of Atlas.ti software. Visual software tools were used to identify central patterns and themes, subsequently validated against the empirical corpus.
Data collection	The research used the expert interview methodology (Audenhove and Donders, 2019), carrying out semi-structured qualitative interviews of an exploratory nature. This method was chosen due to the technical nature of the topic and the lack of conceptual standardization on the subject, which makes the knowledge of experts working in the area in Brazil essential.<br>The sample was composed following criteria of diversity and representativeness, including: minimum participation of women; representatives from academia or research centers; professionals from Brazilian companies; and experts from large companies and technology consultants. Participant selection combined active search on LinkedIn as the main strategy, complemented by convenience sampling and snowballing techniques to expand the network of experts in AI, privacy and cybersecurity. Of the 37 people contacted, 12 agreed to participate in the research, while 7 reported unavailability and 18 did not respond to the invitation.<br>The interviews took place between July 29 and August 29, 2025, in an online format (via Teams), lasting between 45 and 60 minutes. All were conducted by at least one RegLab researcher, following a question guide attached to the study. An interview was carried out as a pilot to test the script and validate initial hypotheses. The remaining 11 interviews were considered sufficient for theoretical saturation, as in qualitative approaches with semi-structured and in-depth interviews, thematic recurrence and analytical density are generally consolidated with few participants (Euest et al, 2006). The interviews were recorded with the permission of the participants, transcribed in full and accompanied by memos from the interviewers. The material<br>was stored and coded in the Atlas.ti software, with the names and institutions of the interviewees<br>duly anonymized.
Data analysis	Data analysis followed the reflective thematic analysis technique (Braun; Clarke, 2006), suitable for exploratory qualitative studies in highly complex contexts. This approach prioritizes contextualized interpretation rather than exhaustive coding, allowing the use of different open analytical strategies.<br>All interviews were fully transcribed and processed in the Atlas.ti software, which performed a first round of intentional automated coding. This procedure generated 719 first-level codes and 23 second-level codes, which were later manually reviewed by the research team.<br>In the next step, different visual software tools (concept clouds, maps, correlation graphs, chatbots) were used to identify patterns and relationships between codes. This process resulted in the emergence of central themes, which were tested and validated against the empirical corpus, ensuring their adherence to the original data.<br>The analysis was conducted between August 29th and September 5th, 2025.

Bias reduction procedures	Consolidated theoretical-methodological references: the data collection and analysis techniques adopted in this study followed practices recognized in academic literature. The methodological approach was discussed internally before and after carrying out the preliminary interviews, allowing the incorporation of criticisms and suggestions into the final research design, before the analysis process began. Complementary checking tool: considering that the initial coding was carried out using software, we used a second software (NotebookLM) to check the consistency of the coding produced in Atlas.ti and identify blind spots. The use of this software was created by researchers who participated directly in the interviews, with the aim to capture nuances that may have been overlooked in automated coding. Triangulation of methods: empirical findings were contrasted with documentary analysis of secondary sources, with the aim of comparing, validating and reinforcing the consistency of interpretations constructed from the interviews. These references, when used, were expressly cited throughout the text Independent double analysis: two researchers reviewed the set of codes and themes cross-referenced, reducing individual biases. The final definition of the themes was carried out in a collective discussion between the two authors, ensuring multiple perspectives and control of individual biases in the interpretation of the data. Recording and methodological transparency: all stages of the analytical process were documented, including successive versions of the files and coding decisions. This practice allows traceability of the methodological path, in accordance with Reglab guidelines for transparency and replicability
Other Methodological Limitations	Initial automated coding: although Atlas.ti is one of the most consolidated software for qualitative analysis and its coding was validated using a second software, the use of automated coding may have generated noise in the initial stage, which could constitute blind spots in the subsequent analysis.<br>Dependence on external tools: part of the analytical process depended on the performance of proprietary software, which could limit replicability in different contexts.<br>Qualitative scope: findings reflect insights from a set delimited from interviews,<br>with analytical depth, but without the intention of statistical generalization.<br>Convenience sampling: the selection may have reflected biases in availability and professional circles, despite the diversity criteria.<br>Technological evolution: the results reflect the state of the art of AI tools and practices at the time of the research. Rapid changes in this field may alter some of the conclusions.
Software use	SOFTWARE USE IN RESEARCH<br><br><br>MS Office Suite Text editing, spreadsheets and graphs, interviews (Teams)<br>Adobe C Suite Layout and finalization of graphics and illustrations Atlas.ti Organization, coding and analysis of qualitative data Cockatoo Audio transcription of interviews in text<br><br><br><br>Brainstorm, information systematization, grammar review ChatGPT 5th (spelling, grammar, synonym search), adequacy of<br>language, adaptation to the Reglab Writing Manual<br><br><br><br>Text editing and review (spelling and grammar, search for<br>Notion AI synonyms, language adequacy, translations), organization of<br>research and structuring of schedule<br><br><br><br>Lex.page Text review (brevity, clichés, readability, passive voice, <br>statements without evidence, repetitions)<br><br><br><br>More UFSC Generation of bibliographic references in the ABNT model

Bias reduction procedures

Consolidated theoretical-methodological references: the data collection and analysis techniques adopted in this study followed practices recognized in academic literature. The methodological approach was discussed internally before and after carrying out the preliminary interviews, allowing the incorporation of criticisms and suggestions into the final research design, before the analysis process began.
Complementary checking tool: considering that the initial coding was carried out using software, we used a second software (NotebookLM) to check the consistency of the coding produced in Atlas.ti and identify blind spots. The use of this
software was created by researchers who participated directly in the interviews, with the aim
to capture nuances that may have been overlooked in automated coding.
Triangulation of methods: empirical findings were contrasted with documentary analysis of secondary sources, with the aim of comparing, validating and reinforcing the consistency of interpretations constructed from the interviews. These references, when used, were expressly cited throughout the text
Independent double analysis: two researchers reviewed the set of codes and themes cross-referenced, reducing individual biases. The final definition of the themes was carried out in a collective discussion between the two authors, ensuring multiple perspectives and control of individual biases in the interpretation of the data.
Recording and methodological transparency: all stages of the analytical process were documented, including successive versions of the files and coding decisions. This practice allows traceability of the methodological path, in accordance with Reglab guidelines for transparency and replicability

Other Methodological Limitations

Initial automated coding: although Atlas.ti is one of the most consolidated software for qualitative analysis and its coding was validated using a second software, the use of automated coding may have generated noise in the initial stage, which could constitute blind spots in the subsequent analysis. Dependence on external tools: part of the analytical process depended on the performance of proprietary software, which could limit replicability in different contexts. Qualitative scope: findings reflect insights from a set delimited from interviews, with analytical depth, but without the intention of statistical generalization. Convenience sampling: the selection may have reflected biases in availability and professional circles, despite the diversity criteria. Technological evolution: the results reflect the state of the art of AI tools and practices at the time of the research. Rapid changes in this field may alter some of the conclusions.

Software use

SOFTWARE USE IN RESEARCH MS Office Suite Text editing, spreadsheets and graphs, interviews (Teams) Adobe C Suite Layout and finalization of graphics and illustrations Atlas.ti Organization, coding and analysis of qualitative data Cockatoo Audio transcription of interviews in text Brainstorm, information systematization, grammar review ChatGPT 5th (spelling, grammar, synonym search), adequacy of language, adaptation to the Reglab Writing Manual Text editing and review (spelling and grammar, search for Notion AI synonyms, language adequacy, translations), organization of research and structuring of schedule Lex.page Text review (brevity, clichés, readability, passive voice, statements without evidence, repetitions) More UFSC Generation of bibliographic references in the ABNT model

Ethical Guidelines

Research funding: this publication is part of a series of research sponsored by the companies Google, Meta and b/luz, with Reglab maintaining full editorial control. Unlike commissioned research, Reglab defined the scope, objectives and methodology of this study with complete autonomy. The authors have preserved total professional independence and assume full responsibility for the content and conclusions presented.
Processing of personal data: the research involved the processing of personal data only in the collection and analysis stages, in a limited manner and proportional to the objectives of the study, in accordance with Law No. 13,709/2018 (LGPD).
Legal basis: all participants formally authorized their participation by signing a consent form, with knowledge of the research objectives and the use of data.
Purpose and suitability: the data were used exclusively for the purposes of this research, in accordance with the consent obtained, and were not used for other purposes.
Minimization and anonymization: personally identifiable information that was not relevant
for the purposes of the study, they were anonymized in the transcriptions and excluded from the active database.
Secrecy and confidentiality: when presenting the results, the data were kept confidential and citations were adjusted, when necessary, to preserve the confidentiality of the sources. Only a limited number of researchers directly involved in the project had access to personal data and original documents.
Registration and information security: the files were stored using password access control and in accordance with Reglab’s internal information security policies.
Retention and disposal: data will be stored for up to 12 months, exclusively for the purposes of
methodological audit and eventual replication, being subsequently eliminated.
Responsible use of public data: although some analyzed data is public, its use was carried out in a responsible and ethical manner, for the sole purpose of independent research.
Methodological transparency: the research methodology was described in detail to ensure transparency and replicability, contributing to scientific integrity and enabling independent validation of results.
Non-Discrimination and Respect for Diversity: the research was conducted in order to respect diversity and avoid any form of discrimination.

ANNEX II – SEMI-STRUCTURED INTERVIEW SCHEDULE

To start, could you tell us a little about your experience with design projects?

AI? And, more specifically, your experience with privacy or data protection issues

data in the context of AI?

In your view, how is personal data actually used in AI models? From the point

From a technical point of view, do they lose their identifiable character throughout the process?

What, in your experience, are the main techniques used today to reduce privacy risks and improve the cybersecurity of personal data in projects involving AI?

I will mention some technologies. Do you know or have worked with any of them? Could you comment, if you wish, on the relevance they have in practice?

Differential privacy
•Trusted Execution Environment (TEE)
Synthetic data
Federated Learning
Homomorphic encryption
In your view, the use of PETs in AI systems actually eliminates privacy risks, or

Are there still relevant concerns such as re-identification or leakage?

In your daily life, is the term “PETs” often used? Or is it more common to refer

directly to specific technologies?

Have you seen cases in which the use of PETs made it possible to make projects that, initially, would be unfeasible or high-risk due to privacy issues possible?

Have you ever come across projects that involved sensitive data — such as health, origin

racial or children’s data — where the use of PETs was considered (or adopted) to enable collection or reduce risks? How was this decision handled?

andWhat do you consider to be the main challenges in increasing the dissemination and adoption of

PETs in AI? Have you ever faced any of these obstacles on a project?

I would like to highlight any further points that have not been covered or leave a

10recommendation for future research in this area? Recommend someone to also participate in the interviews?

Privacy-Enhancing Technologies (PETs) in AI: Potential and Barriers in Brazil

The Study: Privacy in Layers

Primary Barriers to Adoption in Brazil

The Need for Holistic Governance

Citar

Autores

Tags

Executive Summary

Introduction

What is AI and why does it matter?

The Methodological Proposal of this Research

Main Results

How is personal data used in AI models?

FUNDAMENTAL CONCEPTS

Data usage – Analytical AI

Data Usage – Generative AI and the role of LLMs

Data Anonymization in Generative AI

What are PETs

FUNDAMENTAL CONCEPTS

Differential Privacy

Trusted Execution Environment

Synthetic Data

Federated Learning

Homomorphic Encryption

PETs and Privacy Risks

Analysis and Comments

In practice, the model can be used to:

Conclusion

Suggestions for future studies

References

Reglab Methodology Annex

ANNEX II – SEMI-STRUCTURED INTERVIEW SCHEDULE