The Price of Being Known: AI, Privacy, and Your Data

TL;DR:

Major AI companies use your conversations to train their models — under privacy policies nobody reads and opt-outs almost nobody activates.
Having an LLM that truly knows you offers real, concrete benefits. But the data cost is high and the corporate narrative obscures the actual risks.
The future escalation — comprehensive commercial profiles, government surveillance, automated scoring — already has documented precedents. This is not science fiction.

There's a question nobody asks when they open ChatGPT to look up a medical symptom, process grief, or ask for relationship advice: who else is going to read this?

The answer, buried in pages of terms of service that nobody reads, is uncomfortable. OpenAI, by default, uses your conversations to train its models. Meta admitted to using public Facebook and Instagram posts to train Llama, then temporarily paused after European regulatory pressure. Microsoft uses de-identified data from Bing, Copilot and advertising interactions. And Anthropic, the most transparent of the group, claims it only trains on your data with explicit permission.

This is the documented reality of 2026. And it's just the beginning of what's coming.

The Seduction of Being Known

Before talking about risks, you have to be honest: having a language model that knows you offers real benefits. These aren't invented or corporate marketing — they're functionalities that are already changing real lives.

An LLM with access to your medical history can catch drug interactions that no doctor managed to cross-reference in a 15-minute appointment. An assistant that knows your writing style can draft an email on your behalf without the recipient noticing the difference. A tutor that knows your learning difficulties can explain the same concept five different ways until one of them finally clicks.

Personalization has tangible value. LLMs with rich context work exponentially better than generic ones. If a model knows you have hypertension, three kids, work 12-hour days, and are going through a divorce, it can give you much more useful advice than any Google search. If an assistant remembers that you already tried a solution last week and it didn't work, it won't suggest it again. If it knows your communication preferences, it adapts its tone accordingly.

The persistent memory systems these platforms are deploying — still optional, still limited by regulations, still cautious after several launches that had to be walked back due to privacy alerts — are heading exactly in that direction. The promise is an assistant that truly knows you. That remembers. That learns. That improves over time.

In mental health support, where consistency of care genuinely matters, this has implications that go beyond mere convenience. In personalized education, in accessibility tools for people with disabilities, in support systems for elderly users — the use cases are real and the value is undeniable.

The question isn't whether these benefits exist. It's: at what price do they come?

What Companies Do With Your Data (And What They Don't Say)

The standard corporate narrative is reassuring: "we de-identify the data," "we anonymize it before use," "we only use it to improve our services." And in many cases that's partially true. The problem is in what gets left out of the story.

There are documented facts that complicate that tidy picture.

March 2023, the ChatGPT Redis incident. A technical error in the Redis caching system exposed the active chat sessions of real users. Not just conversation content — also emails and the last four digits of credit card numbers from users who had completed payment processes. The incident was confirmed by OpenAI itself. It affected a limited number of users, but it proved something concrete and irrefutable: session isolation in shared AI systems is fragile. An infrastructure bug can turn private conversations into exposed data.

2024, Meta's forced pause. Meta announced it would use public data from Facebook and Instagram users — posts, photos, interactions — to train its generative AI models. After regulatory pressure in Europe, where GDPR gives users the right to object to that use, Meta temporarily halted. In the rest of the world, without that legal protection, the practice continued. There is no global opt-out mechanism. If you're a Meta user living outside Europe, your public posts from the last decade are already part of some model.

The third-party integration problem. LLM platforms allow plugins and external tools. Some of those integrations have access to your conversation content. They can capture, process, and store that information under their own privacy policies — distinct from the primary provider's. The exposure surface is considerably larger than the average user perceives.

And then there's the more subtle technical problem: re-identification. LLMs can inadvertently memorize fragments from their training data. Researchers have documented that it's possible to extract personal information from these models through specific data extraction techniques. Even "anonymized" data can, with enough context, be correlated with other sources to identify an individual. Anonymization is not an absolute shield — it's a mitigation with real limits.

What companies also don't actively publicize: the data you generate doesn't just serve to train models. It feeds the business intelligence of the most valuable companies on the planet. Even without directly selling data, the information generates value. Microsoft acknowledges it can use Copilot conversation history to personalize ads if the user allows it. "If the user allows it" is a phrase doing a lot of quiet work — especially when the option to allow is activated by default or buried in the fifth level of a settings menu.

For more context on how these systems can be exploited externally, it's worth reviewing the documented risks in AI agents.

Regulation That Arrives Too Late

Legal frameworks exist. The problem is they arrive late, patched together, and they don't reach everyone.

The European GDPR is the world's most robust regulation on this subject. It requires explicit legal bases for processing personal data, real transparency about its use, and guarantees concrete rights: access to your own data, deletion, opposition to automated processing. In August 2024, the European AI Regulation came into force, classifying AI systems by risk level and requiring explicit labeling of artificially generated content. It's imperfect — the debate about its adequacy for the generative AI boom is legitimate — but it exists and has real consequences. Meta found out in 2024.

California's CCPA provides similar rights to consumers in that state. Also only in that state.

In Argentina, the Personal Data Protection Law 25.326 dates from the year 2000. It didn't contemplate artificial intelligence, LLMs, or even the smartphones we carry in our pockets today. There are parliamentary bills in progress to modernize it — inspired by GDPR — incorporating concepts like "right to non-algorithmic discrimination" and opposition to automated decisions. There are legislative developments underway. But they're proposals. Debates. Real protection today is limited.

The result is a radical asymmetry: companies headquartered in the United States have access to data from hundreds of millions of users in jurisdictions with inadequate regulation. The user in Argentina, Mexico, most of Latin America and Africa has a negotiating power over their own data that approaches zero. They can read the privacy policies — nobody does — or not use the service. Those are, in practice, the two options.

Where This Is Heading: The Future Nobody Is Talking About

This is where the piece becomes uncomfortable, because what I'm about to describe is not science fiction. It's extrapolation of documented trends, with precedents that have already happened in other industries.

Complete, persistent profiles. Persistent memories in LLMs are still optional and limited. But the direction is unambiguous: an assistant that accumulates years of conversations about your health, your fears, your relationships, your finances, your political opinions. Not as text logs — as a psychological and contextual profile of a precision unprecedented in the history of commercial surveillance. Cambridge Analytica built user profiles with Facebook data and used them to intervene in elections. An LLM that has known you for years builds something infinitely richer.

What happens when that company goes bankrupt? When it's acquired by another with different values? Privacy policies change when ownership changes. User data is a balance-sheet asset in corporate transactions. This has already happened with health apps sold to insurers, with social networks acquired by media conglomerates, with dating platforms whose data migrated to new owners. It will happen with AI data. The question isn't if, but when and at what scale.

Government access. US technology companies operate under laws that allow intelligence agencies to request user data without notifying them — under National Security Letters and FISA orders. The PRISM program, revealed by Snowden in 2013, documented how the NSA had direct access to data from Google, Facebook, Microsoft and Apple. Nothing structural has changed since then. If LLMs store detailed profiles of millions of users, those profiles are — under certain legal circumstances — accessible.

Insurance, credit, and employment. Insurance companies already use big data to segment risks with a precision their clients can't imagine. The next step is using AI profiles — richer, more contextual, more precise — to determine life insurance premiums, credit scoring, or job candidate evaluations. Would you give a loan to someone whose conversations reveal chronic financial problems? Would you hire someone whose assistant shows patterns of severe anxiety? The logic of automated scoring already exists. LLM data is the ideal input.

Social scoring systems. China's social credit system is frequently cited as the extreme case, the dystopian scenario. But the logic doesn't belong only to China. Credit scoring, insurance profiles, automated content moderation — all of that already exists in the West under different names. The difference between a social credit system and an AI-data-based scoring system is, primarily, the degree of transparency and institutional oversight. Not the underlying logic.

The escalation doesn't require an authoritarian government or a monolithic corporate villain. It requires economic incentives, absence of effective regulation, and the same gradual normalization process that allowed behaviorally targeted advertising to go from scandalous to invisible. It happened once. The conditions for it happening again are better.

What We Can Do (Without Illusions)

The recommendations exist and make sense. But you have to be honest about their limits: these are individual measures in a structural problem.

For users:

Activate the training opt-out where it's available. In ChatGPT, it exists in the privacy portal — most people don't activate it because they don't know it exists. Use temporary chats for sensitive conversations: they're not saved, they're not used for training. Don't enter truly critical information — ID numbers, passwords, sensitive medical data, banking information — into commercial AI platforms. For high-sensitivity contexts, consider models that run locally, without sending data to the cloud.

These are reasonable steps. But the responsibility for privacy shouldn't fall exclusively on the user, who faces interfaces designed to maximize data sharing, policies drafted to confuse, and control options hidden behind multiple clicks.

For collective debate:

Individual solutions aren't enough. The problem is structural and requires responses at that scale. Regulation with real teeth — not statements of principles. Mandatory transparency about what data is used, for what purpose, and for how long. Genuinely accessible opt-out mechanisms, not buried in settings. Real computational right to be forgotten: deleting your data should mean the model stops knowing what it knew about you.

Europe pointed to a path. It's not perfect — but it established that personal data is a right, not a resource. The question is whether the rest of the world will follow that path before or after the damage becomes difficult to reverse.

The track record of similar industries — tobacco, social media, pharmaceuticals — suggests that effective regulation arrives after documented harms, not before.

The Question Worth Asking

Back to the beginning. When you open an AI chat to process something personal, you're making an implicit choice: immediate utility versus long-term control over that information. It's a legitimate choice. Sometimes the benefit justifies the cost.

The problem is that today, that choice is made without complete information. Without really understanding how long data is retained, who else can access it, what happens if the company changes ownership, how that information is used to train models that will then be used by millions of other people. And in a context where the rules of the game can change unilaterally whenever the company decides, with a simple update to the terms of service sent to your inbox in an email you're going to ignore.

The AI that knows you can be an extraordinarily useful tool. It can also be the most detailed file ever constructed about you, administered by a corporation with its own incentives and a longevity and influence that no individual can reliably predict.

Understanding that tension isn't paranoia. It's the minimum level of awareness needed to make informed decisions in the world we actually live in.

Tincho Fuentes — Tech journalist and investigative researcher 🚀