Major technology companies are collecting user-generated data from emails, chat prompts, and browsing activity to train large language models [1, 2].

This trend highlights a growing tension between the rapid development of artificial intelligence and individual digital privacy. As companies scale their AI capabilities, the boundary between personal communication and corporate training data has blurred.

Companies including Google, OpenAI, Anthropic, and Perplexity utilize user interactions to improve model performance and develop AI-enabled tools [3, 4]. This practice extends across various platforms, including Gmail and ChatGPT [1, 2].

Microsoft-owned LinkedIn implemented a policy change on Nov. 3, 2024 [5], allowing the platform to collect user data for AI training. Similarly, SpaceX's Starlink service has sought user data for its own AI model training [6].

Privacy concerns grew as these practices became more transparent. A report from the Washington Post on Sept. 5, 2025, detailed the ongoing struggle for users to maintain privacy against automated training systems [2].

There is conflicting information regarding the specific use of email data. Google said it disagreed with assertions that it uses Gmail data for AI training, though some reports suggest the company may still utilize such data [1].

Most of these providers offer opt-out mechanisms for users who do not want their information used [1, 4, 5, 6]. These settings are often buried in privacy menus, requiring users to manually disable data sharing to protect their personal information [3, 4].

Major technology companies are collecting user-generated data from emails, chat prompts, and browsing activity to train large language models.

The shift toward using personal data for AI training represents a transition from 'opt-in' to 'opt-out' privacy models. By making data collection the default, tech firms accelerate the refinement of their models using massive, real-world datasets, effectively shifting the burden of privacy management onto the end user.