22% of All Files and 4.37% of Prompts Submitted to GenAI tools by Employees Contain Sensitive Data

In Q2 alone average enterprise saw 23 new GenAI tools used by employees plus ‘hidden’ GenAI functions in SaaS applications

Organizations are leaking data at a staggering rate according to new analysis from Harmonic Security conducted on a sample of 1 million prompts and 20,000 files submitted to 300 GenAI tools and AI-enabled SaaS applications between April and June. Of these numbers, 22% of files (total 4,400) and 4.37% of prompts (total 43,700) contain sensitive information - this includes source code, access credentials, proprietary algorithms, M&A documents, customer or employee records and internal financial data.

In just Q2, the average enterprise saw 23 previously unknown GenAI tools newly used by their employees, stretching security teams which need to ensure each tool is properly vetted and reviewed. A high proportion of AI use comes from personal accounts which may be unsanctioned and / or without safeguards. Some 47.42% of sensitive uploads to Perplexity were from users with standard (non-enterprise) accounts, although these numbers improve for ChatGPT where 26.3% is via personal accounts, and just 15% of Google Gemini use via personal accounts.

Of all sensitive prompts analyzed in Q2, 72.6% originated in ChatGPT, followed by Microsoft Copilot (13.7%), Google Gemini (5.0%), Claude (2.5%), Poe (2.1%), and Perplexity (1.8%). One dominant trend stands out: code leakage was the most common type of sensitive data sent to GenAI tools and was especially prevalent in ChatGPT, Claude, DeepSeek and Baidu Chat.

The average enterprise uploaded 1.32GB of files in Q2 with PDFs accounting for half. However, a full 21.86% of these files contained sensitive data with a disproportionate concentration of sensitive and strategic content compared to prompt data. For instance, files were the source of 79.7% of all stored credit card exposures, 75.3% of customer profile leaks, and 68.8% of employee PII incidents—all categories with high regulatory or reputational risk. Even in financial projections, where both channels are active, files edged out prompts with 52.6% of total exposure volume.

Not all GenAI risk comes from obvious chatbots. A growing share now stems from everyday SaaS tools that quietly embed LLMs and train on user content which are not flagged as GenAI tools by most enterprise controls. Yet they often receive sensitive content. For instance, Canva was used to create documents containing legal strategy, M&A planning, and client data. Replit and Lovable.dev handled proprietary code and access keys whilst Grammarly and Quillbot were used for editing contracts, client emails, and internal legal language.

China-based applications are a key concern, with Harmonic reporting on this separately earlier in July. However, 7.95% of employees in the average enterprise used a Chinese GenAI tool with 535 separate incidents of sensitive exposure recorded. Of this 32.8% involved source code, access credentials, or proprietary algorithms, 18.2% included M&A documents and investment models, 17.8% exposed PII such as customer or employee records and 14.4% contained internal financial data.

Alastair Paterson, CEO and co-founder of Harmonic Security comments: “The good news for Harmonic Security customers is that this sensitive customer data, personally identifiable information (PII), and proprietary file contents never actually left any customer tenant, it was prevented from doing so. But had organizations not had browser based protection in place, sensitive information could have ended up training a model, or worse, in the hands of a foreign state. AI is now embedded in the very tools employees rely on every day and in many cases, employees have little knowledge they are exposing business data.”

Harmonic Security advises that enterprises must:

Gain visibility into tool usage (including free tiers and embedded tools)
Monitor what types of data are entering GenAI systems
Enforce context-aware controls at the data layer

Activity was recorded via the Harmonic Security Browser Extension, which captures usage across SaaS environments and GenAI platforms, then sanitizes it for aggregate analysis. The analysis relied exclusively on anonymized data and aggregated counts generated by the Harmonic platform within customer environments.

The full report is available here: https://www.harmonic.security/blog-posts/genai-data-exposure-report

About Harmonic

Harmonic Security lets your teams adopt AI tools safely by protecting sensitive data in real time with minimal effort. It gives you full control and stops leaks so your teams can innovate confidently.

For more information, visit https://www.harmonic.security/

View source version on businesswire.com: https://www.businesswire.com/news/home/20250731456105/en/

Contacts

Press contact: david@harmonic.security