Skip to content

Is half your workforce breaking AI policy? | The AI Insider Threat Report

Read Now
Uncategorized
14 Jan 2025

Data Leakage

Data Leakage

Data Leakage

Data leakage in machine learning occurs when a model is trained on information that would not be available during real-world predictions. This unintended access to future or external data during training can make the model appear highly accurate in development but cause poor performance once deployed, leading to inaccurate predictions and unreliable insights. Examples of Data Leakage:
  • Target leakage: Including future information, such as using a "payment status" feature in predicting loan approvals before the loan is issued.
  • Data split leakage: Accidentally using overlapping data points in both training and testing sets, giving the model unfair context.
  • External feature leakage: Incorporating external variables (e.g., weather forecasts) that aren’t accessible during real-time prediction.
Data leakage undermines the integrity of machine learning models and can result in false confidence in their performance, leading to flawed decision-making.

To learn more about our Inference Platform arrange a callback.

Latest Posts

Blog

CalypsoAI Achieves SOC 2 Certification

News

CalypsoAI’s Insider AI Threat Report: 52% of U.S. Employees Are Willing to Break Policy to Use AI

News

Beyond Human Hackers: Agentic AI Becomes the Primary Threat Actor