GitLab Duo data usage

GitLab Duo uses generative AI to help increase your velocity and make you more productive. Each AI-powered feature operates independently and is not required for other features to function.

GitLab uses best-in-class large language models (LLMs) for specific tasks. These LLMs are Google Vertex AI Models and Anthropic Claude.

Progressive enhancement

GitLab Duo AI-powered features are designed as a progressive enhancement to existing GitLab features across the DevSecOps platform. These features are designed to fail gracefully and should not prevent the core functionality of the underlying feature. You should note each feature is subject to its expected functionality as defined by the relevant feature support policy.

Stability and performance

GitLab Duo AI-powered features are in a variety of feature support levels. Due to the nature of these features, there may be high demand for usage which may cause degraded performance or unexpected downtime of the feature. We have built these features to gracefully degrade and have controls in place to allow us to mitigate abuse or misuse. GitLab may disable beta and experimental features for any or all customers at any time at our discretion.

Data privacy

GitLab Duo AI-powered features are powered by a generative AI model. The processing of any personal data is in accordance with our Privacy Statement. You may also visit the Sub-Processors page to see the list of Sub-Processors we use to provide these features.

Data retention

The below reflects the current retention periods of GitLab AI model Sub-Processors:

Anthropic discards model input and output data immediately after the output is provided. Anthropic currently does not store data for abuse monitoring. Model input and output is not used to train models. GitLab has arranged zero data retention with Anthropic for GitLab Duo requests.
Google discards model input and output data immediately after the output is provided. Google currently does not store data for abuse monitoring. Model input and output is not used to train models. Additionally, GitLab has disabled caching for GitLab Duo requests.

All of these AI providers are under data protection agreements with GitLab that prohibit the use of Customer Content for their own purposes, except to perform their independent legal obligations.

GitLab Duo Chat retains chat history to help you return quickly to previously discussed topics. You can delete chats in the GitLab Duo Chat interface. GitLab does not otherwise retain input and output data unless customers provide consent through a GitLab Support Ticket. Learn more about AI feature logging.

Training data

GitLab does not train generative AI models based on private (non-public) data. The vendors we work with also do not train models based on private data.

For more information on our AI sub-processors, see:

Google Vertex AI models API data governance, responsible AI, details about foundation model training, Google Secure AI Framework (SAIF), and release notes.
Anthropic Claude's constitution, training data FAQ, models overview, and data recency article.

Telemetry

GitLab Duo collects aggregated or de-identified first-party usage data through a Snowplow collector. This usage data includes the following metrics:

Number of unique users
Number of unique instances
Prompt and suffix lengths
Model used
Status code responses
API responses times
Code Suggestions also collects:
- Language the suggestion was in (for example, Python)
- Editor being used (for example, VS Code)
- Number of suggestions shown, accepted, rejected, or that had errors
- Duration of time that a suggestion was shown

Model accuracy and quality

Generative AI may produce unexpected results that may be:

Low-quality
Incoherent
Incomplete
Produce failed pipelines
Insecure code
Offensive or insensitive
Out of date information

GitLab is actively iterating on all our AI-assisted capabilities to improve the quality of the generated content. We improve the quality through prompt engineering, evaluating new AI/ML models to power these features, and through novel heuristics built into these features directly.