When you use Gemini Code Assist for individuals, Google collects your prompts, related code, generated output, code edits, related feature usage information, and your feedback to provide, improve, and develop Google products and services and machine learning technologies.
To help with quality and improve our products (such as generative machine-learning models), human reviewers may read, annotate, and process the data collected above. We take steps to protect your privacy as part of this process. This includes disconnecting the data from your Google Account before reviewers see or annotate it, and storing those disconnected copies for up to 18 months. Please don't submit confidential information or any data you wouldn't want a reviewer to see or Google to use to improve our products, services, and machine-learning technologies.
It's a lot more nuanced than that. If you use the free edition of Code Assist, your data can be used UNLESS you opt out, which is at the bottom of the support article you link to:
"If you don't want this data used to improve Google's machine learning models, you can opt out by following the steps in Set up Gemini Code Assist for individuals."
If you pay for code assist, no data is used to improve. If you use a Gemini API key on a pay as you go account instead, it doesn't get used to improve. It's just if you're using a non-paid, consumer account and you didn't opt out.
Google recently testified in court that they still train on user data after users opt out from training [1]. The loophole is that the opt-out only applies to one organization within Google, but other organizations are still free to train on the data. They may or may not have cleaned up their act given that they're under active investigation, but their recent actions haven't exactly earned them the benefit of the doubt on this topic.
Another dimension here is that any "we don't train on your data" is useless without a matching data retention policy which deletes your data. Case and point of 23andMe not selling your data until they decided to change that policy.
I'll go ahead and say that, even if there was a method that deletes your data when you request it, nothing stops them from using that data to train the model up until that point, which is "good enough" for them.
This is incorrect. The data discussed in court is data freely visible on the web, not user data that the users sent to Google.
If the data is sent by a user to sub-unit X of Google, and X promised not to use it for training, it implies that X can share this data with sub-unit Y only if Y also commits not to use the data for training. Breaking this rule would get everyone in huge trouble.
OTOH, when sub-unit X said "We promise not to use data from the public website if the website owner asks us not to", it does not imply another sub-unit Y must follow that commitment.
Reading about all the nuances is such a trigger for me. To cover your ass is one thing, to imply one thing in a lay sense and go on to do something contradicting it (in bad faith) is douchebaggery. I am very sad and deeply disappointed at Google for this. This completes their transformation to Evil Corp after repealing the “don’t be evil” clause in their code of conduct[1].
Isn't that as toxic? I've read a bunch about Walmart and the whole thing is basically a scam.
They get a ton of tax incentives, subsidies, etc to build shoddy infrastructure that can only be used for big box stores (pretty much), so the end cost for Walmart to build their stores is quite low.
They promise to employ lots of locals, but many of those jobs are intentionally paid so low that they're not actually living wages and employees are intentionally driven to government help (food stamps, etc), and together with other various tax cuts, etc, there's a chance that even their labor costs are basically at break even.
Integrated local stores are better for pretty much everything except having a huge mass to throw around and bully, bribe (pardon me, lobby) and fool (aka persuade aka PR/marketing).
Integrated local stores are better for pretty much everything except for actually having what you want in stock.
There is a reason why rural communities welcome Wal-Mart with open arms. Not such a big deal now that you can mail-order anything more-or-less instantly, but back in the 80s when I was growing up in BFE, Wal-Mart was a godsend.
True. A good example being Sears, which should have become Amazon but didn't. Prior to the arrival of Wal-Mart, if you couldn't find something locally (which, again, was true more often than not) your options were to drive 50-150 miles to the nearest large city, or order from the local Sears catalog merchant.
The latter wasn't what most people think of as a Sears store, because the local economy could never have supported such a thing. It was more like a small office with a counter and a stockroom behind it. They didn't keep any inventory, but could order products for pickup in about a week. Pickup, mind you. You still had to drive to town to get your order. As stupid as this sounds, it was 10x worse in person.
So if Wal-Mart didn't exist, it would have had to be invented. It was not (just) a monster that victimized smaller merchants and suppliers, a tax scam, or a plot to exploit the welfare system. It was something that needed to happen, a large gap in the market that eventually got filled.
Nowadays I wouldn't set foot in one, but it was different at the time. I didn't mean to write a long essay stanning for Wal-Mart, but your original post is a bit of a pet peeve.
Yeah, and because of those 2 words, especially "convenience", we're going to burn the planet down.
Also, did you read my original comment and miss the part about Walmart and co being predatory businesses? That's why they can keep those prices so low, because they're socializing their costs to everyone else.
If you scroll to the bottom, it says that the terms of service are governed based on the mechanism by which you access Gemini. If you access via code assist (which the OP posted), you abide by those privacy terms of code assist, one of the ways of which you access is VScode. If you access via the Gemini API, then those terms apply.
So the gemini CLI (as I understand it) doesn't have their own privacy terms, because it's an open source shell on top of another Gemini system, which could have one of a few different privacy policies based on how you choose to use it and your account settings.
(Note: I work for google, but not on this, this is just my plain reading of the documentation)
I guess the key question is whether the Gemini CLI, when used with a personal Google account, is governed by the broader Gemini Apps privacy settings here? https://myactivity.google.com/product/gemini?pli=1
If so, it appears it can be turned off. However, my CLI activity isn't showing up there?
At the bottom it specifies that the terms of service are dependent on the underlying mechanism that the user chooses to use to fulfill the requests. You can use code assist, gemini API, or Vertex AI. My layperson's perspective is that it's positioned as a wrapper around another service, whose terms you already have accepted/enabled. I would imagine that is separate from the Gemini app, the settings for which you linked to.
Looking at my own settings, my searches on the gemini app appear, but none of my gemini API queries appear.
However, as others pointed out, that link take you to here: https://developers.google.com/gemini-code-assist/resources/p... Which, at the bottom says: "If you don't want this data used to improve Google's machine learning models, you can opt out by following the steps in Set up Gemini Code Assist for individuals." and links to https://developers.google.com/gemini-code-assist/docs/set-up.... That page says "You'll also see a link to the Gemini Code Assist for individuals privacy notice and privacy settings. This link opens a page where you can choose to opt out of allowing Google to use your data to develop and improve Google's machine learning models. These privacy settings are stored at the IDE level."
The issue is that there is no IDE, this is the CLI and no such menu options exist.
Are you saying the Gemini Apps Activity switch controls? Or, that if I download VS Code or Intelli J and make the change, it applies to the CLI? https://developers.google.com/gemini-code-assist/docs/set-up... says "These privacy settings are stored at the IDE level."
"1. Is my code, including prompts and answers, used to train Google's models?
This depends entirely on the type of auth method you use.
Auth method 1: Yes. When you use your personal Google account, the Gemini Code Assist Privacy Notice for Individuals applies. Under this notice, your prompts, answers, and related code are collected and may be used to improve Google's products, which includes model training."
The opt out appear to be about other type of stats, no?
Yes, I'm right about to trust Google to do what they pinky swear.
EDIT: Lmao, case in point, two sibling comments pointing out that Google does indeed do this anyway via some loophole; also they can just retain the data and change the policy unilaterally in the future.
If you want privacy do it local with Free software.
To be honest this is by far the most frustrating part of the Gemini ecosystem, to me. I think 2.5 pro is probably the best model out there right now, and I'd love to use it for real work, but their privacy policies are so fucking confusing and disjointed that I just assume there is no privacy whatsoever. And that's with the expensive Pro Plus Ultra MegaMax Extreme Gold plan I'm on.
I hope this is something they're working on making clearer.
In my own experience, 2.5 Pro 03-26 was by far the best LLM model at the time.
The newer models are quantized and distilled (I confirmed this with someone who works on the team), and are a significantly worse experience. I prefer OpenAI O3 and o4-mini models to Gemini 2.5 Pro for general knowledge tasks, and Sonnet 4 for coding.
For coding in my experience Claude Sonnet/Opus 4.0 is hands down better than Gemini 2.5. pro. I just end up fighting with Claude a lot less than I do with Gemini. I had Gemini start a project that involved creating a recursive descent parser for a language in C. It was full of segfaults. I'd ask Gemini to fix them and it would end up breaking something else and then we'd get into a loop. Finally I had Claude Sonnet 4.0 take a look at the code that Gemini had created. It fixed the segfaults in short order and was off adding new features - even anticipating features that I'd be asking for.
Did you try Gemini with a fresh prompt too when comparing against Claude? Sometimes you just get better results starting over with any leading model, even if it gets access to the old broken code to fix.
I haven't tried Gemini since the latest updates, but earlier ones seemed on par with opus.
If I'm being cynical, it's easy to either say "we use it" or "we don't touch it" but they'd lose everyone that cares about this question if they just said "we use it" - most beneficial position is to keep it as murky as possible.
If I were you I'd assume they're using all of it for everything forever and act accordingly.
Hey all,
This is a really great discussion, and you've raised some important points. We realize the privacy policies for the Gemini CLI were confusing depending on how you log in, and we appreciate you calling that out.
To clear everything up, we've put together a single doc that breaks down the Terms of Service and data policies for each account type, including an FAQ that covers the questions from this thread.
Is there any way for a user using the "Login with Google ... for individuals" auth method (I guess auth method 1) -- to opt-out of, and prevent, their input prompts, and output responses, from being used as training data?
From an initial parse of your linked tos-privacy.md doc, it seems like the answer is "no" -- but that seems bonkers to me, so I hope I'm misreading or misunderstanding something!
I think you did a good job CYA on this, but what people were really looking for was a way to opt-out of Google collecting code, similar to the opt-out process for the IDE is available.
Usage statistics includes "your prompts and answers", see the last paragraph in the ToS. I have no idea why legal insists we write "statistics" rather than "data".
So does that mean that if you "opt out", Google _won't_ use your code for training, even on a personal/free plan?
### 1. Is my code, including prompts and answers, used to train Google's models?
This depends entirely on the type of auth method you use.
- *Auth method 1:* Yes. When you use your personal Google account, the Gemini Code Assist Privacy Notice for Individuals applies. Under this notice, your *prompts, answers, and related code are collected* and may be used to improve Google's products, which includes model training.
### 2. What are "Usage Statistics" and what does the opt-out control?
The "Usage Statistics" setting is the single control for all optional data collection in the Gemini CLI. The data it collects depends on your account type:
- *Auth method 1:* When enabled, this setting allows Google to collect both anonymous telemetry (like commands run and performance metrics) and *your prompts and answers* for model improvement.
Does this mean that for a personal account, your data is always "collected", but the opt out may prevent your data from being used for training? Also, the question was about "code", but this addresses only addresses "prompts and answers". Is code covered under prompts? The first FAQ lists "*prompts, answers, and related code are collected*" as separate items so it's still not clear what happens to code and if there's a way to opt out from your code being used for model training IMO.
Thanks, one more clarification please. The heading of point #3 seems to mention Google Workspace: "3. Login with Google (for Workspace or Licensed Code Assist users)". But the text content only talks about Code Assist: "For users of Standard or Enterprise edition of Gemini Code Assist" ... Could you clarify whether point #3 applies with login via Google Workspace Business accounts?
This is useful, and directly contradicts the terms and conditions for Gemini CLI (edit: if you use the personal account, then its governed under the Code Assist T&C). I wonder which one is true?
If you're using Gemini CLI through your personal Google account, then you are using Gemini Code Assist license and need to follow the T&C for that. Very confusing.
Collection means it gets sent to a server, logging implies (permanent or temporary) retention of that data. I tried finding a specific line or context in their privacy policy to link to but maybe someone else can help me provide a good reference. Logging is a form of collection but not everything collected is logged unless mentioned as such.
It would be more accurate to say I packaged it. llamafile is a project I did for Mozilla Builders where we compiled llama.cpp with cosmopolitan libc so that LLMs can be portable binaries. https://builders.mozilla.org/ Last year I concatenated the Gemma weights onto llamafile and called it gemmafile and it got hundreds of thousands of downloads. https://x.com/JustineTunney/status/1808165898743878108 I currently work at Google on Gemini improving TPU performance. The point is that if you want to run this stuff 100% locally, you can. Myself and others did a lot of work to make that possible.
My early contributions to https://github.com/jart/cosmopolitan were focused towards getting a single-file Python executable. I wanted my Python scripts to run on both Windows and Linux, and now they do. To try out Python, you can:
Adding pure-Python libraries just means downloading the wheel and adding files to the binary using the zip command:
./python.com -m pip download Click
mkdir -p Lib && cd Lib
unzip ../click*.whl
cd ..
zip -qr ./python.com Lib/
./python.com # can now import click
Cosmopolitan Libc provides some nice APIs to load arguments at startup, like cosmo_args() [1], if you'd like to run the Python binary as a specific program. For example, you could set the startup arguments to `-m datasette`.
Gemma 27b can write working code in dozens of programming languages. It can even translate between languages. It's obviously not as good as Gemini, which is the best LLM in the world, but Gemma is built from the same technology that powers Gemini and Gemma is impressively good for something that's only running locally on your CPU or GPU. It's a great choice for airgapped environments. Especially if you use old OSes like RHEL5.
It may be sufficient for generating serialized data and for some level of autocomplete but not for any serious agentic coding where you won't end up wasting time. Maybe some junior level programmers may find it still fascinating but senior level programmers end up fighting with bad design choices, poor algorithms and other verbose garbage most of the time. This happens even with the best models.
> senior level programmers end up fighting with bad design choices, poor algorithms and other verbose garbage most of the time. This happens even with the best models.
Even senior programmers can misuse tools, happens to all of us. LLMs sucks at software design, choosing algorithms and are extremely crap unless you exactly tell them what to do and what not to do. I leave the designing to myself, and just use OpenAI and local models for implementation, and with proper system prompting you can get OK code.
But you need to build up a base-prompt you can reuse, by basically describing what is good code for you, as it differs quite a bit from person to person. This is what I've been using as a base for agent use: https://gist.github.com/victorb/1fe62fe7b80a64fc5b446f82d313..., but need adjustments depending on the specific use case
Although I've tried to steer Google's models in a similar way, most of them are still overly verbose and edit-happy, not sure if it's some Google practice that leaked through or something. Other models are way easier to stop from outputting so much superfluous code, and overall following system prompts.
I've spent a long time with models, gemma-3-27b feels distilled from Gemini 1.5. I think the useful coding abilities really started to emerge with 2.5.
This is just for free use (individuals), for standard and enterprise they don't use the data.
Which pretty much means if you are using it for free, they are using your data.
I don't see what is alarming about this, everyone else has either the same policy or no free usage. Hell the surprising this is that they still let free users opt-out...
That bears no relation to OpenAI using data for training purposes. Although the court’s decision is problematic, user data is being kept for legal purposes only, and OpenAI is not authorized to use it to train its models.
I mean, using data that has been explicitly opted out of training paves the way for lawsuits and huge administrative fines in various jurisdictions. I might be naive, but I don’t think that’s something OpenAI would deliberately do.
They really need to provide some clarity on the terms around data retention and training, for users who access Gemini CLI free via sign-in to a personal Google account. It's not clear whether the Gemini Code Assist terms are relevant, or indeed which of the three sets of terms they link at the bottom of the README.md apply here.
Thank you, this is helpful, though I am left somewhat confused as a "1. Login with Google" user.
* The first section states "Privacy Notice: The collection and use of your data are described in the Gemini Code Assist Privacy Notice for Individuals." That in turn states "If you don't want this data used to improve Google's machine learning models, you can opt out by following the steps in Set up Gemini Code Assist for individuals.". That page says to use the VS Code Extension to change some toggle, but I don't have that extension. It states the extension will open "a page where you can choose to opt out of allowing Google to use your data to develop and improve Google's machine learning models." I can't find this page.
* Then later we have this FAQ: "1. Is my code, including prompts and answers, used to train Google's models? This depends entirely on the type of auth method you use. Auth method 1: Yes. When you use your personal Google account, the Gemini Code Assist Privacy Notice for Individuals applies. Under this notice, your prompts, answers, and related code are collected and may be used to improve Google's products, which includes model training." This implies Login with Google users have no way to opt out of having their code used to train Google's models.
* But then in the final section we have: "The "Usage Statistics" setting is the single control for all optional data collection in the Gemini CLI. The data it collects depends on your account type: Auth method 1: When enabled, this setting allows Google to collect both anonymous telemetry (like commands run and performance metrics) and your prompts and answers for model improvement." This implies prompts and answers for model improvement are considered part of "Usage Statistics", and that "You can disable Usage Statistics for any account type by following the instructions in the Usage Statistics Configuration documentation."
So these three sections appear contradictory, and I'm left puzzled and confused. It's a poor experience compared to competitors like GitHub Copilot, which make opting out of model training simple and easy via a simple checkbox in the GitHub Settings page - or Claude Code, where Anthropic has a policy that code will never be used for training unless the user specifically opts in, e.g. via the reporting mechanism.
I'm sure it's a great product - but this is, for me, a major barrier to adoption for anything serious.
Kinda a tragedy of the commons situation. Everyone wants to use these tools that must be trained on more and more code to get better, but nobody wants it to be trained on their code. Bit silly imo.
Do you honestly believe that the opt-out by Anthropic and Cursor means your code won't be used for training their models? Seems likely that they would rather just risk taking a massive fine for potentially solving software development than to let some competitor try it instead.
> For API users, we automatically delete inputs and outputs on our backend within 30 days of receipt or generation, except when you and we have agreed otherwise (e.g. zero data retention agreement), if we need to retain them for longer to enforce our Usage Policy (UP), or comply with the law.
If this is due to compliance with law I wonder how they can make the zero-data-retention agreement work... The companies I've seen have this have not mention that they themself retain the data...
>Anthropic spent "many millions of dollars" buying used print books, then stripped off the bindings, cut the pages, and scanned them into digital files.
The judge, Alsup J, ruled that this was lawful.
So they cared at least a bit, enough to spend a lot of money buying books. But they didn't care enough not to acquire online libraries held apparently without proper licensing.
>Alsup wrote that Anthropic preferred to "steal" books to "avoid 'legal/practice/business slog,' as cofounder and CEO Dario Amodei put it."
Aside: using the term steal for copyright infringement is a particularly egregious misuse for a judge who should know that stealing requires denying others of the use of the stolen articles; something which copyright infringement via an online text repository simple could not do.
Using torrented books in a way that possibly (well, almost certainly) violates copyright law is a world of difference from going after your own customers (and revenue) in a way that directly violates the contract that you wrote and had them agree to.
Insane to me there isn’t even an asterisk in the blog post about this. The data collection is so over the top I don’t think users suspect it because it’s just absurd. For instance Gemini Pro chats are trained on too.
good to know, I won't be using this. Curious, do you know if OpenAI codex and Claude also do the same? I was under the impression that they don't share code.
I still have yet to replace a single application with an LLM, except for (ironically?) Google search.
I still use all the same applications as part of my dev work/stack as I did in the early 2020's. The only difference is occasionally using an LLM baked into to one of them but the reality is I don't do that much.
https://developers.google.com/gemini-code-assist/resources/p...
When you use Gemini Code Assist for individuals, Google collects your prompts, related code, generated output, code edits, related feature usage information, and your feedback to provide, improve, and develop Google products and services and machine learning technologies.
To help with quality and improve our products (such as generative machine-learning models), human reviewers may read, annotate, and process the data collected above. We take steps to protect your privacy as part of this process. This includes disconnecting the data from your Google Account before reviewers see or annotate it, and storing those disconnected copies for up to 18 months. Please don't submit confidential information or any data you wouldn't want a reviewer to see or Google to use to improve our products, services, and machine-learning technologies.