If you use this, all of your code data will be sent to Google. From their terms:...

mattzito · 2025-06-25T16:44:21 1750869861

It's a lot more nuanced than that. If you use the free edition of Code Assist, your data can be used UNLESS you opt out, which is at the bottom of the support article you link to:

"If you don't want this data used to improve Google's machine learning models, you can opt out by following the steps in Set up Gemini Code Assist for individuals."

and then the link: https://developers.google.com/gemini-code-assist/docs/set-up...

If you pay for code assist, no data is used to improve. If you use a Gemini API key on a pay as you go account instead, it doesn't get used to improve. It's just if you're using a non-paid, consumer account and you didn't opt out.

That seems different than what you described.

foob · 2025-06-25T18:21:41 1750875701

your data can be used UNLESS you opt out

It's even more nuanced than that.

Google recently testified in court that they still train on user data after users opt out from training [1]. The loophole is that the opt-out only applies to one organization within Google, but other organizations are still free to train on the data. They may or may not have cleaned up their act given that they're under active investigation, but their recent actions haven't exactly earned them the benefit of the doubt on this topic.

[1] https://www.business-standard.com/technology/tech-news/googl...

TrainedMonkey · 2025-06-25T19:01:23 1750878083

Another dimension here is that any "we don't train on your data" is useless without a matching data retention policy which deletes your data. Case and point of 23andMe not selling your data until they decided to change that policy.

Arisaka1 · 2025-06-25T20:03:08 1750881788

I'll go ahead and say that, even if there was a method that deletes your data when you request it, nothing stops them from using that data to train the model up until that point, which is "good enough" for them.

decimalenough · 2025-06-25T19:54:42 1750881282

Google offers a user-configurable retention policy for all data.

https://support.google.com/accounts/answer/10549751

That said, once your data is inside an LLM, you can't really unscramble the omelette.

elictronic · 2025-06-25T20:30:05 1750883405

Lawsuits and laws seem to work just fine at unscrambling. Once a company has a fiscal interest they seem to change very quickly.

_cs2017_ · 2025-06-26T07:56:11 1750924571

This is incorrect. The data discussed in court is data freely visible on the web, not user data that the users sent to Google.

If the data is sent by a user to sub-unit X of Google, and X promised not to use it for training, it implies that X can share this data with sub-unit Y only if Y also commits not to use the data for training. Breaking this rule would get everyone in huge trouble.

OTOH, when sub-unit X said "We promise not to use data from the public website if the website owner asks us not to", it does not imply another sub-unit Y must follow that commitment.

Melatonic · 2025-06-25T22:17:19 1750889839

Hopefully this doesn't apply to corporate accounts where they claim to be respecting privacy via contracts

sheepscreek · 2025-06-25T22:41:02 1750891262

Reading about all the nuances is such a trigger for me. To cover your ass is one thing, to imply one thing in a lay sense and go on to do something contradicting it (in bad faith) is douchebaggery. I am very sad and deeply disappointed at Google for this. This completes their transformation to Evil Corp after repealing the “don’t be evil” clause in their code of conduct[1].

[1] https://en.m.wikipedia.org/wiki/Don't_be_evil

echelon · 2025-06-25T19:46:40 1750880800

We need to stop giving money and data to hyperscalers.

We need open infrastructure and models.

Xss3 · 2025-06-25T20:19:55 1750882795

People said the same thing about shopping at walmart instead of locally.

oblio · 2025-06-25T20:34:57 1750883697

Isn't that as toxic? I've read a bunch about Walmart and the whole thing is basically a scam.

They get a ton of tax incentives, subsidies, etc to build shoddy infrastructure that can only be used for big box stores (pretty much), so the end cost for Walmart to build their stores is quite low.

They promise to employ lots of locals, but many of those jobs are intentionally paid so low that they're not actually living wages and employees are intentionally driven to government help (food stamps, etc), and together with other various tax cuts, etc, there's a chance that even their labor costs are basically at break even.

Integrated local stores are better for pretty much everything except having a huge mass to throw around and bully, bribe (pardon me, lobby) and fool (aka persuade aka PR/marketing).

CamperBob2 · 2025-06-25T21:25:07 1750886707

Integrated local stores are better for pretty much everything except for actually having what you want in stock.

There is a reason why rural communities welcome Wal-Mart with open arms. Not such a big deal now that you can mail-order anything more-or-less instantly, but back in the 80s when I was growing up in BFE, Wal-Mart was a godsend.

kortilla · 2025-06-26T15:17:19 1750951039

This hasn’t changed much. In rural communities there isn’t same day or even over-night Amazon.

It may have shifted where people buy things they can wait for, but for weekly shopping I don’t think it has.

oblio · 2025-06-26T12:18:40 1750940320

The 80s were 40 years ago, though. Something can outlive its usefulness.

CamperBob2 · 2025-06-26T15:00:02 1750950002

True. A good example being Sears, which should have become Amazon but didn't. Prior to the arrival of Wal-Mart, if you couldn't find something locally (which, again, was true more often than not) your options were to drive 50-150 miles to the nearest large city, or order from the local Sears catalog merchant.

The latter wasn't what most people think of as a Sears store, because the local economy could never have supported such a thing. It was more like a small office with a counter and a stockroom behind it. They didn't keep any inventory, but could order products for pickup in about a week. Pickup, mind you. You still had to drive to town to get your order. As stupid as this sounds, it was 10x worse in person.

So if Wal-Mart didn't exist, it would have had to be invented. It was not (just) a monster that victimized smaller merchants and suppliers, a tax scam, or a plot to exploit the welfare system. It was something that needed to happen, a large gap in the market that eventually got filled.

Nowadays I wouldn't set foot in one, but it was different at the time. I didn't mean to write a long essay stanning for Wal-Mart, but your original post is a bit of a pet peeve.

s1artibartfast · 2025-06-26T17:14:09 1750958049

Local stores are better in many ways, but not the ones consumers care about: price and convenience.

oblio · 2025-06-26T19:59:08 1750967948

Yeah, and because of those 2 words, especially "convenience", we're going to burn the planet down.

Also, did you read my original comment and miss the part about Walmart and co being predatory businesses? That's why they can keep those prices so low, because they're socializing their costs to everyone else.

kortilla · 2025-06-26T15:15:27 1750950927

You forgot the part where nobody wants to shop at local stores and pay twice as much for 1/4 of the inventory.

Walmart spread so successfully precisely because so many people immediately started shopping there for all of the basics.

vpShane · 2025-06-25T19:59:31 1750881571

See, these are the things that are most concerning to me. Just because we 'opt out' means nothing, and to what extent with what AI companies.

When I click 'OPT OUT' I mean, 'don't use my data, show me how you're respecting my privacy'

ipsum2 · 2025-06-25T17:01:02 1750870862

Sorry, that's not correct. Did you check out the link? It doesn't describe the CLI, only the IDE.

"You can find the Gemini Code Assist for individuals privacy notice and settings in two ways:

- VS Code - IntelliJ "

mattzito · 2025-06-25T18:48:12 1750877292

That's because it's a bit of a nesting doll situation. As you can see here:

https://github.com/google-gemini/gemini-cli/tree/main

If you scroll to the bottom, it says that the terms of service are governed based on the mechanism by which you access Gemini. If you access via code assist (which the OP posted), you abide by those privacy terms of code assist, one of the ways of which you access is VScode. If you access via the Gemini API, then those terms apply.

So the gemini CLI (as I understand it) doesn't have their own privacy terms, because it's an open source shell on top of another Gemini system, which could have one of a few different privacy policies based on how you choose to use it and your account settings.

(Note: I work for google, but not on this, this is just my plain reading of the documentation)

ipsum2 · 2025-06-25T19:29:42 1750879782

My understanding is that they have not implemented an opt-out feature for Gemini CLI, like they've done for VSCode and Jetbrains.

fhinkel · 2025-06-25T22:40:13 1750891213

We have! Sorry our docs were confusing! We tried to clear things up https://github.com/google-gemini/gemini-cli/blob/main/docs/t...

tiahura · 2025-06-25T17:28:57 1750872537

As a lawyer, I'm confused.

I guess the key question is whether the Gemini CLI, when used with a personal Google account, is governed by the broader Gemini Apps privacy settings here? https://myactivity.google.com/product/gemini?pli=1

If so, it appears it can be turned off. However, my CLI activity isn't showing up there?

Can someone from Google clarify?

mattzito · 2025-06-25T18:54:06 1750877646

I am very much not a lawyer, and while I work for Google, I do not work on this, and this is just my plain language reading of the docs.

When you look at the github repo for the gemini CLI:

https://github.com/google-gemini/gemini-cli/tree/main

At the bottom it specifies that the terms of service are dependent on the underlying mechanism that the user chooses to use to fulfill the requests. You can use code assist, gemini API, or Vertex AI. My layperson's perspective is that it's positioned as a wrapper around another service, whose terms you already have accepted/enabled. I would imagine that is separate from the Gemini app, the settings for which you linked to.

Looking at my own settings, my searches on the gemini app appear, but none of my gemini API queries appear.

tiahura · 2025-06-25T19:03:26 1750878206

Thanks for trying to clarify.

However, as others pointed out, that link take you to here: https://developers.google.com/gemini-code-assist/resources/p... Which, at the bottom says: "If you don't want this data used to improve Google's machine learning models, you can opt out by following the steps in Set up Gemini Code Assist for individuals." and links to https://developers.google.com/gemini-code-assist/docs/set-up.... That page says "You'll also see a link to the Gemini Code Assist for individuals privacy notice and privacy settings. This link opens a page where you can choose to opt out of allowing Google to use your data to develop and improve Google's machine learning models. These privacy settings are stored at the IDE level."

The issue is that there is no IDE, this is the CLI and no such menu options exist.

fhinkel · 2025-06-25T22:40:50 1750891250

It applies to Gemini CLI too. We've tried to clear up our docs, apologies for the confusion. https://github.com/google-gemini/gemini-cli/blob/main/docs/t...

tiahura · 2025-06-25T23:06:11 1750892771

Thanks, however I'm not sure I've got it.

Are you saying the Gemini Apps Activity switch controls? Or, that if I download VS Code or Intelli J and make the change, it applies to the CLI? https://developers.google.com/gemini-code-assist/docs/set-up... says "These privacy settings are stored at the IDE level."

fhinkel · 2025-06-25T22:39:25 1750891165

Sorry our docs were confusing! We tried to clear things up: https://github.com/google-gemini/gemini-cli/blob/main/docs/t...

olejorgenb · 2025-06-26T10:43:59 1750934639

"1. Is my code, including prompts and answers, used to train Google's models?

This depends entirely on the type of auth method you use.

    Auth method 1: Yes. When you use your personal Google account, the Gemini Code Assist Privacy Notice for Individuals applies. Under this notice, your prompts, answers, and related code are collected and may be used to improve Google's products, which includes model training."

The opt out appear to be about other type of stats, no?

dinvlad · 2025-06-26T16:11:04 1750954264

Thanks a lot for clarifying in plain language! Makes sense re basically anything paid is NOT used for training, free - does.

Off-topic, but I wish this kind of plain language doc existed for Google One vs Google Workspace as well.

aflukasz · 2025-06-25T19:27:54 1750879674

> It's a lot more nuanced than that. If you use the free edition of Code Assist, your data can be used UNLESS you opt out,

Well... you are sending your data to a remote location that is not yours.

andrepd · 2025-06-25T22:38:52 1750891132

Yes, I'm right about to trust Google to do what they pinky swear.

EDIT: Lmao, case in point, two sibling comments pointing out that Google does indeed do this anyway via some loophole; also they can just retain the data and change the policy unilaterally in the future.

If you want privacy do it local with Free software.

8n4vidtmkvmk · 2025-06-26T04:31:28 1750912288

Do you have recommendations? I have ollama but it doesn't have built in tool support

FL410 · 2025-06-25T18:42:51 1750876971

To be honest this is by far the most frustrating part of the Gemini ecosystem, to me. I think 2.5 pro is probably the best model out there right now, and I'd love to use it for real work, but their privacy policies are so fucking confusing and disjointed that I just assume there is no privacy whatsoever. And that's with the expensive Pro Plus Ultra MegaMax Extreme Gold plan I'm on.

I hope this is something they're working on making clearer.

ipsum2 · 2025-06-25T19:22:21 1750879341

In my own experience, 2.5 Pro 03-26 was by far the best LLM model at the time.

The newer models are quantized and distilled (I confirmed this with someone who works on the team), and are a significantly worse experience. I prefer OpenAI O3 and o4-mini models to Gemini 2.5 Pro for general knowledge tasks, and Sonnet 4 for coding.

happycube · 2025-06-25T22:27:35 1750890455

Gah, enforced enshittification with model deprecation is so annoying.

UncleOxidant · 2025-06-25T19:52:52 1750881172

For coding in my experience Claude Sonnet/Opus 4.0 is hands down better than Gemini 2.5. pro. I just end up fighting with Claude a lot less than I do with Gemini. I had Gemini start a project that involved creating a recursive descent parser for a language in C. It was full of segfaults. I'd ask Gemini to fix them and it would end up breaking something else and then we'd get into a loop. Finally I had Claude Sonnet 4.0 take a look at the code that Gemini had created. It fixed the segfaults in short order and was off adding new features - even anticipating features that I'd be asking for.

cma · 2025-06-25T22:38:38 1750891118

Did you try Gemini with a fresh prompt too when comparing against Claude? Sometimes you just get better results starting over with any leading model, even if it gets access to the old broken code to fix.

I haven't tried Gemini since the latest updates, but earlier ones seemed on par with opus.

dmbche · 2025-06-25T19:20:20 1750879220

If I'm being cynical, it's easy to either say "we use it" or "we don't touch it" but they'd lose everyone that cares about this question if they just said "we use it" - most beneficial position is to keep it as murky as possible.

If I were you I'd assume they're using all of it for everything forever and act accordingly.

fhinkel · 2025-06-25T22:37:49 1750891069

Hey all, This is a really great discussion, and you've raised some important points. We realize the privacy policies for the Gemini CLI were confusing depending on how you log in, and we appreciate you calling that out.

To clear everything up, we've put together a single doc that breaks down the Terms of Service and data policies for each account type, including an FAQ that covers the questions from this thread.

Here’s the link: https://github.com/google-gemini/gemini-cli/blob/main/docs/t...

Thanks again for pushing for clarity on this!

kiitos · 2025-06-26T15:01:21 1750950081

Is there any way for a user using the "Login with Google ... for individuals" auth method (I guess auth method 1) -- to opt-out of, and prevent, their input prompts, and output responses, from being used as training data?

From an initial parse of your linked tos-privacy.md doc, it seems like the answer is "no" -- but that seems bonkers to me, so I hope I'm misreading or misunderstanding something!

ipsum2 · 2025-06-25T23:26:40 1750894000

I think you did a good job CYA on this, but what people were really looking for was a way to opt-out of Google collecting code, similar to the opt-out process for the IDE is available.

dcreater · 2025-06-26T01:04:54 1750899894

Yeah how is opt out of data collection not an option? This is what they mean by don't be evil and Google is proving yet again that they truly are

fhinkel · 2025-06-26T09:40:53 1750930853

Usage statistics includes "your prompts and answers", see the last paragraph in the ToS. I have no idea why legal insists we write "statistics" rather than "data".

ibrahima · 2025-06-26T22:44:06 1750977846

So does that mean that if you "opt out", Google _won't_ use your code for training, even on a personal/free plan?

### 1. Is my code, including prompts and answers, used to train Google's models?

This depends entirely on the type of auth method you use.

- *Auth method 1:* Yes. When you use your personal Google account, the Gemini Code Assist Privacy Notice for Individuals applies. Under this notice, your *prompts, answers, and related code are collected* and may be used to improve Google's products, which includes model training.

### 2. What are "Usage Statistics" and what does the opt-out control?

The "Usage Statistics" setting is the single control for all optional data collection in the Gemini CLI. The data it collects depends on your account type:

- *Auth method 1:* When enabled, this setting allows Google to collect both anonymous telemetry (like commands run and performance metrics) and *your prompts and answers* for model improvement.

Does this mean that for a personal account, your data is always "collected", but the opt out may prevent your data from being used for training? Also, the question was about "code", but this addresses only addresses "prompts and answers". Is code covered under prompts? The first FAQ lists "*prompts, answers, and related code are collected*" as separate items so it's still not clear what happens to code and if there's a way to opt out from your code being used for model training IMO.

dcreater · 2025-06-27T18:34:18 1751049258

> I have no idea why legal insists we write "statistics" rather than "data".

that's a read flag. That's a weasel word in my opinion. If we took Google to court, you can easily say "statistics" does not include user's code.

FL410 · 2025-06-26T02:01:14 1750903274

I appreciate the effort here, but I’m still confused. Is my $250/mo “ultra” plan considered personal and still something you train on?

yuuluuu · 2025-06-26T03:04:34 1750907074

Thanks for clarification!

HenriNext · 2025-06-25T22:55:06 1750892106

Thanks, one more clarification please. The heading of point #3 seems to mention Google Workspace: "3. Login with Google (for Workspace or Licensed Code Assist users)". But the text content only talks about Code Assist: "For users of Standard or Enterprise edition of Gemini Code Assist" ... Could you clarify whether point #3 applies with login via Google Workspace Business accounts?

fhinkel · 2025-06-25T23:21:58 1750893718

Yes it does.

mil22 · 2025-06-25T17:11:06 1750871466

There is some information on this buried in configuration.md under "Usage Statistics". They claim:

*What we DON'T collect:*

- *Personally Identifiable Information (PII):* We do not collect any personal information, such as your name, email address, or API keys.

- *Prompt and Response Content:* We do not log the content of your prompts or the responses from the Gemini model.

- *File Content:* We do not log the content of any files that are read or written by the CLI.

https://github.com/google-gemini/gemini-cli/blob/0915bf7d677...

ipsum2 · 2025-06-25T17:41:41 1750873301

This is useful, and directly contradicts the terms and conditions for Gemini CLI (edit: if you use the personal account, then its governed under the Code Assist T&C). I wonder which one is true?

fhinkel · 2025-06-25T18:03:04 1750874584

Thanks for pointing that out, we're working on clarifying!

ipsum2 · 2025-06-25T19:24:34 1750879474

When should we expect to see an update? I assume there'll be meetings with lawyers for the clarification.

fhinkel · 2025-06-25T22:42:00 1750891320

Yes, there were meetings and lawyers. We just merged the update. Hopefully it's much clearer now: https://github.com/google-gemini/gemini-cli/blob/main/docs/t...

mil22 · 2025-06-25T17:54:27 1750874067

Where did you find the terms and conditions for Gemini CLI? In https://github.com/google-gemini/gemini-cli/blob/main/README..., I find only links to the T&Cs for the Gemini API, Gemini Code Assist (a different product?), and Vertex AI.

ipsum2 · 2025-06-25T18:16:40 1750875400

If you're using Gemini CLI through your personal Google account, then you are using Gemini Code Assist license and need to follow the T&C for that. Very confusing.

datameta · 2025-06-25T18:18:33 1750875513

Can a lawyer offer their civilian opinion as to which supercedes/governs?

jdironman · 2025-06-25T17:19:16 1750871956

I wonder what the legal difference between "collect" and "log" is.

kevindamm · 2025-06-25T17:28:25 1750872505

Collection means it gets sent to a server, logging implies (permanent or temporary) retention of that data. I tried finding a specific line or context in their privacy policy to link to but maybe someone else can help me provide a good reference. Logging is a form of collection but not everything collected is logged unless mentioned as such.

jart · 2025-06-25T16:22:43 1750868563

Mozilla and Google provide an alternative called gemmafile which gives you an airgapped version of Gemini (which Google calls Gemma) that runs locally in a single file without any dependencies. https://huggingface.co/jartine/gemma-2-27b-it-llamafile It's been deployed into production by 32% of organizations: https://www.wiz.io/reports/the-state-of-ai-in-the-cloud-2025

ipsum2 · 2025-06-25T17:29:56 1750872596

There's nothing wrong with promoting your own projects, but its a little weird that you don't disclose that you're the creator.

jart · 2025-06-25T18:14:47 1750875287

It would be more accurate to say I packaged it. llamafile is a project I did for Mozilla Builders where we compiled llama.cpp with cosmopolitan libc so that LLMs can be portable binaries. https://builders.mozilla.org/ Last year I concatenated the Gemma weights onto llamafile and called it gemmafile and it got hundreds of thousands of downloads. https://x.com/JustineTunney/status/1808165898743878108 I currently work at Google on Gemini improving TPU performance. The point is that if you want to run this stuff 100% locally, you can. Myself and others did a lot of work to make that possible.

elbear · 2025-06-25T18:58:40 1750877920

I keep meaning to investigate how I can use your tools to create single-file executables for Python projects, so thanks for posting and reminding me.

ahgamut · 2025-06-25T20:32:02 1750883522

My early contributions to https://github.com/jart/cosmopolitan were focused towards getting a single-file Python executable. I wanted my Python scripts to run on both Windows and Linux, and now they do. To try out Python, you can:

    wget https://cosmo.zip/pub/cosmos/bin/python -qO python.com
    chmod +x python.com
    ./python.com

Adding pure-Python libraries just means downloading the wheel and adding files to the binary using the zip command:

    ./python.com -m pip download Click
    mkdir -p Lib && cd Lib
    unzip ../click*.whl
    cd ..
    zip -qr ./python.com Lib/
    ./python.com # can now import click

Cosmopolitan Libc provides some nice APIs to load arguments at startup, like cosmo_args() [1], if you'd like to run the Python binary as a specific program. For example, you could set the startup arguments to `-m datasette`.

[1]: https://github.com/jart/cosmopolitan/commit/4e9566cd3328626d...

nicce · 2025-06-25T16:37:46 1750869466

That is just Gemma model. Most people seek capabilities equivalent for Gemini 2.5 Pro if they want to do any kind of coding.

jart · 2025-06-25T16:43:19 1750869799

Gemma 27b can write working code in dozens of programming languages. It can even translate between languages. It's obviously not as good as Gemini, which is the best LLM in the world, but Gemma is built from the same technology that powers Gemini and Gemma is impressively good for something that's only running locally on your CPU or GPU. It's a great choice for airgapped environments. Especially if you use old OSes like RHEL5.

nicce · 2025-06-25T17:39:59 1750873199

It may be sufficient for generating serialized data and for some level of autocomplete but not for any serious agentic coding where you won't end up wasting time. Maybe some junior level programmers may find it still fascinating but senior level programmers end up fighting with bad design choices, poor algorithms and other verbose garbage most of the time. This happens even with the best models.

diggan · 2025-06-25T17:50:41 1750873841

> senior level programmers end up fighting with bad design choices, poor algorithms and other verbose garbage most of the time. This happens even with the best models.

Even senior programmers can misuse tools, happens to all of us. LLMs sucks at software design, choosing algorithms and are extremely crap unless you exactly tell them what to do and what not to do. I leave the designing to myself, and just use OpenAI and local models for implementation, and with proper system prompting you can get OK code.

But you need to build up a base-prompt you can reuse, by basically describing what is good code for you, as it differs quite a bit from person to person. This is what I've been using as a base for agent use: https://gist.github.com/victorb/1fe62fe7b80a64fc5b446f82d313..., but need adjustments depending on the specific use case

Although I've tried to steer Google's models in a similar way, most of them are still overly verbose and edit-happy, not sure if it's some Google practice that leaked through or something. Other models are way easier to stop from outputting so much superfluous code, and overall following system prompts.

ipsum2 · 2025-06-25T19:26:03 1750879563

I've spent a long time with models, gemma-3-27b feels distilled from Gemini 1.5. I think the useful coding abilities really started to emerge with 2.5.

seunosewa · 2025-06-25T17:39:52 1750873192

The technology that powers Gemini created duds until Gemini 2.5 Pro; 2.5 Pro is the prize.

Workaccount2 · 2025-06-25T16:52:42 1750870362

This is just for free use (individuals), for standard and enterprise they don't use the data.

Which pretty much means if you are using it for free, they are using your data.

I don't see what is alarming about this, everyone else has either the same policy or no free usage. Hell the surprising this is that they still let free users opt-out...

thimabi · 2025-06-25T17:13:20 1750871600

> everyone else has either the same policy or no free usage

That’s not true. ChatGPT, even in the free tier, allows users to opt out of data sharing.

aargh_aargh · 2025-06-25T18:01:39 1750874499

This court decision begs to differ:

https://www.reuters.com/business/media-telecom/openai-appeal...

thimabi · 2025-06-25T18:07:29 1750874849

That bears no relation to OpenAI using data for training purposes. Although the court’s decision is problematic, user data is being kept for legal purposes only, and OpenAI is not authorized to use it to train its models.

netdur · 2025-06-25T19:13:46 1750878826

you must be naive to think OpenAI does not train on your data, Altman is infamous for deceiving claims.

thimabi · 2025-06-25T19:37:14 1750880234

I mean, using data that has been explicitly opted out of training paves the way for lawsuits and huge administrative fines in various jurisdictions. I might be naive, but I don’t think that’s something OpenAI would deliberately do.

joshuacc · 2025-06-25T17:40:24 1750873224

I believe they are talking about the OpenAI API, not ChatGPT.

mil22 · 2025-06-25T17:03:13 1750870993

They really need to provide some clarity on the terms around data retention and training, for users who access Gemini CLI free via sign-in to a personal Google account. It's not clear whether the Gemini Code Assist terms are relevant, or indeed which of the three sets of terms they link at the bottom of the README.md apply here.

fhinkel · 2025-06-25T18:42:10 1750876930

Agree! We're working on it!

fhinkel · 2025-06-25T22:43:01 1750891381

Hope this is helpful, just merged: https://github.com/google-gemini/gemini-cli/blob/main/docs/t...

mil22 · 2025-06-25T23:50:13 1750895413

Thank you, this is helpful, though I am left somewhat confused as a "1. Login with Google" user.

* The first section states "Privacy Notice: The collection and use of your data are described in the Gemini Code Assist Privacy Notice for Individuals." That in turn states "If you don't want this data used to improve Google's machine learning models, you can opt out by following the steps in Set up Gemini Code Assist for individuals.". That page says to use the VS Code Extension to change some toggle, but I don't have that extension. It states the extension will open "a page where you can choose to opt out of allowing Google to use your data to develop and improve Google's machine learning models." I can't find this page.

* Then later we have this FAQ: "1. Is my code, including prompts and answers, used to train Google's models? This depends entirely on the type of auth method you use. Auth method 1: Yes. When you use your personal Google account, the Gemini Code Assist Privacy Notice for Individuals applies. Under this notice, your prompts, answers, and related code are collected and may be used to improve Google's products, which includes model training." This implies Login with Google users have no way to opt out of having their code used to train Google's models.

* But then in the final section we have: "The "Usage Statistics" setting is the single control for all optional data collection in the Gemini CLI. The data it collects depends on your account type: Auth method 1: When enabled, this setting allows Google to collect both anonymous telemetry (like commands run and performance metrics) and your prompts and answers for model improvement." This implies prompts and answers for model improvement are considered part of "Usage Statistics", and that "You can disable Usage Statistics for any account type by following the instructions in the Usage Statistics Configuration documentation."

So these three sections appear contradictory, and I'm left puzzled and confused. It's a poor experience compared to competitors like GitHub Copilot, which make opting out of model training simple and easy via a simple checkbox in the GitHub Settings page - or Claude Code, where Anthropic has a policy that code will never be used for training unless the user specifically opts in, e.g. via the reporting mechanism.

I'm sure it's a great product - but this is, for me, a major barrier to adoption for anything serious.

fastball · 2025-06-26T02:33:12 1750905192

Kinda a tragedy of the commons situation. Everyone wants to use these tools that must be trained on more and more code to get better, but nobody wants it to be trained on their code. Bit silly imo.

FiberBundle · 2025-06-25T16:59:40 1750870780

Do you honestly believe that the opt-out by Anthropic and Cursor means your code won't be used for training their models? Seems likely that they would rather just risk taking a massive fine for potentially solving software development than to let some competitor try it instead.

olejorgenb · 2025-06-25T18:24:08 1750875848

> For API users, we automatically delete inputs and outputs on our backend within 30 days of receipt or generation, except when you and we have agreed otherwise (e.g. zero data retention agreement), if we need to retain them for longer to enforce our Usage Policy (UP), or comply with the law.

If this is due to compliance with law I wonder how they can make the zero-data-retention agreement work... The companies I've seen have this have not mention that they themself retain the data...

rudedogg · 2025-06-25T17:09:47 1750871387

Yes.

The resulting class-action lawsuit would bankrupt the company, along with the reputation damage, and fines.

pera · 2025-06-25T18:57:36 1750877856

> Anthropic cut up millions of used books to train Claude — and downloaded over 7 million pirated ones too, a judge said

https://www.businessinsider.com/anthropic-cut-pirated-millio...

It doesn't look like they care at all about the law though

pbhjpbhj · 2025-06-25T21:41:23 1750887683

>Anthropic spent "many millions of dollars" buying used print books, then stripped off the bindings, cut the pages, and scanned them into digital files.

The judge, Alsup J, ruled that this was lawful.

So they cared at least a bit, enough to spend a lot of money buying books. But they didn't care enough not to acquire online libraries held apparently without proper licensing.

>Alsup wrote that Anthropic preferred to "steal" books to "avoid 'legal/practice/business slog,' as cofounder and CEO Dario Amodei put it."

Aside: using the term steal for copyright infringement is a particularly egregious misuse for a judge who should know that stealing requires denying others of the use of the stolen articles; something which copyright infringement via an online text repository simple could not do.

dghlsakjg · 2025-06-25T22:24:39 1750890279

Using torrented books in a way that possibly (well, almost certainly) violates copyright law is a world of difference from going after your own customers (and revenue) in a way that directly violates the contract that you wrote and had them agree to.

rudedogg · 2025-06-25T16:30:49 1750869049

Insane to me there isn’t even an asterisk in the blog post about this. The data collection is so over the top I don’t think users suspect it because it’s just absurd. For instance Gemini Pro chats are trained on too.

If this is legal, it shouldn’t be.

Buttons840 · 2025-06-25T20:52:34 1750884754

Is this significantly different than what we agree to when we put code on GitHub?

BryanLegend · 2025-06-25T19:00:26 1750878026

Some of my code is so bad I'm sure it will damage their models!

predkambrij · 2025-06-25T22:52:55 1750891975

seems to be straight "yes" with no opt-out. https://github.com/google-gemini/gemini-cli/blob/main/docs/t...

fnl · 2025-06-25T20:02:03 1750881723

How does that compare to Claude Code? How protected are you when using CC?

nojito · 2025-06-25T16:53:02 1750870382

>If you use this, all of your code data will be sent to Google.

Not if you pay for it.

reaperducer · 2025-06-25T16:57:13 1750870633

>If you use this, all of your code data will be sent to Google.

Not if you pay for it.

Today.

In six months, a "Terms of Service Update" e-mail will go out to an address that is not monitored by anyone.

mpalmer · 2025-06-25T19:58:53 1750881533

This sort of facile cynicism doesn't contribute anything useful. Anyone can predict a catastrophe.

reaperducer · 2025-06-25T20:20:58 1750882858

It would only be cynicism if it didn't happen all the time with seemingly every tech company, major and minor.

This is just how things are these days. The track record of Google, and most of the rest of the industry, does not inspire confidence.

nojito · 2025-06-25T17:25:09 1750872309

Sure but then you can stop paying.

There's also zero chance they will risk paying customers by changing this policy.

nprateem · 2025-06-25T19:10:15 1750878615

I'm not that bothered. Most of it came from Google or anthropic anyway

learner007 · 2025-06-26T06:47:55 1750920475

good to know, I won't be using this. Curious, do you know if OpenAI codex and Claude also do the same? I was under the impression that they don't share code.

naiv · 2025-06-25T18:27:45 1750876065

Who cares, software has no value anymore.

crat3r · 2025-06-25T18:30:09 1750876209

Sarcasm? Weird statement if not.

I still have yet to replace a single application with an LLM, except for (ironically?) Google search.

I still use all the same applications as part of my dev work/stack as I did in the early 2020's. The only difference is occasionally using an LLM baked into to one of them but the reality is I don't do that much.