One of Go's big selling points is the avoidance of the "red-blue problem" (https://journal.stuffwithstuff.com/2015/02/01/what-color-is-...). The standard Go implementation does that by using lightweight
"goroutines" that automatically suspend themselves whenever they do some operation that might block - there's no need to use an OS-level thread for each thread of control.
I wonder how the authors plan to accomplish a similar thing while remaining fully FFI-compatible.
Note that this project is not an attempt to replace the Go runtime, but here is what we will do:
Precondition: in Go, each goroutine has its own `g` structure (like TCB, but for controlling goroutines) and its address is stored in the `g` register, which is reserved by the go compiler, active goroutines can call a function runtime.getg() to obtain the value of the `g` register.
In the official runtime, the `g` structure is a concrete type and the scheduler is implemented globally in the runtime package.
But we have made the `g` structure customizable (still requires a fixed header expected by the function prologue & epilogue and linker), so custom g implementations may contain a link to a custom scheduler, and the scheduler can decide how to handle an active goroutine.
> "goroutines" that automatically suspend themselves
The automatic suspend and resume happens by
- calling `runtime.entersyscall` from a running goroutine
- and calling `runtime.exitsyscall` from an finished systemstack
all these operations can interact with the custom `g`, thus can interact with the custom scheduler.
So you see this as just the same, from a privacy perspective, as the way that the Go tool already dials out to the Go proxy by default? That is, if you're OK with that (I'd assume not, but it is at least existing functionality), you'd see the telemetry proposal as similar?
In other words, I think you object to the Go tool already, so this is really no different?
> The "data put up for sale" is to be made available publicly.
How can users verify this?
> IP logging can already be done (the Go proxy is enabled by default).
Sure, but more data will be attached to it. Also, in his proposal, he said that IP addresses will not be logged. I seriously doubt that.
> What's your actual problem with this, beyond a knee-jerk reaction to the idea?
Putting telemetry in a programming language. Working with a programming language is the number one thing I do on a computer. This means that, except for the fact that I don't work in Go, most of my private conversation with a machine could be backdoored.
> Sure, but more data will be attached to it. Also, in his proposal, he said that IP addresses will not be logged. I seriously doubt that.
I think it's worth quoting what Russ said in the article, which sounds very reasonable to me:
> The server would necessarily observe the source IP address in the TCP session uploading the report, but the server would not record that address with the data, a fact that can be confirmed by inspecting the reporting server source code (the server would be open source like the rest of Go) or by reference to a stated privacy policy like the one for the Go module mirror, depending on whether you lean more toward trusting software engineers or lawyers. A company could also run their own HTTP proxy to shield individual system’s IP addresses and arrange for employee systems to set GOTELEMETRY to the address of that proxy. It may also make sense to allow Go module proxies to proxy uploads, so that the existing GOPROXY setting also works for redirecting the upload and shielding the system’s IP address.
> This means that, except for the fact that I don't work in Go, most of my private conversation with a machine could be backdoored.
I don't get this. Given the design that the article is describing, how could most of your private conversation with a machine be backdoored? Specifically given that the Go tool is open source and used by millions already. Are you worried about sneaky code hidden inside that source code? If so, you should be worried already, because there's no reason that they couldn't already be doing that if they were so inclined.
> The server would necessarily observe the source IP address in the TCP session uploading the report, but the server would not record that address with the data
Users can't confirm this. In fact, this makes the next part a falsehood:
> a fact that can be confirmed by inspecting the reporting server source code (the server would be open source like the rest of Go) or by reference to a stated privacy policy like the one for the Go module mirror, depending on whether you lean more toward trusting software engineers or lawyers.
Sure, the source code of the server might be available, but you can't confirm that the server wasn't built with modified source code.
Second, as we've seen before privacy policies are empty; companies violate them all the time.
IOW, I don't trust software engineers, and I don't trust lawyers, and I would bet my life savings that there will be instances of companies lying in the ways I mentioned above.
> I don't get this. Given the design that the article is describing, how could most of your private conversation with a machine be backdoored?
Counts are enough. He says that counts are the only thing that will be uploaded, but he forgot that timing will also come into play.
(A week's delay is only an offset to subtract, by the way.)
Here's how it works: the tool reports counts, maybe in batches per hour. The server logs the counts and the hour those counts came from.
Yes, there's already another piece of data they captured, even though the tool ostensibly only sent counts.
Then those counts plus their timings can be used to infer things. For an example outside of Go (this is one I saw somewhere else), imagine a person texting more and more as the weekend approaches, until they are texting frantically. Then they suddenly stop in the evening of Saturday.
You only get the report of counts and the hours they happened in. Can you give some plausible explanations?
I can. They were texting someone they were planning on meeting that weekend, and then meet them. Can you give a few guesses as to why they're meeting them?
I'll let you fill in the blank.
Sure, there might be other reasons, but I would bet there are not many. Enumerate them, and you already know more. Find the similarities between all possibilities, and you know even more.
People forget about side channels all the time. In this case, the side channel was timing, but it doesn't matter what the side channel is; data can be extracted from it. And companies will.
> Specifically given that the Go tool is open source and used by millions already. Are you worried about sneaky code hidden inside that source code?
Yes. Just because there are eyeballs on that code doesn't mean they won't put sneaky stuff in. For example, the counts could be packed in a different order to tell the server more information. Or the tool could time its uploads. Or it could batch some counts and not batch others.
I'm not smart enough to catch all of the tricks they might pull. Are you?
> If so, you should be worried already, because there's no reason that they couldn't already be doing that if they were so inclined.
That ship has already sailed. The Go tool already by default makes network requests to the Go proxy, which potentially allows everything that you're talking about there. What's significantly different about this telemetry proposal?
First, making network requests when downloading packages is necessary for the tool to function and unavoidable. People who care about this will be using a VPN of some kind. It's just how the Internet works. But telemetry is something the tool author is choosing to add, not something that's necessary due to the architecture of our computing infrastructure.
Second, the Go telemetry would apparently create a unique, persistent user ID. Normal Internet use doesn't, there's just the IP address which is different from location to location, shared by a bunch of people behind the NAT, and can be masked using common tools.
And yeah, I know this is "anonymised"... but if you have one user ID which uses Go sometimes with an IP address from a particular apartment complex land sometimes from a particular office space, finding out which individual that user ID belongs to is trivial.
> First, making network requests when downloading packages is necessary for the tool to function and unavoidable.
It's technically not unavoidable. The Go authors could have made use of the proxy opt-in rather than opt-out, making the tool less usable as a result. A similar argument applies here, I think.
> Second, the Go telemetry would apparently create a unique, persistent user ID
Where did you see this? I scanned through the "Telemetry Design" article reasonably carefully and couldn't find any mention of this concept, and the type definition for the posted JSON (the `Report` type) doesn't seem to include any such user ID.
In the end, ISTM that you're not complaining about something that actually affects your privacy in any way, but just the _idea_ of telemetry. Is that really something worth taking such a hardline stance on?
I'm not sure that hiding the implementation is worth it here. Why not just make it public that it's actually map[T]struct{} underneath - then anyone can range on it and implement allocation optimisations, etc? I added some other methods too for the crack https://go2goplay.golang.org/p/EI1hYaSohnc
I think it's interesting to observe that using the Result type is really not that much different from using a multiple return value - it's actually worse in some ways because you end up using "Unwrap" which can panic.
I like your solution, although it's a tradeoff when the value type is some kind of nested struct whose zero value could be expensive to create.
Agreed that multiple return is actually quite nice in many situations, although `Unwrap()` is generally only called after checking `OK()` and the real benefit is in using `Map()`.
a, err := DoCalculation()
if err != nil {
return err;
}
b, err := Transform(a)
if err != nil {
return err;
}
I happened upon this comment a bit late, and caveat I'm not really a software engineer, but this comment made me think of something...
I've written a decent amount of code professionally in the first and second styles.
I sincerely prefer the second style, for reasons that are hard to articulate, maybe the aesthetic, the space, or something that I assume is equally as questionable.
After I stopped writing a lot of code, when I got pulled back in to debug or provide context long after the fact, in both cases, it was way easier in "real" code bases, to catch myself back up on things in the case that the code was of style 1, than when it was of style 2!
I may be alone in this sentiment, and I even regret that I have it.
I think there is also a bit of satisfaction in writing the code in the second example, especially when the example is a lot more complex than the one provided here.
Maybe it comes down to how much re-reading/maintenance your codebase actually needs. I wonder if coding convention/style has been mapped to the types of problems that code creates and the subsequent repair of those problems being dependent on the code style... I'm sure if I google it I'll find something :)
This was a fun puzzle, but I came to a roadblock here too.
I have to confess I'm confused by what I believe might be the "local hypothesis block" mentioned above. The confusion is somewhat greater because the blocks don't seem to have any attached name or description, so it's not clear what they're meant to be doing.
Up until task 5 in session 2, all the solutions are pretty trivial. But I can't work out what sort of wiring that block implies (for the record, the task is "given (A->B, B->C), prove (A->C)").
It's really not clear to me what the "notch" at the top of that block implies, or how it might be used.
I think there should be at least one "hand-holding" solution for the first use of each block type. Even the papers linked to don't describe the intended semantics of each block.
OK, replying to myself for future reference. I found the answer here: https://youtu.be/0GeJdTTzaDo (incidentally, that series of videos seems to have solutions for most of these problems).
The answer is that the left and right side of the notch holds local equivalents to the global "given X, prove Y" connectors. You can use the left side of the notch as an input, and you can connect that up through the global blocks to prove the hypothesis.
That was non-obvious! Also, the technique of connecting the output first and making connections to nothing to see what the implied proposition is was very useful - I should have thought of that.
One is for a proof by cases; it accepts a disjunction ("or") as input, gives you a local branch for each disjoined option, and hopes you'll show that each branch individually proves P. Then the block globally proves P. This one was intuitive to me.
One is for instantiating variables from existential quantifiers. If you send ∃x.P(x) into that block's global input, you will get P(c), where c is a constant, coming out of that block's local output. You're supposed to prove an expression that does not involve c, send that to local input, and get it back out of global output.
As far as I can tell, this is done for reasons of convenient implementation. Mathematically, the notch shouldn't be there at all -- you should just be able to hook up ∃x.P(x) to global input, and get P(c) on global output. But I think the tool author wants to think of c as being a variable which is scoped inside the ∃-instantiation block and can't exist outside it.
You can specify that a given dependency be replaced by another one. That only applies at the top level though, not when the go.mod file with the replace clause is used by another module.
Don't keep type safety then. Think of Go as half-way between Python and Haskell in that respect. Types are great when they're useful, but they're not required.
If you’re serious, that’s... the worst solution I’ve heard yet.
That’s the same mistake C did with void, Java did with Object and corrected with generics in 1.5, just called interface{} this time.
And while with python and Java, even if I circumvent the type system, I can still use annotations (see typed python, or JetBrains @Contract, Google’s @IntRange, etc), with Go I have nothing of that sort.
Types are useful to provide safety, if your type system has to be turnt off, then I am losing all that safety, and might as well code in PHP (although they also* saw that mistake, and are fixing it now, with 7.0 and later).
> Mocking out the time functions means you don't get any race conditions.
This is a common misapprehension. Actually, even if you fully mock out time, you can still get race conditions, because goroutines can remain active regardless of the state of the clock, and there's no general way to wait until all goroutines are quiescent waiting on the clock. This is not just a theoretical concern - this kind of problem is not uncommon in practice.
I think clock-mocking can be very useful for testing hard-to-reach places in leaf packages. But at a higher level, I think it can end up producing extremely fragile tests that depend intimately on implementation details of packages that the tests should not be concerned about at all. In these cases, I've come to prefer configuring short time intervals and polling for desired state as being the lesser of two evils.
OK, correction acknowledge (no sarcasm), mocking out time functions means you can write test code that doesn't have any race conditions.
"and there's no general way to wait until all goroutines are quiescent waiting on the clock."
Hence my semi-frequent usage of "sync" channels which I described in the previous post.
"But at a higher level, I think it can end up producing extremely fragile tests that depend intimately on implementation details of packages that the tests should not be concerned about at all."
I'd rather have a test that correctly reasonably verifies that a package is correct (or at least "passes the race detector consistently") and reaches into some of the private details than fail to test a package. Too many bugs I've found that way.
It may also help to understand my opinion when I point out that I tend to break my packages down significantly more granularly than a lot of the rest of the Go community, which in my opinion is a little too comfortable having the "main app" directory contain many dozens of .go files. My packages end up way smaller, which also mitigates against the issues of excessively-coupled tests. I have a (not publically published) web framework, for instance, that is broadly speaking less featureful than some of the Big Names that are all in one directory (though it has some unique ones all its own), but is already broken up into 16 modules.
> I'd rather have a test that correctly reasonably verifies that a package is correct (or at least "passes the race detector consistently") and reaches into some of the private details than fail to test a package. Too many bugs I've found that way.
I agree with this, with the caveat that if you can test a package with regard to its public API only, it is desirable to do so because it gives much greater peace of mind when doing significant refactoring.
The difficulty comes in larger software where the package you're testing uses other packages as part of its implementation which also have their own time-based logic. Do we export all those synchronisation points so that importers can use them to help their tests too? If we do, then suddenly our API surface is significantly larger and more fragile - what would have been an internal fix can become a breaking change for many importers.
I wonder how the authors plan to accomplish a similar thing while remaining fully FFI-compatible.