Hacker Newsnew | past | comments | ask | show | jobs | submit | nspaced's commentslogin

Hi, I'm one of the authors and yes it was :)


Did you roll your own die or just trust that Randall's original die roll was sufficiently random?


It’s a well recognized standard


We thank you for your service


The joint entropy of two random variables is never less than the individual entropies.


While that's true, just blindly mixing all you've got is a bad idea. Let's say you have "randomness" sources A, B, C and D, and you just use A+B+C+D. Now an attacker has total control over D. Then she could adaptively feed you with a D value that is canceling out A, B and C. Result is deterministic output.

Just adding in more is not always good. You need to trust every source. Or more precisely: You need at least one source in the mix that an attacker can't predict/read out.


> While that's true, just blindly mixing all you've got is a bad idea. Let's say you have "randomness" sources A, B, C and D, and you just use A+B+C+D. Now an attacker has total control over D. Then she could adaptively feed you with a D value that is canceling out A, B and C. Result is deterministic output.

This is only true if the attacker knows A, B, and C, in which case you have already lost.

> Just adding in more is not always good. You need to trust every source. Or more precisely: You need at least one source in the mix that an attacker can't predict/read out.

You sort of contradict yourself here. The fact that you need a source that the attacker cannot predict is a reason why it is good to just add more. The more sources you have, the more likely there will be one that the attacker can't predict.


Perhaps a better explanation of what randomnumber42 is getting at comes from Dan Bernstein: https://blog.cr.yp.to/20140205-entropy.html


It's an interesting attack, but as I mentioned, it requires an attacker to know every input to the hash function, which in many ways is strong argument for including as many sources of entropy as possible. It is certainly a much stronger argument than the other argument he refutes farther down.

There is also another more pressing issue, which is that if an attacker can see all of your RNG inputs, than they can predict your RNG output anyway, and they can predict your keys anyway. This renders Dan Bernstein's attack against DSA/ECDSA mute.


I think his point is that continuously adding entropy might make it easier for an active attacker to exfiltrate data undetected.

  Of course, the malicious device will also be able to see
  other sensitive information, not just x and y. But this
  doesn't mean that it's cheap for the attacker to exfiltrate
  this information! The attacker needs to find a communication
  channel out of the spying device. Randomness generation
  influenced by the device is a particularly attractive choice
  of channel, as I'll explain below
He then goes on to explore a possible scenario in detail.

The point isn't that continuously drawing entropy is necessarily worse. The point is to illustrate that it's not as risk-free as we may think. Furthermore, it doesn't provide much, if any value. In other words, why bother with the added complexity?

And there is added complexity. Linux[1] and OpenBSD use ad hoc mixing schemes for injecting entropy into their PRNGs, with no proof of security. There are schemes with proofs of security, but they're more complex. More over, the ones with the strongest proofs are the most complex of all and often relatively slow, which is why Linux and OpenBSD have been resistant to simply using, e.g., Fortuna.

Similarly, if you give up on constantly trying to inject data, then multiple CPU implementations are easier--just keep a simple PRNG state per CPU, without any fancy waterfall schemes.

These days collecting, even without a hardware RNG, 128 bits of entropy should happen in very short order. Estimates are of no matter, you either have it or you don't, and there's no way to prompt a user, "Do you really want to proceed?" If you can't collect 128 bits of entropy by the time /bin/init loads, you're probably screwed. The best example is a VM--if you have no access to a TRNG or to the host's entropy source, you shouldn't expect the situation to appreciably improve 10 seconds or even 10 minutes later. At the very least, you should assume you're screwed and that the environment isn't a place where you should put services and assets that depend, directly or indirectly, on secure entropy generation.

The logic is, why? Why bother? Especially in security, we normally shouldn't be doing things without well-quantified justifications, especially considering that engineers invariably underestimate the cost of complexity, particularly in the security domain. History has shown that not only do these behaviors create a false sense of security (they're often inadequate in ways we never initially understand[2]), but they can make matters much worse.

[1] At least I think it still does.

[2] No coincidence when the reason for adding them is to solve some ill-defined and poorly understood problem. If you can't quantify the solution you should be wary of relying on it. If you can neither quantify the problem nor the value of the solution (as in the case with constantly adding entropy), it's arguably poor judgment to apply the solution at all.


How you weight the entropy sources should be a function of the entropy sources, making it much more difficult to predict.


Weighting functions are useless. If you mix entropy properly, in the absence of an _active_ attacker, the output will be no worse than the best entropy source. In the presence of an attacker they're just as useless, if not pernicious.

Modern PRNG frameworks moved away from weighting years ago[1] because it not only has dubious benefit, but engineers suck at estimating entropy, so the estimates created a false sense of security. For example, the entropy estimates for hard drives were for a long time based on a single research publication, which as hard drives evolved over the years became increasingly irrelevant--assuming it wasn't irrelevant the day after it was published. Kernel engineers would try to add conservative margins, but if the original quantity has no basis in reality those efforts are just hand-waving.

This is pretty much how all entropy estimates go. The research that goes into the estimates can be extremely rigorous and sound. The problem is not only ensuring the estimates are relevant to your mix of hardware, but that they remain relevant over time. New designs and new manufacturing methods come online on an almost daily basis, even for what appear to be identical products. Combined with the dubious benefit at the outset, trying to maintain the integrity of the estimates is little more than a fool's errand.

With the exception of first-boot behavior, where you want to ensure there's "enough"[1] initial entropy, even accurate estimates don't provide any substantive value. Rather, mixing in a continuous stream of new entropy after you've gathered 128 or 256 bits can actually expose you to other issues, like exfiltration attacks that are more difficult to detect than they otherwise would be. (See https://blog.cr.yp.to/20140205-entropy.html)

Because of the intrinsic difficulty of maintaining accurate estimates, some people would argue that the proper way to solve the first-boot issue is to not solve it at all. The only proper solution is an on-board, robust entropy source. Anything else provides a false sense of security (eventually if not initially), disincentivizing demand for and supply of systems with an on-board source.[2] The Via C3, circa 2003, provided the first on-chip source in the commodity x86 space. Unfortunately it took Intel over a decade to add the similar RDRAND, and longer still to make it ubiquitous across their product line. Now that AMD provides RDRAND and most ARM chips have something similar, we're nearly at a point where there's no excuse for entropy collection and estimation contrivances.

IoT devices and other lower-end devices are still problematic, but those devices provide little if any hardware that entropy collection and estimation proponents can play with.

Dealing with the long tail of legacy hardware is also a PITA. The PCEngines ALIX used an AMD SoC platform (Geode LX800) with an RNG source, but the more recent APU (G-series T40 and GX-412) lacks any on-board source.[3] This issue will eventually pass, but in the meantime solutions like the NeuG provide a solution. Which isn't to say that's the only use for NeuG. There's nothing wrong with using NeuG to supplement the vendor's RNG, just keep in mind issues like exfiltration attacks for trying to use too much of a good thing.

[1] Compare Yarrow (1999) to Fortuna (2003).

[2] By on-board I mean either on the chip, on a controller (both AMD and Intel put RNG sources on some select I/O controllers), or on a pre-installed device (e.g. HSM module on high-end Unix boxes).

[3] There might actually be one on the NIC controller. I haven't been able to confirm, and in any event it's not detected by the OS and thus unsupported presuming there is one. Which alludes to another issue--the complexity created by supporting so many different mechanisms. All these device drivers and frameworks add attack surface.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: