More

obstinate · on June 27, 2017

But . . . substantial effort is already being put in globally. Not enough by far, but much more than zero. This would seem to put the lie to your claim that nothing will be done until there is no choice, by contradiction.

obstinate · on June 26, 2017

She's also a socialist, for what it's worth, and involved in lobbying for quality affordable housing. That doesn't make her immune from attacks of classism, but it might affect how one judges what she says.

(Also worth noting that McMansions are typically homes to the upper middle class and the rich, not the poor.)

obstinate · on June 26, 2017

Yours seems like a flippant response, but I don't think that it is. We have to be wary of confirmation bias here on HN. Would someone even think to publish a study where no effect was found? And then would someone post that link to HN? And then would it get upvoted? It's important to remember that we are not seeing a uniform sample of news stories here. We are seeing a highly biased selection, and it's important to appropriately discount things appropriately.

obstinate · on June 26, 2017

There's also FarmHash, whose 32-bit version is 2x as fast as xxHash on little endian machines (at least according to this benchmark suite https://github.com/rurban/smhasher).

obstinate · on June 26, 2017

In fact, what performance oriented devs need to be taught about these days is how to not use hash tables, or, more generally, dictionaries. They are a super-easy data structure to reach for, but often it is possible to arrange your data such that their use is unnecessary.

One example that was brought up elsewhere in the thread is the way python looks up variables in successive scope dictionaries. This is obviously terrible for performance, and that's a big part of why Python is slow.

But how are other languages fast? Doesn't every language have to resolve variable names to memory locations? Well, yes, but fast languages do this mostly at compile time. How? Well, I'm not an expert in this, but at a high level, each variable is assigned a location in memory or registers, and then future references to that variable are rewritten to refer to the memory location by register name or memory address. This takes the whole "looking up a name" issue out of the path of work that has to get done at runtime. And you've switched from looking up things in tables to operating directly on registers and memory locations.

BTW, this has nothing to do with high-versus-low level. It's more about how mutable the process of name resolution is after program compilation. One could theoretically write an assembly language where memory locations are names, not numbers. If reflective features like runtime scope editing are available, this would be a very low-level language that still requires some kind of dictionary lookup at runtime.

flukus · on June 26, 2017

> In fact, what performance oriented devs need to be taught about these days is how to not use hash tables, or, more generally, dictionaries. They are a super-easy data structure to reach for, but often it is possible to arrange your data such that their use is unnecessary.

A lot of devs are completely unaware of what's happening at the layer of abstraction below them and this is one of the ways that comes out. The number of elements it takes before hash tables are faster than iterating through an array can be surprisingly large and yet it's one of the first "optimizations" that get made.

Some other related problems are not knowing how expensive network calls are, not knowing what the ORM is doing, caching in memory even though the database already is and more. They just don't think about performance issues in code they don't write.

hueving · on June 26, 2017

99.999% of projects are not going to have any meaningful hot path in variable resolution.

If your program is sensitive to that, even a simple GC pause is going to destroy your performance and you need to be out of managed memory languages at that point.

There are a lot of reasons python can be slow, but this is far from one of them.

flukus · on June 26, 2017

> 99.999% of projects are not going to have any meaningful hot path in variable resolution.

It won't show up in the hot path because the performance cost so pervasive that profiling tools will ignore it, it's everywhere and you can't escape it without compiler optimizations. This cost will be in the hot and cold paths. This and data locality are the two biggest performance issues in dynamic languages and a lot of effort goes into compiling it out.

Here is a good article on how V8 tries to deal with it: https://www.html5rocks.com/en/tutorials/speed/v8/

For statically compiled languages it can show up but often you'll have to write a dictionary free version to see it. Some profiling I've done with c# at various times over the years shows that it's slower than a list until you have more than 50 to 100 items. The caveat is that I normally keep the dictionary version because it's more semantically correct.

adrianN · on June 26, 2017

Eh, one of the standard performance tips for hot loops is to store everything relevant, including functions, in local variables because the interpreter looks in the local dictionary first. The last time I optimized Python code that made a significant difference.

obstinate · on June 26, 2017

To be clear, I don't think this is the only thing that makes python slow. It's probably one of the top five or ten things preventing it from being fast, though.

adrian17 · on June 26, 2017

> Well, I'm not an expert in this, but at a high level, each variable is assigned a location in memory or registers, and then future references to that variable are rewritten to refer to the memory location by register name or memory address. This takes the whole "looking up a name" issue out of the path of work that has to get done at runtime.

Python does this for local variable names in functions. Because of this, moving a tight loop from global scope to a function can make it a couple percent faster.

haburka · on June 26, 2017

Are you serious that dictionaries are a problem for performance oriented developers?

Performance oriented devs should be concerned with bottlenecks, not incredibly minute details. There's almost no situation I can think of where smallish dictionaries are much better or worse than any other data structure when it comes to performance.

Of course, if you're writing a compiler then it can be a serious difference. Most developers don't write compilers though.

adrianN · on June 26, 2017

Yes, if you write performance sensitive code you have to be very careful with dictionaries. See for example this nice talk by Chandler Carruth https://www.youtube.com/watch?v=fHNmRkzxHWs

obstinate · on June 26, 2017

Well of course. :) If you don't write perf sensitive code, don't worry about perf. But if you do, in many cases avoiding hash tables can become important.

obstinate · on June 26, 2017

I think you might be a little confused. Even in hash tables with chaining, one does not tend to spend much time traversing linked lists, because the typical bucket will have only one member. This depends on the load factor of the table, but most practical implementations will eventually rebucket and rehash if their load factor grows too high. "Getting the key loaded" -- i.e. finding the memory location that contains the key -- is O(1) on average in all practical hash tables. It does not typically require any traversal at all.

You keep on talking about ordered tables like red black trees, as in this comment, which is another sign that makes me wonder if you might be confused.

arethuza · on June 26, 2017

The early versions of String.hashCode in java only looked at the first 16 characters of the string - when used for things like URLs this led to rather poor performance!

https://en.wikipedia.org/wiki/Java_hashCode()#The_java.lang....

misja111 · on June 26, 2017

Even if the typical bucket has 100 members, as long as this number is constant and does not go up with the size of the hash table, the performance is still O(1). And the same applies for cache misses. All these things don't really matter except if you are using hash tables with not very many elements in them.

VHRanger · on June 26, 2017

> Even if the typical bucket has 100 members, as long as this number is constant and does not go up with the size of the hash table, the performance is still O(1).

If you mean in a chained hash table, where you chase up to 100 pointers to get the value, the performance is atrocious.

Friendly reminder: traversal of a linked list and a contiguous array are both O(n). In the real world one is two orders of magnitude faster than the other.

> All these things don't really matter except if you are using hash tables with not very many elements in them.

The lookup of a value given a key is probably the least affected operation. If all you care about is a "weak dictionary" (you only really need to store values and lookup from keys), all of this is mostly jabber. If you store the keys to iterate through, or access for some reason, all those things start to matter a whole lot.

obstinate · on June 26, 2017

> sorted vector of keys . . . reserve necessary memory and sort it . . . unsorted coindexed arrays . . .

Most of the things you mentioned are not hash tables, but members of a parent concept, dictionaries. Hash tables all by definition involve some sort of hashing of the key. The two main categories of hash table are chained hash tables (std::unordered_map does this, at least in the implementations I'm aware of) and open addressed hash tables, which use probing instead of secondary data structures to resolve conflicts.

VHRanger · on June 26, 2017

You can implement the sorted vector of keys with a hash function instead of a coindexed vector of values. It still keeps most of the properties we like, especially if doing the coindex-sort operation is too expensive for some reason.

obstinate · on June 26, 2017

> You can implement the sorted vector of keys with a hash function instead of a coindexed vector of values

So you're saying you're going to hash the keys, then sort them according to the hash, with tie breaking on the key itself? I'm not aware of any sorted table that does this, but I'm sure some exist. I suppose you'd get something of a win if N was large, and the keys had long common prefixes, and you didn't care about the ordering property.

But in that case you'd probably use an actual hash table, not the algorithm you just described. Unless there's something I'm missing.

VHRanger · on June 26, 2017

Sorry I misspoke. Lookup is always O(1) in a hash table. But this could be a "weak dict" that is you don't actually store the keys, as long as you can reference a key through a hash you can lookup.

In the proper "data structure", we usually store the key and the value -- iteration through keys and/or values and/or pairs are probably supported operations.

Finding a key in the structure (either with binary tree search, or binary search on a sorted array, or linear lookup on an array) varies. So does iteration and most other operations.

obstinate · on June 25, 2017

You know that thing about anecdotes. There are a lot of explanations for your situation getting better concurrently with chiro, but not because of chiro. For example, regression to the mean, placebo effect, etc.

That said, I had been under the impression that chiro is about as effective for lower back pain as evidence-based physical therapy. But that doesn't mean it is effective. It could equally well mean that we don't know of many good treatments for lower back pain, and thus that physical therapy is only good as a placebo (chiropractic).

pfarnsworth · on June 25, 2017

When you hurt your back then come back and respond. There's nothing worse than an armchair expert googling Wikipedia for their "knowledge".

yomly · on June 25, 2017

Lots of science-only hecklers here. Seems to me to be a bit arrogant to think science holds all the answers and that holistic treatments are devoid of value?

Should we ideally strive for scientific rigour in everything we do? Yes. Does quantum mechanics prevent us from predicing anything to arbitrary precision? Yes. Are we currently unable to model any three body interaction analytically? Yes.

Right at the atomic level, our science is built on top of approximations and concessions to rigour. The second you get to anything remotely murky like biological systems, it all gets a bit hand wavey anyway.

This is not to discount the scientific method or years of evidence based research, but if chiropractors and Chinese medicine, and even homeopathy can have an effect on people then who are we to tell them they're wrong?

Even if it is just a placebo, until we can in precise terms explain the pathways of the placebo effect, then various healing methods in their various cultural contexts still have their place in the world imho, provided they are not known to be causing harm.

obstinate · on June 25, 2017

Even if I did, it would still be an anecdote. It would prove nothing either way. For that we have to look to data.

pfarnsworth · on June 25, 2017

ya I take this back

obstinate · on June 25, 2017

> You only need anecdotes to prove that it's not bullshit.

Oh dear. No. That's not true at all. Suppose I have back pain. Then I eat some dirt. Then I feel better. Then I start claiming that dirt ingestion cures back pain. Someone would be well within their rights to call that bullshit, despite my anecdote. Even if I could find a dozen people who had the same experience, it would still be bullshit.

_yid9 · on June 25, 2017

Claim was made. Falsifying evidence was presented. QED.

_ofdw · on June 25, 2017

Studies show that back pain resolves on its own in 95% of cases.

obstinate · on June 25, 2017

Depends on the team, right? It was a major quality of life bump for me when I recently transferred to a team with whose members I enjoy spending time.

obstinate · on June 25, 2017

In the same sense as Tetris is -- that is, unchallenging at slow pace, extremely difficult to do quickly.