More

cokernel_hacker · on Feb 22, 2017

That's really only a fraction of it. The only definitive documentation that I know of is clang's implementation of its mangling scheme: https://github.com/llvm-mirror/clang/blob/master/lib/AST/Mic...

cokernel_hacker · on Feb 16, 2017

I use Google Express quite often for grabbing house hold essentials that I don't have the time to make a dedicated trip. I wonder if my habits will change now that I just need to speak up to place an order.

I also wonder if this will be one of those things which youngins' will take for granted someday...

cokernel_hacker · on Nov 1, 2016

It is the low-level assembler-related machinery in LLVM.

For more details, see http://blog.llvm.org/2010/04/intro-to-llvm-mc-project.html

ZenoArrow · on Nov 2, 2016

Thank you. Guessing the MC stands for 'Machine Code' then.

cokernel_hacker · on July 18, 2016

Clang/C2 doesn't use LLVM, it will use Microsoft's C2 backend.

pbsd · on July 18, 2016

That may be the case, but whatever the Clang frontend sends to C2 does produce better code than the Microsoft frontend. It probably already folds the stores at the frontend level, instead of leaving it to the backend to figure out.

cokernel_hacker · on June 29, 2016

Wow, the other team members aren't slouches either, Arch D. Robison and Steven Skiena are pretty neat!

Steve's book, The Algorithm Design Manual, was pretty inspirational to me in high school. Arch did a compiler, KAI C++, and designed Intel's TBB library (Thread Building Blocks).

cokernel_hacker · on May 16, 2016

I recently reimplemented machinery in our compiler which implements C++ exceptions. I figure it might be useful to share a few interesting issues.

The way C++ exceptions are used creates very interesting interactions in the design of their implementation.

There a hidden cost many forget: call frame information restricts what the compiler can do. Why is this the case? Even if exceptions are super rare, the compiler still needs to make sure the program will obey the language rules if an exception is somehow thrown at an appropriate program point.

The compiler must emit instructions which can be represented by the call frame information in order for the unwind to be able to reason about the stack.

This is usually not a problem on Linux because the DWARF CFI is very expressive. That expressiveness comes at a cost: it is quite inefficient when it comes to size.

Other platforms, Windows NT (ARM, x64 and IPF) and iOS, recognized that this increase in size is a bad state of affairs and thus aimed to make the CFI more compact. By doing this, they greatly reduced the size of CFI but unfortunately created restrictions on what a compiler can do.

As for trickiness inherent in C++ exceptions, C++ supports a fairly esoteric feature: exceptions may be rethrown without being in a try or catch:

  void rethrow() {
    throw;
  }

An easy way to make this sort of thing work would be to thread a secret in/out parameter which represents the exception state.

But how is this typically implemented?

Well, remember, the ethos of exceptions in C++ is that they are rare. Rare enough that implementors are discouraged from optimizing the speed of C++ exceptions.

Instead, thread local storage is typically used to go from any particular thread back to it's context.

Things get pretty darn complicated pretty quickly with features like dynamic exception specifications:

  void callee() throw(double) {
    throw 0;
  }
  void caller() {
    try {
      callee();
    } catch (...) {
      puts("got here!");
    }
  }

On first examination, "got here!" should be unreachable because the call to "callee" results in a violation of the exception specification.

However, this is not necessarily the case! What _actually_ happens is that some code runs between the throw and the catch: std::unexpected is called.

Now, std::unexpected might throw an exception of it's own! If this new exception matches the exception specification, the exception would pass into the catch block in "caller". If it doesn't, the exception thrown within std::unexpected might result in another violation!

Wow, this is get complicated... OK, so what happens if it results in another violation? Well, the exception gets cleaned up and replaced with, you guessed it, another exception! We'd be left with an exception of type std::bad_exception leaving std::unexpected and thus "callee". Because the catch clause in "caller" is compatible with std::bad_exception, control is transferred to the catch block.

This is the tip of the iceberg. A huge amount of machinery is sitting around, waiting to engage to make exceptions work.

cokernel_hacker · on May 16, 2016

Professional compiler engineer here, C is a mediocre intermediate language.

Let's start with an excellent quote from Wittgenstein. "The limits of my language mean the limits of my world."

Using C as your intermediate language means that your expressiveness is limited to valid C programs. This is workable but only if your language can be mapped to C in _useful_ ways.

For example, let's say your language has behavior similar to scheme's tail-call. How would you get this behavior from a C compiler? You will never be able to make this reliably across optimization levels, etc.

Guaranteed tail-calls are the tip of the iceberg, there are a lot more features which cannot be reasonably mapped onto C.

Real compiler IRs increase your expressivity beyond what the C language designers decided was important.

pcwalton · on May 16, 2016

Exactly, and this is what most people who work on compilers discovered long ago. It's disappointing to see "just compile to C!" still so popular on HN.

I would cite GC as the most prominent example of a feature that is incompatible with compiling to C. It is impossible to write a competitive high performance tracing GC on top of C. Mandatory register spills at safe points and conservative scanning have huge downsides.

ndesaulniers · on May 16, 2016

> It's disappointing to see "just compile to C!" still so popular on HN.

Posts on C get blindly upvoted on HN and proggit. I guarantee it. Me thinks it's like the history channel for programmers; learning about their forerunners and how they had to bang rocks together to make fire.

It's also how I've gotten all of my karma. Unfortunately, the abyss has also looked into me, as I now write C code for a living...

groovy2shoes · on May 17, 2016

Using C as an intermediary language can still be useful at times. For example, when writing a bootstrap compiler, you can target C and magically support virtually every platform. It's not so great for production compilers for numerous reasons, including performance (poor unless your languages maps cleanly to C) and debugging experience (debugging object code is not fun, and if your object is C, you get stuck with whatever debugging information your C compiler outputs; you don't have any say in the matter). For bootstrapping compilers, I only see three sane options, really:

1. Write an interpreter to host the compiler (probably in C).

2. Write a cross-compiler.

3. Write an X-to-C compiler.

I insist on C here not because of performance, but because it's ubiquitous.

loup-vaillant · on May 16, 2016

> For example, let's say your language has behavior similar to scheme's tail-call. How would you get this behavior from a C compiler?

Well, there's still the unreasonable way: compile in continuation passing style, and trampoline every call. I shall not be held responsible for any performance problem this advice may cause however…

cokernel_hacker · on May 10, 2016

"SSA-based Compiler Design" [1] is awesome, heaps of interesting algorithms.

[1] http://ssabook.gforge.inria.fr/latest/book.pdf

cokernel_hacker · on May 8, 2016

Speaking as a professional compiler engineer, this is the biggest reason why signed overflow is treated as UB.

It is incredibly useful to talk about a loop's trip count or the values an induction variable can have.

It turns out that taking advantage of signed overflow lets us do that in more places, a boon to all the programs which don't dynamically execute signed overflow.

The tradeoff, however, is that the programing model gets way more complicated. I have talked to the engineer who introduced this optimizing power to our compiler and he sorta regrets it.

IMO, we need better, higher level ways of talking about a loop's iteration space. Today, in C or C++, we are forced to represent the iteration space of a loop using the language's algebra. This is severely limiting when each operation on a signed type might introduce UB. With a higher level mechanism, we can separate the behavior that we want for loops from the behavior we want for everything else in a nice, clean way.

pcwalton · on May 8, 2016

> With a higher level mechanism, we can separate the behavior that we want for loops from the behavior we want for everything else in a nice, clean way.

Yes! One decisive advantage that high-level iteration constructs have over for loops is that they're easier to optimize.

One obvious example here is bounds checks: there's no need to have infrastructure to optimize out bounds checks over arrays during loops if you have a construct that eliminates the indexing entirely.

btrask · on May 8, 2016

Genuine question: what do you do when you want to implement your own loop mechanism, iterator, array type, or whatever?

cokernel_hacker · on May 8, 2016

It is the job of the language to ensure that language primitives are reusable and composable.

I think this would apply to whatever mechanism is designed to describe a loop's iteration space.

Sean1708 · on May 9, 2016

Most languages use some sort of interface whereby if you implement certain methods for a type then that type can be iterated over using a for loop.

cokernel_hacker · on May 6, 2016

Spinning in locks are always tricky business.

The sharpest thorn, to me, is: what happens when you are running inside the critical section and your thread gets preempted?

Well, now the next thread which tries to acquire the lock is stuck waiting. But waiting for what? Waiting for the original thread to get scheduled.

Now, the OS has no idea that the original thread should get scheduled again and is free to continue scheduling more and more work items.

Fortunately for the lock in this article, and most other locks which spin, it is adaptive and will not spin for too long. But how long is long enough? If the spin is timed for a few loads and stores, then all is probably well as spins will not be attempted for very long.

I wonder how these locks figure out how long they should spin? In the nasty case I previously mentioned, you'd want to spin for a very short while to avoid large amounts of waste. But spins which are too short lead to higher lock/unlock latency if the lock was held for any appreciable amount of time.

This leads me to the following conclusion: spinning inevitable leads to _some_ number of wasted CPU cycles and therefore increased latency.

I'm curious as to how the amount of spinning was chosen.

pizlonator · on May 6, 2016

The post details why we spin for the amount of time that we do. It turns out that there is a wealth of measurements that support our 40-spins-while-yielding rule. It worked for a strangely diverse set of workloads:

- IBM Research found it to be optimal on the portBOB benchmark on a 12-way AIX POWER box in 1999. - I found it to be optimal for DaCapo multi-threaded benchmarks on both AMD and Intel SMPs in ~2007. - I again found it to be optimal for WebKit these days.

The point of spinning for a short time and then parking is that parking is hella slow. That's why you can get away with spinning. Spinning is net profitable so long as the amount you spin for is much smaller than the amount of time the OS will waste while parking.

kazinator · on May 7, 2016

I attempted to address these concerns using an adaptive algorithm in glibc some 15 years ago:

http://stackoverflow.com/questions/19863734/what-is-pthread-...

The idea is to avoid spinning using hard-coded constants, but put a modicum of on-the-fly empirical science into it: measure the duration of the critical regions with a smoothed estimator, which is maintained in the mutex. If the actual wait exceeds this smoothed average by too much (say double) then conclude that the owner must be pre-empted and break the spin with rescheduling.

The smoothing is done very easily using a few bit operations that implement an IIR filter, similarly to the TCP RTO estimator.

pron · on May 6, 2016

See the Doug Lea talk I linked to in another comment. Turns out that even the process of spinning is itself not so simple on some new processors.