Hacker Newsnew | past | comments | ask | show | jobs | submit | 0x09's commentslogin

It's partly an artifact of BCPL where the only type was one representing the machine word. So with a word-sized type you don't get portability in the range of values the type can represent, but can portably know that you won't e.g. take up 2 registers by using it.

You might consider the int_fastN_t types a sort of spiritual successor to this with fewer downsides since they purposefully guarantee a minimum width.


>Can be macro'd at the function definition, but it's ugly.

I wonder if typeof in c23 has changed this at all. Previously there was no sense in defining an anonymous struct as a function's return type. You could do it, but those structs would not be compatible with anything. With typeof maybe that's no longer the case.

e.g. with clang 16 and gcc 13 at least this compiles with no warning and g() returns 3. But I'm not sure if this is intended by the standard or just happens to work.

    struct { int a; int b; } f() {
        return (typeof(f())){1,2};
    }
    int g() {
        typeof(f()) x = f();
        return x.a + x.b;
    }
edit: though I suppose this just pushes the problem onto callers, since every function that does this now has a distinct return type that can only be referenced using typeof(yourfn).


I'm the author of the extension that the vtab and define function in that module were adapted from. It allows you to create something like a parameterized view, but the way it works is fairly simple: a prepared statement is created from the provided SQL on the same db connection as the vtab, and is executed each time the virtual table is queried. Parameters can be supplied as constraints (or using table valued function syntax) and the results of executing the statement are the row values.

Did you have any questions in particular?


Doxygen has the ability to generate these with its CALL_GRAPH/CALLER_GRAPH config, at least from each function individually. It can look quite funny when the depth isn't limited: https://i.imgur.com/3LMV71N.png


Something of an informal reverse engineering of SimCity 3000 existed in sc3000.com's Knowledge Neighborhood. The original site is only partially available via the wayback machine now, though fortunately the articles have been preserved albeit in a slightly less convenient format here https://community.simtropolis.com/profile/157989-catty-cb/co...

The site contained a pretty amazingly comprehensive detailing of the game's mechanics and various algorithms scattered throughout the articles, to the point where it seemed to me like it'd be possible to implement a lot of the game's engine using it.

Some good examples of the more detailed articles:

The economy https://community.simtropolis.com/omnibus/other-games/the-ec...

Land value specifics https://community.simtropolis.com/omnibus/other-games/land-v...

Traffic and transportation specifics https://community.simtropolis.com/omnibus/other-games/traffi...

Zone development rules https://community.simtropolis.com/omnibus/other-games/zone-d...


The article doesn't touch on this, but in C pointer types are also the only kind that can be invalidated without any apparent change to their value:

  void *x = malloc(...);
  free(x);
  if(x); // undefined behavior
Note that this isn't about dereferencing x after free, which is understandably not valid. Rather the standard specifies that any use of the pointer's value itself is undefined after being used as an argument to free, even though syntactically free could not have altered that.

This special behavior is also specifically applied to FILE* pointers after fclose() has been called on them.

If there is some historical reason / architecture that could explain this part of the specification I would be interested to hear the rationale, this has been present in mostly the same wording since C89.


My gut instinct is as follows:

  void *x = malloc(...);
  void *y = malloc(...);
  assert(x != y); // standard guarantees this [1]
Yet it's fairly reasonable that:

  void *x = malloc(...);
  free(x);
  void *y = malloc(...); // malloc reused x's allocation here.
So, in effect, guaranteeing that the results of two mallocs can never alias each other, while allowing the implementation to reuse freed memory, requires semantically adjusting the value of a pointer to a unique, unaddressable value.

[1] I think, but I'm not sure which versions of C/C++ added this guarantee


This would be kind of hilarious if true.

It seems like they could have just said: malloc won't give you a pointer that overlaps with the storage of any live malloc'd object. Such a malloc is implementable without too much trouble. But instead, they gave a stronger guarantee--that all malloc'd pointers would be "unique". It would be unboundedly burdensome on the implementation to meet this property, so what do they do? Update the standard to offer the achievable guarantee? No! They add a new rule, ensuring that it's impossible to observe that the stronger guarantee is not met without doing something "illegal". Instead of getting their act together, they have elected to punish whistleblowers.


On second thought, the choice they've made isn't as sadistic as it sounds. I was thinking of the standard as a contract between the language implementor and the programmer, but actually it is a contract between the language implementor, the programmer, and arbitrarily many other programmers. The stance they have chosen mandates a social convention, that the names of the dead will never be spoken. If everyone builds their APIs with this covenant in mind, it makes it possible to use pointers as unique ids (for whatever that's worth). C has never had much in the way of widely-followed social conventions, so practically speaking, the only way to ensure everyone knows they can depend on other programmers behaving this way is for the compiler to flagellate anyone who steps out of line.


I feel like you should not need to carve out extra language in the standard to explain this. It's very clear that following *x is undefined after free(x). It's also clear from any reasonable understanding of malloc that x's numeric value might collide with a later allocation coincidentally. Why should that make "if (x)" undefined?

It's true that the result of "if (x == y)" would depend on coincidences lining up and you should not rely on either one. Calling any evaluation of x "undefined" seems much more extreme than that though.


At first glance, it does some like an unnecessarily gratuitous instance of undefined behavior. However... what could you actually meaningfully do with a pointer to freed memory anyways? You clearly can't dereference it. The only pointers you can compare it to are other pointers to the now-freed object (cross-object pointer comparisons are UB in C) and NULL. But if you were going to compare it to NULL, it's presumably to guard against a dereference, so you'd end up UB anyways if you didn't overwrite it to NULL in the first place, at which point it's not being compared with anything anymore.


I can think of one use.

Let's say you have two pointers that are sometimes unique and sometimes aliases. Maybe they mean semantically different things but they happen to be the same for some cases, and different in others. They always are on the heap. You want to clean them up when your function exits, freeing them both, or once if they are not unique.

    free(p);
    if (p != q)
       free(q);
Believe it or not I have written something like this, although with integer file descriptors being closed rather than heap buffers freed. eg. Maybe some fds are passed for both reading or writing, or sometimes you have a unique fd for each, but it all needs to hit close(2).

To exist within the standard I guess you need to do the comparison first:

    if (p != q)
       free(p);
    free(q);
Edit: ah, but you just said "cross-object pointer comparisons are UB". I can't see a good reason for that either, but I do suppose it might make some architecture's non-linear pointer representation work better.


For some reason, I thought equality and inequality pointer comparisons required pointers to be in the same object. It's actually only the relational operators that are undefined if not in the same object. (Although I believe most compilers will treat them as unspecified instead of undefined).


Malloc can return NULL so there is no guarantee of unique return values.


NULL != NULL, so at least no two values will be equal, right?


I think that's another one of those "in theory, there is no difference between theory and practice; in practice, there is", sort of things.

One must keep in mind that the standard does not exist in a void, despite what the smartass language-lawyers unfortunately think/want. The standard committee simply did not decide to standardise things that they understood could vary between implementations. In fact, the standard has always said that "behaving during translation or program executed in a documented manner characteristic of the environment" is a possible outcome of UB.


In the dark ages, that would have allowed a single-pass compiler to reuse registers more aggressively.


I suspect the explanation is mundane: I'd expect it's just to allow a compiler to actually null out the pointer if it is able and willing to (for debugging or whatever) without affecting the correctness of the program.


There might be an architecture out there, where malloc() creates a new memory segment, free() invalidates it, and merely loading an invalid pointer into a register causes a hardware trap. It may not even be possible to load a pointer into an integer register to test it against NULL.

I'm not aware of any architecture that does this, however, I think this is exactly how 80286 and later behave in protected mode, if malloc() allocated fresh segments.


Can you cite chapter and verse in the standard for this?

It certainly does not make sense for current implementations, and I find it difficult to imagine a pointer representation where it does make sense. Perhaps if reading the address itself involved some indirection.


In C99:

Annex J.2 "The value of a pointer to an object whose lifetime has ended is used (6.2.4)."

6.2.4 "The value of a pointer becomes indeterminate when the object it points to reaches the end of its lifetime."

The only thing I can think of is that it allows the compiler to recycle the storage occupied by the pointer itself for something else after free() is called, but I don't see much value(!) in that either, given that if it could do variable lifetime analysis, it would be able to do it for pointers too.


This and Annex J.2 Undefined Behavior and Annex L.3 Analyzability Requirements both also specifically include: "The value of a pointer that refers to space deallocated by a call to the free or realloc function is used (7.22.3)."


Though scan-build is usually the simpler option, clang itself does have an --analyze flag which writes analysis results in various formats, including the same html reports that scan-build would generate. But to see this on standard out

   clang --analyze --analyzer-output text ...
Will print the entire analysis tree in the same format as regular diagnostics.


The only problem is the CTU mess if you're analyzing more than one file, thus the aforementioned tools' necessity.

Hopefully in a future version the kinks are ironed out and we can just use the flag without any hassle. It's like if we needed to still manually link our files with ld before compiling them instead of clang auto doing it.


Not about the language exactly, so maybe not fair game, but: how did you all find yourselves joining ISO? And maybe more generally, what's the path for someone like a regular old software engineer to come to participate in the standardization process for something as significant and ubiquitous as the C programming language?


Great question!

Joining the committee requires you to be a member of your country's national body group (in the US, that's INCITS) and attend at least some percentage of the official committee meetings, and that's about it. So membership is not difficult, but it can be expensive. Many committee members are sponsored by their employers for this reason, but there's no requirement that you represent a company.

I joined the committees because I have a personal desire to reduce the amount of time it takes developers to find the bugs in their code, and one great way to reduce that is to design features to make harder to write the bugs in the first place, or to turn unbounded undefined behavior into something more manageable. Others join because they have specific features they want to see adopted or want to lend their domain expertise in some area to the committee.


Related to that: C++ standards body seems to be quite open allowing non-members to participate (outside official votes, while respecting them when looking for consensus) is it just due to my limited observation or is the C group less open? Any plans in that regard?


Most of us on the committee would like to see more participation from other experts. The committee's mailing list should be open even to non-members. Attendance by non-members at meetings might require an informal invitation (I imagine a heads up to the convener should do it).


I think that's right. These days, much of the discussion occurs through study subgroups (like the floating-point guys) and the committee e-mailing list.


I would love to see more open interactions between the broader C community and the WG14 committee. One of the plans I am currently working on is an update to the committee's webpage to at least make it more obvious as to how you can get involved. The page isn't ready to go live yet, but will hopefully be in shape soon.


Apple did submit a proposal based on the then-new blocks extension in 2010: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1451.pdf

There was an analysis of this and the C++11 lambda specification done shortly after at http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1483.htm, but it was inconclusive and there doesn't seem to have been any followup since then.


I like using pointer-to-array types for this purpose as (like an array in struct) the array's length is encoded in the type, and thus likewise allows the compiler to warn if an incompatible array is provided. e.g.

   void f(char (*n)[255]);
   char array[6];
   f(&array);
warns of "incompatible pointer types passing 'char (* )[6]' to parameter of type 'char (* )[255]'"

This won't produce a diagnostic for f(NULL) like "static" does, but does have two properties that might be considered benefits:

1) The length is exact rather than a minimum.

2) The type of "* n" is still char[255], whereas a char[static 255] parameter is still a decayed pointer-to-char. Thus with the former sizeof(* n) behaves as expected inside of "f", yielding 255.

These are true of the array-in-struct method as well.


This is a fantastic technique that didn't occur to me for an embedded project that I was developing in C and where passing arrays of known size was a frequent thing. Also it was desirable to save space by avoiding the padding that a "array+size" struct would contain.

Thanks for commenting this.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: