Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
C++ Modules: Packaging Story (conan.io)
133 points by ibobev on Oct 18, 2023 | hide | past | favorite | 101 comments


I think Conan and Modules will make c++ more popular in the future. It is a behemoth of a language. Very hard to master and it is still too easy to produce difficult to debug errors. So it will be very important, that the language changes to address these.

But in working with c++ day to day I feel the most annoying things are indeed dealing with dependencies and headers. It can be a pain to set up a complex c++ project on a new machine. Even with Conan it can still be a pain because configuration management is a mess in c++.

As for headers I can’t wait to see them rot in hell. Some people say that they can help you reason about your Program, but this is a very small positive point. Too many times I had problems with includes which only worked because a Translation Unit included some other header beforehand. I once read a blogpost of the developer of SumatraPDF where they describe that they never include anything directly[1]. This can be an improvement for compile time. But if someone else is working on your code or has to refactor it, it can be impossible to read yourself into it or track any compile time error. If one would only use modules these errors would simply not exist.

[1]https://blog.kowalczyk.info/article/96a4706ec8e44bc4b0bafda2...


It's crazy to see how much work header files make the C++ compiler do just to build hello world:

    $ gcc -E hello.c | cloc --force-lang=c -
    419
    $ g++ -E -std=c++11 hello_iostream.cpp | cloc --force-lang=c++ -
    20707
    $ g++ -E -std=c++20 hello_iostream.cpp | cloc --force-lang=c++ -
    30876
    $ g++ -E -std=c++20 hello_std_format.cpp | cloc --force-lang=c++ -
    42757
Just bumping the C++ std version can add thousands of lines of code.


> Just bumping the C++ std version can add thousands of lines of code.

And yet none of this heavily influences compile times. At least I have not found so on MM LoC projects I've been working so far. Not sure what the fuss is around modules, I honestly don't think it will solve C++ build times. Happy to be proved otherwise.


Try compiling a small source file that includes a handful of STL headers, and compare to one that doesn’t. Even just including without instantiating any of the templates, the difference is a substantial fraction of a second. That can be a heavy influence, though maybe not in a huge codebase which also has plenty of large headers of its own.


I tried couple of times to workaround the problem you're describing through PCH on some of my ex projects. All of them were on the MM LoC scale and build-from-scratch was taking roughly 45min to 1hr on a developer machine. I never got to make a substantial cut in the build time through PCH, only perhaps by a very few % but YMMV.

Given that modules are PCH in disguise, I don't believe modules are going to have a dramatically bigger, and better, effect on C++ build times. Whether or not they are going to cut the build time from 45 seconds to 30 seconds I also couldn't care less - small projects don't even have the build time problem worth solving. Mid and big projects is where it will really count and if modules are going to have a substantial effect in the range of at least 20-30% in build time cut, I will be all ears.


Yeah, how much modules/PCHs would help probably depends a lot on your code base. I read about clangs work on modules and they seemed a bit uncertain if you'd get much benefit at all once you switch on optimizations. https://clang.llvm.org/docs/StandardCPlusPlusModules.html#ho...


iostreams are more or less just a straight conversion of everything that C streams were (but nobody ever used) in to polymorphic interfaces and templates.

It's a shame they add so much header bloat, but there's a lot of functionality in there and it's hard to see a way it could have been avoided if you want type safety.


Also taking us back to 1990's, anyone coming from Object Pascal, Smalltalk or Common Lisp would feel right at home with iostreams architecture.

As for the header bloat, fear not, import std; for the complete C++ standard library, is faster than a plain #include <iostream>, as per VC++ team measurements.


Most of the sin of iostream is having a humonguous overload set for operator<<. AFAIK it's not too bad to just compile the stream headers, but the overload resolution for `std::cout << anything` is expensive.


And after all that effort, an operator overload is still the "wrong" API for the job anyway, because: it's missing a crucial argument! Any binary (infix) operator can only relate two things, but typically when you want to print something you actually want to relate three things: the stream, the object or value to print, and the way you want values represented.

Oh, sure, you can manipulate this by inserting magic formatting objects, but that's not different from saying all functions take one argument a la Category Theory or Haskell without syntactic sugar. Or you could write a class or function that allows you to tag your object or value by changing its type to a different one so a different operator<< overload is called. In the iostreams design, you're only indirectly selecting which print function to use by manipulating either the stream or the printed object. But in actuality the types of your two arguments are not different, so either you must munge global settings on the stream or "hack" which operator<< overload is selected.

And that's the crux of the issue; what you "really want" is a stream API with three arguments: the stream, the object or value, and a pointer to a function (or function object) that takes the value and transforms it to a stream of characters. Why not just pass that function directly and explicitly? Of course that requires something that's not just a single chained infix operator; my thesis being that starting with that quirky choice leads to down a path that ends in a bad design.


> what you "really want" is a stream API with three arguments: the stream, the object or value, and a pointer to a function (or function object) that takes the value and transforms it to a stream of characters.

Or maybe a format object that returns a string in the way described by some directive, the way printf() worked in C or format conversions work in Python? You could call it something like std::format and make it available through a standard header like <format>.


I don't really see many codebases using iostreams anymore, it's all either cstdio or fmt


How many of those lines are blank after #ifdef?


I used cloc so blank lines and comments should be ignored.


When you consider that most popular modern languages long ago abandoned the idea of a separate header file, you see that the advantages far outweigh the disadvantages.


Except for C, no popular language (modern or ancient) has ever used header files.

So it's not a story of "abandoning", more like an insane C idiosyncrasy that wasn't ever used anywhere else.


It was only ever a thing in the first place because, almost half a century ago, some people decided to write a one-pass compiler due to storage/memory limitations. Even 40+ years ago, C was the only popular language that required them - to never address the issue is practically inexcusable.


I personally like headers very much and do not understand the hate: They are super simple and efficient.

They do not work well C++ though because it puts the implementation into the header for some reason.


The reason being the compiler needs to know the template code to be able to generate a specialization for it.

ELF files based on the C type system can't provide such information on object files, and binary libraries.

In the modules world it can read it from the symbol table on the BMI, like in any other module based language.


Not sure what ELF and C type system has to do with this. C type information can be stored in ELF (as well as many other things). This is how a debugger can know the types.

It is simply a language design flaw of C++ that class implementation (not just templates) ended up in the headers.


In Python I sometimes miss "header files". Quite often you see:

    import x

    class Foo:
        def bar(self):
            import y  # break circular import
            y.something()

Would be nice to be able to import `Foo` without pulling in also `y`, or moving `y` inline.

Can be solved in different ways, but you see inline imports everywhere.


That's not a problem with modules, per se, but a problem with Python's module system. Importing a module in Python just executes the whole file. JavaScript used to have this issue with `require` years ago. Each `require` statement essentially executed that import right then and there, and would bug out if circular dependencies existed. However, they fixed it with `import`.

Almost every other module-based language does not have issues with circular dependencies. Python, in theory, could follow their lead, but they won't.

EDIT: I wasn't as clear as I could be. The issue again isn't modules, but scripting languages that allow top-level statements at all. Intermixing types and method declarations with executable code makes circular decencies an issue. Compiled languages don't allow top-level statements, so they don't have the dependency resolution problem.


> Each `require` statement essentially executed that import right then and there

That still happens.

It takes a lot of magic trickery to make cyclical require/imports work for JavaScript and a lot of times they don't/can't.


That's true. I guess I misspoke and wasn't as clear as I could be. What I meant to say was: the problem with Python and JavaScript's module system isn't modules themselves, but top-level statements. You are correct that `import` doesn't solve such a problem; It solves the synchronous dependency resolution problem.

Realistically, that's just what happens when a language allows top-level statements, as they are executed when a file is loaded. As such, scripting languages tend to fall victim to the problem, but compiled ones don't.

I'll update my comment.


Worth noting that ES-style imports do improve the scenario for cyclic dependencies though, assuming you do not need to reference the symbols imported during the execution of the top level statements. This is because `import { bar } from './bar.js'` makes `bar` a "live binding" -- even if `bar.js` also imports `foo.js` and thus `bar` may initially be undefined due to the cyclic dependency, later when `bar.js` completes its initialization of `bar`, the identifier `bar` in `foo.js` will be updated to match.

So if you just need to call that binding within, say, a function or a method call, its typically fine to do so assuming that by the time that function or method call is executed, both modules will have completed their initializations.


And that is similar to require's behavior for properties on the returned object.

   const foo = require("foo"); // starts out as empty object

   exports.bar = function() {
     return foo.fn();
   }


Yes but the difference is that this:

let x = require('foo').x

of course, cannot do that. It's a small difference but it does make it easier to fall into the happy path in more cases.


> Realistically, that's just what happens when a language allows top-level statements, as they are executed when a file is loaded. As such, scripting languages tend to fall victim to the problem, but compiled ones don't.

Lots of languages allow initialization code (e.g. Java static initializer).

But in the case of scripting languages, usually the classes and functions themselves are initialization code....i.e. everything is initialization code.


Go forbids cyclical imports, and, in most cases, I really appreciate it. Makes the code much simpler to reason about


As someone who doesn't use Go, how would one handle a situation where two modules actually depend on each other? For example, a parent object owning a child object is normal, but what if that child object sometimes needs a (weak) reference to the parent?


Just put them in the same module?


What if the child is an user-implemented piece of code and the parent is in a library?


Then there wouldn't be a circular dependency. If the parent is in a library and was written independently of the child, then the parent could not possibly import the child.


> Almost every other module-based language does not have issues with circular dependencies.

I recall having seen a circular dependency compiler error in Go.


wait, is this even possible? I just presume that all import statements are executed first.


Ironically you can solve this with header guards in python as well... Old problem requires old solution


One justification I've seen is that the header defines the interface of a library and allows its implementation to change; the argument reads remarkably like that for pure abstract base classes or what other OOP languages call "interface". The problem, of course, being that C's version of this idea is a bastardized compile-time-only polymorphism.

But from a developer user experience standpoint, what is the difference between having to write things twice for this reason:

// header.h

int foo();

// library.c

int foo() { return 42; }

Versus writing things twice for this reason:

// interface.cs

interface IBar { int foo(); }

// library.cs

class Bar : IBar { int foo() { return 42; } }

Granted in C it was for dumb reasons whereas in a modern language it's for better (?) reasons, but: You're still writing things twice!


> writing things twice

Not in Oberon.


The ada concept of "package spec" and "package body" does headers the right way.

Expose the types, and the public procedures. Users only need to see the spec during compilation.


Pascal has include files which work very similarly. Cobol has the COPY statement which is commonly used to include shared data structure definitions (“copybooks”).


so modules should have actually been one of the first feature added to C++? Because in C the usage of header files is still more or less sane, it's only C++ that drives it too far...


Impossible given the original design requirements for C with Classes, while being developed at Bell Labs, require nothing more than a standard UNIX C toolchain.

Hence name mangling, the only way to add some linking type safety, while using a bare bones UNIX linker that only knows C and Assembly.


The problem isn't separate header files, it's textual inclusion. With C++ modules you still have separate module interface units. Separation between interface and implementation is important for enabling circular dependencies (mutual use) between implementations.


I think headers were a great tool from a time before intellisense where developers and teams would include function + comments on how to use it.

On the flip side, I think there are still a lot of developers who don't use language servers or use the equivalent of notepad who do rely on separate header files as a means of documentation.

That said, it's always been painful to structure inter-related headers and source files to avoid circular dependencies, and if modules truly resolves that, I'm happy to see it.


conan2 breaking backwards compatibility with conan1 is... rough. My first conan experience was everything breaks because of that...


>If one would only use modules these errors would simply not exist.

In exchange for reducing parallelism because you are not using forward declerations to break dependency chains up.


Only for the mindset stuck in the classical UNIX tooling approach, module based languages have used parallel compilation for a long time.

The big difference is that those communities embrace compiler and tooling are part of the same story.

Thankfully now we have a tools working group trying to bring C++ community into the modern world of module based compiler toolchains.


Show me a module based language that maintains the same parallelism of compiles.


C#, compiling in parallel since the C++ compiler was replaced with a C# written one, beating C++ traditional compile times hands down.

Go, compiling code in parallel since Go 1.19. One is able to compile the whole toolchain from scratch faster than many C++ codebases.

Active Oberon, compiling in parallel via the Paco toolchain since 2003, a full graphical workstation OS, built in a couple of minutes.

Ada, parallel compilation available in most toolchains since 1989. Also available in GNAT.

Delphi, yet another example.

It is C++ that needs to get up to date with modern toolchains and away from hacks like unity builds.


Go compiler does incomparably less work than GCC. Arguably, GCC could have been faster, but this is somewhat apple to oranges comparison. The closer one possibly is comparing .NET Native AOT and GCC, where the compilation times and memory use start not to differ that much because of significant amount of work required to properly optimize and link all assemblies/compilation units.

Otherwise, I agree that LLVM and GCC are not an end all and be all, and have significant issues with compilation performance.


Notice how you ignored the existence of .NET Native, Ada, Delphi.

While I can pick other examples, I can't be bothered providing an exaustive list of every single language with compiler toolchains doing parallel compilation.


.NET Native is dead and when Ada or Delphi were in their prime I did not even have a computer which is why they got no mention :)


It is going to be a struggle. Headers work well for full recompilation as they provide significant parallelism. But are very bad for incremental compilation.


+1

I once refactored part of my code for headers to only #include forward declarations (and few generics, of course), except where really needed (implementation)

It made compile times 2x faster, probably could get 3x if fully done.


C++ is really becoming easier to use!

    compiler error: you have to match the flags of that one dependency you're not familiar with and you didn't even compile yourself, rookie!


Youve always needed to do this.

If you link against a static library that's compiled with -D_GLIBCXX_ASSERTIONS=1 it needs to be defined in your code too. Same for the CRT flags with MSVC


You'd hope that modules would make those issues go away, instead of making them worse.

Also, CPPFLAGS ensure the shared code in headers is the same, whereas with modules it feels as if you have to manually bring the compiler itself in the right state.


ABI-breaking flags break the ABI. There is no way around that.

edit: incidentally, _GLIBCXX_ASSERTIONS is not ABI breaking.


TIL!


No, it has been the same since compiled languages exist, modules only improve the packaging and type safety story.

CPPFLAGS ensures not such thing when using binary libraries.


You really just have to compile all of your dependencies yourself (or use pre-compiled distro packages and compile your code with the distro toolchain).

Of course, compiling everything yourself is the only way to get LTO to work well, and you probably want that.


You can use pkg-config for this!

I'm not sure if a lot of people are using pkg-config manually but you can do it easily. If you have a pkg-config file for the library it will handle the flags for you.

I guess a build system would handle this for you also.

Even for "hello world" I've found meson to be useful, and it's only two lines of meson to get "hello world" to compile. I would prefer this to manually compiling by calling "c++ main.c++ -o hello" from the shell.


Wouldn’t it make sense to make the BMI file to be compiler neutral? I’m not sure there is an actual need for them to be that intimately tied to the compiler. What information that’s impossible to recreate from the equivalent of a header file is required?


I think they are just an optimization, like precompiled header files. Theoretically they could probably take the form of C++ module interface units without implementation, but that wouldn't gain you that much in terms of compilation time.


There is a proposal for this: https://github.com/microsoft/ifc-spec

Only Microsoft implements that so far. I gather EDG (which powers IntelliSense) has been at least researching support for consuming IFC files.

I'm not aware of anyone sponsoring work in Clang or GCC to add IFC support.


Yeah, that'd make a lot of sense.

With Fortran mod files I believe it's not even backwards compatible for a single compiler: gfortran.

If it's not standardized, you end up recompiling the world, which defies the point.


I'm hoping C++ modules take off, because getting rid of naive textual inclusion enables mixing of code from different language versions, even if they're slightly incompatible.

It would enable C++ to have an Edition concept like Rust, and be able to start removing warts and poor defaults from the language, without breaking old code, and without needing the whole world to upgrade at once (well, except the modules adoption).


> So far, CMake 3.28 (still a Release Candidate at time of writing) is the only build system that implements dependency scanning and the ability to consume external libraries that provide modules, and the BMIs are built locally rather than distributed.

Hm, build2 was able to do this back in 2021: https://build2.org/blog/build2-cxx20-modules-gcc.xhtml And it was able to do it for both named modules and header units (the mentioned CMake release can only handle named modules).


There's no way build2 did exactly this back in 2021 given communication APIs for GCC just landed in its main branch a month ago. Clang and GCC, only a year ago or so.

How does build2 understand module dependencies without that?


build2 use the module mapper API which was available long before 2021: https://gcc.gnu.org/onlinedocs/gcc/C_002b_002b-Module-Mapper... You can try the examples here https://github.com/build2/cxx20-modules-examples/ with GCC 13 or 12 to confirm, if you wish.

> given communication APIs for GCC just landed in its main branch a month ago

Can you elaborate on what are these "communication APIs"?


https://wg21.link/p1689

Well, the daemon-based approach won't really work with certain kinds of distributed build and caching systems. I suspect it will work for a lot of use cases though. It's also problematic if you want to parse code without a build system as such. Does every analysis tool and IDE need to add support for that GCC specific API?

But the p1689 approach is implemented the same way for all three major compilers. And it should be reasonably adoptable by anything else that might want to work with CMake and other build systems that choose to support p1689.


> https://wg21.link/p1689

Ah, that. It's just a new -M output. It's still cannot handle header units though without a major extension to the preprocessor semantics, which so far only Clang managed to implement (for details, see https://developercommunity.visualstudio.com/t/scanDependenci... ; in particular notice how it was reported 1.5 years ago but is still unfixed).

> Does every analysis tool and IDE need to add support for that GCC specific API?

That besides the original point, which was that I claimed build2 supported this since 2021 to which you replied that it couldn't have.


I still think the p1689 is a more robust implementation, but I'll stand corrected on the technical details of the build2 implementation of modules.

Viable designs to support header units was still up in the air until six months ago or so. I don't see significant investment for that happening in GCC or Clang still, especially with respect to how build systems are supposed to understand how to invalidate BMIs and such. Current trajectory is that we'll consider header units a niche technology or even an unfinished idea.


> I'll stand corrected on the technical details of the build2 implementation of modules.

Thank you.

> I still think the p1689 is a more robust implementation

I agree prescan has advantages, like being easier to integrate into existing build systems/analyzers/IDEs, but robustness is definitely not one of them. What can be more robust than the compiler asking the build system directly during compilation for the information it needs? Compared to the prescan, where the build system first scans the world with one set of command lines, digests the dependencies, and then starts invoking the compiler with another set of command lines.

Also, the mapper approach could be used to address other long-standing issues, like proper generated header support: https://wg21.link/P1842R0


Again, the build system and compiler are often on different systems. The design has a specific drawback in that respect. Even for machine local caching, there are advantages to having the communication with the compiler be unidirectional.


Totally agree that plain header files have many issues, and they are a common source of headache, but can someone explain to me why replacing plaintext, greppable header files with a seemingly very fragile/unportable binary format is a good idea? Surely I must be missing something? I get the sense that they tried to fix complex issues of header files by introducing more complexity. Is it assumed that new tooling will hide this complexity? All genuine questions, I haven't tried out C++ modules myself at all yet.


Yeah this just sounds like precompiled headers, except the only part they actually standardized was the name.


Conan indulges the fantasy that c++ code is packageable. Sorry, in 2023 the only portable way to distribute c++ libraries is still by source. This is why header only libraries are so desirable.


Portable in what dimensions? There's no portable way to describe how to build an archive containing C++ source code.


I've been using Conan for years with great success. I think you might have missed the documentation: https://docs.conan.io


conan begs to differ. they distribute binary packages for many options: arch, compiler, debug/release, flags, custom, ...


Conan can beg to differ all they want. Offering source meets all of those needs without have to rely on a third party guessing your needs.


Conan is a cooperate packaging framework, which obviously favors binary packages over source-only. And there's no guessing, all the options are hashed, and the binary is based on this hash. A single option change leads to different binaries.

Even in open source it's extremely important to use proper options (arch, flags, deps in which variant, ...). E.g. with something Debian or Redhat they decide on these options. With conan you can decide by yourself. E.g. use better hardening flags or a better libc or a more stable dependency package.


  └-- lib
          |-- cxx
          |   └-- foo.cppm     ---> this is a module interface (does `export module foo`)
          └-- libfoo.a
Hm, .../lib/ normally contains architecture-specific files so I wonder what was the rationale behind installing architecture-independent source code (foo.cppm is a source file) there instead of something architecture-independent like .../include/?


That example isn't a linux file system, it's a Conan package. Conan packages are built for a single configuration/architecture, and only used when compiling your project. They live in a local Conan cache folder, and not installed on the system.


lib on Linux contains library files. On native languages, this means they contain native code, but not all library files are architecture-dependent. See Python files at /usr/lib/python<version>

/usr/include on the other hand only is used for C and C++ header files. Modules are not header files.


> but not all library files are architecture-dependent.

While this is definitely true, the presence in /usr/share/ of many files from library packages on my system (Debian) still suggests that architecture-independent files should not go into /usr/lib/.

> See Python files at /usr/lib/python<version>

Aren't the compiled (.pyc) files in there architecture-dependent (or could be; genuine question, I have no idea)?

> /usr/include on the other hand only is used for C and C++ header files. Modules are not header files.

Yes, but it doesn't follow they cannot be installed there if that location is the most suitable from the distribution packaging point of view.


This BMI compiler dependency seems quite the limitation, surely this was a chance to tidy and unify name mangling. So we can’t import binaries across even minor compiler version changes?!


You don’t want unified mangling because different compilers make different default decisions on laying out data. While that’s mostly been stamped out in C++ (mixed feelings about that), their library implementations are quite different, which Is a good thing.


I want unified mangling only if it comes with unified basic types. I want a string (maybe not std::string) that I can use not just in C++, but also rust, ada, python, C#, Java, and whatever other language is hot this week that is willing to provide some guarantees (this might include something like deterministic destruction which could toss some of the above out - something to debate). Likewise I want a list type (like std::vector) that I can put in arbitrary types and access by the above languages. The above quickly forces some form a struct type with a layout, and likely also functions (thus class), though we can debate how much we include.


Seems like you want winrt


So the bmi file is the replacement for the header file and the module the replacement for the lib? Got it.

I’d like to have the option to have a bmi section in the module file itself; that of course would mean that most people would either have to have a writable /lib &. /usr/lib etc or constrain the compiler flags available. I wonder how many people would actually be inconvenienced by the latter.


No. The BMI is more like an object file. It's not likely anyone would ship one.

The lib is still the lib. The header files are replaced with module interfaces. It's just that those are probably going to be shipped with some parsing instructions, like required preprocessor flags, the C++ version used, and that sort of thing.


C++ is getting blue haired language features.

Quick, someone make a mascot!


Are there any modern "package manager" for C?


Conan, Spack, Nix, vcpkg, not to mention the OS specific ones. All are actively maintained. Modern means different things to different people, so I expect some people will still be looking for more answers.


In my head, these are:

- Conan and vcpkg are probably the closest equivalent to other "language package managers" (e.g. Rust's cargo, or NodeJS's NPM).

- Spack is typically used in HPC/Scientific Computing domain.

- Nix (and Guix) are very powerful (system) package managers; these have enough nifty features that I'd call them "next-generation".

Probably also for sufficiently complicated projects, it may be worth mentioning sophisticated build tools like Bazel (which I think manages dependencies in its own way).


Spack dev here. Yep it's mainly used in HPC, but we'd love to see more folks using in other areas.

There is windows support now (in addition to linux and macOS), which I think was a roadblock for many, and we have gotten some very interest from folks outside the HPC community, e.g. Replica.one, who hope to use it for a software-defined OS for embedded devices: https://www.youtube.com/watch?v=sMxNafpDhng (skip to ~12min for some discussion of Spack, Nix, Guix, and portage)


See the other replies for options.

Nobody wants a package manager for C in the C world. We want a package manager for whatever language we write our code in. A lot of C code is not a single C ecosystem, but coexists with other languages, even where it is all C we often have various code generation tools that look a lot like different languages that just happen to generate C, and sometimes we want to package the into for those tools not the C output.


Same goes for C++. And successor languages will have limited reach as long as that don't get this requirement by, for instance, exclusively servicing a language specific packaging solution.


https://xrepo.xmake.io does a decent job at that, but then you're stuck with a modern "build system" too




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: