I think Conan and Modules will make c++ more popular in the future.
It is a behemoth of a language. Very hard to master and it is still too easy to produce difficult to debug errors. So it will be very important, that the language changes to address these.
But in working with c++ day to day I feel the most annoying things are indeed dealing with dependencies and headers. It can be a pain to set up a complex c++ project on a new machine. Even with Conan it can still be a pain because configuration management is a mess in c++.
As for headers I can’t wait to see them rot in hell. Some people say that they can help you reason about your Program, but this is a very small positive point. Too many times I had problems with includes which only worked because a Translation Unit included some other header beforehand. I once read a blogpost of the developer of SumatraPDF where they describe that they never include anything directly[1]. This can be an improvement for compile time. But if someone else is working on your code or has to refactor it, it can be impossible to read yourself into it or track any compile time error. If one would only use modules these errors would simply not exist.
> Just bumping the C++ std version can add thousands of lines of code.
And yet none of this heavily influences compile times. At least I have not found so on MM LoC projects I've been working so far. Not sure what the fuss is around modules, I honestly don't think it will solve C++ build times. Happy to be proved otherwise.
Try compiling a small source file that includes a handful of STL headers, and compare to one that doesn’t. Even just including without instantiating any of the templates, the difference is a substantial fraction of a second. That can be a heavy influence, though maybe not in a huge codebase which also has plenty of large headers of its own.
I tried couple of times to workaround the problem you're describing through PCH on some of my ex projects. All of them were on the MM LoC scale and build-from-scratch was taking roughly 45min to 1hr on a developer machine. I never got to make a substantial cut in the build time through PCH, only perhaps by a very few % but YMMV.
Given that modules are PCH in disguise, I don't believe modules are going to have a dramatically bigger, and better, effect on C++ build times. Whether or not they are going to cut the build time from 45 seconds to 30 seconds I also couldn't care less - small projects don't even have the build time problem worth solving. Mid and big projects is where it will really count and if modules are going to have a substantial effect in the range of at least 20-30% in build time cut, I will be all ears.
Yeah, how much modules/PCHs would help probably depends a lot on your code base. I read about clangs work on modules and they seemed a bit uncertain if you'd get much benefit at all once you switch on optimizations. https://clang.llvm.org/docs/StandardCPlusPlusModules.html#ho...
iostreams are more or less just a straight conversion of everything that C streams were (but nobody ever used) in to polymorphic interfaces and templates.
It's a shame they add so much header bloat, but there's a lot of functionality in there and it's hard to see a way it could have been avoided if you want type safety.
Also taking us back to 1990's, anyone coming from Object Pascal, Smalltalk or Common Lisp would feel right at home with iostreams architecture.
As for the header bloat, fear not, import std; for the complete C++ standard library, is faster than a plain #include <iostream>, as per VC++ team measurements.
Most of the sin of iostream is having a humonguous overload set for operator<<. AFAIK it's not too bad to just compile the stream headers, but the overload resolution for `std::cout << anything` is expensive.
And after all that effort, an operator overload is still the "wrong" API for the job anyway, because: it's missing a crucial argument! Any binary (infix) operator can only relate two things, but typically when you want to print something you actually want to relate three things: the stream, the object or value to print, and the way you want values represented.
Oh, sure, you can manipulate this by inserting magic formatting objects, but that's not different from saying all functions take one argument a la Category Theory or Haskell without syntactic sugar. Or you could write a class or function that allows you to tag your object or value by changing its type to a different one so a different operator<< overload is called. In the iostreams design, you're only indirectly selecting which print function to use by manipulating either the stream or the printed object. But in actuality the types of your two arguments are not different, so either you must munge global settings on the stream or "hack" which operator<< overload is selected.
And that's the crux of the issue; what you "really want" is a stream API with three arguments: the stream, the object or value, and a pointer to a function (or function object) that takes the value and transforms it to a stream of characters. Why not just pass that function directly and explicitly? Of course that requires something that's not just a single chained infix operator; my thesis being that starting with that quirky choice leads to down a path that ends in a bad design.
> what you "really want" is a stream API with three arguments: the stream, the object or value, and a pointer to a function (or function object) that takes the value and transforms it to a stream of characters.
Or maybe a format object that returns a string in the way described by some directive, the way printf() worked in C or format conversions work in Python? You could call it something like std::format and make it available through a standard header like <format>.
When you consider that most popular modern languages long ago abandoned the idea of a separate header file, you see that the advantages far outweigh the disadvantages.
It was only ever a thing in the first place because, almost half a century ago, some people decided to write a one-pass compiler due to storage/memory limitations. Even 40+ years ago, C was the only popular language that required them - to never address the issue is practically inexcusable.
Not sure what ELF and C type system has to do with this. C type information can be stored in ELF (as well as many other things). This is how a debugger can know the types.
It is simply a language design flaw of C++ that class implementation (not just templates) ended up in the headers.
That's not a problem with modules, per se, but a problem with Python's module system. Importing a module in Python just executes the whole file. JavaScript used to have this issue with `require` years ago. Each `require` statement essentially executed that import right then and there, and would bug out if circular dependencies existed. However, they fixed it with `import`.
Almost every other module-based language does not have issues with circular dependencies. Python, in theory, could follow their lead, but they won't.
EDIT: I wasn't as clear as I could be. The issue again isn't modules, but scripting languages that allow top-level statements at all. Intermixing types and method declarations with executable code makes circular decencies an issue. Compiled languages don't allow top-level statements, so they don't have the dependency resolution problem.
That's true. I guess I misspoke and wasn't as clear as I could be. What I meant to say was: the problem with Python and JavaScript's module system isn't modules themselves, but top-level statements. You are correct that `import` doesn't solve such a problem; It solves the synchronous dependency resolution problem.
Realistically, that's just what happens when a language allows top-level statements, as they are executed when a file is loaded. As such, scripting languages tend to fall victim to the problem, but compiled ones don't.
Worth noting that ES-style imports do improve the scenario for cyclic dependencies though, assuming you do not need to reference the symbols imported during the execution of the top level statements. This is because `import { bar } from './bar.js'` makes `bar` a "live binding" -- even if `bar.js` also imports `foo.js` and thus `bar` may initially be undefined due to the cyclic dependency, later when `bar.js` completes its initialization of `bar`, the identifier `bar` in `foo.js` will be updated to match.
So if you just need to call that binding within, say, a function or a method call, its typically fine to do so assuming that by the time that function or method call is executed, both modules will have completed their initializations.
> Realistically, that's just what happens when a language allows top-level statements, as they are executed when a file is loaded. As such, scripting languages tend to fall victim to the problem, but compiled ones don't.
Lots of languages allow initialization code (e.g. Java static initializer).
But in the case of scripting languages, usually the classes and functions themselves are initialization code....i.e. everything is initialization code.
As someone who doesn't use Go, how would one handle a situation where two modules actually depend on each other? For example, a parent object owning a child object is normal, but what if that child object sometimes needs a (weak) reference to the parent?
Then there wouldn't be a circular dependency. If the parent is in a library and was written independently of the child, then the parent could not possibly import the child.
One justification I've seen is that the header defines the interface of a library and allows its implementation to change; the argument reads remarkably like that for pure abstract base classes or what other OOP languages call "interface". The problem, of course, being that C's version of this idea is a bastardized compile-time-only polymorphism.
But from a developer user experience standpoint, what is the difference between having to write things twice for this reason:
// header.h
int foo();
// library.c
int foo() { return 42; }
Versus writing things twice for this reason:
// interface.cs
interface IBar { int foo(); }
// library.cs
class Bar : IBar { int foo() { return 42; } }
Granted in C it was for dumb reasons whereas in a modern language it's for better (?) reasons, but: You're still writing things twice!
Pascal has include files which work very similarly. Cobol has the COPY statement which is commonly used to include shared data structure definitions (“copybooks”).
so modules should have actually been one of the first feature added to C++? Because in C the usage of header files is still more or less sane, it's only C++ that drives it too far...
Impossible given the original design requirements for C with Classes, while being developed at Bell Labs, require nothing more than a standard UNIX C toolchain.
Hence name mangling, the only way to add some linking type safety, while using a bare bones UNIX linker that only knows C and Assembly.
The problem isn't separate header files, it's textual inclusion. With C++ modules you still have separate module interface units. Separation between interface and implementation is important for enabling circular dependencies (mutual use) between implementations.
I think headers were a great tool from a time before intellisense where developers and teams would include function + comments on how to use it.
On the flip side, I think there are still a lot of developers who don't use language servers or use the equivalent of notepad who do rely on separate header files as a means of documentation.
That said, it's always been painful to structure inter-related headers and source files to avoid circular dependencies, and if modules truly resolves that, I'm happy to see it.
Go compiler does incomparably less work than GCC. Arguably, GCC could have been faster, but this is somewhat apple to oranges comparison. The closer one possibly is comparing .NET Native AOT and GCC, where the compilation times and memory use start not to differ that much because of significant amount of work required to properly optimize and link all assemblies/compilation units.
Otherwise, I agree that LLVM and GCC are not an end all and be all, and have significant issues with compilation performance.
Notice how you ignored the existence of .NET Native, Ada, Delphi.
While I can pick other examples, I can't be bothered providing an exaustive list of every single language with compiler toolchains doing parallel compilation.
It is going to be a struggle. Headers work well for full recompilation as they provide significant parallelism. But are very bad for incremental compilation.
I once refactored part of my code for headers to only #include forward declarations (and few generics, of course), except where really needed (implementation)
It made compile times 2x faster, probably could get 3x if fully done.
If you link against a static library that's compiled with -D_GLIBCXX_ASSERTIONS=1 it needs to be defined in your code too. Same for the CRT flags with MSVC
You'd hope that modules would make those issues go away, instead of making them worse.
Also, CPPFLAGS ensure the shared code in headers is the same, whereas with modules it feels as if you have to manually bring the compiler itself in the right state.
You really just have to compile all of your dependencies yourself (or use pre-compiled distro packages and compile your code with the distro toolchain).
Of course, compiling everything yourself is the only way to get LTO to work well, and you probably want that.
I'm not sure if a lot of people are using pkg-config manually but you can do it easily. If you have a pkg-config file for the library it will handle the flags for you.
I guess a build system would handle this for you also.
Even for "hello world" I've found meson to be useful, and it's only two lines of meson to get "hello world" to compile. I would prefer this to manually compiling by calling "c++ main.c++ -o hello" from the shell.
Wouldn’t it make sense to make the BMI file to be compiler neutral? I’m not sure there is an actual need for them to be that intimately tied to the compiler. What information that’s impossible to recreate from the equivalent of a header file is required?
I think they are just an optimization, like precompiled header files. Theoretically they could probably take the form of C++ module interface units without implementation, but that wouldn't gain you that much in terms of compilation time.
I'm hoping C++ modules take off, because getting rid of naive textual inclusion enables mixing of code from different language versions, even if they're slightly incompatible.
It would enable C++ to have an Edition concept like Rust, and be able to start removing warts and poor defaults from the language, without breaking old code, and without needing the whole world to upgrade at once (well, except the modules adoption).
> So far, CMake 3.28 (still a Release Candidate at time of writing) is the only build system that implements dependency scanning and the ability to consume external libraries that provide modules, and the BMIs are built locally rather than distributed.
Hm, build2 was able to do this back in 2021: https://build2.org/blog/build2-cxx20-modules-gcc.xhtml And it was able to do it for both named modules and header units (the mentioned CMake release can only handle named modules).
There's no way build2 did exactly this back in 2021 given communication APIs for GCC just landed in its main branch a month ago. Clang and GCC, only a year ago or so.
How does build2 understand module dependencies without that?
Well, the daemon-based approach won't really work with certain kinds of distributed build and caching systems. I suspect it will work for a lot of use cases though. It's also problematic if you want to parse code without a build system as such. Does every analysis tool and IDE need to add support for that GCC specific API?
But the p1689 approach is implemented the same way for all three major compilers. And it should be reasonably adoptable by anything else that might want to work with CMake and other build systems that choose to support p1689.
Ah, that. It's just a new -M output. It's still cannot handle header units though without a major extension to the preprocessor semantics, which so far only Clang managed to implement (for details, see https://developercommunity.visualstudio.com/t/scanDependenci... ; in particular notice how it was reported 1.5 years ago but is still unfixed).
> Does every analysis tool and IDE need to add support for that GCC specific API?
That besides the original point, which was that I claimed build2 supported this since 2021 to which you replied that it couldn't have.
I still think the p1689 is a more robust implementation, but I'll stand corrected on the technical details of the build2 implementation of modules.
Viable designs to support header units was still up in the air until six months ago or so. I don't see significant investment for that happening in GCC or Clang still, especially with respect to how build systems are supposed to understand how to invalidate BMIs and such. Current trajectory is that we'll consider header units a niche technology or even an unfinished idea.
> I'll stand corrected on the technical details of the build2 implementation of modules.
Thank you.
> I still think the p1689 is a more robust implementation
I agree prescan has advantages, like being easier to integrate into existing build systems/analyzers/IDEs, but robustness is definitely not one of them. What can be more robust than the compiler asking the build system directly during compilation for the information it needs? Compared to the prescan, where the build system first scans the world with one set of command lines, digests the dependencies, and then starts invoking the compiler with another set of command lines.
Also, the mapper approach could be used to address other long-standing issues, like proper generated header support: https://wg21.link/P1842R0
Again, the build system and compiler are often on different systems. The design has a specific drawback in that respect. Even for machine local caching, there are advantages to having the communication with the compiler be unidirectional.
Totally agree that plain header files have many issues, and they are a common source of headache, but can someone explain to me why replacing plaintext, greppable header files with a seemingly very fragile/unportable binary format is a good idea? Surely I must be missing something? I get the sense that they tried to fix complex issues of header files by introducing more complexity. Is it assumed that new tooling will hide this complexity?
All genuine questions, I haven't tried out C++ modules myself at all yet.
Conan indulges the fantasy that c++ code is packageable. Sorry, in 2023 the only portable way to distribute c++ libraries is still by source. This is why header only libraries are so desirable.
Conan is a cooperate packaging framework, which obviously favors binary packages over source-only. And there's no guessing, all the options are hashed, and the binary is based on this hash. A single option change leads to different binaries.
Even in open source it's extremely important to use proper options (arch, flags, deps in which variant, ...). E.g. with something Debian or Redhat they decide on these options. With conan you can decide by yourself. E.g. use better hardening flags or a better libc or a more stable dependency package.
└-- lib
|-- cxx
| └-- foo.cppm ---> this is a module interface (does `export module foo`)
└-- libfoo.a
Hm, .../lib/ normally contains architecture-specific files so I wonder what was the rationale behind installing architecture-independent source code (foo.cppm is a source file) there instead of something architecture-independent like .../include/?
That example isn't a linux file system, it's a Conan package. Conan packages are built for a single configuration/architecture, and only used when compiling your project. They live in a local Conan cache folder, and not installed on the system.
lib on Linux contains library files. On native languages, this means they contain native code, but not all library files are architecture-dependent. See Python files at /usr/lib/python<version>
/usr/include on the other hand only is used for C and C++ header files. Modules are not header files.
> but not all library files are architecture-dependent.
While this is definitely true, the presence in /usr/share/ of many files from library packages on my system (Debian) still suggests that architecture-independent files should not go into /usr/lib/.
> See Python files at /usr/lib/python<version>
Aren't the compiled (.pyc) files in there architecture-dependent (or could be; genuine question, I have no idea)?
> /usr/include on the other hand only is used for C and C++ header files. Modules are not header files.
Yes, but it doesn't follow they cannot be installed there if that location is the most suitable from the distribution packaging point of view.
This BMI compiler dependency seems quite the limitation, surely this was a chance to tidy and unify name mangling. So we can’t import binaries across even minor compiler version changes?!
You don’t want unified mangling because different compilers make different default decisions on laying out data. While that’s mostly been stamped out in C++ (mixed feelings about that), their library implementations are quite different, which Is a good thing.
I want unified mangling only if it comes with unified basic types. I want a string (maybe not std::string) that I can use not just in C++, but also rust, ada, python, C#, Java, and whatever other language is hot this week that is willing to provide some guarantees (this might include something like deterministic destruction which could toss some of the above out - something to debate). Likewise I want a list type (like std::vector) that I can put in arbitrary types and access by the above languages. The above quickly forces some form a struct type with a layout, and likely also functions (thus class), though we can debate how much we include.
So the bmi file is the replacement for the header file and the module the replacement for the lib? Got it.
I’d like to have the option to have a bmi section in the module file itself; that of course would mean that most people would either have to have a writable /lib &. /usr/lib etc or constrain the compiler flags available. I wonder how many people would actually be inconvenienced by the latter.
No. The BMI is more like an object file. It's not likely anyone would ship one.
The lib is still the lib. The header files are replaced with module interfaces. It's just that those are probably going to be shipped with some parsing instructions, like required preprocessor flags, the C++ version used, and that sort of thing.
Conan, Spack, Nix, vcpkg, not to mention the OS specific ones. All are actively maintained. Modern means different things to different people, so I expect some people will still be looking for more answers.
- Conan and vcpkg are probably the closest equivalent to other "language package managers" (e.g. Rust's cargo, or NodeJS's NPM).
- Spack is typically used in HPC/Scientific Computing domain.
- Nix (and Guix) are very powerful (system) package managers; these have enough nifty features that I'd call them "next-generation".
Probably also for sufficiently complicated projects, it may be worth mentioning sophisticated build tools like Bazel (which I think manages dependencies in its own way).
Spack dev here. Yep it's mainly used in HPC, but we'd love to see more folks using in other areas.
There is windows support now (in addition to linux and macOS), which I think was a roadblock for many, and we have gotten some very interest from folks outside the HPC community, e.g. Replica.one, who hope to use it for a software-defined OS for embedded devices: https://www.youtube.com/watch?v=sMxNafpDhng (skip to ~12min for some discussion of Spack, Nix, Guix, and portage)
Nobody wants a package manager for C in the C world. We want a package manager for whatever language we write our code in. A lot of C code is not a single C ecosystem, but coexists with other languages, even where it is all C we often have various code generation tools that look a lot like different languages that just happen to generate C, and sometimes we want to package the into for those tools not the C output.
Same goes for C++. And successor languages will have limited reach as long as that don't get this requirement by, for instance, exclusively servicing a language specific packaging solution.
But in working with c++ day to day I feel the most annoying things are indeed dealing with dependencies and headers. It can be a pain to set up a complex c++ project on a new machine. Even with Conan it can still be a pain because configuration management is a mess in c++.
As for headers I can’t wait to see them rot in hell. Some people say that they can help you reason about your Program, but this is a very small positive point. Too many times I had problems with includes which only worked because a Translation Unit included some other header beforehand. I once read a blogpost of the developer of SumatraPDF where they describe that they never include anything directly[1]. This can be an improvement for compile time. But if someone else is working on your code or has to refactor it, it can be impossible to read yourself into it or track any compile time error. If one would only use modules these errors would simply not exist.
[1]https://blog.kowalczyk.info/article/96a4706ec8e44bc4b0bafda2...