In short: I wanted to talk a bit about ASN.1, a bit about D, and a bit about the compiler itself, but couldn't think of any real cohesive format.
So I threw a bunch of semi-related ramblings together and I'm daring to call it a blog post.
Sorry in advance since I will admit it's not the greatest quality, but it's really not easy to talk about so much with such brevity (especially since I've already forgot a ton of stuff I wanted to talk about more deeply :( )
A small nitpick: I don’t think your intersection example does what you want it to do. Perhaps there’s some obscure difference in “PER-visibility” or whatnot, but at least set-theoretically,
LegacyFlags2 ::= INTEGER (0 | 2 ^ 4..8) -- as in the article
is exactly equivalent to
LegacyFlags2 ::= INTEGER (0) -- only a single value allowed
as (using standard mathematical notation and making precedence explicit) {0} ∪ ({2} ∩ {4,5,6,7,8}) = {0} ∪ ∅ = {0}.
At least you might be summoning Walter Bright in talking about D. One of my favorite languages I wish more companies would use. Unfortunately for its own sake, Go and Rust are way more popular in the industry.
C#, Java and C++ have poor copies of the D features. For example, C++ constexpr is a bad design because a special keyword to signify evaluating at compile time is completely redundant. (Just trigger on constexpr in the grammar.)
C++ modules, well, they should have more closely copied D modules!
They might have, still they are good enough to keep people on those ecosystems and not bother at all with D, and tbe current adoption warts concerning IDE tooling and available libraries.
I wouldn't say that it's unable to make a comeback, there is still a valid use case from my experience with it. The syntax, mixed-memory model, UFCS, and compilation speed are nice quality of life features compared to C++, and it's still a native binary compared to C# and Java. So if you're starting out with a new project from scratch there's not much reason not to beyond documentation reasons. And you can interface pretty easily to C/C++ as well as pretty much any other language designed for that sort of thing, but without a lot of syntax changes like Carbon.
I imagine that the scope of its uses has shrunk as other languages caught up, and I don't think it's necessarily a good language for general enterprise stuff (unless you're dealing with C++), but for new projects it's still valid IMO. I think that the biggest field it could be used in is probably games too, especially if you're already writing a new engine from scratch. You could start with the GC and then ease off of it as the project develops in order to speed up development, for example. And D could always add newer features again too, tbh.
You always have to compare ecosystems, not programming languages syntax on their own.
Another thing that Java and C# got to do since 2011, is that AOT is also part of the ecosystem and free of charge (Java has had commercial compilers for a while), so not even a native binary is an advantage as you imagine.
First D has to finish what is already there in features that are almost done but not quite.
TBH I don't necessarily think that ecosystem is what matters in every application, but it is necessary for most people, I agree. And I do agree with finishing a lot of the half-baked features too, but I'm unsure if the people maintaining the language have the will or the means to do that.
Do you have any other ideas about how D could stand out again?
I feel like back when D might've been a language worth looking into, it was hampered by the proprietary compilers.
And still today, the first thought that comes to mind when I think D is "that language with proprietary compilers", even though there has apparently been some movement on that front? Not really worth looking into now that we have Go as an excellent GC'd compiled language and Rust as an excellent C++ replacement.
Having two different languages for those purposes seems like a better idea anyway than having one "optionally managed" language. I can't even imagine how that could possibly work in a way that doesn't just fragment the community.
But that's only the reference compiler, DMD. The other two compilers were fully open source (including gcc, which includes D) before that.
Fully disagree on your position that having all possibilities with one language is bad. When you have a nice language, it's nice to write with it for all things.
Sounds like you should look into it instead of idly speculating! Also, the funny thing about a divisive feature is that it doesn't matter if it fragments the community if you can use it successfully. There are a lot of loud people in the D community who freak out and whine about the GC, and there are plenty more quiet ones who are happily getting things done without making much noise. It's a great language.
Go is a GC language that has eaten a chunk of the industry (Docker, TypeScript, Kubernetes... Minio... and many more I'm sure) and only some people cry about it, but you know who else owns sizable chunks of the industry? Java and C# which are both GC languages. While some people waste hours crying about GCs the rest of us have built the future around it. Hell, all of AI is eaten by Python another GC language.
And in D, there's nothing stopping from either using or not using the GC. One of the key features of D is that it's possible to mix and match different memory management strategies. Maybe I have a low level computational kernel written C-style with memory management, and then for scripting I have a quick and dirty implementation of Scheme also written in D but using the GC. Perfectly fine for those two things to co-exist in the same codebase, and in fact having them coexist like that is useful.
> And in D, there's nothing stopping from either using or not using the GC.
Wait so are you, or are you not, saying that a GC-less D program can use libraries written with the assumption that there's a GC? The statement "there's nothing stopping [you] from not using the GC" implies that all libraries work with D-without-GC, otherwise the lack of libraries written for D-without-GC would be stopping you from not using the GC
Sorry, but it's more complicated than this. I understand the point you're making, but if your desiderata for using a language is "If any feature prevents me from using 100% of the libraries that have been written for the language, then the language is of no use to me", well... I'm not sure what to tell you.
It's not all or nothing with the GC, as I explained in another reply. There are many libraries that use the GC, many that don't. If you're writing code with the assumption that you'll just be plugging together libraries to do all the heavy lifting, D may not be the right language for you. There is a definite DIY hacker mentality in the community. Flexibility in attitude and approach are rewarded.
Something else to consider is that the GC is often more useful for high level code while manual memory management is more useful for low level code. This is natural because you can always use non-GC (e.g. C) libraries from GC, but (as you point out) not necessarily the other way around. That's how the language is supposed to be used.
You can use the GC for a GUI or some other loose UI thing and drop down to tighter C style code for other things. The benefit is that you can do this in one language as opposed to using e.g. Python + C++. Debugging software written in a mixture of languages like this can be a nightmare. Maybe this feature is useful for you, maybe not. All depends on what you're trying to do.
You're the guy who said that "nothing is preventing you" from using D without a GC. A lack of libraries which work without a GC is something preventing you from using D without a GC. Just be honest.
I just said there are loads of libraries which have been written explicitly to work with the GC disabled. Did you read what I wrote?
It looks like you work in a ton of different domains. I think based on what I've written in response to you so far, it should be easy to see that D is likely a good fit for some things you work on and a bad fit for others. I don't see what the problem is.
The D community is full of really nice and interesting people who are fun to interact with. It's also got a smaller number of people who complain loudly about the GC. This latter contingency of people is perceived as being maybe a bit frustrating and unreasonable.
I don't care whether you check D out or not. But your initial foray into this thread was to cast shade on D by mentioning issues with proprietary compilers (hasn't been a thing in years), and insinuating that the community was fractured because of the GC. Since you clearly don't know that much about the language and have no vested interest in it, why not leave well enough alone instead of muddying the waters with misleading and biased commentary?
My conclusion remains that the language is fractured by the optional GC and that its adoption was severely hampered by the proprietary toolchain. Nothing you have said meaningfully challenges that in my opinion, and many things you've said supports it. I don't think anything I've said is misleading.
Well, no, it isn't. There's some frustration. But mostly what I've seen is a lot of lengthy conversations with a lot of give and take. There's a small number of loud people who complain intensely, but there are also people who are against the GC who write lots of libraries that avoid it and even push the language in a healthy direction. If these people hadn't argued so strenuously against the GC, I doubt Phobos would have started moving in the direction of using the GC less and less. This is actually the sign of a healthy community.
I'm not strongly for or against (non-deterministic) GC. Deterministic GC in Rust or the (there's no real Scotsman) correctly written C++ has benefits, but often I don't care and go / java / c# / python are all fine.
I think you're really overstepping with AI is eaten by python. I can imagine an AI stack with out python llama.cpp (for inference not training... isn't completely that, but most of the core functionality is not python, and not-GCd at all), I can not imagine an AI stack with out CUDA + C++. Even the premier python tools (pytorch, vllm) would be non-functional with out these tools.
While some very common interfaces to AI require a GC'd language I think if you deleted the non-GC parts you'd be completely stuck and have years of digging your self out, but if you deleted the 'GC' parts you can end up with a usable thing in very short order.
NVidia has decided that the market demand to do everything in Python justifies the development cost of making Python fast in CUDA.
Thus now you can use PTX directly from Python, and with the new cu Tiles approach, you can write CUDA kernels in a Python subset.
Many of these tools get combined because that is what is already there, and the large majority of us don't want, or has the resources, to spend bootstrapting a whole new world.
Until there is some monetary advantage in doing so.
For 99% of the industry use cases, some kind of GC is good enough, and even when that isn't the case, there is no need to throw away the baby with the babywater, a two language approach also works.
Unfortunely those that cry about GCs are still quite vocal, at least we can now throw Rust into their way.
There's a place for GC languages, and there's a place for non-GC languages. I don't understand why you seem so angry towards people who write in non-GC languages.
Are you saying that if I'm using D-without-GC, I can use any D library, including ones written with the assumption that there is a GC? If not, how does it not fracture the community?
> There are a lot of loud people in the D community who freak out and whine about the GC, and there are plenty more quiet ones who are happily getting things done without making much noise
This sounds like an admission that the community is fractured, except with a weirdly judgemental tone towards those who use D without a GC?
> Are you saying that if I'm using D-without-GC, I can use any D library, including ones written with the assumption that there is a GC? If not, how does it not fracture the community?
"Are you saying that if I'm using Rust in the Linux kernel, I can use any Rust library, including ones written with the assumption they will be running in userspace? If not, how does that not fracture the community?"
"Are you saying that if I'm using C++ in an embedded environment without runtime type information and exceptions, I can use any C++ library, including ones written with the assumption they can use RTTI/exceptions? If not, how does that not fracture the community?"
You can make this argument about a lot of languages and particular subsets/restrictions on them that are needed in specific circumstances. If you need to write GC-free code in D you can do it. Yes, it restricts what parts of the library ecosystem you can use, but that's not different from any other langauge that has wide adoption in a wide variety of applications. It turns out that in reality most applications don't need to be GC-free (the massive preponderance of GC languages is indicative of this) and GC makes them much easier and safer to write.
I think most people in the D community are tired of people (especially outsiders) constantly rehashing discussions about GC. It was a much more salient topic before the core language supported no-GC mode, but now that it does it's up to individuals to decide what the cost/benefit analysis is for writing GC vs no-GC code (including the availability of third-party libraries in each mode).
The RTTI vs no-RTTI thing and the exceptions vs no-exceptions thing definitely does fracture the C++ community to some degree, and plenty of people have rightly criticized C++ for it.
> If you need to write GC-free code in D you can do it.
This seems correct, with the emphasis. Plenty of people make it sound like the GC in D is no problem because it's optional, so if you don't want GC you can just write D without a GC. It's a bit like saying that the stdlib in Rust is no problem because you can just use no_std, or that exceptions in C++ is no problem because you can just use -fno-exceptions. All these things are naïve for the same reason; it locks you out of most of the ecosystem.
> This sounds like an admission that the community is fractured, except with a weirdly judgemental tone towards those who use D without a GC?
That's not what I'm saying, and who cares if it's fractured or not? Why should that influence your decision making?
There are people who complain loudly about the GC, and then there are lots of other people who do not complain loudly and also use D in many different interesting ways. Some use the GC, some don't. People get hyper fixated on the GC, but it isn't the only thing going on in the language.
What domain are you working in? It's hard to be able to say anything specific that might help you if you can't explain a little more clearly what you're working on.
I will say there are a larger and larger number of no-GC libraries. Phobos is getting an overhaul which is going to seriously reduce the amount that the GC is used.
It is probably also worth reflecting for a moment why the GC is causing problems for you. If you're doing something where you need some hard-ish realtime guarantee, bear in mind that the GC only collects when you allocate.
It's also possible to roll your own D runtime. I believe people have done things like replace the GC with allocators. The interface to the GC is not unduly complicated. It may be possible to come up with a creative solution to your problem.
I work in many domains, some where a GC is totally no problem, and some where I'd rather not have a GC (such as microcontrollers, games, the occasional kernel code, software running on embedded Linux devices with very limited memory).
I'm happy with C++ and Rust for those tasks. For other tasks where I want a GC, I'm perfectly happy with some combination of Go, Python and Typescript. I'm realistically never going to look into D.
I'm glad you've found a nice stack that you like working with! D has been used for every domain mentioned. Whether or not you look into D in the future is no business of mine.
Once one realizes that the GC is just another way to allocate memory in D, it becomes quite wonderful to have a diverse collection of memory management facilities at hand. They coexist quite smoothly. Why should programs be all GC or no GC? Why should you have to change languages to switch between them?
Indeed the GC is just a library with some helpful language hooks to make the experience nice.
If you understand how it's hooked into, it's very easy to work with. There is only one area of the language related to closure context creation that can be unexpected.
I don't think the proprietary compilers is a true set back, look at for example C# before it became as open as .NET has become today (MIT licensed!) and yet the industry took it. I think what D needed was what made Ruby mainly relevant: Rails. D needs a community framework that makes it a strong candidate for a specific domain.
I honestly think if Walter Bright (or anyone within D) invested in having a serious web framework for D even if its not part of the standard library, it could be worth its weight in gold. Right now there's only Vibe.d that stands out but I have not seen it grow very much since its inception, its very slow moving. Give me a feature rich web framework in D comparable to Django or Rails and all my side projects will shift to D. The real issue is it needs to be batteries included since D does not have dozens of OOTB libraries to fill in gaps with.
Look at Go as an example, built-in HTTP server library, production ready, its not ultra fancy but it does the work.
There are plenty of people who aren't interested in using languages with proprietary toolchains. Those people typically don't use C#. The people who don't mind proprietary toolchains typically write software for an environment where D isn't relevant, such as .NET or the Apple world.
I do agree with you that there needs to be a good framework though. Either in Web or Games. Web because it's more familiar than Go but also has Fibers, and Games because it's an easier C++. There is also Inochi2D which looks rather professional: https://inochi2d.com/
One of the issues I've seen in the community is just that there aren't enough people in the community with enough interest and enough spare time to spend on a large project. Everyone in the core team is focused on working on the actual language (and day-jobs), while everyone else is doing their own sort of thing.
From your profile you seem to have a lot of experience in the field and in software in general, so I'd like to ask you if you have any other advice for getting the language un-stuck, especially with regards to the personnel issues. I think I'd like to take up your proposal for a web framework as well, but I don't really have any knowledge of web programming beyond the basics. Do you have any advice on where to start or what features/use case would be best as well?
Getting a web framework into the standard library is something I want to get working, along with a windowing library.
Currently we need to get a stackless coroutine into the language, actors for windowing event handling, reference counting and a better escape analysis story to make the experience really nice.
This work is not scheduled for PhobosV3 but a subset such as a web client with an event loop may be.
Lately I've been working on some exception handling improvements and start on the escape analysis DFA (but not on the escape analysis itself). So the work is progressing. Stackless coroutine proposal needs editing, but it is intended to be done at the start of next year for approval process.
As someone that had the dis-pleasure to work with Asn.1 data (yes, certificates) I fully symphatise with anguish you've gone through (that 6months of Ansible HR comments cracked me up also :D ).
Bzzt! Wrong! I have worked with ASN.1 for many years, and I love ASN.1. :)
Really, I do.
In particular I like:
- that ASN.1 is generic, not specific to a given encoding rules (compare to XDR, which is both a syntax and a codec specification)
- that ASN.1 lets you get quite formal if you want to in your specifications
For example, RFC 5280 is the base PKIX spec, and if you look at RFCs 5911 and 5912 you'll see the same types (and those of other PKIX-related RFCs) with more formalisms. I use those formalisms in the ASN.1 tooling I maintain to implement a recursive, one-shot codec for certificates in all their glory.
- that ASN.1 has been through the whole evolution of "hey, TLV rules are all you need and you get extensibility for free!!1!" through "oh no, no that's not quite right is it" through "we should add extensibility functionality" and "hmm, tags should not really have to appear in modules, so let's add AUTOMATIC tagging" and "well, let's support lots of encoding rules, like non-TLV binary ones (PER, OER) and XML and JSON!".
Protocol Buffers is still stuck on TLV, all done badly by comparison to BER/DER.
Yeah I know I'm making fun of it a lot (mostly in jest) but it genuinely is a really interesting specification, and it's definitely sad - but not surprising - it's not a very popular choice outside of its few niche areas.
:) Glad to see someone else who's gone down this road as well.
I feel the experience of many people writing with ASN.1 is that of dealing with PKI or telecom protocols, which attempt to build worldwide interop between actually very different systems. The spec is one thing, but implementing it by the book is not sufficient to get something actually interoperable, there are a ton of quirks to work around.
If it was being used in homogenous environments the way protocol buffers typically are, where the schemas are usually more reasonable and both read and write side are owned by the same entity, it might not have gotten such a bad rap...
How do you feel about something like CBOR? In which stage would you say it's stuck in evolution compared to ASN.1 (since you said Protobuf is still TLV)?
CBOR and JSON are just encodings, not schema, though there are schemas for them. I've not looked at their schema languages but I doubt they support typed hole formalisms (though they could be added as it's just schema). And since CBOR and JSON are just encodings, they are stuck being what they are -- new encodings will have compatibility problems. For example, CBOR is mostly just like JSON but with a few new types, but then things like jq have to evolve too or else those new types are not really usable. Whereas ASN.1 has much more freedom to introduce new types and new encoding rules because ASN.1 is schema and just because you introduce a new type doesn't mean that existing code has to accept it since you will evolve _protocols_. But to be fair JSON is incredibly useful sans schema, while ASN.1 is really not useful at all if you want to avoid defining modules (schemas).
I worked with ASN.1 for a few years in the embedded space because its used for communications between aircraft and air traffic control in Europe [1]. I enjoyed it. BER encoding is pretty much the tightest way to represent messages on the wire and when you're charged per-bit for messaging, it all adds up. When a messaging syntax is defined in ASN.1 in an international standard (ICAO 9880 anyone?), its going to be around for a while. Haven't been able to get my current company to adopt ASN.1 to replace our existing homegrown serialization format.
As a former PKI enthusiast (tongue firmly in cheek with that description) I can say if you can limit your exposure to simply issuing certs so you control the data and thus avoid all edge cases, quirks, non-canonical encodings, etc, dealing with ASN.1 is “not too terrible.” But it is bad. The thing that used to regularly amaze me was the insane depths of complexity the designers went to … back in the 70’s! It is astounding to me that they managed to make a system that encapsulated so much complexity and is still in everyday use today.
ASN.1 is from the mid-80s, and PKI is from the late 80s.
The problems with PKI/PKIX all go back to terrible, awful, no good, very bad ideas about naming that people in the OSI/European world had in the 80s -- the whole x.400/x.500 naming style where they expected people to use something like street addresses as digital names. DNS already existed, but it seems almost like those folks didn't get the memo, or didn't like it.
They got grant money to work on anything but TCP/IP. :-) A lot of European oral history about how "the Internet" got to a Uni talks about how they were supposed to only use ISO/OSI but eventually unofficially installed IP anyway.
There's the other story of corporate vendors saying "yes, we will implement OSI, give us X time but buy our product now and we will deliver OSI" then actually going "we mangled BSD Sockets enough to work if you squint enough, let's try to wait the client off while racking the profit"
Unless you need things like ability to address groups in flexible ways, which is why X.400 survives in various places (in addition to actually supporting inline cryptography and binary attachments).
What people forget is that you do not have to use the whole set of schema attributes.
Does Internet email not support binary attachments? Of course it does.
And encrypted and/or signed email? That too, though very poorly, but the issue there is key management, and DAP/LDAP don't help because in the age of spam public directories are not a thing. Right now the best option for cryptographic security for email is hop-bby-hop encryption using DANE for authentication in SMTP, with headers for requesting this, and headers for indicating whether received email transited with cryptographic protection all the way from sender to recipient.
As for the "ability to address groups in flexible ways", I'm not sure what that means, but I've never see group distribution lists not be sufficient.
No we're not. We're using dNSName subjectAlternativeName values. We used to use the CN attribute of the subject DN, and... there is still code for that, but it's obsolete.
We _are_ using subject DNs for linking certs to their issuers, but though that's "free-form", we don't parse them, we only check for equality.
SANs are not free-form. A dNSName SAN is supposed to have an FQDN. An rfc822Name SAN is supposed to carry an email address. And, ok, sure, email addresses' mailbox part is basically free-form, but so what, you don't interpret that part unless you've accepted that certificate for that email address' domain part, and then you interpret the mailbox part the way a mail server would because you're probably the mail server. Yes, you can have directoryName SANs, but the whole point of SANs is that DNs suck because x.400/x.500 naming sucks so we want to use something that isn't that.
There was an amusing chain of comments the last time protobuf was mentionned in which some people were arguing that it had been a terrible idea and ASN.1, as a standard, should have been used.
It was hilarious because clearly none of the people who were in favor had ever used ASN.1.
Cryptonector[1] maintains an ASN.1 implementation[2] and usually has good things to say about the language and its specs. (Kind of surprised not he’s not in the comments here already :) )
Thanks for the shout-out! Yes, I do have nice things to say about ASN.1. It's all the others that mostly suck, with a few exceptions like XDR and DCE/Microsoft RPC's IDL.
Derail accepted! Is your approval of DCE based only on the serialization not being TLV or on something else too? I have to say, while I do think its IDL is tasteful, its only real distinguishing feature is AFAICT the array-passing/returning stuff, and that feels much too specialized to make sense of in anything but C (or largely-isomorphic low-level languages like vernacular varieties of Pascal).
Well, I do disapprove of the RPC-centric nature of both, XDR and DCE RPC, and I disapprove of the emphasis on "pointers" and -in the case of DCE- support for circular data structures and such. The 1980s penchant for "look ma'! I can have local things that are remote and you can't tell because I'm pretending that latency isn't part of the API hahahaha" research really shows in these. But yeah, at least they ain't TLV encodings, and the syntax is alright.
I especially like XDR, though maybe that's because I worked at Sun Microsystems :)
"Pointers" in XDR are really just `OPTIONAL` in ASN.1. Seems so silly to call them pointers. The reason they called them "pointers" is that that's how they represented optionality in the generated structures and code: if the field was present on the wire then the pointer is not null, and if it was absent the then pointer is null. And that's exactly what one does in ASN.1 tooling, though maybe with a host language Optional<> type rather than with pointers and null values. Whereas in hand-coded ASN.1 codecs one does sometimes see special values used as if the member had been `DEFAULT` rather than `OPTIONAL`.
You're likely to find my comments among those saying that. I've been using ASN.1 in some way for a couple of decades, and I've been an ASN.1 implementor for about half a decade.
It's not entirely horrible, parsing DER dynamically enough to handle interpreting most common certificates can be done in some 200-300 lines of C#, so I'd take that any day over XML.
The main problem is that to work with the data you need to understand the semantics of the magic object identifiers and while things like the PKIX module can be found easily, the definitions for other more obscure namespaces for extensions can be harder to locate as it's scattered in documentation from various standardization organizations.
So, protobuf could very well have been transported in DER, the problem issue was probably more one of Google not seeing any value of interoperability and wanting to keep it simple (or worse, clashing by oblivious users re-using the wrong less well documented namespaces).
I suspect that typical interactions with ASN.1 are benign because people are interested in reading and writing a few specific preexisting data structures with whatever encoding is required for interoperability, not in designing new message structures and choosing encodings for them.
For example, when I inherited a public key signature system (mainly retrieving certificates and feeding them to cryptographic primitives and downloading and checking certificate revocation lists) everything troublesome was left by dismissed consultants; there were libraries for dealing with ASN.1 and I only had to educate myself about the different messages and their structure, like with any other standard protocol.
Just wanted to say I enjoyed your post very much. Thank you for writing it. I love D but unfortunately I haven't touched it for several years. I also have some experience writing parsers and implementing protocols.
For those of you who missed this, there was a very interesting thing that happened in the growth of the internet.
At the time people were evolving the protocols through the IETF. So all the things that you rely on now - for the most part - just came into being. One day there was email. There was ftp. There was TCP. There were the Van Jacobson TCP mods.
At this time corporate types paid no attention to the internet. Academic types and the IETF were from what I saw the main developers.
Then one day the corporate world realized they might make money. But the development process of the protocols was incomprehensible (and incompatible) with the corporate culture. TCP was clearly a mess, all these protocols like DNS were a mess. From the corporate perspective.
Whether ASN.1 was a product of that war or just a product of the corporate mentality, it serves as a powerful instance of the what the corporate world looks like vs the academic world looks like. You can find the wreckage from the war littered around. If you see and X.something protocol it could well be one of the relics. There were a few X.things that were adopted and useful, but were others that would haunt your dreams.
Although this is ancient history, and pretty much now told from the corporate perspective, it suggests to us that the corporate process for thinking is not as effective as the alternative - the IETF and Academic.
One is a sort of recipe culture. You write a recipe, everyone follows it and you are happy.
The other is a sort of functional culture. If you can make bread and eat it you are happy. When the bread doesn't taste good you fix it.
Given the kind of bread that is commonly available in the US now, we can draw some conclusions about recipe thinking, recipe culture, corporate culture etc. One could even extend this paradigm of thinking to new things like AI. Or not.
My partner and I were re-watching Father of the Bride the other day (rest in peace, Diane Keaton) and during the early parents meeting the son-in-law to-be describes himself as a communications consultant, working on X.25 networking installations.
I had to pause the movie and explain to my partner just how close the world came to missing out on The Internet, and having instead to suffer the ignominy of visiting sites with addresses like “CN=wikipedia, OU=org, C=US” and god knows what other dreadful protocols underlying them. I think she was surprised how angry and distressed I sounded! It would have been awful!
> how close the world came to missing out on The Internet
Monday-morning-quarterbacking is an unproductive pastime, but I don't think it was very close, on account of the Internet side having developed a bunch of useful (if a bit ramshackle) protocols and applications much faster than the ISO team, because the specs were freely available (not to mention written in a much more understandable manner). I still rue the day the IETF dropped the "distribution of this memo is unlimited" phrase from the RFC preambles. Yeah I understand that it originally had more to do with classification than general availability, but it described the ethos perfectly.
It's not all roses and we're paying for the freewheeling approach to this day in some cases, cf. email spam and BGP hijacking. But it gave results and provided unstoppable momentum.
> It's not all roses and we're paying for the freewheeling approach to this day in some cases, cf. email spam and BGP hijacking. But it gave results and provided unstoppable momentum.
Those are hard problems that ITU/OSI did not exactly have solutions for. Literally any thing that can be a target of spam becomes a target of spam soon enough, and fixing that is hard.
As for BGP, the rpki should fix that, though I'm told if I look I'll be sad (so I'm not looking).
> As for BGP, the rpki should fix that, though I'm told if I look I'll be sad (so I'm not looking).
Afaiu, it’s even worse than you might think: RPKI doesn’t actually secure BGP. It provides only origin validation (i.e. which ASes may use which IP blocks). It critically does not provide path validation (i.e. which ASes may provide transit for which other ASes). Which is kind of a big deal.
I get your point and it is reasonable. We are paying today. However, I believe part of the problem is that when you could make money from email, it froze. The evolution stopped. We could easily evolve email if ...
The "if..." is one of the two VERY BIG INTERNET PROBLEMS. How do you pay for things? We have an answer which pollutes. Ads => enshitification. Like recipes for how to boil and egg that are three pages of ads, and then are wrong. But we now have AI, right?
The other problem is identities on the internet. This is hard. But email? Nope. Login with Apple? Nope. Login with Google? Double, Quadruple Nope.
In the real world we have both privacy AND accountability. And. It is very difficult to maintain two identities in real life.
Privacy on the internet? Nope. Accountability? Only if you are invested in your account. Privacy and Accountability together? Nope. Two identities? You can easily do 100's or more. freg@g*.com, greg33222@g*.com, janesex994@g*.com, dogs4humanity@g*.com etc.
That's roughly the same thing as saying there would not have been an Internet if the ITU/OSI crowd won. I think that's quite right. The yahoos at Berkeley and a bunch of other U.S. universities won, and I can just imagine how the Europeans who had high falutin ideas about how the Internet should be felt, and it's delicious.
Something tells me the corporations will get the last laugh, once web browsers stop showing you "the web" and only show you LLM hallucinations that superficially seem like the web.
I'm confused. much of your story is correct, but you replace the primary actors (the ITU and ISO) with 'corporate'. This is true is inasmuch as the ITU represented telephony culture, but isn't really representative of corporatism as a whole.
there is _another_ 'protocol war', but it was certainly a cold one. Internet companies starting in the late 90's just decided they weren't going to care any more about standardization efforts. They could take existing protocols and warp their intent. they could abandon the goal of universal reachability in order to make a product more consumable by the general public and add 'features'. basically whatever would stick. the poster child for this division was the development of IPv6 and the multicast protocols. The IETF just assumed that like the last 20 years, they would hash out the solutions and the network would deploy them. Except the rules had changed out from under them, the internet wasn't being run by government and academic agencies anymore, and the new crew just couldn't be bothered.
two wars. the IETF won the first through rough consensus and running code, but lost the second for nearly the same reason.
Protocol Wars are also a story of early enshittification of Internet, where attempts to push forward with solutions to already known problems were pushed back because they would require investment on vendor side instead of just carrying on using software majorly delivered free of charge because DoD needed a quick replacement for their PDP-10 fleet. (Only slight hyperbole)
A lot of issues also came from ISO standards refusing to get stuck without known anticipated issues taken care of, or with unextendable lockin due to accidental temporary solution ending up long term one, while IETF protocols happily ran forward "because we will fix it later" only to find out that I stalled base ossified things - one of the lessons is to add randomness to new protocols so that naive implementation will fail on day one.
Then there were accidental things, like a major ASN.1 implementation for C in 1990 being apparently shit (a tradition picked up in even worse way by OpenSSL and close to most people playing with X.509 IMO), or even complaints about ASN.1 encodings being slow due to CPU lacking barrel shifters (I figure it must refer to PER somehow)
There's a Turkish saying, "a human will [use] this, a human!", to signify that the thing is so abnormal/out-of-proportion that it doesn't seem to be made for people. The verb changes based on the context. If you had made too much food for example, the verb would be "eat". I think it's a great motto for design.
Remember the Game of Thrones quote, "the man who passes the sentence should swing the sword"? I think it should also be applied to specs. Anyone who comes up with a spec must be the first responsible party to develop a parser for it. The spec doesn't get ratified unless it comes with working parser code with unit tests.
That kind of requirement might actually improve specs.
I really love D, it's one of my favorite languages. I've started implementing a vim-like text editor in it from scratch (using only Raylib as a dependency) and was surprised how far I was able to get and how good my test coverage was for it. My personal favorite features of D:
* unit tests anywhere, so I usually write my methods/functions with unit tests following them immediately
* blocks like version(unittest) {} makes it easy to exclude/include things that should only be compiled for testing
* enums, unions, asserts, contract programming are all great
I would say I didn't have to learn D much. Whatever I wanted to do with it, I would find in its docs or asked ChatGPT and there would always be a very nice way to do things.
Isn't D supported by the GNU compiler collection? I personally would prefer this type of tooling over what Rust and Go do (I can't even get their compilers to run on my old platform anymore; not to mention all this dependencies on remote resources typical Rust/Go projects seem to have: which seems to be enforced by the ecosystem?)
It is supported. However on Windows GDC does not work. LDC based on LLVM needs Visual Studio but I maybe wrong since there are clang/llvm distributions based on mingw64. Other than that DMD works fine, a bit slower than the other compilers, but a charm to use.
Yeah, the foundations of the language are incredible. It's just everything else around it that brings it down (and is unfortunately very hard to motivate people to solve).
D definitely missed a critical period, but I love it all the same.
So, I also write Go and I don't get the part about tooling. I don't need formatters or linters as I'm adult enough to know how to format my code (in fact I dislike tools doing it for me). D also has dub, which is fine, as far as package managers go. The ecosystem is the only issue and Go does arguably have a lot of very cool libraries for virtually anything, but outside of webdev, I can't see myself using them. This is why D works a lot better for projects where I don't need all those dependencies and would do better without them.
I freely admit to not being a Go or Rust expert, but from what I can tell using C from D is even easier than in either of these languages. The C++ interop is also decently usable.
If it makes you feel better I have some projects that I'm working on in order to improve the tooling. Would you mind listing out all of the things you think are missing, so I can work on those once I get the other ones done?
I like a lot about D. My main criticism of the language is that it quickly becomes too noisy as more of its nice features are used. I think this could be fixed, for example with better attribute defaults in function signatures, and I think Walter is aware of this.
I could tolerate the noisy language bits, but:
The standard library (Phobos) was so riddled with paper cuts that every day I used it felt like trying to navigate the surface of a coral reef... barefoot... in a hurricane... while blindfolded. It drove me off after a few months. (That was last year.)
A Phobos V3 design has begun, but given how few people they have to work on it, I am skeptical of it ever developing into a library that I would want to use. Here's hoping for a pleasant surprise. :)
Heavy reliance on the range interface, without automatic support for common range-like types like arrays, so calling code must be cluttered with wrappers.
Insistence on returning things exclusively as ranges, even when a single item is wanted, so calling code must be cluttered with dereferencing. (e.g. std.algorithm.searching.find)
The doubly-linked list implementation had no obvious way to remove a node by reference; removal of an inner item seemed to require traversing the list.
Inconsiderately designed class hierarchies. (e.g. InternetAddress & Internet6Address shared a base class that lacked a port field.)
Inconsiderately designed interfaces. (e.g. CRC32 could deliver a result only as a byte array, which fits the generic std.digest interface, but not as an integer, which is how 32-bit CRCs are often used in the real world.)
Unfortunate naming choices scattered about, such as a function name implying behavior subtly different from what the function does and a user probably expects (leading to surprising bugs) or named after a specific computer science algorithm instead of its purpose (making the needed operation hard to discover).
Various types and functions defined without the attributes required by other parts of the stdlib, or required in order to use/override them in safety-minded code. (e.g. nothrow, @safe, scope)
Time module lied about system clock resolution despite having internally retrieved the correct value from the OS.
Generally useful functionality kept private within the library, rather than made public, forcing users to either re-implement it themselves or lean on arcane trickery to get at it. (e.g. string escaping.)
Logger output customization was needlessly difficult.
No viable async/coroutine support. Vibe.d is the closest substitute, but is a third-party package, integrates poorly with Phobos, isn't documented very well, and is buggy.
...
Mind, these are just highlights of things I discovered while using D for one small project (a network protocol library and tool) so this is by no means a comprehensive account of problems in Phobos. All these problems could be worked around, but I ended up spending far too much time investigating them and developing workarounds instead of being productive.
I acknowledge that I was new to D, so some of my complaints might have had solutions. (However, I would point out that a solution that isn't easy to find might as well not exist.) Regardless, the pattern was clear: Phobos was a PITA to use. Too many paper cuts.
> a standard library you like?
I stuck with C, C++, and Python for more than a few years. Their standard libraries all have significant problems, but none frustrated me so much as to drive me away from the language.
The following statement originates from Adam Wilson who is heading the PhobosV3 project, and I will second it.
The issues you have are examples of known problems that PhobosV2 has suffered under. Coroutines, for instance we would like to have in language, a proposal (me, not Adam) have already made one, but it's sitting till next year. Naming and interfaces issues are a big part of what will change in PhobosV3.
Thank you for this post. I can tell you that we do appreciate comments like this, and the deficiencies you cite are many of the reasons we are rewriting phobos and druntime.
The time module "lied" seems like a straight up bug. Can you give any more details on it?
The schema format is complicated, and the data format also is, but most of it is ignorable complexity, often making it simpler than it is.
I do not use the ASN.1 schema notation, since I have been able to use ASN.1 without it. (I wrote my own implementation of the data format (DER) in C.)
> On the one hand the sheer amount of possible encodings is daunting, but on the other hand it shows a certain flexbility that ASN.1 provides - you could even invent your own domain-specific encoding if needed.
It is true. My opinion is, of the standard encodings, DER is the only one that is any good (and is the one that I usually use (although most of my uses are not having to do with cryptography); X.509 (which is a common use of ASN.1) also uses DER), but I did make up my own encodings as well: DSER (between BER and DER), SDSER (like DSER but the length is encoded differently so that streaming is possible, in a (in my opinion) better way than CER does), and TER (a text format which is intended to be converted to DER).
> ANY DEFINED BY
I still effectively use it (as well as ANY), since I think it is very useful. (As I mentioned, I did not actually use the ASN.1 schema format, but I sometimes use what effectively works like ANY and ANY DEFINED BY.)
(I also sometimes use a nonstandard feature that I made up, which is called OBJECT IDENTIFIER RELATIVE TO, which is like OBJECT IDENTIFIER but it can be encoded more efficiently (using the RELATIVE OID type) if the value has the prefix that the schema expects it to have.)
Very neat article. I too have spent countless hours (but not as many) hacking on an ASN.1 compiler, adding a subset of X.681, X.682, X.683 functionality to make it possible to -in a single codec invocation!- a whole certificate, with all its details like extensions and OtherName SANs and what not decoded recursively. So it's very nice to see a post about similar work!
ASN.1 really is quite awesome. It gets a lot of hate, but it's not deserved. ASN.1 is not just a syntax or set of encoding rules -- it's a type system, and a very powerful one at that.
I worked on a Swift ASN.1 compiler [1] a while back (not swift-asn1, mine used Codable). I saved myself some time by using the Heimdal JSON compiler, which can transform ASN.1 into a much more parseable JSON AST.
> ASN.1 is a... some would say baroque, perhaps obsolete, archaic even, "syntax" for expressing data type schemas, and also a set of "encoding rules" (ERs) that specify many ways to encode values of those types for interchange.
for me expressing disdain for ASN.1. On the contrary: I'm saying those who would say that are wrong:
> ASN.1 is a wheel that everyone loves to reinvent, and often badly. It's worth knowing a bit about it before reinventing this wheel badly yet again.
Hey, I love how the author describes ASN.1 as a "syntax" in quotes.
What I disagree is on the disdain being veiled. Seems very explicit to me.
Anyway, yeah, I hadn't heard about it before either, and it's great to know that somebody out there did solve that horrible problem already, and that we can use the library.
I'm fascinated by ASN.1, I don't know why it appeals to me so much but I find working with it oddly fun. I've always wanted to find the time to write an ASN.1 compiler for Rust, because for some reason all of the Rust implementations I've seen end up just either being derive macros on Rust structs (so going the other direction), or even just providing a bunch of ASN.1 types and functions and expecting you to manually chain them together using nom.
Normally, you could say when implementing some standard that you get 80% of the functionality with 20% of the planned time. But with ASN.1 the remaining 20% could take the rest of your life.
As someone who extended the ASN.1 parser in the Netscape code base to handle PKCS#12 and at one point knew more about RSA standards and varios ASN.1 definitions than anyone should, I respect the blog owners diligence and masochism.
It would not be the project I would pick out to learn D.
I salute your for deep dive into this. History would have it that ASN.1 was already there as both an IDL and serialization format when HTTPS certs were defined. If it were today, would it be the same or would we end up with protobuf or thrift or similar?
> If it were today, would it be the same or would we end up with protobuf or thrift or similar?
The main advantage of ASN.1 (specifically DER) in an HTTPS/PKI context is that it's a canonical encoding. To my understanding Protobuf isn't; I don't know about Thrift.
(A lot of hay is made about ASN.1 being bad, but it's really BER and other non-DER encodings of ASN.1 that make things painful. If you only read and write DER and limit yourself to the set of rules that occur in e.g. the Internet PKI RFCs, it's a relatively tractable and normal looking serialization format.)
> The main advantage of ASN.1 (specifically DER) in an HTTPS/PKI context is that it's a canonical encoding. To my understanding Protobuf isn't; I don't know about Thrift.
There should be no need for a canonical encoding. 40 years ago people thought you needed that so you could re-encode a TBSCertificate and then validate a signature, but in reality you should keep the encoding as-received of that part of the Certificate. And so on.
I'm hardly a connoisseur of DER implementations, but my understanding is that there are two main problems with DER. The first is that the format isn't really parseable without using a schema, unlike (say) XML or JSON. This means your generic DER parser needs to have an ASN.1 schema passed into it to parse the DER, and this leads to the second problem, which is that this ends up being complex enough that basically every attempt to do so is full of memory safety issues.
> The first is that the format isn't really parseable without using a schema, unlike (say) XML or JSON.
You can parse DER perfectly well without a schema, it's a self-describing format. ASN.1 definitions give you shape enforcement, but any valid DER stream can be turned into an internal representation even if you don't know the intended structure ahead of time.
rust-asn1[1] is a nice demonstration of this: you can deserialize into a structure if you know your structure AOT, or you can deserialize into the equivalent of a "value" wrapper that enumerates/enforces all valid encodings.
> which is that this ends up being complex enough that basically every attempt to do so is full of memory safety issues.
Sort of -- DER gets a bad rap for two reasons:
1. OpenSSL had (has?) an exceptionally bad and permissive implementation of a DER parser/serializer.
2. Because of OpenSSL's dominance, a lot of "DER" in the wild was really a mixture of DER and BER. This has caused an absolutely obscene amount of pain in PKI standards, which is why just about every modern PKI standard that uses ASN.1 bends over backwards to emphasize that all encodings must be DER and not BER.
(2) in particular is pernicious: the public Web PKI has successfully extirpated BER, but it still skulks around in private PKIs and more neglected corners of the Internet (like RFC 3161 TSAs) because of a long tail of OpenSSL (and other misbehaving implementation) usage.
Overall, DER itself is a mostly normal looking TLV encoding; it's not meaningfully more complicated than Protobuf or any other serialization form. The problem is that it gets mashed together with BER, and it has a legacy of buggy implementations. The latter is IMO more of a byproduct of ASN.1's era -- if Protobuf were invented in 1984, I imagine we'd see the same long tail of buggy parsers regardless of the quality of the design itself.
Even if where is no use of IMPLICIT you still have the problem that it's just a bunch of primitive values and composites of them, but you don't know what anything means w/o reference to the defining module. And then there's all the OCTET STRING wrappers of things that are still DER-encoded -- there are lots of these in PKIX, even just in Certificate you'll find:
- the parameters in AlgorithmIdentifier
- the attribute values in certificate names
- all the extensions
- otherName choices of SubjectAlternativeName
- certification policies
- ...
Look at RFCs 5911 and 5912 and look for all the places where `CLASS` is used, and that's roughly how many "typed holes" there are in PKIX.
Sure, but that's the same thing as you see with "we've shoved a base64'd JSON object in your JSON object." Value opacity is an API concern, not evidence that DER can't be decoded without a schema.
The wikipedia page on serialization formats[0] calls ASN.1 'information object system' style formalisms (which RFCs 5911 and 5912 make use of, and which Heimdal's ASN.1 makes productive use of) "references", which I think is a weird name.
You can parse DER, but you have no idea what you've just parsed without the schema. In a software library, that's often not very useful, but at least you can verify that the message was loaded correctly, and if you're reverse engineering a proprietary protocol you can at least figure out the parts you need without having to understand the entire thing.
Yes, it's like JSON in that regard. But the key part is that the framing of DER doesn't require a schema; that isn't true for all encoding formats (notably protobuf, where types have overlapping encodings that need to be disambiguated through the schema).
It's really not an advantage that DER can be "parsed" without a schema. (As compared to: XDR, PER, OER, DCE RPC, etc., which really can't be.) It's only possible because of the use of tag-length value encoding, which is really wasteful and complicates life (by making it harder or impossible to do online encoding, since you have to compute the length before you can place the value because the length itself is variable length so you have to reserve the correct number of bytes for it and shoot-me-now).
I don't have a strong opinion about whether it's an advantage or not, that was more just about the claim that it can't be parsed without a schema.
(I don't think variable-length-lengths are that big of a deal in practice. That hasn't been a significant hurdle whenever I've needed to parse DER streams.)
Variable length lengths are not a big deal, but they prevent online encoding. The way you deal with that anyways is that you make your system use small messages and then stream those.
I'd argue that JSON is still easier as it allows you to reason about the structure and build up a (partial) schema at least. You have the keys of the objects you're trying to parse. Something like {"username":"abc","password":"def",userId:1,admin:false} would end up something like Utf8String(3){"abc"}+Utf8String(3){"def"}+Integer(1){1}+Integer(1){0} if encoded in DER style.
This has the fun side effect that DER essentially allows you to process data ("give me the 4th integer and the 2nd string of every third optional item within the fifth list") without knowing what you're interpreting.
> You can parse DER perfectly well without a schema, it's a self-describing format. ASN.1 definitions give you shape enforcement, but any valid DER stream can be turned into an internal representation even if you don't know the intended structure ahead of time.
> rust-asn1[1] is a nice demonstration of this: you can deserialize into a structure if you know your structure AOT, or you can deserialize into the equivalent of a "value" wrapper that enumerates/enforces all valid encodings.
Almost. The "tag" of the data doesn't actually tell you the type of the data by itself (most of the time at least), so while you can say "there is something of length 10 here", you can't say if it's an integer or a string or an array.
> The "tag" of the data doesn't actually tell you the type of the data by itself (most of the time at least), so while you can say "there is something of length 10 here", you can't say if it's an integer or a string or an array.
Could you explain what you mean? The tag does indeed encode this: for an integer you'd see `INTEGER`, for a string you're see `UTF8String` or similar, for an array you'd see `SEQUENCE OF`, etc.
You can verify this for yourself by using a schemaless decoder like Google's der-ascii[1]. For example, here's a decoded certificate[2] -- you get fields and types, you just don't get the semantics (e.g. "this number is a public key") associated with them because there's no schema.
It's been a long time since I last stared at DER, but my recollection was for the ASN.1 schema I was decoding, basically all of the tags ended up not using the universal tag information, so you just had to know what the type was supposed to be. The fact that everything was implicit was why I qualified it with "most of the time"; it was that way in my experience.
Oh, that makes sense. Yeah, I mostly work with DER in contexts that use universal tagging. From what I can tell, IMPLICIT tagging is used somewhat sparingly (but it is used) in the PKI RFCs.
So yeah, in that instance you do need a schema to make progress beyond "an object of some size is here in the stream."
IMPLICIT tagging is used in PKIX (and other protocols) whenever a context or application tag is needed to disambiguate due to either a) OPTIONAL members, b) members that were inserted as if the SEQUENCEs/SETs were extensible, or c) CHOICEs. The reason for IMPLICIT tagging instead of EXPLICIT is simply to optimize on space: if you use EXPLICIT you add a constructed tag-length in front of the value that already has a tag and length, but if you use IMPLICIT then you merely _replace_ the tag of the value, thus with IMPLICIT you save the bytes for one tag and one length.
Kerberos uses EXPLICIT tagging, and it uses context tags for every SEQUENCE member, so these extra tags and lengths add up, but yeah, dumpasn1 on a Kerberos PDU (if you have the plaintext of it) is more usable than on a PKIX value.
> The "tag" of the data doesn't actually tell you the type of the data by itself (most of the time at least)
In my experience it does tell you the type, but it depends on the schema. If implicit types are used, then it won't tell you the type of the data, but if you use explicit, or if it is neither implicit nor explicit, then it does tell you the type of the data. (However, if the data type is a sequence, then you might not lose much by using an implicit type; the DER format still tells you that it is constructed rather than primitive.)
DER is TLV. You don't know the specifics ("this integer is a value between 10 and 53") that the schema contains, but you know it's an integer when you read it.
PER lacks type information, making encoding much more efficient as long as both sides of the connection have access to the schema.
One of my big problems with ASN.1 (and its encodings) is how _crusty_ it is.
You need to populate a string? First look up whether it's a UTF8String, NumericString, PrintableString, TeletexString, VideotexString, IA5String, GraphicString, VisibleString, GeneralString, UniversalString, CHARACTER STRING, or BMPString. I'll note that three of those types have "Universal" / "General" in their name, and several more imply it.
How about a timestamp? Well, do you mean a TIME, UTCTime, GeneralizedTime, or DATE-TIME? Don't be fooled, all those types describe both a date _and_ time, if you just want a time then that's TIME-OF-DAY.
It's understandable how a standard with teletex roots got to this point but doesn't lead to good implementations when there is that much surface area to cover.
I think that it is useful to have different types for different purposes.
> You need to populate a string? First look up whether it's a UTF8String, NumericString, PrintableString, TeletexString, VideotexString, IA5String, GraphicString, VisibleString, GeneralString, UniversalString, CHARACTER STRING, or BMPString.
They could be grouped into three groups: ASCII-based (IA5String, VisibleString, PrintableString, NumericString), Unicode-based (UTF8String, BMPString, UniversalString), and ISO-2022-based (TeletexString, VideotexString, GraphicString, GeneralString). (CHARACTER STRING allows arbitrary character sets and encodings, and does not fit into any of these groups. You are unlikely to need it, but it is there in case you do need it.)
IA5String is the most general ASCII-based type, and GeneralString is the most general ISO-2022-based type. For decoding, you can treat the other ASCII-based types as IA5String if you do not need to validate them, and you can treat GraphicString like GeneralString (for TeletexString and VideotexString, the initial state is different, so you will have to consider that). For the Unicode-based types, BMPString is UTF-16BE (although normally only BMP characters are allowed) and UniversalString is UTF-32BE.
When making your own formats, you might just use the most general ones and specify your own constraints, although you might prefer to use the more restrictive types if they are known to be suitable; I usually do (for example, PrintableString is suitable for domain names (as well as ICAO airport codes, etc) and VisibleString is suitable for URLs (as well as many other things)).
> How about a timestamp? Well, do you mean a TIME, UTCTime, GeneralizedTime, or DATE-TIME?
UTCTime probably should not be used for newer formats, since it is not Y2K compliant (although it may be necessary when dealing with older formats that use it, such as X.509); GeneralizedTime is better.
In all of these cases, you only need to implement the types you are using in your program, not necessarily all of them.
(If needed, you can also use the "ASN.1X" that I made up which adds some additional nonstandard types, such as: BCD string, TRON string, key/value list, etc. Again, you will only need to implement the types that you are actually using in your program, which is probably not all of them.)
> The first is that the format isn't really parseable without using a schema, unlike (say) XML or JSON.
You can parse DER without using a schema (except for implicit types, although even then you can always parse the framing even if the value cannot necessarily be parsed; for this reason I only use implicit types for sequences and octet strings (and only if an implicit type is needed, which it often isn't), in my own formats). (The meaning of many fields will not be known without the schema, but that is also true of XML and JSON.)
I wrote a DER parser without handling the schema at all.
> I'm hardly a connoisseur of DER implementations, but my understanding is that there are two main problems with DER. The first is that the format isn't really parseable without using a schema, unlike (say) XML or JSON.
That's not really the problem. The problem is that DER is a tag-length-value encoding, which is quite redundant and inefficient and a total crutch that people who didn't see XDR first could not imagine not needing, but yeah, they really didn't need it. That crutch made it harder, not easier, to implement ASN.1/DER.
XML is no picnic either, by the way. JSON is much much simpler, and it's true you don't need a schema, but you end up wanting one anyways.
I wrote an Asn.1 decoder and since it contains type/size info you can often read a subset and handle the rest as opaque data objects if you need round-tripping, this is required as there can be plenty of data that is unknown to older consumers (like the ETSI EIDAS/Pades personal information extensions in PDF signatures).
However, to have a sane interface for actually working with the data you do need a schema that can be compiled to a language specific notation.
It is not only that ASN.1 was there before SSL, but even the certificate format was there before SSL. The certificate format comes from X.500, which is the "DAP" part of "LDAP", L as in "Lightweight" in "LDAP" refers mostly to LDAP not using public key certificates for client authentication in contrast to X.500 [1]. Bunch of other related stuff comes from RSA's PKCS series specifications, which also mostly use ASN.1.
1] the somewhat ironic part is that when it was discovered that using just passwords for authentication is not enough, the so called "lighweight" LDAP got arguably more complex that X.500. Same thing happened to SNMP (another IETF protocol using ASN.1) being "Simple" for similar reasons.
If it were designed today, I would imagine it could end up looking like JWT (JOSE) and use JSON. I've seen several key exchange formats in JSON beyond JWT/JOSE in the wild today as well, so we may even get there eventually in a future upgrade of TLS.
Yes and no, the JSON handling of things like binary data (hashes) and big-ints leaves a bit to be desired (sure we can use base64 encoding). Asn.1 isn't great by any extent but for this JSON really isn't much better apart from better library support.
Yes, JOSE is still infinitely better than XmlSignatures and the canonical XML madness to allow signatures _inside_ the document to be signed.
- huge braking change with the whole cert infrastructure
- this question was asked to the people who did choose ASN.1 for X509 and AFIK they saied today they would use protobuf. But I don't remember where I have that from.
- JOSE/JWT etc. aren't exactly that well regarded in the crypto community AFIK or designed with modern insights about how to best do such things (too much header malleability, too much crypto flexibility, too little deterministic encoding of JSON, too much imprecise defined corner cases related to JSON, too much encoding overhead for keys and similar (which for some pq stuff can get in the 100KiB ranges), and the argument of it being readable with a text editor falls apart if anything you care about is binary (keys, etc.) and often encrypted (producing binary)). (And IMHO opinion the plain text argument also falls apart for most non-crypto stuff I mean if you anyway add a base64 encoding you already dev need tooling to read it, and weather your debug tooling does a base64 decode or a (maybe additional) data decode step isn't really relevant, same for viewing in IDE which can handle binary formats just fine etc. but thats an off topic discussion)
- if we look at some modern protocols designed by security specialists/cryptographers and have been standardized we often find other stuff (e.g. protobuf for some JWT alternatives or CBOR for HSK/AuthN related stuff).
> JOSE/JWT etc. aren't exactly that well regarded in the crypto community
That is true, but it's also true that JWT/JOSE is a market winner and "everywhere" today. Obviously, it's not a great one and not without flaws, and its "competition" is things like SAML which even more people hate, so it had a low bar to clear when it was first introduced.
> CBOR
CBOR is a good mention. I have met at least one person hoping a switch to CWT/COSE happens to help somewhat combat JWT bloat in the wild. With WebAuthN requiring CBOR, there's more of a chance to get an official browser CBOR API in JS. If browsers had an out-of-the-box CBOR.parse() and CBOR.stringify(), that would be interesting for a bunch of reasons (including maybe even making CWT more likely).
One of the fun things about CBOR though is that is shares the JSON data model and is intended to be a sibling encoding, so I'd also maybe argue that if CBOR ultimately wins that's still somewhat indirectly a "JSON win".
The IETF has made a bunch of standards lately like COSE for doing certificates and encryption stuff with CBOR. It’s largely for embedded stuff, but I could see it being a modern alternative. I haven’t used it myself yet.
CBOR is self-describing like JSON/XML meaning you don’t need a schema to parse it. It has better set of specific types for integers and binary data unlike JSON. It has an IANA database of tags and a canonical serialization form unlike MsgPack.
No, we would use something similar to S-Expressions [1]. Parsing and generation would be at most a few hundred lines of code in almost any language, easily testable, and relatively extensible.
With the top level encoding solved, we could then go back to arguing about all the specific lower level encodings such as compressed vs uncompressed curve points, etc.
- ASN.1 is a set of a docent different binary encodings
- ASN.1's schema languages is IMHO way better designed then Protobuf but also more complex as it has more features
- ASN.1 can encode much more different data layouts (e.g. things where in Protobuf you have to use "tricks") each being layout in the output differently depending on the specific encoding format, annotations on the schema and options during serialization
- ASN.1 has many ways to represent things more "compact" which all come with their own complexity (like bit mask encoded boolean maps)
overall the problem of ASN.1 is that it's absurdly over engineered leading to you needing to now many hundred of pages of across multiple standard documents to just implement one single encoding of the docent existing ones and even then you might run
into ambiguous unclear definitions where you have to ask on the internet for clarification
if we ignore the schema languages for a moment most senior devs probably can write a crappy protobuf implementation over the weekend, but for ASN.1 you might not even be able to digest all relevant standards in that time :/
Realistically if ASN.1 weren't as badly overengineered and had shipped only with some of the more modern of it's encoding formats we probably would all be using ASN.1 for man things including maybe your web server responses and this probably would cut non image/video network bandwidth by 1/3 or more. But then the network is overloaded by image/video transmissions and similar not other stuff so I guess who cares???!???
> Realistically if ASN.1 weren't as badly overengineered and had shipped only with some of the more modern of it's encoding formats we probably would all be using ASN.1 for man things including maybe your web server responses and this probably would cut non image/video network bandwidth by 1/3 or more. But then the network is overloaded by image/video transmissions and similar not other stuff so I guess who cares???!???
ASN.1 was not over-engineered in 1990. The things that kept it from ruling the world are:
- the ITU-T specs for it were _not_ free back then
- the syntax is context dependent, so using a LALR(1) parser generator to parse ASN.1 is difficult, though not really any more than it is to parse C with a LALR(1) parser generator, but yeah if it had had a LALR(1)-friendly syntax then ASN.1 would have been much easier to write tooling for
- competition from XDR, DCE/MS RPC, XML, JSON, Protocol Buffers, Flat Buffers, etc.
The over-engineering came later, as many lessons were learned from ASN.1's early years. Lessons that the rest of the pack mostly have not learned.
ASN.1 doesn't have a schema language. It has a schema spec, how you encode the schema is up to you. This is a huge boon.
ASN.1 has many encoding standards, but you don't need to implement them all, only the specific one for your needs.
ASN.1 has a standard and an easy to follow spec, which Protobuf doesn't.
In sum: I could cobble together a working ASN.1 implementation over a weekend.
In contrast, getting to a clean-room working Protobuf library is a month's work.
Caveat: I have not had to deal with PKI stuff. My experience with ASN.1 is from LDAP, one of the easiest protocols to implement ever, IMO.
my experience is from helping a coworker writing a ASN.1 serialization/deserialization library limited to a subset of encodings but (close to) the full spec each encoding ;)
> to follow spec, which Protobuf doesn't.
I can't say so personally, but from what I heard from the coworker I helped the spec isn't always easy to follow as there are many edge cases where you can "guess" what they probably mean but they aren't precise enough. Through they had a surprising amount of success to get clarification from authoritative sources (I think some author or maintainer of the standard but I'm not fully sure, it was a few years ago).
In general there seem to be a huge gap between writing something which works for some specific usage(s) of ASN.1 and something which works "in general/all the standard" (for the relevant encodings (mainly DER, but also at least part of non-DER BER as far as I remember)).
> Protobuf doesn't.
yes but it's wire format is relatively simple and documented (not as a specification but documented anyway) so getting something going which can (de-)serialize the wire format really isn't that hard and the mapping from that to actual data types is also simpler (through also won't work for arbitrary types due to dump design limitations). I would be surprised if you need a month of work to get something going there. Through if you want to reproduce all the tooling eco-system or special ways some libraries can interact with it (e.g. in place edits etc.) it's a different project altogether. What I mean is just a (de-)serializer for the wire format with appropriate mapping to data types (objects,structs, whatever the language you use prefers).
Protobuf is pretty much ASN.1 with better tooling, optimized for message exchange protocol rather than files, when it comes down to the details. Withouth ASN.1 and the lessons learned from it, another binary serialization protocol would've probably taken its place, and I bet Protobuf and similar tools would look and perhaps work quite differently. The same way JSON would look and act quite differently if XML had never been invented.
they share some ideas, that doesn't make it "pretty much ASN.1". Its only "pretty much the same" if you argue all schema based general purpose binary encoding formats are "pretty much the same".
ASN.1 also isn't "file" specific at all it's main use case is and always has been being used as message exchange protocols.
(Strictly speaking ASN.1 is also not a single binary serialization format but 1. one schema language, 2. some rules for mapping things to some intermediate concepts, 3. a _docent_ different ways how to "exactly" serialize things. And in the 3rd point the difference can be pretty huge, from having something you can partially read even without schema (like protobuff) to more compact representations you can't read without a schema at all.)
> if you argue all schema based general purpose binary encoding formats are "pretty much the same"
At the implementation level they are different, but when integrating these protocols into applications, yeah, pretty much. Schema + data goes in, encoded data comes out, or the other way around. In the same way YAML and XML are pretty much the same, just different expressions of the same concepts. ASN.1 even comes with multiple expressions of exactly the same grammar, both in text form and binary form.
ASN.1 was one of the early standardised protocols in this space, though, and suffers from being used mostlyin obscure or legacy protocols, often with proprietary libraries if you go beyond the PKI side of things.
ASN.1 isn't file specific, it was designed for use in telecoms after all, but encodings like DER work better inside of file formats than Protobuf and many protocols like it. Actually having a formal standard makes including it in file types a lot easier.
PB is a lot more invasive at the build system layer, and in the libraries you have to link with. But that's not an essential aspect of PB, more like accidental, thus you're quite right :)
I have also had to work with this in many contexts... Deeply embedded systems with no parsers available and where no "proper" ones would fit. So i have hand written but basic parsing and generation a few times.
Oh and there is also non compliant implementations. E.g. some passports (yes the passports with chip use tons of ASN.1) even have incorrect including of big integers (supposed to be the minimum two complement, as I recall some passports used a fixed non-complement format yanked into the 0x02 INTEGER type... Some libraries have special non-compliant parsing modes to deal with it).
According to ASN.1 Wikipedia entry, most of the tools supporting ASN.1 do the following:
1) parse the ASN.1 files,
2) generates the equivalent declaration in a programming language (like C or C++),
3) generates the encoding and decoding functions based on the previous declarations
All of these of exercise are apparently part of data engineering process or lifecycle [1].
Back in early 21st century Python is just another interpreted general purpose programming language alternative, not for web (PHP), not for command tool (TCL), not for system (C/C++), not for data wrangling (Perl), not for numerical (Matlab/Fortran), not for statistics (R).
D will probably follow similar trajectory of Python, but it really needs a special kind of killer application that will bring it to the fore.
I'm envisioning that real-time data streaming, processing and engineering can be D killer utility and defining moment that D is for data.
Ack. I wrote an ASN.1 compiler in Java in the 90s. Mostly just to make sure I understood how it and BER/DER were used in X.509. I think the BER interpretation bits are still being used somewhere
I'm sorry you had to waste a year of your life.
There are few things I dislike more in the computing world than ASN.1/BER. It seems to encourage over-specification and is suboptimal for loosely coupled systems.
Looking at the comments, it seem that many people hate ASN.1 and many people like ASN.1 (I am the latter). (I consider BER to be messy and overly complicated, but DER is much better.) (I wrote a implementation of DER (decoding and encoding), although not the schema format (which I have never needed).)
(i worked with asn1c (not sure which fork) and had to hack in custom allocator and 64bit support. i shiver every time something needs attention in there)
I was using asn1c with a Rust project since there was no Rust asn1 compiler at the time. It became a bottleneck, and in profiling I found that the string copying helper used everywhere was doing bit-level copying even in our byte-aligned mode, which was extra weird cause that function had a param for byte alignment.
Wow, I needed to parse just one small ASN.1 with one value (one signature), but I didn't know ASN.1 can have a specification (to generate parser from it). So I ended up untangling it myself, just for that specific 256 bits.
Still I think it's better to have overapecified format for security stuff, json and xml are just too vague and parsers are unpredictable.
Every time I have ever had the displeasure of looking at an X.whatever spec, I always end up coming away with the same conclusion.
Somehow, despite these specifications being 90% metadata by weight, they seem to consistently forget the part of the spec that lets you actually know what something is. and that part is just left up to context.
I could well be missing something, but a majority of the time it feels to me like they set out to make a database schema, and accidentally wrote the sqlite file format spec instead.
Like thanks, its nice that I can parse this into a data structure :). It would be nicer, however if doing so gave me any idea of what I can do with the data I've parsed.
Though to be fair I feel the same way about the entire concept of XML schemas. The fact that you theoretically can validate an xml document against a schema is honestly completely useless. If I am parsing XML, its because my code already knows what information it needs from the XML document, and really it should also know where that information is. I don't need a separate schema definition to tell me that! its already expressed!! In the part where I am parsing out the data I need!!!
The quality does definitely seem to vary depending on the spec.
As someone who's not very well-learned I found x.690 to be my personal gold standard for how a specification should be written - it left me with very few questions, and after a small period of getting used to its general use of language, didn't struggle too much with it.
x.680 (the 2003 revision more so than the 2021 one) on the other hand has definitely given me a few struggles with general comprehension, and leaving me some similar feelings. I believe that to be an issue of how they've structured the spec (and how they've spread out some important info across multiple specs).
> The fact that you theoretically can validate an xml document against a schema is honestly completely useless. If I am parsing XML, its because my code already knows what information it needs from the XML document, and really it should also know where that information is.
You seem to miss the entire point of XML schemas, or any schema really. Validating a document against a schema isn’t really for your code. It’s for documentation of what can be in a given document and how it needs to be structured. So others don’t need to read your code to understand that.
It then allows editing tools to verify generated documents. Or developers to understand how they can structure XML output properly.
Your code could also use it to verify an XML document before passing it to your code. Then you can inform the user of an invalid document and why instead of just crashing at a random point in code without rolling your own. It can also verify an entire document whereas code may only parse portions leading to later corruption.
> Realistically if ASN.1 weren't as badly overengineered and had shipped only with some of the more modern of it's encoding formats we probably would all be using ASN.1 for man things including maybe your web server responses and this probably would cut non image/video network bandwidth by 1/3 or more. But then the network is overloaded by image/video transmissions and similar not other stuff so I guess who cares???!???
For payment systems people really do validate messages' encoding.
> You seem to miss the entire point of XML schemas, or any schema really. Validating a document against a schema isn’t really for your code. It’s for documentation of what can be in a given document and how it needs to be structured. So others don’t need to read your code to understand that.
Schemas also let you parse data into ergonomic data structures in the host programming language. That's really the biggest win in having a schema.
Schemas and schema-aware tooling also help you not produce invalid messages that others then have to put up with and hack their parsers to handle when you present them with a fait accompli and you have the market power to make it stick.
Schemas also let you specify things formally rather than having to use English prose (or worse, prose in not-English, or even worse, having to produce prose in multiple languages and make sure they all say the same thing!).
The downside to schemas is that you have to do the work of writing them, and if you're the first implementor and you're using JSON and the messages are simple you just won't care to.
ITU-T specs are a pleasure to read and use. The problem is that x.4xx and x.5xx are horrible in details that have nothing to do with ASN.1, like eveyrthing to do with naming. The x.68x and x.69x specs are truly beautiful and well written. You will not see many Internet RFCs written remotely as well.
Oh yeah, amazing tooling no doubt, and non-free. Fabrice Bellard knew he could make good bank from this. ASN.1 for 3GPP and finance is a great little niche if you have a chance to exploit it.
The only goal of such ridiculous standards is to act as a form of vendor lock-in for vendors implementing those standards; the vendors get to say to governments that it is a standard and the sellers of the standards also get some money.
Any system designed picking such standards is basically betraying their client.
I think, if you want to annoy these people maximally, you should write an annotated version of the standard in a mathematical formal language.
I read the table constraints, which try to do something simple, but it's written in the most convoluted way possible.
I think I considered ASN.1 for a system once, but rejected it because of more modern technically superior system.
If the parser for something like ASN.1 doesn't fit in 80 lines of Haskell, perhaps you just shouldn't use it.
I don't know who these assholes are that say "Sure, let's make things slow and buggy, since we all hail Satan after all".
So I threw a bunch of semi-related ramblings together and I'm daring to call it a blog post.
Sorry in advance since I will admit it's not the greatest quality, but it's really not easy to talk about so much with such brevity (especially since I've already forgot a ton of stuff I wanted to talk about more deeply :( )
reply