> The first is that the format isn't really parseable without using a schema, un...

BradleyChatha · 2025-10-23T15:14:30 1761232470

> You can parse DER perfectly well without a schema, it's a self-describing format.

If the schema uses IMPLICIT tags then - unless I'm missing something - this isn't (easily) possible.

The most you'd be able to tell is whether the TLV contains a primitive or constructed value.

This is a pretty good resource on custom tagging, and goes over how IMPLICIT works: https://www.oss.com/asn1/resources/asn1-made-simple/asn1-qui...

> Because of OpenSSL's dominance, a lot of "DER" in the wild was really a mixture of DER and BER

:sweat: That might explain why some of the root certs on my machine appear to be BER encoded (barring decoder bugs, which is honestly more likely).

woodruffw · 2025-10-23T15:20:22 1761232822

Ah yeah, IMPLICIT is the main edge case. That's a good point.

cryptonector · 2025-10-23T20:37:05 1761251825

Even if where is no use of IMPLICIT you still have the problem that it's just a bunch of primitive values and composites of them, but you don't know what anything means w/o reference to the defining module. And then there's all the OCTET STRING wrappers of things that are still DER-encoded -- there are lots of these in PKIX, even just in Certificate you'll find:

  - the parameters in AlgorithmIdentifier
  - the attribute values in certificate names
  - all the extensions
  - otherName choices of SubjectAlternativeName
  - certification policies
  - ...

Look at RFCs 5911 and 5912 and look for all the places where `CLASS` is used, and that's roughly how many "typed holes" there are in PKIX.

woodruffw · 2025-10-23T21:17:18 1761254238

Sure, but that's the same thing as you see with "we've shoved a base64'd JSON object in your JSON object." Value opacity is an API concern, not evidence that DER can't be decoded without a schema.

cryptonector · 2025-10-23T21:53:06 1761256386

For sure. Typed holes are a fact of life.

The wikipedia page on serialization formats[0] calls ASN.1 'information object system' style formalisms (which RFCs 5911 and 5912 make use of, and which Heimdal's ASN.1 makes productive use of) "references", which I think is a weird name.

[0] https://en.wikipedia.org/wiki/Comparison_of_data-serializati...

cryptonector · 2025-10-23T20:42:45 1761252165

Is it really because of OpenSSL? Anyways, I don't see much of this in the wild.

jeroenhd · 2025-10-23T14:54:38 1761231278

You can parse DER, but you have no idea what you've just parsed without the schema. In a software library, that's often not very useful, but at least you can verify that the message was loaded correctly, and if you're reverse engineering a proprietary protocol you can at least figure out the parts you need without having to understand the entire thing.

woodruffw · 2025-10-23T14:59:43 1761231583

Yes, it's like JSON in that regard. But the key part is that the framing of DER doesn't require a schema; that isn't true for all encoding formats (notably protobuf, where types have overlapping encodings that need to be disambiguated through the schema).

jeroenhd · 2025-10-23T15:09:59 1761232199

I'd argue that JSON is still easier as it allows you to reason about the structure and build up a (partial) schema at least. You have the keys of the objects you're trying to parse. Something like {"username":"abc","password":"def",userId:1,admin:false} would end up something like Utf8String(3){"abc"}+Utf8String(3){"def"}+Integer(1){1}+Integer(1){0} if encoded in DER style.

This has the fun side effect that DER essentially allows you to process data ("give me the 4th integer and the 2nd string of every third optional item within the fifth list") without knowing what you're interpreting.

cryptonector · 2025-10-23T20:41:30 1761252090

It's really not an advantage that DER can be "parsed" without a schema. (As compared to: XDR, PER, OER, DCE RPC, etc., which really can't be.) It's only possible because of the use of tag-length value encoding, which is really wasteful and complicates life (by making it harder or impossible to do online encoding, since you have to compute the length before you can place the value because the length itself is variable length so you have to reserve the correct number of bytes for it and shoot-me-now).

woodruffw · 2025-10-23T21:14:50 1761254090

I don't have a strong opinion about whether it's an advantage or not, that was more just about the claim that it can't be parsed without a schema.

(I don't think variable-length-lengths are that big of a deal in practice. That hasn't been a significant hurdle whenever I've needed to parse DER streams.)

cryptonector · 2025-10-23T21:47:59 1761256079

Variable length lengths are not a big deal, but they prevent online encoding. The way you deal with that anyways is that you make your system use small messages and then stream those.

jcranmer · 2025-10-23T14:52:28 1761231148

> You can parse DER perfectly well without a schema, it's a self-describing format. ASN.1 definitions give you shape enforcement, but any valid DER stream can be turned into an internal representation even if you don't know the intended structure ahead of time.

> rust-asn1[1] is a nice demonstration of this: you can deserialize into a structure if you know your structure AOT, or you can deserialize into the equivalent of a "value" wrapper that enumerates/enforces all valid encodings.

Almost. The "tag" of the data doesn't actually tell you the type of the data by itself (most of the time at least), so while you can say "there is something of length 10 here", you can't say if it's an integer or a string or an array.

woodruffw · 2025-10-23T14:56:59 1761231419

> The "tag" of the data doesn't actually tell you the type of the data by itself (most of the time at least), so while you can say "there is something of length 10 here", you can't say if it's an integer or a string or an array.

Could you explain what you mean? The tag does indeed encode this: for an integer you'd see `INTEGER`, for a string you're see `UTF8String` or similar, for an array you'd see `SEQUENCE OF`, etc.

You can verify this for yourself by using a schemaless decoder like Google's der-ascii[1]. For example, here's a decoded certificate[2] -- you get fields and types, you just don't get the semantics (e.g. "this number is a public key") associated with them because there's no schema.

[1]: https://github.com/google/der-ascii

[2]: https://github.com/google/der-ascii/blob/main/samples/cert.t...

jcranmer · 2025-10-23T15:16:56 1761232616

It's been a long time since I last stared at DER, but my recollection was for the ASN.1 schema I was decoding, basically all of the tags ended up not using the universal tag information, so you just had to know what the type was supposed to be. The fact that everything was implicit was why I qualified it with "most of the time"; it was that way in my experience.

woodruffw · 2025-10-23T15:22:32 1761232952

Oh, that makes sense. Yeah, I mostly work with DER in contexts that use universal tagging. From what I can tell, IMPLICIT tagging is used somewhat sparingly (but it is used) in the PKI RFCs.

So yeah, in that instance you do need a schema to make progress beyond "an object of some size is here in the stream."

cryptonector · 2025-10-23T22:08:24 1761257304

IMPLICIT tagging is used in PKIX (and other protocols) whenever a context or application tag is needed to disambiguate due to either a) OPTIONAL members, b) members that were inserted as if the SEQUENCEs/SETs were extensible, or c) CHOICEs. The reason for IMPLICIT tagging instead of EXPLICIT is simply to optimize on space: if you use EXPLICIT you add a constructed tag-length in front of the value that already has a tag and length, but if you use IMPLICIT then you merely _replace_ the tag of the value, thus with IMPLICIT you save the bytes for one tag and one length.

Kerberos uses EXPLICIT tagging, and it uses context tags for every SEQUENCE member, so these extra tags and lengths add up, but yeah, dumpasn1 on a Kerberos PDU (if you have the plaintext of it) is more usable than on a PKIX value.

jeroenhd · 2025-10-23T14:57:15 1761231435

DER is TLV. You don't know the specifics ("this integer is a value between 10 and 53") that the schema contains, but you know it's an integer when you read it.

PER lacks type information, making encoding much more efficient as long as both sides of the connection have access to the schema.

zzo38computer · 2025-10-24T05:06:25 1761282385

> The "tag" of the data doesn't actually tell you the type of the data by itself (most of the time at least)

In my experience it does tell you the type, but it depends on the schema. If implicit types are used, then it won't tell you the type of the data, but if you use explicit, or if it is neither implicit nor explicit, then it does tell you the type of the data. (However, if the data type is a sequence, then you might not lose much by using an implicit type; the DER format still tells you that it is constructed rather than primitive.)