Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The notation is an insurmountable barrier to many.


Many introductory texts, such as Types and Programming Languages explain the notation thoroughly and start really simple, even starting from the inference rule notation, then talking about reduction of untyped arithmetic expressions, building up to the standard notation type theory.


It's interesting how calling it insurmountable both panders to and insults the people you're talking about.

Maybe a better way to describe the notation is "off-putting"?


It's not supposed to be insulting. I think the notation is more than offputting, and describing it as a barrier to entry is appropriate.

Like many kinds of formal notation it almost requires a class in a university to have it explained. The same is true of other fields dense in notation where you need to learn the conventions as much as the basics.


In my experience, any sufficiently deep study of any subject requires introduction of at least some degree of specialized terminology ("jargon") to be able to express things precisely and concisely. Sure, you can try to avoid this and use just "generic" language, but you almost always have to sacrifice one or both of these, at least partially. That can very well be worth it in certain cases (e.g. in an introductory text). But the site being discussed here is a list of specific (counter)examples. It would seem counterproductive to me to "dumb it down" given that it's primary purpose seems to be to discuss very specific situations; indeed, one common feature among many of these examples are subtle differences in what things mean in different parts of a language specification -- i.e., one of the reasons why domain experts invariably start using jargon.

That said, of course jargon often is als abused, or used in cases it doesn't have to be, etc. -- that's s bad! And many papers and books suffer from this and would be improved by using less jargon.

But in general I find this negative stance against formal notation, and the expectation that one should be able to dive into subtle and advanced examples without a need of at least some "studying" (not at all necessarily at a university!) quite odd, and unrealistic.


My main question here is: why is conciseness a good thing? Precision is a must, no argument here, but conciseness, to me, seems like a premature, and harmful, optimization. Is it really that much harder to stay within standard ASCII(EDIT: I mean "stay within the characters that the reader already knows how to read", which on the second thought doesn't have to be ASCII at all...)? Do you really need to express concepts with a single strange character? What does it buy you, given that the downsides for readability are quite obvious? There are arguments that using a full word for something would be confusing due to other/overloaded meanings of the word, but then I go read some papers and I see the same ideograms used in drastically different meanings, too.

Jargon is ok. Jargon is what happens when you want to be precise while re-using brain machinery for reading. The exotic notations are, on the other hand, just jargon put in a form which you have to (re)learn how to read, for the sake of conciseness alone. That conciseness may pay off if you're going to be working with the notation 8h/day for 10 years (or a year, or maybe even half a year), but if you just want to read and understand concepts and learn facts... To paraphrase: you wanted a banana, but what you got was gorilla holding the banana and the whole jungle along with it.

Hence my question: isn't creating exotic, one-character == 2.5 concepts, notations simply optimization which is harmful to the vast majority of potential readers?


> My main question here is: why is conciseness a good thing?

Going through a single line of text to understand a concept is much better than having to go through two or more. I believe you should understand that, if your definition of conciseness matches mine.

As a programmer, you know first hand that repeating oneself is bad practice: it's better to define a function which will encapsulate this computation you need to repeat rather than using copy and paste. The same goes for any language: having vocabulary to embody reccuring ideas helps us discuss complex ideas more easily.

> Do you really need to express concepts with a single strange character?

It is only strange to you because you don't speak greek. Of course many don't: my pedantic point is that Greeks might disagree with you about these characters being strange, and less pedantically, the scientific community will argue that it's not that strange when you belong to their group. There are only a handful of them to remember in the context of type theory.

I guess you would want these characters, like Gamma (Γ), Delta (Δ), tau (τ) or sigma (σ) to be replaced by their ascii equivalent (eg. G, D, t, s), or perhaps even more evocative names? But even if this would be done, you'd still have to understand the underlying concepts they represent, such as the typechecking context which is usually represented by Gamma.

These concepts are indeed not easy to understand: you would have to learn to become a Gorilla in order to be able to hold that banana you languish for, or survive in the Jungle that surrounds it.

One good thing about using a different character set is that they stand out. There's little chance to confuse them with words of vernacular English. Greek mathematicians actually might have a harder time with these notations than you do :)


> As a programmer, you know first hand that repeating oneself is bad practice

Yes, but we're talking about the syntax/notation here, not the underlying concepts. I see it like this: imagine a function which has a single argument, in a dynamic language so that we have no type annotation. Let's say I named the argument "pair_of_ints". Following your logic, I should have named it "i2" instead, because then I wouldn't have to repeat myself by writing all the p,a,r,o,f,n,t,s characters all over the function!

Another way of saying this: yes, it's good to encapsulate pieces of logic in subroutines/functions/methods/etc. It's not good, however, to name the functions (let's add: in global scope) f, g, h, f2, f3, gh, etc. just to avoid typing more characters. Instead, we choose a name which can be understood with as little additional context as possible.

> having vocabulary to embody reccuring ideas helps us discuss complex ideas more easily.

Yes. My problem is when all the entries in the vocabulary are one (Unicode) character long.

> I guess you would want these characters, like Gamma (Γ), Delta (Δ), tau (τ) or sigma (σ) to be replaced by their ascii equivalent (eg. G, D, t, s), or perhaps even more evocative names?

The latter, preferably. If something is meant to denote a context, I see little reason to name it "Γ" instead of, I don't know, "context"?

> But even if this would be done, you'd still have to understand the underlying concepts they represent, such as the typechecking context which is usually represented by Gamma.

Yes, but then I wouldn't need to work to understand the syntax, making the process of understanding the concepts easier.


> It's not good, however, to name the functions (let's add: in global scope) f, g, h, f2, f3, gh, etc. just to avoid typing more characters. Instead, we choose a name which can be understood with as little additional context as possible.

Ah, you're not talking about the same things: you speak code, he speaks type theory.

You are completely correct on that: a good naming convention is better for readability, particularly with a large codebase.

So, let's try to find a compromise. It really depends on the size of the lexicon in the language considered: if all you have is 50 rules in your type theory, do you really need to give each of them a name fully describing its semantics? Of course you don't, so calling them inl1, sub2 etc. shouldn't be an issue, if the name can still act as a reminder of the rule meaning. In the same spirit, rules usually connect together few objects: a context, at most half a dozen of type variables which come up all the time in the rule set, a few operators. Maybe someone with some experience with functional programming would find it easier to understand. In fact, I think I pretty much paraphrased the quote above.

> Yes, but then I wouldn't need to work to understand the syntax, making the process of understanding the concepts easier.

I very much agree with that, but I also think that it's a question of training. In software engineering, people routinely devise DSLs, learn new languages to handle a specific problem or adapt to new working conditions. Once you get the hang of it, it should become easier.

Here's a small tip: Gamma in the greek alphabet corresponds to the letter C in the Roman one, so it sort of makes sense, if one's using Greek letters, to use Gamma for Context - like one would use Delta for difference in math, for instance. It's confusing at first, but it might also help in learning the Greek alphabet - again, same as what we have to go through in math classes.


This discussion reminds me of the style guide for a Coq PL formalization project:

> No unicode or infix operators for judgement forms. When I use them in my proofs they make perfect sense, but when you use them in yours they're completely unreadable.

Computers (parsers, rather) are extremely sensitive to notation, and indeed in this project they eschew greek letters and unicode altogether, however it makes things more verbose, but it is almost certain that the authors communicate the ideas in paper with notation like Γ ⊢ e : σ instead of `TYPE Gamma e s`.


> [...] in a dynamic language so that we have no type annotation.

We're talking about type theory and you took the escape hatch.

> Let's say I named the argument "pair_of_ints". Following your logic, [...]

You spoke of conciseness, that's the logic you introduced, not hers.

> It's not good, however, to name the functions (let's add: in global scope) f, g, h, f2, f3, gh, etc. just to avoid typing more characters. Instead, we choose a name which can be understood with as little additional context as possible.

That's perfectly reasonable to expect programmers to use long enough, meaningful names in a complex program. But here the topic is type theory, which are made at most of a few dozen rules (unless you already want to tackle on a complex type system, but that would be like trying to learn how to drive on an indycar). In such a setting, long comprehensive words are not really needed.

If you allow me to transpose this issue of yours to your domain, software engineering newcomers might see this arcane notation of using brackets everywhere quite confusing. Why not simply use plain English to describe a program?

> Yes. My problem is when all the entries in the vocabulary are one (Unicode) character long.

It's not so bad, why limit yourself to 52 characters when you can have so many more to choose from? Korean has more than 20 different characters for vowels alone (luckily they don't have capital letters), Japanese has 2000 kanjis to convey meaning, not counting the (200 or so) kanas used for spelling and connecting words. Chinese has 10k symbols in everyday use, and so many more obscure ones.

The good thing about these Unicode characters is that they have a short notation, but a long pronunciation, a bit like the spelling alphabet (Alpha, Bravo, ...).

I guess that my point is: English speakers have it easy at the morphological level, learning a few more character cannot be that difficult.

> The latter, preferably. If something is meant to denote a context, I see little reason to name it "Γ" instead of, I don't know, "context"?

If there's only one context being considered, one might as well use Gamma (that letter corresponds to C in the roman alphabet, btw). Perhaps the hidden secret of all this discussion is that Type Theory is more a math subject than an engineering one. Of course at some point a type system has to cross this boundary to be implemented, but that doesn't mean that math people have to speak pseudo-code.

> Yes, but then I wouldn't need to work to understand the syntax.

Math uses Greek letters by tradition, perhaps because the first occidental ones where Greek? As for the horizontal bar to separate the premises from the conclusion, it's an inheritance from Gentzen notation for natural deduction (iirc). Obviously, our knowledge builds up incrementally from former discoveries/inventions, because (at the risk of this discussion to become absurd) how would you be able to extend something if you have to start from scratch every time? Conversely, what would be the point of using a different language once you've done the effort of learning the one used all through the literature?

Sometimes things do change however. A few centuries ago, scientific articles were still written in Latin, or sometimes French. In a sense, it has to be a step up in the right direction for native English speakers now that papers are in English (for those devoted to an international readership at least), perhaps not so much for Latinists and French speakers though. Maybe someone like you will do the extra effort of learning the old way, and push for a new way without the fancy typography? But you'd probably have to publish significant papers for that.


This is about programming, but the "theory" part of type theory is an area of mathematics. Mathematics has always operated this way, going back to + and -.

You could make a case that, in order to be of value to (most ordinary) programmers, type theory needs to present their results in programmer-speak, not in math-speak. But then programmers ask questions like "why is that true", and the answer is the proof which, being a mathematical proof, is in math-speak.


> Mathematics has always operated this way, going back to + and -.

Wikipedia says[1] the earliest use of characters resembling + and - was in 14th century. From what I remember, math books were light on special notation until (at least) 18th century, and another poster here says it could be even newer (beginning of 20th century).

> and the answer is the proof which, being a mathematical proof, is in math-speak.

There are many, many proofs in Euclid's elements, and the only unusual (ie. not in plain natural language) notation used there (from what I remember and after a cursory glance now) is using clusters of capital letters to denote line segments.

Proofs are just logic, and logic was used for millenia (I think?) before someone decided that `∧` is better than "and" and we should all use it.

What I'm trying to say is that the "math-speak" is (or should be?) defined by what you're talking about, not in what syntax. And if this is true, then using more familiar syntax would be better for lowering the barrier to entry.

On the other hand, as Twisol notes, the modern terse syntax probably has its merits for experts. I'm a casual user - I won't be writing papers or checking their correctness - so I get all the bad (unfamiliar, strange symbols, context dependent syntax) without any good parts. :(

[1] https://en.wikipedia.org/wiki/Plus_and_minus_signs


> Is it really that much harder to stay within standard ASCII?

Sorry, I don't follow. What is ASCII? Can you please avoid using computer jargon in your text, it makes my head hurt and it's an unnecessary barrier to entry. Can you please write instead "7-bit encoding of text characters"?

:)


No, that's jargon - and I explicitly said it's ok. You can read the word just fine, and you can also easily search for its meaning. Compare with this:

> ...that much harder to stay ∋⌞0x21,0x7D⌝

It may be just me, but I think there's a huge difference between the two, even though both are equally made-up.


> ...that much harder to stay ∋⌞0x21,0x7D⌝

You are using wrong / nonstandard notation with no context or definition. Of course I have no idea what you're talking about.

Imagine if instead of writing

(-h/2m ∂²/∂x² + V) φ = E φ

you had to write

“Divide the reduced Plank constant by twice the mass of the particle, then multiply by the second derivative, with respect to the x-coordinate, of the wave-function. To this quantity, add the potential multiplied by the wave-function. This is equal to the product of the energy eigenvalue and the wave-function.”

because you don't want to memorise what ‘∂’, or the superscript 2, one number over another with a bar means, or that the funny greek letter means wave-function in this context x)


> nonstandard notation with no context or definition.

Which is exactly how the notation is used in many papers. You're supposed to have the context (from reading books and other papers in the area) and infer the definitions ("it's trivial to show, so we won't"). It's a passive-agressive stance from the perspective of someone not belonging to the club.

> “Divide the reduced Plank constant by twice the mass of the particle, then multiply by the second derivative, with respect to the x-coordinate, of the wave-function. To this quantity, add the potential multiplied by the wave-function. This is equal to the product of the energy eigenvalue and the wave-function.”

When I imagine this, I'm ecstatic. The difference between the two formulations is that the latter I can work my way through with a search engine, WolframAlpha, and Wikipedia, while I have no way - other then learn the whole subject matter - to decipher the former. I just don't understand what's wrong with the second formulation. From what I read, this was the standard way of writing about maths and physics until 17th century - at least. Yes, it's more characters, but it's infinitely more readable and discoverable to the uninitiated.

Not to mention, from the classes I attended, lecturers often say it out loud in exactly this form, while writing the symbols on a whiteboard. So it's obvious that the two are equivalent.

The former notations lends itself better to algebraic transformation, so I don't have any issues with using it for the intermediate steps, but the result - which is what I'm after, as a causal user and not a practitioner - could be presented in both forms. It would immediately unlock a lot of knowledge for use by non-experts. What's wrong with hoping for that?


Most of this notation predates ASCII and computer typesetting in general.


I see using this word was a bad choice, I edited the post. What I meant was "to stay within the character set the reader already knows how to read".


A more interesting question.

I think there are two answers:

1. This style of notation was, I believe first really developed in Principia Mathematica in 1910ish. Some of the proofs in that book are big, and would be vastly longer if they were in plain english.

2. Part of the goal of that book was precision. They could reuse existing words, but those have pesky connotations that might mislead. So instead they'd either be left to invent new terminology "sqaggle x, y lemmy ror, x grorple y" or pick something symbologicial.

There are really good conversations about intuitiveness of notation (see, e.g. https://mathoverflow.net/questions/366070/what-are-the-benef...), and I think that comes down to things being hard. Notation is almost always created by an expert who is making something useful to them. It's not clear that the correct expressive notation would even be the same as the correct layman's/teaching notation. Those two concepts may simply be at odds.


The linked MathOverflow answer was written by Dr. Terence Tao, and it really is an excellent dissection of the many needs levied upon notation. I find the two "Suggestiveness" items to be most related to concision as practices in research papers.

This bit from the tail end should not be missed, in our context:

> ADDED LATER: One should also distinguish between the "one-time costs" of a notation (e.g., the difficulty of learning the notation and avoiding standard pitfalls with that notation, or the amount of mathematical argument needed to verify that the notation is well-defined and compatible with other existing notations), with the "recurring costs" that are incurred with each use of the notation. The desiderata listed above are primarily concerned with lowering the "recurring costs", but the "one-time costs" are also a significant consideration if one is only using the mathematics from the given field X on a casual basis rather than a full-time one. In particular, it can make sense to offer "simplified" notational systems to casual users of, say, linear algebra even if there are more "natural" notational systems (scoring more highly on the desiderata listed above) that become more desirable to switch to if one intends to use linear algebra heavily on a regular basis.


It's a great explanation, thank you. Yes, as a casual user I'm concerned with "one-time costs", ie. the barrier to entry. I'm glad that I'm not the only one who noticed that these costs are real and make "casual usage" harder than it could be otherwise.

Your previous comment, though, is much less optimistic (to me): you say that, even if they are aware of this, the experts have no incentives to use "simplified notational systems". That means that I can either become an expert myself, find an expert who will translate for me, or be unable to make use of the knowledge contained in materials for experts. I wish there was some other choice, like Google Translate from math to English...


Yes, I would like to see more efforts out there like Distill in making cutting-edge research more accessible. I don't think we can wish away the significant gap in expertise and knowledge embodied by research, though. Somebody has to put in the effort to understand it for themselves and then unpack it for others. What may be one page in a research paper could easily turn into dozens.

I myself have been studying category theory (CT) off and on for a few years now, trying to get enough of an understanding to be able to explain it to others in the software engineering (SE) field. I think there's a lot to gain from CT, but it's so strongly founded on distant mathematical fields like algebraic topology that it's very hard to get at its essence and find tractable connections from an SE perspective.

Finding good explanations is really a hard research problem of its own; it just isn't funded that way.

> That means that I can either become an expert myself

I don't think there's any way around this. If you understand a topic, you have obtained some amount of expertise thereof. An expert cannot simply translate for you; it's not a merely notational difference. There's a body of knowledge that must be transferred.

I would recommend finding an expert who's willing to correspond with you on occasion, and look for more introductory materials like textbooks and position papers in the field. If there's a particular goal you have in mind -- say there's a specific paper that achieves something you have an interest in -- be up front about that; it helps focus the explanations and recommendations.

Being an expert doesn't mean you have to have a very broad base of expertise. In fact, it could be argued that most experts are expert in a very, very small and focused niche. The smaller the niche, the less you need to digest; but the more rarefied the niche, the more stable the successive foundations below you need to be.


> Do you really need to express concepts with a single strange character? What does it buy you, given that the downsides for readability are quite obvious?

As someone who dabbles in the field, I'll bite. I find that more succinct expressions make their structure more apparent than less succinct expressions. This is critical for identifying (and then demonstrating) the general principles underlying any particular example, which is a great deal of what we do in theoretical research. (It's usually better to describe a whole family of solutions than to describe a single solution; and failing that, to indicate some directions potentially leading to general principles.)

What do I mean by structure? This is a bit of a dodge, but structure is what you get when you remove all the data, all the aspects that pin an expression to a specific scenario instead of a general pattern. If you can recognize and memorize the pattern, you can apply it to a much broader class of examples than the one you learned it from. Concise notation is one tool for downplaying the concrete data and calling out the common aspects more deliberately.

As other posters have said, this can be abused. I'd even say it's a very rare paper that uses concise notation appropriately to its fullest extent as an aid to the reader. But the concision does serve an important purpose to experts in the field (who need rather less aid): it makes the newly contributed patterns more visible (and uses existing patterns to effectively de-emphasize parts).

> but if you just want to read and understand concepts and learn facts...

Most research papers do not have as a goal for a technical reader to understand concepts and learn facts. (There are certainly visible exceptions, see just about anything written by Simon Peyton Jones.) Research papers are evidence of progress at the forefront of human knowledge; they're shared amongst an expert community to help drive that whole community forward.

We absolutely need more effort to distill research and collect it into a more cohesive picture. Unfortunately, that responsibility does not (and probably cannot, in the current system) fall on the original researchers themselves. There are a few organized efforts out there for some fields; the one I know of is Distill.pub, for machine learning: https://distill.pub/about/

>> When we rush papers out the door to meet conference deadlines, something suffers — often it is the readability and clarity of our communication. This can add severe drag to the entire community as our readers struggle to understand our ideas. We think this "research debt" can be avoided.

But I don't think the concision of notation is at fault. It's just the most obvious roadblock to a (relatively) lay reader. The truth is simply that the paper wasn't written with you in mind.


I don't get it, do you feel like you need a class in a university to learn programming? Source code is nothing but formal notation. The fact that it kind of looks like English text if you squint makes no difference, as far as I'm concerned. There's only a handful of symbols you have to learn to read type theory stuff.


I believe you that it wasn’t intended to be insulting.

But let’s think about it.

I feel pretty confident that most people on this website are capable of learning the core 2400 Chinese characters in a year if they spent a few hours a day and that’s literally a foreign notation. A lot of people learn new languages all the time.

Kids who don’t want to learn calculus, learn calculus every day. The notation isn’t just awkward, the concepts are too. Yet they learn it.

What we’re talking about is a small notation. It’s a handful of symbols. They work predictably and consistently and the people learning it are usually familiar with the subject on some level.

It’s off-putting because it appears foreign, but the concepts and actual mechanics are already familiar with most of these readers. They just need a Rosetta Stone to help get past the initial awkwardness.

When you look at it from that perspective, it is kind of insulting use a hyperbolic word like insurmountable to a group of programmers.

Many chose not to learn it because they don’t want to be bother or don’t see the value investing their time.

Throwing words like insurmountable around seems like it breeds learned helplessness.


This is coming from my own experience trying to teach theoretical concepts via notation to very smart engineers. I won't argue vocabulary, but I stand by what I say: it's an insurmountable barrier for many people.

Teaching calculus to children is a good example of why. They have a teacher to hold their hands and answer questions 40 minutes a day with mandatory homework assignments.

If you study concepts on your own, you do not have this luxury. Its very difficult to internalize notation when you do not use it every day. The problem is not the notation itself, but the fact that no notation can be categorically searched and referenced. It cannot be typed or entered into google easily. It is rarely consistent between authors, which is free of problems if you are already fluent in the notation.

Sometimes you can get away with "what does upside down A mean" but consider something like `\forall n \in \mathbb{N}`. Imagine if you simply had `unsigned int` (I don't want to debate the exact implication of unsigned int versus the naturals, but it should serve a point).

When I have tried to explain relatively simple notational structures to working engineers, the universal feedback is along the lines of "I wish there was a good reference for notation because I can't understand this or keep up with it."

Pseudo code, or something akin to it, may be more ambiguous but is much easier to grok and document.

As an aside: something that constantly irks me is the celebration of terseness and convenience. This plagues many texts and robs novices and experts alike from understanding. Notation that is terse to the point of resembling arcane incantations is a problem, and it bothers me that academic publications in the applied sciences don't recognize it as one.


I absolutely agree with this. My background isn't at all mathematical, but I've ended up working with machine learning. As part of that, I naturally want to understand what I'm doing, so I try to read the papers/textbooks/etc.

The problem I run into, again and again, is that all the notation is presented as already understood, even in introductory texts. It's not explained, and it's not clear where I'm supposed to go to get it explained. I can't search for it directly because it's a symbol, not a name for the symbol, and if I do manage to find the name, I'm left to wade through all the different contexts in which it might be used to mean something to find out which it is.

It's not an insurmountable barrier because I'm incapable of learning. It's an insurmountable barrier because I can't find a reference to learn from that doesn't already assume I know all the answers.


I am pretty sure anyone for whom the notation in `\forall n \in \mathbb{N}` presents a barrier to learning meaningful amounts of type theory will have greater problems from lack of mathematical background.


The mathematical background required is just familiarity with notation... which is my entire point. It sounds like you don't have a good handle on the mathematical pedagogy required to be a good engineer today - it's not what you could describe as extensive


I think it's easier to do if you work up to it gradually. I read Logic by Wilfrid Hodges a few years back on a whim from some recommendation on HN. I didn't think much of the book at the time, though I recently picked up Benjamin Pierce's Types and Programming Languages. Having seen notation like that before made digesting the book much easier (along with starting with other introductory texts like Friedman and Wand's Essentials of Programming Languages).


Yeah, Essentials of Programming Languages is also another great implementation-oriented text, it has some formal things like inductive sets (which pop up in PLT anyway) and also lots of programming exercises.


Looking at java, rust or assembly i find this argument hard to swallow ;) It's really worth it to learn some sequent calculus basics to be able to read typing rules. Its mostly the same thing as a grammar, or a weird inductive type ("enum").


It shouldn't be hard to swallow. Monospaced ascii text that is (usually) unambiguous with extensive documentation is night and day from formal notation.

At least BNF grammars are legible


Look at APL, K, and J. The notation based on keywords is not a problem; the problematic notations are the ones which abuse strange characters or ideograms, because they require you to re-learn the process of reading.


The notation really isn't as bad as it originally looks, trust me. I was in the same boat recently, but after cracking open TAPL, on page 26 you basically get all the notation you need for the rest of the book.

Inference rules just transform the traditional format of

    (A && B) -> C
into

    A B
    ---
     C
This format allows you to stack rules on top of each other when writing out the derivation of some term, without running out of page space.


It seems absurd to care about 'page space' when discussing computer science notation...


ah notation is such a subtle and often sad topic

when you click you never give a damn about notation but before you do it's such a drag

it's like monads




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: