More

cooljoseph · 2025-12-13T19:34:34 1765654474

I've done something similar for a couple tools.

I tend to make them as Python servers which serve plain html/js/css with web components. I know this is a bit more complicated than just having a single html file with inline js and css, but the tools I made were a bit too complicated for the LLMs to get just right, and separating out the logic into separate js files as web components made it easy for me to fix the logic myself. I also deliberately prompted the LLMs to avoid React because adding I didn't want to need a build step.

The only one I actually still use is the TODO app I made: https://github.com/cooljoseph1/todo-app It stores everything in a JSON file, and you can have multiple TODO lists at once by specifying that JSON file when you launch it.

cooljoseph · 2025-10-11T04:47:13 1760158033

This sounds somewhat like a normalizing flow from a discrete space to a continuous space. I think there's a way you can rewrite your DDN layer as a normalizing flow which avoids the whole split and prune method.

1. Replace the DDN layer with a flow between images and a latent variable. During training, compute in the direction image -> latent. During inference, compute in the direction latent -> image. 2. For your discrete options 1, ..., k, have trainable latent variables z_1, ..., z_k. This is a "code book".

Training looks like the following: Start with an image and run a flow from the image to the latent space (with conditioning, etc.). Find the closest option z_i, and compute the L2 loss between z_i and your flowed latent variable. Additionally, add a loss corresponding to the log determinant of the Jacobian of the flow. This second loss is the way a normalizing flow avoids mode collapse. Finally, I think you should divide the resulting gradient by the softmax of the negative L2 losses for all the latent variables. This gradient division is done for the same reason as dividing the gradient when training a mixture-of-experts model.

During inference, choose any latent variable z_i and flow from that to a generated image.

diyer22 · 2025-10-11T20:26:29 1760214389

Thanks for the idea, but DDN and flow can’t be flipped into each other that easily.

1. DDN doesn’t need to be invertible. 2. Its latent is discrete, not continuous. 3. As far as I know, flow keeps input and output the same size so it can compute log|detJ|. DDN’s latent is 1-D and discrete, so that condition fails. 4. To me, “hierarchical many-shot generation + split-and-prune” is simpler and more general than “invertible design + log|detJ|.” 5. Your design seems to have abandoned the characteristics of DDN. (ZSCG, 1D tree latent, lossy compression)

The two designs start from different premises and are built differently. Your proposal would change so much that whatever came out wouldn’t be DDN any more.

godelski · 2025-10-16T06:23:22 1760595802

Fwiw, I'm not convinced its a Flow and that's my niche. But there are some interesting similarities that actually make me uncertain. A deeper dive is needed.

But you address your points

  > 1. DDN doesn’t need to be invertible

The flow doesn't need to be invertible at every point in the network. As long as you can do the mapping the condition will hold. Like the classic layer is [x_0, s(x_1)*x_0 + t(x_1)]. s,t are parametrized by an arbitrary neural network. But some of your layers look more like invertible convolutions.

I think it is worth checking. FWIW I don't think an equivalence would undermine the novelty here.

  > 2. Its latent is discrete, not continuous.

That's perfectly fine. Flows aren't restricted that way. Technically all flows aren't exactly invertible as you noise the data to dequantize it.

Also note that there are discrete flows. I'm not sure I've seen an implementation where each flow step is discrete but that's more an implementation issue.

  > 3. As far as I know, flow keeps input and output the same size so it can compute log|detJ|.

You have a unet, right? You're full network is doing T:R^n -> R^n? Or at least excluding the extra embedding information? Either way I think you might not interested in "Approximation Capabilities of Neural ODEs and Invertible Residual Networks". At minimum their dimensionality discussion and reference to the Whitney Embedding Theorem is likely valuable to you (I don't think they say it by name?).

You may also want to look at RealNVP since they have a hierarchical architecture which does splitting.

Do note that NODEs are flows. You can see Ricky Chen's works on i-resnets.

As for the Jacobian, I actually wouldn't call that a condition for a flow but it sure is convenient. The typical Flows people are familiar with use a change of variables formula via the Jacobian but the isomorphism is really the part that's important. If it were up to me I'd change the name but it's not lol.

  > 5. Your design seems to have abandoned the characteristics of DDN. (ZSCG, 1D tree latent, lossy compression)

I think you're on the money here. I've definitely never seen something like your network before. Even if it turns out to not be its own class I don't think that's an issue. It's not obviously something else but I think it's worth digging into.

FWIW I think it looks more like a diffusion model. A SNODE. Because I think you're right that the invertibility conditions likely don't hold. But in either case remember that even though you're estimating multiple distributions that that's equivalent to estimating a single distribution.

I think the most interesting thing you could do is plot the trajectories like you'll find in Flow and diffusion papers. If you get crossing you can quickly rule out flows.

I'm definitely going to spend more time with this work. It's really interesting. Good job!

cooljoseph · 2025-09-08T17:33:50 1757352830

> The system is full up at the moment and you can't really add anything without removing some things.

Middle school seems rather un-full to me. Right now students start learning about fractions in 4th grade. They don't move on to algebra until 9th grade. What is there in the middle? Not much, in my experience.

Maybe instead of having a giant no-math gap during middle school, they could move everything down and free up some space later on.

jerf · 2025-09-09T13:49:51 1757425791

I doubt it. Local math programs tried to jam those things down sooner, but it doesn't do much good and may do harm. While I understand Piaget is not necessarily the last word in human development, using his terminology, real mathematical education can't really proceed until the formal operational stage is attained. Prior to that all you can really expect is extremely concrete things (in the aptly-named concrete operational stage) like arithmetic. Only single-digit percentages can handle simply moving that stuff earlier into the curriculum. The local conversation on that is distorted by HN having a lot of those single-digit percentages in question, but it's not the normal case.

While I didn't lay it out in my curriculum discussion, I also extremely, extremely strongly support using computers to provide personalized curricula to students designed to probe them for when they are ready to start more advanced math, leading to the complete destruction of the unbelievable awful cohort system in use today, and giving those single-digit percentages the ability to proceed onwards at their own pace, however much faster or slower it may be than anyone else's. But if my curriculum is goring a sacred cow, this idea is goring an entire herd of them at once. The field of education's resistance to computerization has been incredibly effective, to their personal benefit but to all our children's detriment.

aragilar · 2025-09-09T10:06:02 1757412362

Huh, Australia had (many years ago when I did it) algebra in year 7, and if you were in any kind of accelerated/gifted-and-talented class you'd see it even earlier.

OkayPhysicist · 2025-09-10T19:00:48 1757530848

Where the heck are they teaching algebra in 9th grade? Here in California that's 6th-7th grade curriculum.

cooljoseph · 2025-08-22T22:39:22 1755902362

> I'd bet accidents are more often than not mostly their fault.

That's actually not true. Most surveys I've seen show that drivers are at fault ~80% of the time.

MisterMower · 2025-08-23T02:06:50 1755914810

Surveys?

“Yes I’ve been in an accident on my bike Mr. Poll Taker.”

“What? Of course it was the other guy’s fault!”

cooljoseph · 2025-08-23T03:19:59 1755919199

No, surveys like where researchers show up to hospitals and look at the police reports for the injured cyclists.

Sorry if I wasn't clear in my wording. By "survey" I was trying to point to the specific kind of research methodology where you survey people about what has happened in the past instead of trying to control variables like in a typical experiment.

I wasn't talking about random internet polls or self-reported blame analyses, but actual research papers.

cooljoseph · 2025-05-24T05:45:58 1748065558

It is possible to construct φ exactly with a straight-edge and compass. Would the approximation of 5π/6 - 1 be used because it's easier to calculate quickly?

Xmd5a · 2025-05-24T14:46:53 1748098013

Yes φ is a constructible number and more generally an algebraic number, solution of the polynomial equation x^2 = x + 1. However π is not, and so is my approximation of φ as 5π/6 - 1. Here non-constructibility (in the mathematical sense) translates to the fact there is no method to "straighten" an arc into a segment using a compass and a ruler. But bear with me because the 5π/6 - 1 approximation of phi has more to say.

First, the "conspiracy theory" that the meter is linked to Earth's dimensions and harmonizes with ancient measurement units through a shared reference actually predates the meter's definition. This idea was a thread of interest among the scientists who developed a universal measurement system – one that could be derived anywhere on the planet.

>One can well sense that it can only be through comparisons of measurements made in ancient times & in our days on monuments still existing, that I can determine to how many of our toises the Geometers of antiquity would have evaluated a degree of Meridian. Now I find, 1st. that the side of the base of the great pyramid of Egypt taken five hundred times; 2nd. that the cubit of the Nilometer taken two hundred thousand times; 3rd. that a stadium existing & measured at Laodicea in Asia Minor, by Mr. Smith, & taken five hundred times; I find, I say, that these three products are each of the same value, & that each in particular is precisely the same measure of a degree [of a Meridian], which has been determined by our modern Geometers.

Alexis-Jean-Pierre Paucton, Metrology, or Treatise on measures, weights and currencies, of the ancients and the moderns, 1780

more context: https://anonpaste.pw/v/71abb0f8-5a03-4cb5-879a-d4f44ad6d57c#...

original: https://gallica.bnf.fr/ark:/12148/bpt6k55491755/f126.item

>Newton was trying to uncover the unit of measurement used by those constructing the pyramids. He thought it was likely that the ancient Egyptians had been able to measure the Earth and that, by unlocking the cubit of the Great Pyramid, he too would be able to measure the circumference of the Earth.

https://www.theguardian.com/science/2020/dec/06/revealed-isa...

Having said that(-1 downvotes!), let's recap:

This is how we can construct a royal cubit from a circle of diameter = 1m:

https://imgur.com/a/HmnfDKR

φ = 2cos(π/5) lead us to this construction around a pentagon from which we can derive the "pige" or "quine" of cathedral builders (for now consider this is historically true) https://fr.wikipedia.org/wiki/Pige_(mesure)

https://imgur.com/a/ZqprfAd

What I mentioned earlier was that using a circle-based construction(diameter = 1m), one can derive a non-constructible approximation of φ, namely φ̃ = 5π/6 – 1, with the remarkable property that 0.2 × φ̃² = π/6, thanks to φ² = φ + 1.

But what’s truly elegant is that this process has a symmetric counterpart, where we approximate π using φ. This time, we begin with a constructible triangle, sometimes called the triangle of the builders (1, 2, √5), whose perimeter is:

    t = (1 + 2 + √5)/10

This value is fully constructible with compass and straightedge, and numerically it approximates π/6 to four digits. If we treat this `t` as a stand-in for π/6 in the previous formula:

    φ = 5t – 1

we recover the *exact golden ratio*:

    φ = (5 × (3 + √5)/10) – 1 = (1 + √5)/2

And then, going full circle:

    0.2 × φ² = t again

In both directions, 0.2 (i.e., 1/5) emerges as the key scaling factor, bridging the decimal system, φ, and π through geometry. It ties together:

    - the constructible (t from the triangle),
    - the transcendental (π/6 from the circle), and
    - the algebraic (φ² = φ + 1)

^this is a new result I just found.

For the historically conservative, arguments can be made that these considerations are pseudo-historical, that the "quine of cathedral builders" is an unsubstantiated myth. See the wikipedia link above for the "pige"

or this recent article: https://classiques-garnier.com/aedificare-2021-2-revue-inter...

this one too: http://compagnonsdudevoir.fr/?p=790

>This greatly saddens those who have built an entire "operative" narrative around this kind of knowledge supposedly passed down in secret among the compagnons of the Tour de France for centuries… and have made it their pedestal. The question of how "tradition" is constructed among the compagnons (and incidentally among the Freemasons) remains a taboo that absolutely needs to be broken — and not just for the sake of advancing historical knowledge.

Also this blog post traces the confabulation of the quine to Le Corbusier's Modulor system based on the golden ratio: https://blogruz.blogspot.com/2007/12/en-qui-quine.html

>Le Corbusier considered various sets of proportions, notably using a human height of 1.75 meters, before settling in 1947 on a single set based on a height of 1.83 meters. He chose this because the associated Modulor measurement of 226 cm corresponds to within less than a millimeter of 89 inches — 89 being a number in the Fibonacci sequence that provides some of the best approximations of the golden ratio.

>This system was intended to unite all nations around a universal standard, effectively casting aside the metric system, if not the decimal system entirely. We know how that turned out: the Modulor was essentially used for only one major creation — albeit a significant one — the Cité Radieuse in Marseille, completed in 1952, where all dimensions, down to the built-in furniture, are derived from the Modulor.

Makes you think... The fact we don't have documents isn't surprising given that the campagnons (or later freemasons) communicated practical (then mystical) knowledge esoterically for political reasons (See https://fr.wikipedia.org/wiki/Compagnonnage).

Nonetheless, the same motivations and the same quest for harmony (in the obsessive, symbolic sense) can be observed in Le Corbusier. As if the situation follows a geometric progression: in this sense, the "ancients" were as puzzled as we are by unexpected harmonies and actively sought them, and if you look at the historical sequence that lead to the definition of the meter, this is what you find.

Compare the length of a greek foot with a roman foot: 30.87cm vs 29.62cm. The ratio matches 24/25 with 3 nines of precision. 24•25•7 forms a pythagorean triangle. As if the definition of some measurement units were retrofitted to facilitate conversion. If this kind of behavior leads to the formation of a strange graph of quasi-conversions or numerical coincidences, then maybe we could explain the emergence of patterns such as the 5π/6 - 1 approximation of φ without needing to argue for (or against) someone's intention behind what appears as a design choice.

Alternatively the measures of the tools or geometric constructs that drive these conversions are idealized/approximated with a ratio, hence the delusion of the conspiracy theorists. But as I said, "ancients" had the same attitude, in particular with irrational numbers they wished to express as a ratio. Imagine the kind of problem the pseudo-phi <-> pseudo-π/6 complex I desribed above posed to people who where attempting to construct a straight line of length pi using only a compass and a straight-edge and establishing mathematics more rigorously. That's quite a nasty trap. Surely they found themselves in a mindstate that must not be that different from ours. Put in other words, the situation is hyperstionnal, and if we want to understand what is happening (whether this is an illusion or not) I think we should try to tackle this from a cognitive angle and model surprise explicitly.

Some more links:

https://www.messagedelanuitdestemps.org/les-principales-unit...

https://martouf.ch/2021/03/le-metre-une-matrice-universelle-...

cooljoseph · 2025-03-18T04:44:34 1742273074

Are you sure about that? According to Wikipedia both gypsum and calcite are used. Apparently, gypsum is used for colored chalk, and calcite is used for white chalk:

> Chalk sticks are produced in white and in various colours, especially for use with blackboards. White chalk sticks are made mainly from calcium carbonate derived from mineral chalk or limestone, while coloured chalk sticks are made from calcium sulphate in its dihydrate form, CaSO4·2H2O, derived from gypsum.[6][7] Chalk sticks containing calcium carbonate typically contain 40–60% of CaCO3 (calcite).

https://en.wikipedia.org/wiki/Blackboard#Chalk_sticks

Wikipedia cites the following articles:

[6] "How chalk is made – material, making, used, processing, procedure, product, industry". madehow.com. Retrieved February 17, 2021.

[7] Corazza, M.; Zauli, S.; Pagnoni, A.; Virgili, A. (2012). "Allergic contact dermatitis caused by metals in blackboard chalk: a case report". Acta Dermato-Venereologica. 92 (4): 436–437. doi:10.2340/00015555-1296. PMID 22367154.

The first of these seems more relevant... I'm not quite sure what the second citation adds.

cooljoseph · on Nov 26, 2024

The camera is positioned so that the side window is visible through the front windshield. I think the "black coating" you are seeing is just the interior of the van, and that the entire front windshield is usable.

cooljoseph · on Oct 23, 2024

They seem a lot easier than USAMO problems, or even Putnam problems. I suspect that top students nowadays could easily solve them all in a day.

cooljoseph · on Sept 22, 2024

I was having some difficulty figuring out how Hy actually is translated to Python (and wasn't even sure if it was compiled or interpreted). Eventually I found on Wikipedia the following: > Hy is a dialect of the Lisp programming language designed to interact with Python by translating s-expressions into Python's abstract syntax tree (AST).

Also, looking at the code on Github suggests this compiler is written in Python (see https://github.com/hylang/hy/blob/master/hy/compiler.py).

I kind of wish this was made more clear on the main website. Perhaps, instead of introducing Hy as "a Lisp dialect that's embedded in Python", introduce it as "a Lisp dialect that compiles to Python's AST". The words "embedded in Python" don't make it very clear just how it's embedded into Python. The various ways you can embed a Lisp look very different and have very different tradeoffs.

For example, off the top of my head, I could "embed" a Lisp by writing an interpreter (in C if I care about performance) and letting it be called from Python, perhaps passing in a Python list instead of a string to make it more "native". Or I could "embed" a Lisp by compiling to Python bytecode. Or I could "embed" a Lisp by translating it directly to Python source code. Etc.

Regardless, interesting project!

wodenokoto · on Sept 22, 2024

From the readme / github page:

> Hy is a Lisp dialect that's embedded in Python. Since Hy transforms its Lisp code into Python abstract syntax tree (AST) objects, you have the whole beautiful world of Python at your fingertips, in Lisp form.

Kodiologist · on Sept 22, 2024

> this compiler is written in Python

Yes, that's right. Hy is not self-hosted.

> The various ways you can embed a Lisp look very different and have very different tradeoffs.

Hy itself provides options. Typically the process is that the Hy source code becomes Python AST objects, which Python then complies and executes, but you can also translate the Python AST objects into Python source text. Or you can use Python from Hy or vice versa: https://hylang.org/hy/doc/v1.0.0/interop

rcarmo · on Sept 22, 2024

The "embed" part stems from the fact that you can mix Python and Hy in a project with bi-directional calling. Works great, because it is all Python byte code in the end.

PuercoPop · on Sept 22, 2024

The original hy annoucement makes it clear that they embed a Lisp by compiling with Python bytecode. You can see it in the following video about the 16:25 mark

https://m.youtube.com/watch?v=1vui-LupKJI

Foxboron · on Sept 22, 2024

and for those interested in history, Docker was first announced 10 minutes afterwards on the 26:24 mark.

Kodiologist · on Sept 22, 2024

Now I know how those guys felt who were on the same episode of Ed Sullivan that introduced the Beatles.

Foxboron · on Sept 22, 2024

There is a reason why Hylang was one of the first official Docker images!

Slackwise · on Sept 23, 2024

The actual statement in the video is:

> ...because this is a frontend like LLVM or GCC that compiles instead of bytecode, uh, to Python AST, um, so this Lisp compiles entirely to Python

@ https://youtu.be/1vui-LupKJI?t=1020

cooljoseph · on Sept 17, 2024

The Hilbert curve does contain every point in the unit square. It is a limit of curves, and so can contain points even not in the intermediate constructions. This is similar to how the limit of 1/x as x -> infinity can be 0, even though 1/x never equals 0.

cooljoseph · on Sept 17, 2024

Also, a curve which gets arbitrarily close to every point in the unit square actually touches every point in the unit square. This is because (by definition) a curve is a continuous map from a compact space (the unit interval) to a Hausdorff space (R^2), and so its image is compact, and thus closed. A closed set contains every point that it is arbitrarily close to.

yarg · on Sept 18, 2024

If I travel one half of the distance from where I am to the finishing line an infinite number of times, I reach the finishing line but still never finish the race.

With a Hilbert curve the entire plane becomes a limit.

yarg · on Sept 18, 2024

This doesn't seem to fly with the inductive fact that 1/2 of a power of two is always one over a power of two no matter how many times you perform the iteration.

There are a countably infinite number of rationals between any two rationals, you can even keep splitting up those rational infinitesimal gaps into countably many rationals that are infinitesimal even relative to the earlier infinitesimals.

And you still only end up with a countably infinite set of expressible locations and not the real continuum.

Either x, y, or both are guaranteed to be a number of that form for all values on the curve.