Hacker Newsnew | past | comments | ask | show | jobs | submit | 6d65's commentslogin

Keep up the good work and thank you for sharing.

I wasn't sold about the source level snapshot tests, they seemed clunky and not that universal. But, I am definitely giving snapit a try after this blog post, and the other two linked.

I might have to add a few things, most notably the support for multiline strings for more readable ascii art based prints, but I have a good feeling about this, but will see how it goes.


I have been tracking 5950x prices and anything over $300 seems unreasonable. And I think I'm just going to get a 7700x and a new MB and 32GB of RAM, it should be a $500-sh upgrade. It should have M3 like single threaded perf, multi threaded perf better than 5950x, slightly worse than M3 Max, better memory bandwidth than the 5950x.

You're right about just having a beefier PC, but maybe a baseline M2/M3 Air to SSH into a local machine + a beefier PC is a good combination. Luckly I switched to an all in terminal dev experience with Neovim/Helix/Tmux and the advantage is that one can seamlessly mix ssh/mosh shells in this workflow. Don't even need to set up any VNC/RDPs/Moonlights. Maybe even that is not needed if your languages supports ccache like tools.

I'm mostly interested in compilation speed, maybe running some programs faster on beefier hardware and I don't mind working on Linux.

So to recap, a baseline Air for portability(or some light ryzen laptop) plus a medium range PC seems like a winning combination for me. Given you already have a case/psu/ssd together they can cost as an entry level m3 macbook pro with 8GB of RAM.


Since everyone is sharing, I'm using bookmarklets daily, the main ones being 3x and 2x, which find the video element in the current page and speed it up.

I also made some overlay grid helpers, like 8h and 8v which create some overlay pass through elements and render red grid lines(8px spacing) using css. This is like an alignment tool for every page.

The bad thing is that Chrome and Edge don't highlight bookmarks by default when searching and makes one to type the top arrow key. Firefox used to be better in this regard, with the bookmarklet highlighted and ready to hit. The workaround for Chrome and Edge is to make the bookmarklet a search engine, that makes the bookmarklet selected by default.

Bookmarklets and custom search engines (for various online documentation sites, or plain jira, or aws) can be a productivity booster.

I wouldn't be able to navigate aws ssm without my trusty ssm search engine. Same for the others: EC2, CF, S3.

I should be using bookmarklets more though.


Another great post in the series.

I've been trying to bootstrap my own deep learning framework in Rust for a while now. I'm still stuck at implementing the stuff on the CPU.

But I've always wanted something running on GPU as well. I've poked at OpenCL, and webgpu. But Piet-gpu(more probably piet-hal) seems to be the best starting point in the Rust land.

Once I get to the GPU compute part, I'll start with Piet GPU. Not looking forward to debugging GPU compute kernels, but maybe it will be fun.

PS. It's a bit disheartening to see such buggy Vulkan implementations, since one of the main Vulkan selling points was less bugs in the drivers. I'm not sure what I'll see on Linux.

Also, I hoped that between Vulkan and Metal one could cover all the the major desktop OSes with GPU accelerated software. It's sad to see apple dropping the ball in this regard.

There is also the concern of code reuse and code quality. Doing something complex in one pass would mean a huge kernel. It would be interesting to investigate if something like openai's Triton can be implemented in top of Piet-gpu(HAL). A dsl embedded in Rust that could do kernel fusion, and generate one shader when composed together. I'll poke around at this when I get to Piet-gpu.


Let's talk if and when you want to build something. I make no promises that piet-gpu-hal is suitable for other workloads (right now I would say it barely meets the needs for piet-gpu), but on the other hand I am interested in the question of what it lacks and what it would take to run, eg, machine learning workloads on top of it.


There's a twitch streamer(tsoding) doing a FORTH like language from first principles.

I've watched a couple of episodes and went started my own. I learned a bit of arm64 assembly, and now I have both an interpreter and a compiler to M1 binaries. Very fun experience.

One day I'll slap a F#/OCaml'esque syntax on top of it with typed functions, structs, interfaces and see if it works.

At their core(it seems to me), concatenative languages are about function (words) composition.

So, instead if having "words", I would make them functions, with typed arguments, and simple argument matching (no more dups and swaps). This would probably make the language less flexible, but will make it more readable.

Anyway, FORTH is nice.


Sounds like you're in the process of reinventing Factor.

Which is not a bad thing. Go nuts. Just realize that there may be some degree of prior art for what you're doing.


I'm vaguely familiar with Factor and hopefully not in process of reinventing it.

I was planning to investigate this space just for fun. Most likely nothing will come out if it and that's ok, it's just a learning opportunity.


Nah fam, like I said, go nuts. Even if something exists, reimplementing it is great for the challenge/intellectual curiosity/fun factor. I learned Ada by reimplementing Unix utilities in it.


> So, instead if having "words", I would make them functions, with typed arguments

Interesting idea. Wouldn't it require some compiler magic, though? E.g. you can't type a Y combinator.


Maybe I'm not using the proper FP function term. I don't mean functions as objects that you can store or pass around or higher order functions. Just take some arguments from stack, and return some values.

What I mean is that FORTH words seem to be functions taking stuff from stack and pushing stuff back. The composition is done by chaining them together.

This probably works fine with repl driven development(same as in lisp), you find out during development that your words cannot be composed. But as mentioned by others it may make the code hard to read, having to keep the state id the stack at all times in your head.

I was thinking just adding a light syntax, ex:

let add a b = a b +

// optional type annotation

let square a = a a *

let add-square = add square

Maybe add some type inference, support for structs, match expressions. But at the core keep the low level FORTH nature, with an interpreter and compiler.

It remains to be seen if it's possible to reach a balance here. FP programmers may be disappointed this is not a proper FP language, while FORTH programers may say it's a syntax ridden abomination.

I would say with some tooling (editor support, test framework, god forbid package manager), it can be an interesting option for embedded development.


In many forths you can have what they call locals, so your example can be:

    : add {: a b :} a b + ;


I was thinking not too long ago about how phones are missing an installable os with a clear business plan.

There is the potential(given an unlocked bootloader), to use webusb to allow people flash the OS from an official website.

And maybe at a later day allow install from phone to phone via a cable, as a lot of people nowadays don't have a PC.

The pricing will be tricky though, $5 per install given a large user base, plus maybe some additional features for sale.

That being said, having an OS also means having an appstore, that alone could generate enough revenue to make the OS free.

By OS I mean an Android fork or a fuchsia based OS if the license allows it.


Yup. Considering my Android ROM can reach thousands of various phone models, from 30$ devices to 1000$ deviecs, and I can upgrade devices stuck on 3 years-old Android to latest Android versions, I believe I have a good basis to monetize it.

And so far, I haven't seen any path towards monetization.

Most people won't give you directly money, unless maybe you target very specific niches, like privacy or FLOSS (And I don't believe I can address either of those).

As you mentioned, another possibility is to get revenue through app store (or other revenue sharing sources) possibilites. There is the obvious one, Google Play Store. I simply can't bear Google's bureaucraty, and if I were to try that, I'd have died of old age.

Other actors would be much more willingly to discuss to 3rd parties, maybe Aptoide or Amazon Store would be interested? But then, you can't have Google Services, and usage would be much more limited. Also, I believe that the revenue from people out of Amazon Store is much lower than Play Store, because people have less trust towards it (but I could be wrong there).

5$ per install is imo far from enough, because it barely pays for the bandwidth to download OTAs for one year. Unless you mean 5$ per install and per OTA, but I don't like the idea of having the user pay for security upgrades. Also, it lacks an important property to replace revenue sharing, which is adaptability to users' money (if the user has a lot of money, the revenue sharing will give more money, and it's better for reach to still have a place for low income people, notably indian peoples can give you a lot of advertisement "for free" but they don't have much money)


You have fair points and much more experience than I have in this domain.

Yeah, $5 even at a 1M users is probably not that much, having to maintain drivers, and as you mentioned serving updates, that would require a team.

With regards to the Google apps, I think a company would be better off trying to make something of their own, or collaborate with some app makers, at least for difficult stuff like maps, camera, gallery, browser(not sure about the paid codecs)

As for the app store, I know that's a big project, but that would have to be custom and a core product of the company.

Hope you'll eventually find a way to monetize your ROM.


For what it's worth, I indeed thought of a a direction where app store would be the major aspect of it.The idea would be this ROM would be a "safe place". With regards to app store, this would mean:

- with "all you can eat" monthly subscription that gets you all the apps included, so no whaling allowed

- and with severe policing against dark patterns

- including a "panic button" so users can report whether they got addicted to some app.

But when I tried to design that, I very quickly ended up into a rabbit hole of "how to compute which app deserves which share of the revenue". And then, how to define "dark patterns".

And here I am, still haven't actually tried anything.


Is there a desktop OS with a clear business plan?

Windows is collecting data and milking a declining Enterprise client base, all the early "influencer" companies use macOS. Regular consumers essentially refuse to pay for Windows even if they prefer it.

macOS isn't installable and is subsidized by hardware sales and monthly services.

That leaves you with *nix, which is free. No business plan!

In 2021 I don't think operating systems are a good business to be in! You can make more money selling subscription sleep apps or budget apps which have way less complexity.


Well, Microsoft became quite a big company off the OS and their business suite.

Of course it will be hard to find corporate clients for a mobile os, but may be doable.

But, you're right, aside from Microsoft and Red hat, and maybe CoreOS some time ago, people don't seem to earn money from OSes, unless I'm missing something.


Automotive. Looks like combined native qt android. Which jolla knows...


I will definitely give it a try.

Even after many years of not using terralang I still cannot forget what a good of an idea it is.

Nelua seems like a more pragmatic implementation of similar ideas, but generates C code instead of embedding the llvm. And doesn't generate code at runtime. But still, things like ecotypes should be possible.

It will be interesting to play with the compile time lua scripting. Also, as mentioned in the other comments not sure about the GC. But there seems to be a manual memory management option.

But still, looks great, kudos to the author, keep up the great work.

PS: If I would implemented it, I would deviate a bit from Lua and replace local with let. It's highly subjective but I think it would make code "prettier"(whatever that means)


> PS: If I would implemented it, I would deviate a bit from Lua and replace local with let. It's highly subjective but I think it would make code "prettier"(whatever that means)

The idea behind it is to have the lowest syntax barrier possible for Lua developers so they can migrate from Lua to Nelua without a sweat.


Yep, it makes sense.

With the popularity of Roblox with the young people, and it using Lua for scripting, Lua can see a renaissance of some sorts.


Could you macro up an alias if you wanted to?


You could change the language grammar through the preprocessor to accept "let" as an alias for "local". But I recommend people get used to Lua syntax because the metaprogramming will be done in Lua anyway, thus both programming and metaprogramming contexts have similar syntax.


As long as we're bikeshedding, I've often thought that Lua should warm up to using `my` as a keyword in place of `local`. `let` doesn't convey that it's a scope keyword (LISP heritage notwithstanding), while `local` is long and `loc` is an eyesore.


Eyesore is a precise word for the local keyword, from what I see the variable names in scripts are usually short, and the local keyword seems to draw more attention than the variable names themselves.

Of course this can be alleviated with syntax highlighting that would slightly mute the local keyword. And, people using Lua they day probably learn to ignore the keyword automatically.


By ecotypes i meant exotypes(see https://cs.stanford.edu/~zdevito/pldi083-devito.pdf)


Bought the PDF after reading the free web version, the work definitely has a lot of love put into it and this has to be rewarded.

I've skimmed the web version till the end, but thoroughly read half of it, still learned a lot. Especially the pragmatic approach to code generation via string concatenation. I went a bit further with a small dsl that also does string concatenation.

My C# and Rust implementations are half baked, but still I had a lot of fun doing them, and that's all that counts.

Can't wait to see Bob's next endeavor.


AFAIK, It's mainly used for implementing gradient descent, which is used for training neural networks.

Frameworks like pytorch, tensorflow, probably used back propagation to calculate the gradient of a multidimensional function. But in involves tracing, and storing the network state during the forward pass.

Static automatic differentiation should be faster and should look a lot like differentiation is done mathematically rather than numerically.

Of course there are more applications to AD in scientific computing.


I think Swift is also going into this direction of backing in directly into the compiler and provide it as a higher level language construction.

https://github.com/apple/swift/blob/main/docs/Differentiable...

Which leads to "Swift for Tensorflow" that unlike other languages like Java, Go or Python is not just about bindings to the C++ tensorflow library.


You're right about Swift for Tensorflow. It is an interesting development. I have installed swift on Linux just to play with it. Was dissapointed to see that Swift for Tensorflow was a fork of the Swift compiler, and haven't followed whether it's being refreshed, and whether there are any plans to merge it back.

Ideally an LLVM tool should allow languages that compile to LLVM(I'm mostly interested in Rust and Julia) to leverage it without dramatically changing their compilers.

It would be interesting to see something like JAX in Rust, exposing the AD functionality in the Standard Library, paired with a high performance SIMD/GPU array type. Things could get very interesting.


I don't see how static AD removes the need to store the network state. Is this a fundamental property of static AD?

Also, your statement sounds like pyTorch/TF are doing AD numerically, which is not the case. They build the analytical gradient from the traced computation graph.


Reverse mode AD can always get into situations where it needs to store original values (i.e. network state).

One advantage, however, of doing a more whole-program approach to AD rather than individual operators is that one might be able to avoid caching values unnecessarily. For example if an input isn't modified (and still exists) by the time the value is needed in the reverse pass, you don't need to cache it but can simply use the original input without a copy.

And yes PyTorch/TF tend to perform a (limited) form of AD as well, rather than do numerical differentiation (though I do think there may be an option for numerical?)

I wouldn't really position a tool like Enzyme as a competitor to PyTorch/TF (they may have some better domain-specific knowledge after all), but rather a really nice complement. Enzyme can take derivatives of arbitrary functions, in any LLVM-based language rather than the DSL of operators supported by PyTorch/TF. In fact, we built a plugin for PyTorch/TF that uses Enzyme to import custom foreign code as a differentiable layer!


> One advantage, however, of doing a more whole-program approach to AD rather than individual operators

I was under the impression that the big ML frameworks (and surely JAX with jit) are doing optimization on the complete compute graph, too.

I didn't want to make this discussion too TF/pyTorch focused (I'm not even a ML researcher). But your optimization claims sound like the other AD frameworks are not doing any optimization at all, which is not the case.

I was also thinking about derivatives of functions which are doing something iterative on the inside, like a matrix decomposition (combined with linear solve and/or matrix inversion). While a "high level" AD tracer can identify an efficient derivative of these operations, your LLVM introspection would only be able to compute the derivative through all the internal step of the matrix decomposition?


Oh for sure, any ML framework worth its salt should do some amount of graph rewriting / transformations.

I was (perhaps poorly) trying to explain how while yes AD (regardless of implementation in Enzyme, PyTorch, etc) _can_ avoid caching values using clever tricks, they can't always get away with it. The cache-reduction optimizations really depend on the abstraction level chosen for what tools can do. If a tool can only represent the binary choice of whether an input is needed or not, it could miss out on the fact that perhaps only the first element (and not the whole array/tensor) is needed.

Regarding Enzyme v JaX/etc, again I think that's the wrong way to think about these tools. They solve problems at different levels and in fact can be used together for mutual benefit.

For example a high-level AD tool in a particular DSL might know that algebraically you don't need to compute the derivative of something since from the domain knowledge it is always a constant. Without that domain knowledge, a tool will have to actually compute it. On the other side of the coin, there's no way such a high level AD tool would do all the drudgery of invariant code motion, or even lower level scheduling/register allocation (and see Enzyme paper for reasons why these can be really useful optimizations for AD).

In an ideal world you want to combine all this together and have AD done in part whenever there's some amount of meaningful optimization (and ideally remove abstraction barriers like say a black box call to cudnn). We demonstrate this high and low level AD in a minimal test case against Zygote [high level Julia AD], replacing a scalar code which is something Zygote is particularly bad at. This thus enables both the high level algebraic transformations of Zygote and the low level scalar performance of Enzyme, which is what you'd really want to do.

It looks like the discussion of this has dropped off for now, but I'm sure shoyer would be able to do a much better job of listing interesting high-level tricks JaX does [and perhaps low level ones it misses] as a consequence of its choice of where to live on the abstraction spectrum.

Also thanks for reminding me about matrix decomposition, I actually think there's a decent chance of doing that somewhat nicely at a low level from various loop analyses, but I got distracted by a large fortran code for nuclear particles.


You might be right, I seem to have confused the pytorch(and tf eager mode) differentiation approaches with numerical methods.

I'm making a Pytorch inspired ML framework, and indeed, each op node, defines also a backward pass, which is a manual definition of a derivative. And going backwards over the ops graph, and combine derivatives for each op via chain rule to get the final gradient, looks indeed like a runtime analytical method rather than a numerical one.

The advantage of an automatic AD is not having to define the backward pass for each op, and the function that calculates the derivative being generated at compile time.

I've left the project marinate a bit, so the little knowledge I had is fading away.


The way I see it, this is what programmers do, they turns things into programs. A good example are hardware description languages, where hardware designers write programs, and at the end a very complex chip design is generated. Which is then taped out. Those are very complex systems with very complex constraints. The output of these are locations of hunderds of billions of transistors with their interconnects.

Assuming you know JavaScript, you get some advantages over doing it via gui:

* Source control support: the model is text and can be stored in git. Multiple people can work incrementally in a branch. I assume cad models are binary, or very complex text. Having readble code will also allow for code reviews.

* Parametric modeling: you can define components (parts) as functions with function parameters to customize the behavior. You can have if else statements inside the functions to change the behavior. For example you can have a bolt function that can generate any type of bolt based on parameters

* Automation : In a programming language you can have loops that can generate any amount of components, and you can customize position, and object parameters in the loop. Making grids if objects is trivial.

* Testing : if you're using a full language like JavaScript, you can write unit tests for your more complicated parametric models. This can help with edgecases, and make your models more resilient.

* Code reuse : you can create a standard library of parametric models, ex: pipes, bricks, bolts and so on, and use them quickly on new projects

* IDE support : programming languages often have ide support that make reafactoring, code navigation, code exploration much simpler

* Familiarity : learning a complex gui app can take time, especially when the GUI is very complex. If the person already knows JavaScript, and can use an ide, with fuzzy searching through the symbols. In my opinion this is much simpler to start and get complex results than searching for some obscure feature in a deeply nested GUI. Of course this can be alleviated by a fuzzy search of UI functionality from a single place, ex: GIMP / command, and the new blender searchable menus.

Anyway, these are a few from the top of my head. There are probably many more. Of course, there are disadvantages, mainly that not everyone can program, and of course the people who use CADs are less likely to be familiar with coding. But capable people can learn quickly.

I'm looking at this as a software developer than wants to design some complex models, but doesn't want to learn a complex UI of a cad system. I'm quite familiar with working on complex systems in code, and find this approach much more suitable for me.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: