I can't share pricing details since they are confidential but if you just want to play with MIP you don't need to buy one of the big three (XPRESS, Gurobi, CPLEX) which are all very expensive but usually available for free for students. There are at least two good open source / free for non-commercial use MIP solvers available:
I've used both. They are waaaaaaaaaay faster, waaaay more reliable, and actually have support. You're not going to want to run your product that is responsible for millions off of something without really solid support.
You can get a temporary free license for Gurobi. You are limited to a 1000 node problem size, but you can learn how to use the tool and set up your problem.
If you have a problem that needs Gurobi, it’s worth paying for it. Talk with their sales team. They are happy to help you get started. They know once you know how to use it, and how it can solve problems you will be inclined to use it in the future.
> If you have a problem that needs Gurobi, it’s worth paying for it.
Thit statement is baed on the assumption that it is a "big money" problem. On the other hand, I know lots of problems interesting to nerds for which Gurobi would help (but nerds don't have the money).
If you have a "nerdy" problem you can probably get someone to write it up as a research paper and then it would easily fall under the academic license. To some extent, if you're buying a commercial license you're just paying for secrecy.
This is not true: to get an academic license of Gurobi, you have to be a member of a degree-granting academic institution (otherwise every person could easily (illegally) get one):
Plus, the setup is really fucking annoying... The number of times I had to reactivate my academic license of gurobi while in uni...
The speed is totally worth it though, literally orders of magnitude better than any alternative for wide problem classes. Plus the bindings are good enough that you rarely ever need to drop into c++
If you can't even get a random member of any degree-granting institution (could just be random research staff, a student or adjunct faculty) to take some interest in your optimization problem as a subject for publishable research, does it even qualify as a "nerdy" problem?
Their price list wasn't that confidential last I spoke with the sales team. It depends on the license type. Last I heard, it's around $15k/year for a standard subscription license. You can probably trial it for free, or be a student and have longer free access.
We kind-of have that in DeepSeek-R1-zero [1], but it has problem. From the original authors:
> With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing.
A lot of these we can probably solve, but as other have pointed out we want a model that humans can converse with, not an AI for the purpose of other AI.
That said, it seems like a promising area of research:
> DeepSeek-R1-Zero demonstrates capabilities such as self-verification, reflection, and generating long CoTs, marking a significant milestone for the research community.
Despite the similar "zero" names, DeepSeek-R1 Zero and AlphaGo Zero have nothing in common.
AlphaGo came before AlphaGo Zero; it was trained on human games, then improved further via self-play. The later AlphaGo Zero proved that pre-training on human games was not necessary, and the model could learn from scratch (i.e. from zero) just via self-play.
For DeepSeek-R1, or any reasoning model, training data is necessary, but hard to come by. One of the main contributions of the DeepSeek-R1 paper was describing their "bootstrapping" (my term) process whereby they started with a non-reasoning model, DeepSeek-V3, and used a three step process to generate more and more reasoning data from that (+ a few other sources) until they had enough to train DeepSeek-R1, which they then further improved with RL.
DeepSeek-R1 Zero isn't a self-play version of DeepSeek-R1 - it was just the result of the first (0th) step of this bootstrapping process whereby they used RL to finetune DeepSeek-V3 into the (somewhat of an idiot savant - one trick pony) R1 Zero model that was then capable of generating training data for the next bootstrapping step.
That's not what happened. R1-Zero is a model per se, released with a different set of weights. Also it's not an intermediate step obtained making R1. In R1, a first SFT was performed before the RL training. While R1-Zero performed ONLY the RL training (on top of the raw V3).
Of course it's hard to argue that R1-Zero and AlphaZero are very similar, since in the case of AlfaZero (I'm referring to the chess model, not Go) only the rules were known to the model, and no human game was shown, while here:
1. The base model is V3, that saw a lot of thigs in pre-training.
2. The RL for the chain of thought has as target math problems that are annotated with the right result. This can be seen as somewhat similar to the chess game finishing with a positive, negative, or draw result. But still... it's text with a problem description.
However the similarity is that in the RL used for R1-Zero, the chain of thought to improve problem solving is learned starting cold, without showing the model any CoT to fine tune on it. However the model could sample from the V3 latent space itself that was full of CoT examples of humans, other LLMs, ...
From reading the R1 paper, it seems the steps were:
1) V3 --RL--> R0
2) R0 generates reasoning data, which is augmented to become "cold start" dataset
3) V3 cold-start-dataset SFT -> intermediate model --RL--> final intermediate model
4) intermediate model generates reasoning data, which is augmented to create 600K reasoning samples, to which is added 200K non-reasoning samples = 800K
5) V3 800k SFT -> R1 --RL--> R1 final
Is that not a correct understanding ?
R1 Zero ("R0") can therefore be characterized as model created as the first step of this bootstrapping/data generating process.
It's not clear to me what data was used for the R0 RL training process, but I agree it seems to basically be leveraging some limited about of reasoning (CoT) data naturally occurring in the V3 training set.
From a quick survey of the implementation probably not very well since, for example, it is using dynamic dispatching for all distance calculations and there are a lot of allocations in the hot-path.
Maybe it would be better to post this repository as a reference / teaching implementation of HNSW.
The "autofree" memory management [1] seems quite interesting, and a very cool mixture between garbage collection with static memory management. Has been there been any seminar / conference talk held on this topic related to the pros / cons of this approach vs a pure GC memory management strategy, since it seems like it could be used in _almost_ all modern GC languages.
This is an interesting lawsuit as it is a *company* seeking damages due to DMCA abuse by a third party, and my understanding is that this is very widespread (no source sorry). Is anyone aware of any studies done on the total cost¹ (vs benefit¹) of the DMCA due to malicious actors?
¹ defining what is a cost vs benefit (and how much) is probably the hardest part, maybe after acquiring the necessary data.
I don't believe this is how it is actually implemented in _most_ companies. Where I work every PR must have a linked story / bug / etc but anyone has the rights to create a story so it acts more as a way to track what changes actually goes into a release for x-teams to review and see if they need to document it, etc.
In regard to refactors, people tend to just squash them into another change they are making. This makes the git log a bit harder to follow at times, but people did this back when we just used to push to trunk too so I don't think the story is the deciding factor.
You wrote: <<I don't believe this is how it is actually implemented in _most_ companies.>>
I would say for non-tech companies with a strict set of IT guidelines, this is mostly true. Please ignore non-tech companies with weak or zero IT culture. It will be the 'Wild West' at those places! Nothing will be maintainable beyond a certain size because there will be so much key person dependency.
For pure tech or tech heavy (banking, insurance, oil & gas, etc.), there is frequently more flexiblity, including "dummy Jiras" just to track a non-QA'able code change like upgrade C++ / DotNet / Java / Python library, or refactor some code. In my experience, 'Jira-per-commit' rule isn't awful, as long as tech debt does not require non-tech approval, and the ticket is just a tracking device. (A few different vendors offer very nice total integration between issue ticket, bug ticket, pull request, code review, etc.) Just a one liner in the Jira should be enough. In my experience, the best teams try hard to "do what works for us", instead of be a slave to the Jira process. Yes, I realise this is highly dependent upon team and corporate culture!
Finally, I would be curious to hear from people who work in embedded programming -- like automotive, aeronautical, other transport, and consumer electronics. I have no experience in those areas, but there is a huge number of embedded programmers in the world! Do you also have a very strict 'Jira-per-commit' rule?
I've taken the opposite approach a lot of times. So instead of asking what you can add you might want to look at what apps / bookmarks that you can remove. Do you need a link to the reddit frontpage, or can you narrow it down to one or two specific subreddits that you want to check, or maybe remove it entirely if it does not add much value to your daily life.
Apps that I would recommend however are many of the apps that tries to gamify physical activity. I use Garmin, but I am not sure if the app works without owning the accompanying smart watches and there are plenty of alternatives, and the gamification and accountability that it offers around physical activity makes it a lot easier to get out of the door (which is always the hard part). Physical activity has long been known to have a huge positive effect on mental capacity and health so well worth spending an hour or so every day on.
This reminds me of an old paper [1] that discuss the performance characteristics of different array layouts for searching in particular. The conclusion is heavily based on the number of cache misses and branch predictor misses that binary search has for different array layouts.
Doesn't have much practical application unfortunately since there is almost zero support for things like eytzinger layout in most standard libraries and sorting an array with a eytzinger layout is a bit harder than a non-decreasing layout.
https://highs.dev/ https://www.scipopt.org/