A first-order Markov-chain text generator is not complex at all; it's a one-line shell script, reformatted here onto three lines for clarity:
perl -ane 'push@{$m{$a}},$a=$_ for@F;END{
print$a=$m{$a}[rand@{$m{$a}}]," "for 1..100_000
}' <<<'fish fish for fish.'
Given the King James Bible instead of "fish fish for fish." as input, this program produces output like this:
25:13 As the hands upon his disciple; but make great matter wisely in their calamity; 1:14 Sanctify unto the arches thereof unto them: 28:14 Happy is our judges ruled, that believe. 19:36 Thus saith the first being one another's feet.
This leans pretty hard on Perl DWIM; a Python version is many times longer:
import random, sys
model, last, word = {None: []}, None, None
for word in (word for line in sys.stdin for word in line.split()):
model.setdefault(last, []).append(last := word)
sys.stdout.writelines(str(word := random.choice(model.get(word) or words)) + " "
for words in [list(model.keys())] for i in range(100_000))
(If you write code like that at a job, you might get fired. I would reject your pull request.)
It's probably worth mentioning:
- Markov-chain states don't have to be words. For text modeling, a common thing to do is to use N-grams (of either words or characters) instead of single words for your states.
- It's good to have some nonzero probability to take a transition that hasn't occurred in the training set; this is especially important if you use a larger universe of states, because each state will occur fewer times in the training set. "Laplacian smoothing" is one way to do this.
- Usually (and in my code examples above) the probability distribution of transitions we take in our text generator is the probability distribution we inferred from the input. Potter's approach of multiplying his next-state-occupancy vector Ms⃗ by a random diagonal matrix R and then taking the index of the highest value in the resulting vector produces somewhat similar results, but they are not the same; for example
sum(random.random() * 0.75 > random.random() * 0.25 for i in range(1_000_000))
is roughly 833000, not roughly 750000. It's 5× as common instead of 3×. But I imagine that it still produces perfectly cromulent text.
Interestingly, modern compilers derive from Chomsky's theory of linguistics, which he formulated while working on an AI project in the 01950s in order to demolish the dominant theory of psychology at the time, behaviorism. Behaviorism essentially held that human minds were Markov chains, and Chomsky showed that Markov chains couldn't produce context-free languages.
This round started at $5bn target and it ends at $13bn. When this sort of thing happens it's normally because the company wants to 1) hit the "hot" market, and 2) has uncertainty about their ability to raise revenues at higher valuations in the future.
Whatever it is, the signal it's sending of Anthropic insiders is negative for AI investors.
Other comments having read a few hundred comments here:
- there is so much confusion, uncertainty, and fanciful thinking that it reminds me of the other bubbles that existed when people had to stretch their imaginations to justify valuations
- there is increasing spend on training models, and decreasing improvements in new models. This does not bode well
- wealth is an extremely difficult thing to define. It's defined vaguely through things like cooperation and trade. Ultimately these llms actually do need to create "wealth" to justify the massive investments made. If they don't do this fast this house of cards is going to fall, fast.
- having worked in finance and spoken to finance types for a long time: they are not geniuses. They are far from it. Most people went into finance because of an interest in money. Just because these people have $13bn of other people's money at their disposal doesn't mean they are any smarter than people orders of magnitude poorer. Don't assume they know what they are doing.
Many years ago [1] I was led to a marvelous site explaining many subtleties of human color perception, including the three cone primaries. The author called this pure-M-cone response "psychedelic aquamarine"; the page is offline but the archive has a capture [2]. I haven't seen this color referred to by that name elsewhere, but I think it's a good name.
I think I can see psychedelic aquamarine and the other cone primaries by closing my eyes and rubbing my eyelids while pressing in gently but firmly.
"Look, I don't know... I really don't know. We're talking about a country's gold, very valuable, very beautiful. And we might freeze it. Or we might not. I mean, who knows? It's a big decision, a very big decision. Some people say, 'Freeze it, Mr. President! It'll be great, the best!' Others, and good people, they say, 'Maybe not, maybe we do something else.' But I'll tell you what, I do like frozen things. They're nice. Very nice. You know, you freeze something, it's solid, it's secure. It's beautiful. It's really beautiful. So we'll see. We'll see what happens. It's all on the table. Believe me."
NYT was writing a piece about rationality, SSC, lesswrong, east bay, maybe AGI, etc.
Evidence suggests it wasn't a hit piece (at least initially) and was just about the rationality community/bay area influence since a lot of people don't really know about lesswrong.
As part of it NYT said they had to reveal his real name and he asked them not to (details described in that blog post). This created controversy.
The ? syntax agrees that errors should just be regular values returned from functions, and handling of errors should be locally explicit. It's not a different approach from `if err != nil return err`, it merely codifies the existing practice, and makes expressing the most common cases more convenient and clearer.
It's clearer because when you see ? you know it's returning the error in the standard way, and it can't be some subtly different variation (like checking err, but returning err2 or a non-nil ok value).
The code around it also becomes clearer, because you can see the happy path that isn't chopped up by error branches, so you get high signal to noise ratio, fewer variables in the scope, without losing the error handling.
For me, Urban Dictionary[0] defines this issue much more clearly:
> When this term became popularized, initially the meaning of this term was when an individual become more aware of the social injustice. Or basically, any current affairs related like biased, discrimination, or double-standards.
> However, as time passed by, people started using this term recklessly, assigning this term to themselves or someone they know to boost their confidence and reassure them that they have the moral high grounds and are fighting for the better world. And sometimes even using it as a way to protect themselves from other people's opinion, by considering the 'outsider' as non-woke. While people that are in line with their belief as woke. Meaning that those 'outsiders' have been brainwash by the society and couldn't see the truth. Thus, filtering everything that the 'outsider' gives regardless whether it is rationale or not.
> And as of now, the original meaning is slowly fading and instead, is used more often to term someone as hypocritical and think they are the 'enlightened' despite the fact that they are extremely close-minded and are unable to accept other people's criticism or different perspective. Especially considering the existence of echo chamber(media) that helped them to find other like-minded individuals, thus, further solidifying their 'progressive' opinion.
> 1st paragraph
>"Damn bro, I didn't realize racism is such a major issue in our country! I'm a woke now!"
> 2nd paragraph
> "I can't believe this. How are they so close-minded? Can't they see just how toxic our society is? The solution is so simple, yet they refused to change! I just don't understand!"
> 3rd paragraph
> "Fatphobic?! Misogyny?! What's wrong with preferring a thin woman?! And she is morbidly obese for god sake! Why should I be attracted to her?! Why should I lower myself while she refuse to better herself?! These woke people are a bunch of ridiculous hypocrite!"
I honestly don't think it's coming through to people just how limited the Switch hardware really is. Sure, it is a modern machine, but the frame budget is just nothing compared to most home consoles. The Tegra X1 was a pretty decent mobile SoC... For 2015. And the one in the switch actually runs at a lower clock speed than NVIDIA was running it at before. The crowning achievement for the X1 was that it could technically boot UE4 games, but frankly as evidenced by a lot of Switch titles, it doesn't do all that well if the developers don't put a lot of effort into optimizing it. Nintendo is overall pretty good at making stuff that looks good on their own hardware, but porting games designed to run on other flagship consoles to the Nintendo flagship seems like it has been quite a challenge for the past ten years or so.
> Miracle? Eh. I'd rather pick up the PS4 copy for half the price and enjoy the game. It's not really a game designed to be played on a bus.
This is more of a nitpick but the Nintendo Switch definitely is made with Japan in mind. I am not a cultural expert on Japan by any means but it surely seems like people living near city centers have quite little space to work with, leading to folks using iPad setups instead of desktop or laptop computers. The Switch is certainly compelling in this regard, if you don't have much space for a comfortable TV setup at home. I think viewed through this lens, the contortions make a lot more sense. This also explains why Valve has been pushing very hard in Asia in general with the Steam Deck, and aggressively pricing it as well.
This is a wonderful write-up and a very enjoyable read. Although my knowledge about systems programming on ARM is limited, I know that it isn't easy to read hardware-based time counters; at the very least, it's not as simple as the x86 rdtsc [1]. This is probably why the author writes:
> This code is more complicated than what I expected to see. I was thinking it would just be a simple register read. Instead, it has to write a 1 to the register, and then delay for a while, and then read back the same register. There was also a very noticeable FIXME in the comment for the function, which definitely raised a red flag in my mind.
Regardless, this was a very nice read and I'm glad they got down to the issue and the problem fixed.
"I'm Starting to Worry About This Black Box of Doom"
Highly recommend it to folks, especially if you enjoy Pargin's other works ("John Dies at the End", "Futuristic Violence and Fancy Suits"). I am continually in awe at how he is evolving as a writer.
His characterizations and insights convey a unique and profound interest in the world we live in, and it’s clear he takes great care in understanding others and what makes them who they are.
It'll make you rethink some of your relationships / reactions to the current world (social media, other humans, etc.).
Isn't it more sensible to just check that the params that are about to be sent to memcpy be reasonable?
That is why I tend to wrap my system calls with my own internal function (which can be inlined in certain PLs), where I can standardize such tests. Otherwise, the resulting code that performs the checks and does the requisite error handling is bloated.
Note that I am also loath to #DEFINE such code because C is already rife with them and my perspective is that the less of them the better.
At the end of the day, quick and dirty fixes will prove the adage "short cuts make long delays", and OpenBSD's approach is the only really viable long-term solution, where you just have to rewrite your code if it has ill-advised constructs.
For designing libraries such as C's stdlib, I don't believe in 'undefined behavior', clearly define your semantics and say, "If you pass a NULL to memcpy, this is what will happen." Same for providing a (n == 0), or should (src == dst).
And if, for some strange reason, fixing the semantics breaks calling code, then I can't imagine that their code wasn't f_cked in the first place.
I also had an issue where Rust was recompiling dependencies unnecessarily. It turned out to be because rust-analyzer has one more level of `Bash` than VSCode's integrated terminal. My `.bashrc` was loading something unconditionally which meant if you nested bash sessions the PKG_CONFIG_PATH changed (duplicate entries).
I had OpenSSL as a dependency, and if `PKG_CONFIG_PATH` changes it rebuilds it (this is correct), which means if you make an edit to a file and save it, rust-analyzer would blow away the cache, then you build it on the command line and it blows it away again.
To test:
1. If you quit VSCode and do an incremental build is it still slow?
2. Try `CARGO_LOG=cargo::core::compiler::fingerprint=info cargo build` (took me a while to find that; there is a bug open to make it less stupidly hard to find).
That will print a load of info about why Cargo is rebuilding stuff. Note that it isn't really in a sensible order - the first message isn't necessarily the cause. The message about the environment variable changing was somewhere in the middle for me, so read the whole log.
I've tried following that same guide, and I believe I've tried every thing on that blog post except the mold linker, because I'm on MacOS, and the MacOS version of the mold repo says "use the default linker if you have XCode 15 or higher."
And I do think I have a pretty granular crate system (would be _very_ happy to hear otherwise, because that would mean there's low hanging compile time fruit!): https://github.com/dnaaun/heimisch
My current _incremental_ compilation time swings anywhere between 15 seconds and 3 minutes (no, I'm not kidding). And I work on an M3 max macbook pro.
---
Things that I suspect are making my compile times worse:
1. The fact that I am doing SSR, which means my frontend code is included in the backend code as well.
2. I _think_ Rust is unnecessarily recompiling dependencies on incremental builds? (I don't understand how incremental compilation times can be so bad otherwise). But I'm clueless about how to go about debugging that.
Facebook did a technical paper where they described their training cluster and the sheer amount of complexity is staggering. That said, the original poster was interested in inferencing, not training.
I finally read Nabokov's Pale Fire. It is far and away the best book I have ever read. I think about it multiple times a week unprompted and I'm sad because I am certain that I will never find another book like it.
In that respect, sure; however, there's also a lot of lawlessness without repercussions there, like year(s)-long long-cons where people climb up the corporate ladder to eventually empty out said corporate's wallets and assets.
Sure, the player character involved will be marked for life, but they can just move the assets to a new character and nobody will be the wiser.
That said, the game was (or seemed to be designed) that the bigger alliances should be fighting for terrain all the time, but as it turns out they recognize it's mutually beneficial to not wage war. I remember reading about when they introduced titans, and that they were stupid expensive and involved to make (took like a month or so? plus all the raw materials). But the alliances scaled up, stocked up, and now there's hundreds if not thousands of the things and they rarely, if at all get used and lost in combat. So there's no "sink" of those things. I haven't heard of any major conflict involving titans since B-R5RB [0], but looking at that article there was another one in 2020 apparently that was a bit more costly ($378K in real money).
The problem that Eve has is also its unique selling point, it's all one big universe, so there can be battles involving over 6000 people in one system. But the game's internal clock slows way down, meaning that inside one system time moves at 1/10th of what is out there, also meaning that it feels like reinforcements can arrive within seconds or minutes instead of how long it should usually take. And they still do a once per day server shutdown for maintenance, but idk if these battles continue after that, or if they suspend shutdown during these big fights nowadays.
• "Laws of Form" by George Spencer-Brown, a little book that describes how to bootstrap the Universe from nothing. Louis Kauffman [1] has a lot of papers/writeups on it, from knot theory to quantum physics. If you ever wanted to make a pancake truly from scratch, this is a place to start.
• "The Unconscious as Infinite Sets" by Ignacio Matte Blanco. Reformulates Freud in logico-mathematical terms and establishes a formal system (bi-logic) to describe unconsciousness phenomena: in case you ever wanted to apply category theory to study yourself.
• "The Protracted Game" by Scott Boorman. Interprets Maoist's revolutionary strategies during 1927 - 1949 period as a game of Go. Interesting both from historical, military, and game-theoretical perspective; raised an appreciation of Eastern wisdom and 'board games as a tool of thought' [2] for me.
• "The Unwritten Laws of Engineering" by W. J. King. Written in 1944, but the advice is still relevant, more so to the software engineering field. Should be at least skimmed at any part of your career.
• "The Myth of Sisyphus" by Alber Camus. Unequivocally answers the most important question there is — does life have meaning, and if not, should you kill yourself over it? I read it in my teens while wrestling with existential dread, and lived a somewhat happy and interesting life ever after.
Ahh. "Gross", meaning, the author thinks they should do a better job selling their products from a practical standpoint.
At a push, if we wanted to try maintain a more essential meaning of "gross", I guess we could imagine that the author is saying that not doing an optimal job of selling your products is morally repugnant to them. Maybe that's true, I suppose.
An example of Apple actually being gross in their marketing is using the word "privacy" to mislead people into thinking they have privacy when they use Apple's devices. It's security they are attempting to offer to users, not privacy.
This intentional and somewhat subtle conflation of two terms in order to give people a false impression of Apple products is, in my opinion, properly gross.
I cannot speak to the CIA training but I've received some concealed weapon scenario training mostly developed from lessons learned at penitentiaries. It is a terrible thing to be impressed by but inmate ingenuity staggers the mind. You've likely had several items in your carry on that could do much worse than the pen to eye (which I don't think would be fatal). In the attack tree, a shiv smuggled past the checkpoint would need to have the potential to coerce the cockpit before it would factor in to a risk matrix for the plane.
I served in Special Forces before and after 9/11. The 'security theater' points that many of you make are valid. I don't, however, believe the reactionary measures were to calm the fears of the American people. The severe restriction placed on travelers is similar to a trend of restrictions placed upon soldiers following 9/11. The reaction is CYA for senior leadership/command.
Accountability became a tremendous focus following the early campaigns following 9/11. A single casualty was regarded as a devastating loss. Clearing buildings early in the bloodshed of Iraq taught many commanders that the peacetime tactics largely learned from SWAT were not as effective in combat. The procedure was too slow for such a dynamic and hostile environment. Too many soldiers died because the common procedure for clearing a building broke down in structures of irregular layout and in cities crawling with hostiles. Before commanders and NCOs were prepared to blame the procedures, however, they were taking accountability for the loss.
An after action review (AAR) follows every mission and leaders are encouraged to highlight their mistakes before someone else must do it for them. An atmosphere of blame settled in while civilians back CONUS were tiring of the involvement. Casualties were frequent enough that many ODAs had suffered through a few. For most, it was their first time facing a grieving widow with a young child hugging her leg. Those stories, coupled with the blame, changed the landscape of command. CONOPS that were once routinely approved were rejected for increasingly vague reasons. Ultimately, the tone was that the risk was too great compared with the operational gain- almost like the soldier was too valuable to put in harm's way. But we signed up for that. The truth, I suspect, was that the appetite for risk taking at the senior levels was shrinking. If an ODA lost a man, the mission's CONOP would be scrutinized for evidence that all of the risks were accounted for, that the courses of action reflected sound decision making when assuming risk, that the operational gain justified the risk, and that good faith efforts were made to mitigate perceived risks. The AAR became a trial. While I was working through these challenges during deployments, I believe something similar was happening with security measures and leadership back home.
Creating an illusion of safety seems less likely the hope than creating an exemption from accountability. Negligence would be too likely the charge if tight restrictions were not put in place.
So come on man, let’s be honest here. I got serious sacred masterpiece vibes from this story.
This reminds me of some Hindu parable about people who let go of possessions and head out to become ascetics. So there is this wealthy man and wife and the wife is all upset because her brother keeps insinuating that he’s gonna go ascetic and cut loose. The husband tells her to stop her crying and don’t worry about it, he ain’t going to do it. The wife asks him: ‘but how can you be so sure?’ Because, the husband says, this is how you do it, and then and there he rips open his shirt, tells her “you’re my mother” and heads out to the woods.
I describe all the resources I read outside my day job's material. There are a lot of useful engineering blogs you can use, and with 1-2 books you can go very very far in learning the principles.
Once you have the theory, then you just need to practice on brainstorming designs for any kind of app you want. Then, you can verify that with the real implementation, you can pretty much find information about any big-scale famous system online. Or you can do mock interviews with others, but I am not a fan of these personally.
In my experience, reading, and then trying to come up with designs on your own for well-known products will make you ace these interviews after a few months.
Also, during the interview there is no nearly enough time to verify that you have done in the past all the things you will mention. So, if you understand what you are saying, and your design makes sense, you will pass the interview, even if you haven't practiced these things in your day job.
25:13 As the hands upon his disciple; but make great matter wisely in their calamity; 1:14 Sanctify unto the arches thereof unto them: 28:14 Happy is our judges ruled, that believe. 19:36 Thus saith the first being one another's feet.
This leans pretty hard on Perl DWIM; a Python version is many times longer:
(If you write code like that at a job, you might get fired. I would reject your pull request.)It's probably worth mentioning:
- Markov-chain states don't have to be words. For text modeling, a common thing to do is to use N-grams (of either words or characters) instead of single words for your states.
- It's good to have some nonzero probability to take a transition that hasn't occurred in the training set; this is especially important if you use a larger universe of states, because each state will occur fewer times in the training set. "Laplacian smoothing" is one way to do this.
- Usually (and in my code examples above) the probability distribution of transitions we take in our text generator is the probability distribution we inferred from the input. Potter's approach of multiplying his next-state-occupancy vector Ms⃗ by a random diagonal matrix R and then taking the index of the highest value in the resulting vector produces somewhat similar results, but they are not the same; for example
is roughly 833000, not roughly 750000. It's 5× as common instead of 3×. But I imagine that it still produces perfectly cromulent text.Interestingly, modern compilers derive from Chomsky's theory of linguistics, which he formulated while working on an AI project in the 01950s in order to demolish the dominant theory of psychology at the time, behaviorism. Behaviorism essentially held that human minds were Markov chains, and Chomsky showed that Markov chains couldn't produce context-free languages.