More

c4wrd · 2026-01-09T16:28:21 1767976101

I think it's a balance. In some cases, the act of giving means much more to the giver than to the receiver, especially when they want to be a part of something larger than themselves.

c4wrd · 2025-11-05T23:26:03 1762385163

Perhaps you are miserable because you are reinforcing your brain to only look for the flaws in others?

> Why do you look at the speck of sawdust in your brother’s eye and pay no attention to the plank in your own eye?

c4wrd · 2025-10-27T16:55:04 1761584104

I think the author’s point is that by exposing oneself to feedback, you are on the receiving end of the growth in the case of error. If you hand off all of your tasks to ChatGPT to solve, your brain will not grow and you will not learn.

c4wrd · 2025-08-28T00:23:00 1756340580

I’m in this position now. The longer I’ve been in it, I’ve come to realize can be summarized as:

You experience some the benefits of being a manager but bear all the responsibilities of managing others. It becomes challenging to make sound judgments when you must consider two different perspectives of a problem. Essentially, you’re taking on the duties of two jobs. I’ve found it incredibly difficult to step back and allow the team to make decisions without my input. My technical bias compels me to intervene when I perceive a decision as clearly incorrect. However, this approach hinders growth and may be perceived as micromanagement. While it’s a challenging position, it’s an excellent opportunity to explore management and determine if it’s a long-term career path you’re interested in.

c4wrd · 2025-08-19T19:41:29 1755632489

Obsidian's plug-in ecosystem is fantastic. I've used Obsidian for three years as a replacement for Notion, and I have never used the Graph mode. My Obsidian plugins enable automatic task synchronization with TickTick (where I manage my tasks) and allow me to set up features like templates. I strongly recommend giving it a spin.

The only downside for me is the inability to use it from a web browser. This isn't a major issue for my workflows.

hed · 2025-08-19T20:45:05 1755636305

What are you using to sync with TickTick?

c4wrd · 2025-07-01T18:06:57 1751393217

You're missing the bigger picture. It isn't free to put content on the Internet. At a bare minimum, you have infrastructure and bandwidth costs. In many cases, a goal someone may have is that if they publish content on the internet, they will attract people to return for more of the content they produce. Google acted as a broker, helping facilitate interactions between producers and consumers. Consumers would supply a query they want an answer to, and a producer would provide an answer or facilitate a space for the answers to be found (in the recent era, replace answer with product or store-front).

There was a mostly healthy interaction between the producers and consumers (I won't die on this hill; I understand the challenges of SEO optimization and an advertisement-laden internet). With AI, Google is taking on the roles of both broker and provider. It aims to collect everyone's data and use it as its own authoritative answer without any attribution to the source (or traffic back to the original source at all!).

In this new model, I am not incentivized to produce content on the internet, I am incentivized to simply sell my data to Google (or other centralized AI company) and that's it.

A clearer picture to help you understand what's going on: the internet of the past few decades was a bazaar marketplace. Every corner featured different shops with distinct artistic styles, showcasing a great deal of diversity. It was teeming with life. If you managed your storefront well, people would come back and you could grow. In this new era, we are moving to a centralized, top-down enterprise. Diversity of content and so many other important attributes (ethos, innovation, aestheticism) go out of the window.

haiku2077 · 2025-07-01T18:13:15 1751393595

> You're missing the bigger picture. It isn't free to put content on the Internet. At a bare minimum, you have infrastructure and bandwidth costs.

While it technically isn't free, the cost is virtually zero for text and low-volume images these days. I run a few different websites for literally $0.

(Video and high-volume images are another story of course)

jorvi · 2025-07-01T18:11:10 1751393470

> A clearer picture to help you understand what's going on: the internet of the past few decades was a bazaar marketplace.

That internet died almost two decades ago. Not sure what you're talking about.

MisterTea · 2025-07-01T18:48:03 1751395683

The web died. The internet is still a functional global IP network. For now.

c4wrd · 2025-03-21T15:42:53 1742571773

For those who aren't in the video game development field and like technical content, I cannot recommend watching GDC's content enough (even better, attending!). Although this blog post isn't as detailed as I've seen them do in the past, I appreciate that game developers have the opportunity to share their wins like this externally.

If you enjoy watching technical content and video games, the following videos are greatly entertaining deep dives into ludicrously technically complicated bugs or performance optimizations done to deliver the gaming experience that I myself take for granted.

- I Shot You First: Networking the Gameplay of Halo: Reach: https://www.youtube.com/watch?v=h47zZrqjgLc - 8 Frames in 16ms: Rollback Networking in Mortal Kombat and Injustice 2: https://www.youtube.com/watch?v=7jb0FOcImdg - It IS Rocket Science! The Physics of Rocket League Detailed: https://www.youtube.com/watch?v=ueEmiDM94IE - https://gdcvault.com/play/1023220/Fighting-Latency-on-Call-o...

c4wrd · 2025-03-18T18:11:56 1742321516

> When attached to an EBS–optimized instance, General Purpose SSD (gp2 and gp3) volumes are designed to deliver at least 90 percent of their provisioned IOPS performance 99 percent of the time in a given year. This means a volume is expected to experience under 90% of its provisioned performance 1% of the time. That’s 14 minutes of every day or 86 hours out of the year of potential impact. This rate of degradation far exceeds that of a single disk drive or SSD. > This is not a secret, it's from the documentation. AWS doesn’t describe how failure is distributed for gp3 volumes, but in our experience it tends to last 1-10 minutes at a time. This is likely the time needed for a failover in a network or compute component. Let's assume the following: Each degradation event is random, meaning the level of reduced performance is somewhere between 1% and 89% of provisioned, and your application is designed to withstand losing 50% of its expected throughput before erroring. If each individual failure event lasts 10 minutes, every volume would experience about 43 events per month, with at least 21 of them causing downtime!

These are some seriously heavy-handed assumptions being made, completely disregarding the data they collect. First, the author assumes that these failure events are distributed randomly and expected to happen on a daily basis, ignoring Amazon's failure rate statement throughout a year ("99% of the time annually"). Second, they argue that in practice, they see failures lasting between 1 and 10 minutes. However, they assert that we should assume each failure will last 10 minutes, completely ignoring the severity range they introduced.

Imagine your favorite pizza company claiming to deliver on time "99% of the time throughout a year." The author's logic is like saying, "The delivery driver knocks precisely 14 minutes late every day -- and each delay is 10 minutes exactly, no exceptions!". It completely ignores reality: sometimes your pizza is delivered a minute late, sometimes 10 minutes late, sometimes exactly on time for four months.

As a company with useful real-world data, I expect them not to make arguments based on exaggerations but rather show cold, hard data to back up their claims. For transparency, my organization has seen 51 degraded EBS volume events in the past 3 years across ~10,000 EBS volumes. Of those events, 41 had a duration of less than one minute, nine had a duration of two minutes, and one had a duration of three minutes.

remram · 2025-03-18T18:56:18 1742324178

They are expanding on what the guarantee from AWS means, their statement is correct. They did not say the pizza place does this, they said the pizza place's guarantee allows for this. I don't see a problem.

c4wrd · 2025-02-28T20:51:54 1740775914

Imagine you're studying for a test where you are given an image and need to answer the correct class. To prepare, you're given a deck of flashcards with an image on the front and the class on the back.

(Random) You shuffle the deck every time you go through it. You're forced to learn the images and their classifications without relying on any specific sequence, as the data has no signal from sequence order.

(Fixed order) Every time you go through the deck, the images appear in the exact same order. Over time you may start to unconsciously memorize the sequence of flashcards, rather than the actual classification of each image.

When it comes to actually training a model, if the batches are sampled sequentially from a dataset, it risks learning from correlations caused by the sequencing of the data, resulting in poor generalization. In contrast, when you sample the batches randomly, the model is biased and encouraged to learn features from the data itself rather than from any signals that arise from artifacts of the ordering.

dekhn · 2025-03-01T01:01:14 1740790874

Why, then are so many successful models trained on multiple passes through sequential data? Note, I'm not naive in this field, but as an infra person, random reads make my life very difficult, and if they're not necessary, I'd rather not deal with them by having to switch to an approach that can't use readahead and other strategies for high throughput io.

Terretta · 2025-03-01T13:46:20 1740836780

Why did they take time to invent an entire FS?

dekhn · 2025-03-01T22:46:46 1740869206

The engineers at DeepSeek seem extremely smart and well-funded, so my guess is they looked at the alternatives and concluded none of them would allow them to make their models work well enough.

rfoo · 2025-03-01T16:10:49 1740845449

They did this back in their trading firm days, and...

Imagine that you have a sequence of numbers. You want to randomly select a window of, say, 1024 consecutive numbers, a sequence, as input to your model. Now, say, you have n items in this sequence, you want to sample n/c (c is a constant and << 1024) sequences in total. How to do fixed shuffle?

The key is, we have overlap in data we want to read. If we brute force fixed shuffle and expand, we need to save 1024/c times more than original data.

This isn't useful for LLMs, but hey, wonder how it started?

dekhn · 2025-03-01T22:45:19 1740869119

I guess I'm much more of the "materialize the shuffle asychronously from the training loop" kind of person. I agree, the materialization storage cost is very high, but that's normally been a cost I've been willing to accept.

As an ML infra guy I have had to debug a lot of failing jobs over the years, and randomizing datapipes are one of the hardest to debug. Sometimes there will be a "record-of-death" that randomly gets shuffled into a batch, but only causes problems when it is (extremely rarely) coupled with a few other records.

I guess I'll just have to update my priors and accept that inline synchronous randomization with random reads is a useful-enough access pattern in HPC that it should be optimized for. Certainly a lot more work and complexity, hence my question of just how necessary it is.

rfoo · 2025-03-02T00:34:59 1740875699

Yeah, I don't want to do this either. This is a super special case, after exploring alternatives with our researchers it's unfortunately needed. As for record-of-death, we made sure that we do serialize all rng state and have our data pipeline perfectly reproducible even when starting from checkpoint.

Building a system for serving read-only data at NVMe SSD speed (as in IOPS) took surprisingly few effort, and is mostly enough for training data. Kudos to DeepSeek who decided to spend extra effort to build a full PFS and share it.

c4wrd · on Dec 27, 2024

> “We once again need to raise more capital than we’d imagined. Investors want to back us but, at this scale of capital, need conventional equity and less structural bespokeness.”

Translation: They’re ditching the complex “capped-profit” approach so they can raise billions more and still talk about “benefiting humanity.” The nonprofit side remains as PR cover, but the real play is becoming a for-profit PBC that investors recognize. Essentially: “We started out philanthropic, but to fund monstrous GPU clusters and beat rivals, we need standard venture cash. Don’t worry, we’ll keep trumpeting our do-gooder angle so nobody panics about our profit motives.”

Literally a wolf in sheep’s clothing. Sam, you can’t serve two masters.