Hacker Newsnew | past | comments | ask | show | jobs | submit | giantrobot's commentslogin

In the most charitable case it's some "AI" companies with an X/Y problem. They want training data so they vibe code some naive scraper (requests is all you need!) and don't ever think to ask if maybe there's some sort of common repository of web crawls, a CommonCrawl if you will.

They don't really need to scrape training data as CommonCrawl or other content archives would be fine for training data. They don't think/know to ask what they really want: training data.

In the least charitable interpretation it's anti-social assholes that have no concept or care about negative externalities that write awful naive scrapers.


I use Spotlight all the time to search for the contents of files. I don't memorize the contents and names of every file on my system, that's what my computer is for.

I want spotlight to open applications and system settings. But full disk indexing makes spotlight basically useless for that, because its index is filled with crap. Instead of opening zed I accidentally open some random header file that’s sitting around in Xcode. It’s worse than useless. And that says nothing of the grotesque amount of cpu time spotlight wants to spend indexing all my files.

A feature I never wanted has ruined a feature I do want. It’s a complete own goal. In frustration I turned spotlight off completely a few months ago. Rip.


I think it's been said in this thread already, but it sounds like what you want is Alfred https://www.alfredapp.com/ it's a great app, use it every few minutes every day.

also, for opening apps, https://charmstone.app/ is pretty great.


I am also in OP's boat and, even though these are great suggestions, personally I would like to be able to do a basic thing such as opening an app with a built-in way rather than having to download yet another app to do that. Every major macos update I have to worry about spotlight reindexing things.

What I find really annoying with macos is that with stock/default settings it is the worst UX. You have to download an app to launch apps, an app to move and resize windows, an app to reverse the mouse's wheel direction to be the opposite of the trackpad, an app to manage the menu bar (esp decrease the spacing, so that you can fit items up until the notch). Then, you also need anyway to spend an hour tweaking settings and run custom commands (such as `defaults write -g ApplePressAndHoldEnabled -bool false` so that you can actually type stuff like aaaaaaaaaaaaaaaaaaaaa). These are just needed to make using macos bearable, and do not include any kind of "power user" kind of stuff.

I used to hate macos before getting my own mac, because I had to use some at work in their default settings and it was just a horrible experience.


this is what grep is for. Why do I need a service constantly indexing my system and wasting resources for the few times a month I might need to run grep <string>?

what problem was really solved here?


Does grep work on anything other than plain text?

And then you'll have to wait for grep to trawl every folder to find the right file rather than consult an optimised index.


Spotlight search relevancy is a complete joke. If only they did some embedding based search across the system and paid attention to basic precision recall numbers. This has gone from bad to worse quickly.

Couriers and USB flash drives can be pretty effective. They're high latency but can be very high bandwidth. Look at the El Paquete network in Cuba[0] as inspiration. Self-contained HTML/JavaScript SPAs can provide navigation and the likes of TiddlyWiki[1] can allow for collaboration. A network of couriers can move as fast as road traffic and distribute stuff pretty widely.

Contents can be re-shared locally over ad-hoc or mesh WiFi networks even without Internet access.

Encryption and steganography can obscure the contents of drives from casual inspection. You can stuff a lot of extraneous data in Office XML documents that are just zip files and look innocuous when opened.

1. For current events content add descriptions, locations, and timestamps to everything. The recipients need that context.

2. Even unencrypted files can be verified with cryptographic signatures. These can be distributed on separate channels including Bluetooth file transfers.

3. Include offline installers for browsers like Dillo or Firefox. Favor plain text formats where possible. FAT32 has the broadest support in terms of file system for the flash drives. Batch, PowerShell, and bash scripts can also be effective in doing more complex things while not needing local installation or invasive installations on people's computers.

[0] https://en.wikipedia.org/wiki/El_Paquete_Semanal

[1] https://en.wikipedia.org/wiki/TiddlyWiki


Do we need to come up with more internet protocols/services that don't require a negotiation process? So that it would work better with very high latency sneaker-net flash-drive networks? Especially for the already asynchronous ones like email? I could envision a user with a messenger/email-like client who "sends" (encrypted) messages that get put on a flash drive. This is carried around the neighborhood, etc, where others do the same. Eventually someone takes it to a machine with regular internet access, where the messages get delivered to their intended recipients. And then replies to these messages (coming hours, days, weeks later) also get put on a flash drive, and maybe hopefully get back to the original receivers. And if the internet-down situation has been resolved, the recipients will already have their messages, but if not, they'll get them when the flash drive arrives.

I suppose this isn't complete without mentioning RFC 1149 (IP Datagrams on Avian Carriers).

https://www.rfc-editor.org/rfc/rfc1149


In this case NNCP (Note-to-Node Copy)[0] would be useful. It fully supports sneakernet/floppynet distribution but also has an online mode that could be used by nodes with active Internet connections.

[0] http://www.nncpgo.org/index.html


Reverse that. You'll be shot and then retroactively declared a terrorist.

Apple's neural engine is used a lot by the non-LLM ML tasks all over the system like facial recognition in photos and the like. The point of it isn't to be some beefy AI co-processor but to be a low-power accelerator for background ML workloads.

The same workloads could use the GPU but it's more general purpose and thus uses more power for the same task. The same reason macOS uses hardware acceleration for video codecs and even JPEG, the work could be done on the CPU but cost more in terms of power. Using hardware acceleration helps with the 10+ hour lifetime on the battery.


Yes of course but it's basically a waste of silicon (which is very valuable) imo - you save a handful of watts to do very few tasks. I would be surprised if in the length of my MacBook the NPU has been utilised more than 1% of the time the system is being used.

You still need a GPU regardless if you can do JPEG and h264 decode on the card - for games, animations, etc etc.


Do you use Apple's Photos app? Ever see those generated "memories," or search for photos by facial recognition? Where do you think that processing is being done?

Your macbook's NPU is probably active every moment that your computer is on, and you just didn't know about it.


How often is the device either generating memories or I'm searching for photos? I don't use Apple Photos fwiw, but even if I did I doubt I'd be in that app for 1% of my total computer time, and of that time only a fraction of the time would be spent doing stuff on the ANE. I don't think searching for photos requires that btw, if they are already indexed it's just a vector search.

You can use asitop to see how often it's actually being used.

I'm not saying it's not ever used, I'm saying it's used so infrequently that any (tiny) efficiency gains do not trade off vs running it on the GPU.


Continuously in the background. There's basically a nonstop demand for ML things being queued up to run on this energy-efficient processor, and you see the results as they come in. That indexing operation is slow, and run continuously!

You also have Safari running OCR on every image and video on every webpage you load to let you select and copy text

> Last time I benchmarked a VPS it was about the performance of an Ivy Bridge generation laptop.

I have a number of Intel N95 systems around the house for various things. I've found them to be a pretty accurate analog for small instances VPSes. The N95 are Intel E-cores which are effectively Sandy Bridge/Ivy Bridge cores.

Stuff can fly on my MacBook but than drag on a small VPS instance but validating against an N95 (I already have) is helpful. YMMV.


Mining tracking data is a megaFLOP and gigaFLOP scale problem while just a simple LLM response is a teraFLOP scale problem. It also tends towards embarrassingly parallel because tracks of multiple users aren't usually interdependent. The tracking data processing also doesn't need to be calculated fresh for every single user with every interaction.

LLMs need to burn significant amounts of power for every inference. They're exponentially more power hungry than searches, database lookups, or even loads from disk.


The MacBook Air has a standard size keyboard.


> This game only works on browsers with reasonable compatibility

Just say Chrome. This is what you mean.


Nope, I do all my frontend dev for Firefox and then it works on Chrome too.


And they for some reason need a 60fps stream to...watch a computer type. No one stopped for a second and asked "maybe we don't know anything about the problem domain". They seem to have given a vague description to an LLM and assumed it knew what it was talking about.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: