It seems to let you access head tracking data, so now I'm really curious if it would be accurate enough to use with games (eg, microsoft flight sim/arma 3/euro truck simulator 2 head tracking). There is probably a lot of other interesting use cases for it too, but I'm stuck with windows for now so :(
the head gestures is something i couldn't quite figure out, so the gesture logic is completely AI generated. i don't know how to get the actual values from the sensors. but there sure is a use in gaming.
I've really wanted to fine tune an inline code completion model to see if I could get at all close to cursor (I can't, but it would be fun), but as far as I know there are no open diffusion models to use as a base, and especially not any that would be good as a base. Hopefully something comes out soon that is viable for it
I digitised our family photos but a lot of them were damaged (shifted colours, spills, fingerprints on film, spots) that are difficult to correct for so many images. I've been waiting for image gen to catch up enough to be able to repair them all in bulk without changing details, especially faces. This looks very good at restoring images without altering details or adding them where they are missing, so it might finally be time.
All of the defects you have listed can be automatically fixed by using a film scanner with ICE and a software that automatically performs the scan and the restoration like Vuescan. Feeding hundreds (thousands?) of photos to an experimental proprietary cloud AI that will give you back subpar compressed pictures with who knows how many strange artifacts seems unnecessary
I scanned everything into 48-bit RAW and treat those as the originals, including the IR scan for ICE and a lower quality scan of the metadata. The problem is sharing them - important images I manually repair and export as JPEG which is time consuming (15-30 minutes per image, there are about 14000 total) so if its "generic family gathering picture #8228" I would rather let AI repair it, assuming it doesn't butcher faces and other important details. Until then I made a script that exports the raws with basic cropping and colour correction but it can't fix the colours which is the biggest issue.
this reminds me of a joke we used to tell as kids when there was a new Photoshop version coming out - "this one will remove the cow from the picture and we'll finally see what great-grandpa looked like!"
Vuescan is terrible. SilverFast has better defaults. But nothing beats the orig Nikon scan software when using ICE. It does a great job of removing dust, fingerprints etc Even when you zoom in. VS what iSRD does in SilverFast, which if you zoom in and compare the 2. iSRD kinda smooches/blurs the infrared defects whereas Nikon Scan clones the surrounding parts, which usually looks very good when zooming in.
Both Silverfast and Nikon Scan methods look great when zoomed out.
I never tried Vuescan's infrared option. I just felt the positive colors it produced looks wrong/"dead".
I've been waiting for image gen to catch up enough to be able to repair them all in bulk without changing details, especially faces.
I've been waiting for that, too. But I'm also not interesting in feeding my entire extended family's visual history into Google for it to monetize. It's wrong for me to violate their privacy that way, and also creepy to me.
Am I correct to worry that any pictures I send into this system will be used for "training?" Is my concern overblown, or should I keep waiting for AI on local hardware to get better?
You're looking for Flux Kontext, a model you can run yourself offline on a high end consumer GPU. Performance and accuracy are okay, not groundbreaking, but probably enough for many needs.
I don't really understand the point of this usecase. Like, can't you also imagine what the photos might look like without the damage? Same with AI upscaling in phone cameras... if I want a hypothetical idea of what something in the distance might look like, I can just... imagine it?
I think we will eventually have AI based tools that are just doing what a skilled human user would do in Photoshop, via tool-use. This would make sense to me. But just having AI generate a new image with imagined details just seems like waste of time.
Well, that goes to the heart of my point. I take pictures because I value how literal they are. I enjoy the fact that they directly capture the arrangement of light in the moment I took them. That
So yeah, if I'm gonna then upscale them or "repair" them using generative AI, then it's a bit pointless to take them in the first place.
Do you happen to know some software to repair/improve video files? I'm in the process of digitalizing a couple of Video 2000 and VHS casettes of childhood memories of my mom who start suffering from dementia. I have a pretty streamlined setup for digitalizing the videos but I'd like to improve the quality a bit.
Topaz is probably the SOTA in video restoration, but it can definitely fuck shit up. Use carefully and sparingly and check all the output for weird AI glitches.
I tried a dozen or so images. For some it definitely failed (altering details, leaving damage behind, needing a second attempt to get a better result) but on others it did great. With a human in the loop approving the AI version or marking it for manual correction I think it would save a lot of time.
Sure, I could manually correct that quite easily and would do a better job, but that image is not important to us, it would just be nicer to have it than not.
I'll probably wait for the next version of this model before committing to doing it, but its exciting that we're almost there.
Being pragmatic, the after is a good restoration. There is nothing really lost (except some sharpness that could be put back). The main failing of AI is on faces because our brains are so hardwired to see any changes or weirdness. This is the sort of image that is perfect for AI because the subject's face is already occluded.
Another question/concern for me: if I restore an old picture of my Gramma, will my Gramma (or a Gramma that looks strikingly similar) ever pop up on other people's "give me a random Gramma" prompts?
It might show her for prompts of “show me the world’s best grandma” :)
On free tier, I’d essentially believe that to be the default behavior. In reality they might simply use your feedback and your text prompts instead. Certainly know free Google/OpenAI LLM usage entails prompts being used for research.
Edit: decent chance it would NOT directly integrate grandma into its training, but would try hard to use an offline model for any privacy concerns
If CV cheats are good enough that people are using them (and then getting banned), and other people are willing to pay >$1000 for "undetected" cheats (that still get them banned)... wouldn't making custom hardware that is just a capture card and USB keyboard+mouse running one of those CV models that sends the inputs back over a "real" keyboard work?
If it uses a 2nd input device, that's just obvious.
If it properly mixes its input into your main device, there will still be hints.
A real mouse has a limited range of motion. It can't keep moving left or right indefinitely.
Real players don't immediately gravitate towards the geometric center of the head of every enemy.
Real players don't try to move the mouse to shoot at enemies on the loading screen.
Real players have coordinated or stereotyped mouse and keyboard movements. They don't react instantly with the mouse but after a delay on the keyboard, for instance.
In my experience a good aimbot is impossible to tell from a normal player when you play ranked at a high competitive level, at least not with any degree of certainty that I think is worth banning people for. The cheaters you lose to at that level are the ones calling out your positions to their team because they can see you on the minimap while they aren't supposed to, things like that.
Trigger bots (shoots when it detects an enemy using CV AND you are holding some key/pedal) are much harder to detect, almost impossible if effort is taken to make the time distribution believable.
This article reads like it was written by an LLM and it doesn't mention how these "undetected" DMA cheats are actually caught. The anti-cheat teams join discords of vendors to get access to the cheats and flag users based on the heuristics they observe from the vendors firmware (that DMA card / hardware has to show up as _something_). So yeah, your setup can work (as long as you’re sticking to the drivers and input methods they tolerate), and the same goes for private DMA cheats.
The post doesn't seem to elaborate on how Riot detects and bans people using that sort of cheat, but you can detect some % of those people by analyzing their inputs. Humans don't play the way an aimbot does.
The DMA cheaters are caught when riot gets access to the vendors firmware and ban the people that are using it, not by the cheats themselves. Colorbots run on the same PC so these can be caught in various ways.
I have tried fully switching to bun repeatedly since it came out and every time I got 90% of the way there only to hit a problem that couldn't be worked around. Last I tried I was still stuck on some libraries requiring napi functions that weren't implemented in bun yet, as well an issue I forget but it was vaguely something like `opendir` silently ignoring the `recursive` option causing a huge headache.
I'm waiting patiently for bun to catch up because I would love to switch but I don't think its ready for production use in larger projects yet. Even when things work, a lot of the bun-specific functionality sounds nice at first but feels like an afterthought in practice, and the documentation is far from the quality of node.js
You already need very high end hardware to run useful local LLMs, I don't know if a 200gb vector database will be the dealbreaker in that scenario. But I wonder how small you could get it with compression and quantization on top
I've worked in other domains my whole career, so I was astonished this week when we put a million 768-len embeddings into a vector db and it was only a few GB. Napkin math said ~25 GB and intuition said a long list of widely distributed floats would be fairly uncompressable. HNSW is pretty cool.
You can already do A LOT with an SLM running on commodity consumer hardware. Also it's important to consider that the bigger an embedding is, the more bandwidth you need to use it at any reasonable speed. And while storage may be "cheap", memory bandwidth absolutely is not.
> You already need very high end hardware to run useful local LLMs
A basic macbook can run gpt-oss-20b and it's quite useful for many tasks. And fast. Of course Macs have a huge advantage for local LLMs inference due to their shared memory architecture.
The mid-spec 2025 iPhone can run “useful local LLMs” yet has 256GB of total storage.
(Sure, this is a spec distortion due to Apple’s market-segmentation tactics, but due to the sheer install-base, it’s still a configuration you might want to take into consideration when talking about the potential deployment-targets for this sort of local-first tech.)
I'ts fun, I think it needs queues for different game modes because with 150 players you almost always get horded by neighbours. Being able to queue for a team game would make it a bit easier to learn I think
I've had this too, especially it getting stuck at the very end and just.. never finishing. Once the usage-based billing comes into effect I think I'll try cursor again.
What local models are you using? The local models I tried for autocomplete were unusable, though based on aiders benchmark I never really tried with larger models for chat. If I could I would love to go local-only instead.
I've been digitising family photos using this. I scanned the photo itself and the text on it, then passed that to an LLM for OCR and used tools to get the caption verbatim, the location mentioned and the date in a standard format. That was going to be the end of it, but the OpenAI docs https://platform.openai.com/docs/guides/function-calling?lan... suggest letting the model guess coordinates instead of just grabbing names, so I did both and it was impressive. My favourite was taking a picture looking out to sea from a pier and pinpointing the exact pier.
I showed the model a picture and any text written on that picture and asked it to guess a latitude/longitude using the tool use API for structured outputs.
That was in addition to having it transcribe the hand written text and extracting location names, which was my original goal until I saw how good it was at guessing exact coordinates. It would guess within ~200km on average, even on pictures with no information written on them.
The ACLs might look a bit scary at first, but they are actually quite intuitive once you coded up a rule or two.
It basically works by tagging machines (especially those deployed with an API key) and grouping users. Then you set up rules which allow groups and tags can communicate with each other on specific ports. Since the default rule is DENY, you only need to specify rules for communication you actually want to allow.
For instance you would create a tag for `servers` and a group for `sre`. Then you setup an ACL rule like this to allow SRE to ssh into servers: