Has anyone built something like this using accessibility APIs instead of (or in ...

bitwize · on April 21, 2024

Dragon NaturallySpeaking supports voice commands like "click OK" and responds accordingly. Its solution to the problem of Microsoft Office doing its own custom widget rendering was to OCR the text on widgets and buttons to determine their labels. You need something like this far, far more often than you think you do. Developers will flummox you, they will NOT use the provided APIs.

dbish · on April 22, 2024

We've done a bit of both for our screen seachable loom-like screen recorder, the problem is that the accessibility APIs differ greatly between Mac and Windows if you want to be OS agnostic and even on Windows all the apps tend to do things a little differently making it hard to say what did you actually "see", with some apps missing key data or implementing it incorrectly. OCR ends up being easier many times desptie thinking accessibility would be.

Sephr · on April 22, 2024

OCR is easier for the developer, but worse for the user in terms of battery drain / energy use.

dbish · on April 22, 2024

For sure, we made a privacy tradeoff to do it server side (given some screen change delta) because of this. Accessibility is a good "in addition to" but there are just so many apps that don't handle it well

janpmz · on April 22, 2024

I've built a workflow recorder with a screen history (MVP).

I concluded that in case this is a viable approach, Microsoft or Apple will built that into their OS natively as part of a copilot that remembers everything and comes to assist the user with the knowlede.

My screen history was not as advanced as the app mentioned here though. And I didn't use it myself.

JoBrad · on April 21, 2024

It would also be great to have the foreground apps as metadata.

ehsankia · on April 21, 2024

Also more general structured data. Like being able to search only the window titles, or text within a certain windows (Discord, Chrome).

wingerlang · on April 22, 2024

This is what I added to my macOS app recently, foreground app metadata. It is displayed on the timeline if you look at the pictures on my website (screenmemory.app). For my use case it was night and day in UX.