I agree with other comments that this research treads a fine, unethical line. Did the authors responsibly disclose this, as is often done in the security research community? I cannot find any mention of it in the paper. The researchers seem to be involved in security-related research (first author is doing a PhD, last author holds a PhD).
At least arxiv could have run the cleaner [1] before the print of this pre-print (lol). If there was no disclosure, then I think this pre-print becomes unethical to put up.
> leading to the identification of nearly 1,200 images containing sensitive metadata. The types of data represented vary significantly. While device information (e.g., the camera used) or software details (such as the exact version of Photoshop) may already raise concerns, in over 600 cases the metadata contained GPS coordinates, potentially revealing the precise location where a photo was taken. In some instances, this could expose a researcher’s home address (when tied to a profile picture) or the location of research facilities (when images capture experimental equipment)
Having arXiv run the cleaner automatically would definitely be cool. Although I've found it non-trivial to get working consistently for my own papers. That said, it would be nice if this was at least an option.
Leaks of read/write access to documents and GitHub, Dropbox etc credentials is certainly worrying, but location and author/photographer details in photo metadata? That's quite a stretch, and seems like the authors here are just trying to boost the numbers.
The vast majority (I would wager >(100 - 1e-4)) of location of research institutions is public knowledge and can be found out by simply googling the institution address (I am not aware of a single research institution that publishes publically where the location is confidential).
They responsibly disclosed it in their research paper. An unethical use would be to use those coordinates to gain state secrets about say, research facilities
This is a wonderful write-up and a very enjoyable read. Although my knowledge about systems programming on ARM is limited, I know that it isn't easy to read hardware-based time counters; at the very least, it's not as simple as the x86 rdtsc [1]. This is probably why the author writes:
> This code is more complicated than what I expected to see. I was thinking it would just be a simple register read. Instead, it has to write a 1 to the register, and then delay for a while, and then read back the same register. There was also a very noticeable FIXME in the comment for the function, which definitely raised a red flag in my mind.
Regardless, this was a very nice read and I'm glad they got down to the issue and the problem fixed.
Bear in mind that the blog post is about a 32 bit SoC that's over a decade old, and the timer it is reading is specific to that CPU implementation. In the intervening time both timers and performance counters have been architecturally standardised, so on a modern CPU there is a register roughly equivalent to the one x86 rdtsc uses and which you can just read; and kernels can use the generic timer code for timers and don't need to have board specific functions to do it.
But yeah, nice writeup of the kinds of problem you can run into in embedded systems programming.
> This was in snapshots for more than 2 months, and only spotted one other program depending on the behaviour (and that test program did not observe that it was therefore depending in incorrect behaviour!!)
Fascinating. I wonder what that program is, and why it depends on the NUL character.
Zooming out, panning around, and seeing the milky is... jaw dropping in a way. I know it's silly because we've seen SO many photos of the universe, but I still get the goosebumps every time I think about it. And the detail too! You can really zoom in.
> SearXNG protects the privacy of its users in multiple ways regardless of the type of the instance (private, public). Removal of private data from search requests comes in three forms:
> 1. removal of private data from requests going to search services
> 2. not forwarding anything from a third party services through search services (e.g. advertisement)
> 3. removal of private data from requests going to the result pages
The docs mention a caveat below at "What are the consequences of using public instances?":
> If someone uses a public instance, they have to trust the administrator of that instance. This means that the user of the public instance does not know whether their requests are logged, aggregated and sent or sold to a third party.
All of that is fine but by simply having your IP, Google can continue to profile you in countless ways with data they collect in other ways and it wouldn't be expensive for them at all.
i think since 'IP address' has become something of a baseline non-technical understanding of one of the critical components of networking, it becomes increasingly difficult for non-netpeeps to fully grasp the many uses and non-uses of addressing.
a proxy (or proxies) and how they can shield but one or many of ' your' IP addresses throughout an egress packet's many hops (and from who or what destination it or those addresses can be shielded) is a pretty advanced concept when you think about it.
not to mention that, at this point, bare source IP address is a pretty dilute tracker compared to other current methods of identity profiling or traffic fingerprinting.
a few examples of a self-hosted design that would not, include policy-based routing over a VPN with one or multiple tunneled hops, or through another external proxy. (and then there's also that 'onion' routing 'protocol' there—but i'm not clear if/how that integrates with clearnet destinations like publicly-accessible search engines if at all.)
It's incredibly annoying when this isn't taken care of. Alt + Left Arrow becomes impossible to use and I have to resort to using my mouse.
Don't Microsoft's support pages [1] do this as well? I just checked with a random support page [2] and it looks like the middle redirection url is the same. Is that a different thing?
At least arxiv could have run the cleaner [1] before the print of this pre-print (lol). If there was no disclosure, then I think this pre-print becomes unethical to put up.
> leading to the identification of nearly 1,200 images containing sensitive metadata. The types of data represented vary significantly. While device information (e.g., the camera used) or software details (such as the exact version of Photoshop) may already raise concerns, in over 600 cases the metadata contained GPS coordinates, potentially revealing the precise location where a photo was taken. In some instances, this could expose a researcher’s home address (when tied to a profile picture) or the location of research facilities (when images capture experimental equipment)
Oof, that's not too great.
[1] https://github.com/google-research/arxiv-latex-cleaner