This is also very true of the events reported in Wikipedia, see this animated timeline of (a hopefully representative set of) historical events reported in Wikipedia. Is really is "Europe meets the world":
I agree with others in this thread that this more probably "information-biased" than "eurocentric" on the part of the Atlas creator. Pretty sure they wish non-european history was easier to find and aggregate as it would make the project much more compelling (I certainly had this problem with https://landnotes.org/).
I am hoping LLMs will do a lot of good at bridging gaps and surfacing world historical information that didn't make it yet to centralized projects like Wikipedia.
Total plug but this year I scraped 400,000 wikipedia pages with Gemini to create landnotes.org, an atlas where you can ask "what happened in Japan in 1923":
My plan has been to overlay historical map borders on top of it, like the Geacron one from this post, but they all seem to be protected by copyright - and understandably so, given the amount of work involved.
very cool. Made something with a similar idea, but using timelines instead of maps. I wonder if the two could be combined in some way
https://timeline-of-everything.milst.dev/
Nice, how does your timeline work under the hood? Does it read from wikipedia? What could be interesting in your project is to be able to compare timelines. See for instance this website specifically for comparing composer works (with timelines pre-extracted from wikipedia):
under the hood messages are sent to Tambo which knows how to use the timeline component (sort of like an LLM tool) and can fill the component's props with whatever data it decides.
The actual data at this point is completely generated by the LLM (which seems to be ok for historical data on popular topics.) I should add a tool to allow tambo to fetch data from wikipedia before trying to generate timeline data.
Comparing timelines is an awesome idea, understanding when certain events happened in relation to others is really interesting. Maybe even overlaid in different colors or something instead of separate timelines.
This is very very cool! I went right to the month and year of my birth; kind of the same vibe as finding a newspaper published on the day you were born but all over the world. Thanks for sharing!
Yeah it would be nice if Wikipedia would host it, but it would probably require some more serious ground work so the project fits in the wikipedia ecosystem. Could be a pipeline Wikipedia -> Wikidata -> Atlas.
There are many projects that could be done with with wikipedia and LLMs, for instance "equalizing" all languages by translating all pages into all other languages where they are missing. Or, more surgically, finding which facts are reported in some languages of a page but not others, and adding these facts to all languages.
For now, it seems that wikipedia doesn't want to use generative AI to produce wikipedia pages, and that's understandable, but there may be a point where model quality will be too good to ignore.
Understandable for not using it to write net-new content from outside sources, but agreed that at some point the translation becomes good enough to bridge all language gaps, where it's simply an obvious call that a translation of the more fully written English article is better than relying on a local writer.
what if the designer - writer sought to draw my attention by making me hyperfocus on the text?
That is, musicians do it with dynamics, architects do it with compression and expansion, writers do it with words, but do designers do it with the most dynamic, infinitely extensible medium on the planet that combines all of our senses and perceptions at once?
Maybe that was the point here, even if it was unintentional, bordering on magic?
Fascinating, I knew about the "Wikipedia degrees of separation" and whe wikigame (https://www.thewikigame.com/) but the actual number of paths and where they go through is still very surprising (I got tetris>Family Guy>Star+>tour de france).
You should try the words from previous days (clicking on the date below the title). So far it’s been pretty random how many tries someone will need to find a word, the same person who needed 2 tries one day might need 10 tries another time. Just like wordle, you might get lucky or unlucky on your first guesses.
- Pianola: https://zulko.github.io/pianola/ - upload a piano roll midi file, and it plays with the piano roll and keyboard animation (you can zoom on some parts, slow down etc).
What makes them interesting to you? Does the music sound different?
I've seen pianola rolls and even played one as a child. But I have wondered as an adult what the 'listening quality' of the music is / would be. What got you into them and could you share -- if you want to nerd out please do, I'm genuinely interested! -- what interested you about them?
When I was about 10 I picked my first ever CD at a music shop, and it was a recording of the Gershwin piano rolls, because the cover photo caught my eye [1]. I didn't really understand what I was listening to, I assumed "piano roll" was a musical genre, like "rock'n'roll", until years later when my English became good enough to read the CD's booklet.
It was also a time when all these midi files started being available, like the 6000 rolls from Terry Smythe [2], and I figured out transcribing these could be a good way to learn old-school Jazz, which is otherwise difficult to find as sheet music.
Does a piano roll sound different (I assume it does)? Ie, is or was there a specific market for a CD of a piano roll specifically, not, of someone playing the piano?
In terms of the music being played, piano rolls can be different from "normal piano music" because it's not played live by a real human, so it can have complex parts with full chords, additional voices, all with perfect rhythm and no wrong notes. This can be very compelling when well executed on the right songs (and it can also sound "mechanical" on others).
There isn't a huge market for piano roll recordings, and these recordings are rare. It's a niche topic that can attract
- Older people who have known the time piano rolls (say, until the 1950s)
- People nostagic of old times in general (in particular the 1910s-1940s), the age of early jazz with stride piano and early Broadway.
- Music scholars, because some of these rolls are of historical/musical importance, in particular those "recorded" by George Gershwin or Fats Waller and other big names. A lot of material exists only as piano rolls.
For the example of the Gershwin CD I posted above, it was produced by musicologist Artis Wodehouse [1] in parnership with the yamaha disklavier pianos iirc [2], so my guess is this was a passion project above all, with a bit of Yamaha marketing.
In his 1976 essay on (or against) genetic engineering [1] Erwin Chargaff wrote "But screams and empty promises fill the air: Don't you want cheap insulin? (...) And how about a green man synthesizing his nourishment: 10 minutes in the sun for breakfast, 30 minutes for lunch, and 1 hour for dinner?" Nice to see that scientists are actually trying.
It also comes up at least once in John Varley sci fi books. People can get themselves turned into space-floating plants and just sort of hang around saturn. I can't remember if they get genetically modified to photosynthesize or they wear a suit that does it.
Insulin today is produced cheaply in genetically modified microbes, this is the technology that Chargaff is alluding to, which was first succesful 2 years after his letter, in 1978.
Same experience here. Been building a classical music database [1] where historical and composer life events are scraped off wikipedia by asking ChatGPT to extract lists of `[{event, year, location}, ...]` from biographies.
- Using chatgpt-mini was the only cheap option, worked well (although I have a feeling it's dumbing down these days) and made it virtually free.
- Just extracting the webpage text from HTML, with `BeautifulSoup(html).text` slashes the number of tokens (but can be risky when dealing with complex tables)
- At some point I needed to scrape ~10,000 pages that have the same format and it was much more efficient speed-wise and price-wise to provide ChatGPT with the HTML once and say "write some python code that extracts data", then apply that code to the 10,000 pages. I'm thinking a very smart GPT-based web parser could do that, with dynamically generated scraping methods.
- Finally because this article mentions tables, Pandas has a very nice feature `from_html("http:/the-website.com")` that will detect and parse all tables on a page. But the article does a good job pointing at websites where the method would fail because the tables don't use `<table/>`
If you haven't considered it, you can also use the direct wikitext markup, from which the HTML is derived.
Depending on how you use it, the wikitext may or may not be more ingestible if you're passing it through to an LLM anyway. You may also be able to pare it down a bit by heading/section so that you can reduce it do only sections that are likely to be relevant (eg. "Life and career") type sections.
You can also download full dumps [0] from Wikipedia and query them via SQL to make your life easier if you're processing them.
> reduce it do only sections that are likely to be relevant (eg. "Life and career")
True but I also managed to do this from HTML. I tried getting pages wikitext through the API but couldn't find how to.
Just querying the HTML page was less friction and fast enough that I didn't need a dump (although when AI becomes cheap enough, there is probably a lot of things to do from a wikipedia dump!).
One advantage of using online wikipedia instead of a dump is that I have a pipeline on Github Actions where I just enter a composer name and it automagically scrapes the web and adds the composer to the database (takes exactly one minute from the click of the button!).
This doesn't directly address your issue but since this caused me some pain I'll share that if you want to parse structured information from Wikipedia infoboxes the npm module wtf_wikipedia works.
https://www.reddit.com/r/MapPorn/comments/1l3xl8x/events_fro...
I agree with others in this thread that this more probably "information-biased" than "eurocentric" on the part of the Atlas creator. Pretty sure they wish non-european history was easier to find and aggregate as it would make the project much more compelling (I certainly had this problem with https://landnotes.org/).
I am hoping LLMs will do a lot of good at bridging gaps and surfacing world historical information that didn't make it yet to centralized projects like Wikipedia.
reply