More

zulko · 2025-11-20T12:26:07 1763641567

This is also very true of the events reported in Wikipedia, see this animated timeline of (a hopefully representative set of) historical events reported in Wikipedia. Is really is "Europe meets the world":

https://www.reddit.com/r/MapPorn/comments/1l3xl8x/events_fro...

I agree with others in this thread that this more probably "information-biased" than "eurocentric" on the part of the Atlas creator. Pretty sure they wish non-european history was easier to find and aggregate as it would make the project much more compelling (I certainly had this problem with https://landnotes.org/).

I am hoping LLMs will do a lot of good at bridging gaps and surfacing world historical information that didn't make it yet to centralized projects like Wikipedia.

zulko · 2025-11-20T12:04:29 1763640269

Total plug but this year I scraped 400,000 wikipedia pages with Gemini to create landnotes.org, an atlas where you can ask "what happened in Japan in 1923":

https://landnotes.org/?location=xnd284b0-6&date=1923&strictD...

https://github.com/Zulko/landnotes

My plan has been to overlay historical map borders on top of it, like the Geacron one from this post, but they all seem to be protected by copyright - and understandably so, given the amount of work involved.

milst · 2025-11-20T17:48:24 1763660904

very cool. Made something with a similar idea, but using timelines instead of maps. I wonder if the two could be combined in some way https://timeline-of-everything.milst.dev/

zulko · 2025-11-21T02:36:20 1763692580

Nice, how does your timeline work under the hood? Does it read from wikipedia? What could be interesting in your project is to be able to compare timelines. See for instance this website specifically for comparing composer works (with timelines pre-extracted from wikipedia):

https://zulko.github.io/composer-timelines/?selectedComposer...

milst · 2025-11-21T04:50:49 1763700649

under the hood messages are sent to Tambo which knows how to use the timeline component (sort of like an LLM tool) and can fill the component's props with whatever data it decides. The actual data at this point is completely generated by the LLM (which seems to be ok for historical data on popular topics.) I should add a tool to allow tambo to fetch data from wikipedia before trying to generate timeline data.

Comparing timelines is an awesome idea, understanding when certain events happened in relation to others is really interesting. Maybe even overlaid in different colors or something instead of separate timelines.

Here's the repo btw: https://github.com/MichaelMilstead/timeline-of-everything

llbbdd · 2025-11-20T16:58:07 1763657887

This is very very cool! I went right to the month and year of my birth; kind of the same vibe as finding a newspaper published on the day you were born but all over the world. Thanks for sharing!

lippihom · 2025-11-22T01:42:27 1763775747

Wikipedia doesn't have an API?

zulko · 2025-11-23T05:07:49 1763874469

It does, why?

pu_pe · 2025-11-20T12:36:23 1763642183

This looks pretty cool actually, nice job!

annodomini2019 · 2025-11-20T20:20:56 1763670056

Wow, this is actually so cool. Fantastic idea, I would LOVE something like this in Wikipedia. Nicely done!

zulko · 2025-11-21T02:30:11 1763692211

Yeah it would be nice if Wikipedia would host it, but it would probably require some more serious ground work so the project fits in the wikipedia ecosystem. Could be a pipeline Wikipedia -> Wikidata -> Atlas.

There are many projects that could be done with with wikipedia and LLMs, for instance "equalizing" all languages by translating all pages into all other languages where they are missing. Or, more surgically, finding which facts are reported in some languages of a page but not others, and adding these facts to all languages.

For now, it seems that wikipedia doesn't want to use generative AI to produce wikipedia pages, and that's understandable, but there may be a point where model quality will be too good to ignore.

annodomini2019 · 2025-11-21T19:00:31 1763751631

Understandable for not using it to write net-new content from outside sources, but agreed that at some point the translation becomes good enough to bridge all language gaps, where it's simply an obvious call that a translation of the more fully written English article is better than relying on a local writer.

qq66 · 2025-11-20T19:34:43 1763667283

Cool project. Seems like your link to "wiki-dump-extractor" is broken.

zulko · 2025-11-21T02:25:02 1763691902

Thank you for reporting this!

zulko · 2025-09-01T12:19:05 1756729145

Very nice but the fonts render as white on white background on my phone which makes it difficult to read. Screenshot: https://ibb.co/29CqpTx

davidthewatson · 2025-09-01T13:06:52 1756732012

I was thinking the same thing.

Then it occurred to me:

what if the designer - writer sought to draw my attention by making me hyperfocus on the text?

That is, musicians do it with dynamics, architects do it with compression and expansion, writers do it with words, but do designers do it with the most dynamic, infinitely extensible medium on the planet that combines all of our senses and perceptions at once?

Maybe that was the point here, even if it was unintentional, bordering on magic?

bnxts21 · 2025-09-01T13:17:40 1756732660

ggap · 2025-09-01T12:29:37 1756729777

Noted, I will update it to adapt to the phone background color settings

StilesCrisis · 2025-09-01T13:13:20 1756732400

iPad is similarly broken.

zulko · 2025-08-29T18:01:28 1756490488

Fascinating, I knew about the "Wikipedia degrees of separation" and whe wikigame (https://www.thewikigame.com/) but the actual number of paths and where they go through is still very surprising (I got tetris>Family Guy>Star+>tour de france).

If anyone is looking to start similar projects, I open-sourced a library to convert the wikipedia dump into a simpler format, along with a bunch of parsers: https://github.com/Zulko/wiki_dump_extractor . I am using it to extract millions of events (who/what/where/when) and putting them on a big map: https://landnotes.org/?location=u07ffpb1-6&date=1548&strictD...

zulko · 2025-08-26T12:27:29 1756211249

You should try the words from previous days (clicking on the date below the title). So far it’s been pretty random how many tries someone will need to find a word, the same person who needed 2 tries one day might need 10 tries another time. Just like wordle, you might get lucky or unlucky on your first guesses.

zulko · 2025-07-16T23:24:10 1752708250

This used to be one of my main hobbies, I listened to thousands of these and I am super grateful to the people scanning and hosting these collections.

Some software I wrote for piano roll analysis and transcription:

- Unroll: https://zulko.github.io/unroll-online/ - upload a piano roll midi file and have it quantized and converted to lilypond sheet music. More about the process in this blog: https://zulko.github.io/blog/2014/02/12/transcribing-piano-r...

- Pianola: https://zulko.github.io/pianola/ - upload a piano roll midi file, and it plays with the piano roll and keyboard animation (you can zoom on some parts, slow down etc).

Some transcriptions made with these tools:

- Hindustan: https://github.com/Zulko/sheet-music--hindustan

- Gershwin - Sweet and Lowdown: https://github.com/Zulko/sheet-music--Gershwin-sweet-and-low...

- Gershwin - Limehouse Nights: https://github.com/Zulko/-sheet-music--Gerhswin-Limehouse-Ni...

vintagedave · 2025-07-17T08:15:07 1752740107

What makes them interesting to you? Does the music sound different?

I've seen pianola rolls and even played one as a child. But I have wondered as an adult what the 'listening quality' of the music is / would be. What got you into them and could you share -- if you want to nerd out please do, I'm genuinely interested! -- what interested you about them?

willtemperley · 2025-07-17T16:05:41 1752768341

Hearing Debussy playing Debussy is magic enough for me.

gus_massa · 2025-07-17T17:46:23 1752774383

Do you have a permalink?

willtemperley · 2025-07-17T23:03:19 1752793399

https://m.youtube.com/watch?v=W3NX_TrxfVk&pp=0gcJCfwAo7VqN5t...

StarlaAtNight · 2025-07-17T01:14:48 1752714888

Just curious, what made you go down that rabbit hole?

zulko · 2025-07-17T02:07:53 1752718073

When I was about 10 I picked my first ever CD at a music shop, and it was a recording of the Gershwin piano rolls, because the cover photo caught my eye [1]. I didn't really understand what I was listening to, I assumed "piano roll" was a musical genre, like "rock'n'roll", until years later when my English became good enough to read the CD's booklet.

It was also a time when all these midi files started being available, like the 6000 rolls from Terry Smythe [2], and I figured out transcribing these could be a good way to learn old-school Jazz, which is otherwise difficult to find as sheet music.

[1] https://www.youtube.com/watch?v=BX9MCyO6smk

[2] https://archive.org/details/terrysmythe.ca-archive/mp3s/Ampi...

vintagedave · 2025-07-17T08:15:55 1752740155

Does a piano roll sound different (I assume it does)? Ie, is or was there a specific market for a CD of a piano roll specifically, not, of someone playing the piano?

zulko · 2025-07-17T13:48:22 1752760102

In terms of the music being played, piano rolls can be different from "normal piano music" because it's not played live by a real human, so it can have complex parts with full chords, additional voices, all with perfect rhythm and no wrong notes. This can be very compelling when well executed on the right songs (and it can also sound "mechanical" on others).

There isn't a huge market for piano roll recordings, and these recordings are rare. It's a niche topic that can attract

- Older people who have known the time piano rolls (say, until the 1950s)

- People nostagic of old times in general (in particular the 1910s-1940s), the age of early jazz with stride piano and early Broadway.

- Music scholars, because some of these rolls are of historical/musical importance, in particular those "recorded" by George Gershwin or Fats Waller and other big names. A lot of material exists only as piano rolls.

For the example of the Gershwin CD I posted above, it was produced by musicologist Artis Wodehouse [1] in parnership with the yamaha disklavier pianos iirc [2], so my guess is this was a passion project above all, with a bit of Yamaha marketing.

[1] https://www.artiswodehouse.com/biography/ [2] https://usa.yamaha.com/products/musical_instruments/pianos/d...

zulko · on Nov 15, 2024

In his 1976 essay on (or against) genetic engineering [1] Erwin Chargaff wrote "But screams and empty promises fill the air: Don't you want cheap insulin? (...) And how about a green man synthesizing his nourishment: 10 minutes in the sun for breakfast, 30 minutes for lunch, and 1 hour for dinner?" Nice to see that scientists are actually trying.

[1] https://www.science.org/doi/10.1126/science.11643312

dekhn · on Nov 16, 2024

It also comes up at least once in John Varley sci fi books. People can get themselves turned into space-floating plants and just sort of hang around saturn. I can't remember if they get genetically modified to photosynthesize or they wear a suit that does it.

0cf8612b2e1e · on Nov 16, 2024

In the novel Old Man’s War, everyone is a green genetically modified super soldier. To reduce potential calorie intake for the troops.

Charon77 · on Nov 16, 2024

What about water?

Onavo · on Nov 16, 2024

Insulin is already very cheap, the markup in the US is not due to manufacturing costs.

zulko · on Nov 17, 2024

Insulin today is produced cheaply in genetically modified microbes, this is the technology that Chargaff is alluding to, which was first succesful 2 years after his letter, in 1978.

zulko · on Sept 3, 2024

Same experience here. Been building a classical music database [1] where historical and composer life events are scraped off wikipedia by asking ChatGPT to extract lists of `[{event, year, location}, ...]` from biographies.

- Using chatgpt-mini was the only cheap option, worked well (although I have a feeling it's dumbing down these days) and made it virtually free.

- Just extracting the webpage text from HTML, with `BeautifulSoup(html).text` slashes the number of tokens (but can be risky when dealing with complex tables)

- At some point I needed to scrape ~10,000 pages that have the same format and it was much more efficient speed-wise and price-wise to provide ChatGPT with the HTML once and say "write some python code that extracts data", then apply that code to the 10,000 pages. I'm thinking a very smart GPT-based web parser could do that, with dynamically generated scraping methods.

- Finally because this article mentions tables, Pandas has a very nice feature `from_html("http:/the-website.com")` that will detect and parse all tables on a page. But the article does a good job pointing at websites where the method would fail because the tables don't use `<table/>`

[1] https://github.com/Zulko/composer-timelines

davidsojevic · on Sept 3, 2024

If you haven't considered it, you can also use the direct wikitext markup, from which the HTML is derived.

Depending on how you use it, the wikitext may or may not be more ingestible if you're passing it through to an LLM anyway. You may also be able to pare it down a bit by heading/section so that you can reduce it do only sections that are likely to be relevant (eg. "Life and career") type sections.

You can also download full dumps [0] from Wikipedia and query them via SQL to make your life easier if you're processing them.

[0] https://en.wikipedia.org/wiki/Wikipedia:Database_download#Wh...?

zulko · on Sept 3, 2024

> reduce it do only sections that are likely to be relevant (eg. "Life and career")

True but I also managed to do this from HTML. I tried getting pages wikitext through the API but couldn't find how to.

Just querying the HTML page was less friction and fast enough that I didn't need a dump (although when AI becomes cheap enough, there is probably a lot of things to do from a wikipedia dump!).

One advantage of using online wikipedia instead of a dump is that I have a pipeline on Github Actions where I just enter a composer name and it automagically scrapes the web and adds the composer to the database (takes exactly one minute from the click of the button!).

distances · on Sept 3, 2024

Wikipedia's api.php supports JSON output, which probably helps already quite a bit. For example https://en.wikipedia.org/w/api.php?action=query&prop=extract...

zulko · on Sept 4, 2024

Oooh I had missed that thanks!

iudqnolq · on Sept 4, 2024

This doesn't directly address your issue but since this caused me some pain I'll share that if you want to parse structured information from Wikipedia infoboxes the npm module wtf_wikipedia works.

zulko · on Sept 1, 2024

If you like this type of maze, I used to make similar ones based on traffic-light systems.

http://zulko.github.io/blog/2014/04/27/viennese-mazes-what-t...

zulko · on Jan 17, 2023

There is a 2022 french movie with the same premise (an elderly woman on her way to a hospice asks the taxi driver to drive her through Paris):

https://www.imdb.com/title/tt14586118/