Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I am thoroughly unimpressed by GPT-5. It still can't compose iambic trimeters in ancient Greek with a proper penthemimeral cæsura, and it insists on providing totally incorrect scansion of the flawed lines it does compose. I corrected its metrical sins twice, which sent it into "thinking" mode until it finally returned a "Reasoning failed" error.

There is no intelligence here: it's still just giving plausible output. That's why it can't metrically scan its own lines or put a cæsura in the right place.



It once again completely fails on an extremely simple test: look at a screenshot of sheet music, and tell me what the notes are. Producing a MIDI file for it (unsurprisingly) was far beyond its capabilities.

https://chatgpt.com/share/68954c9e-2f70-8000-99b9-b4abd69d1a...

This is not anywhere remotely close to general intelligence.


Interpreting sheet music images is very complex, and I’m not surprised general-purpose LLMs totally fail at it. It’s orders of magnitude harder than text OCR, due to the two-dimensional-ness.

For much better results, use a custom trained model like the one at Soundslice: https://www.soundslice.com/sheet-music-scanner/


> I am thoroughly unimpressed by GPT-5. It still can't compose iambic trimeters in ancient Greek with a proper penthemimeral cæsura, and it insists on providing totally incorrect scansion of the flawed lines it does compose

This would be a hilarious take to read in 2020


I'm AI skeptical as the next guy but as a person with no understanding of what the context here this the parent comment is funny as fuck


It's well-known at this point that LLMs don't handle spelling, syllables, rhythm, meter, or other word-form-based questions well due to tokenization -- sometimes sheer scale (or leaning on code) can get the right answer if they're lucky, but they're literally blind to the individual letters.

(Incidentally, go back in time even five years and this specific expectation of AI capability sounds comically overblown. "Everything's amazing and nobody's happy.")


This is a great test because it’s something you could teach an elementary school kid in an hour.


is this a joke


No, it’s easy if the kid already knows the alphabet. Latin scansion was standard grade school material up until the twentieth century. Greek less so, but the rules for it are very clear-cut and well understood. An LLM will regurgitate the rules to you in any language you want, but it cannot actually apply the rules properly.


is ancient greek similar enough to modern day greek that an elementary school kid could learn to compose anything not boilerplate in an hour? Also, do you know that if you fed the same training material you need to train the kid in an hour into the LLM it can't do it?


To outperform GPT-5 in this case, all the kid needs to do is correctly recognize the syllable stress constraint. Even if they can't quickly compose many such poems, they could still be able to tell when something they've written doesn't match the constraints.


I can't tell whether you're serious or not. Your criterion for an "impressive" AI tool is that it be able to write and scan poetry in ancient Greek?


AI looks like it understands things because it generates text that sounds plausible. Poetry requires the application of certain rule to that text, and the rules for Latin and Greek poetry are very simple and well understood. Scansion is especially easy once you understand the concept, and you actually can, as someone else suggested, train a child to scan poetry by applying these rules.

An LLM will spit out what looks like poetry, but will violate certain rules. It will generate some hexameters but fail harder on trimeter, presumably because it is trained on more hexametric data (epic poetry: think Homer) than trimetric (iambic and tragedy, where it’s mixed with other meters). It is trained on text containing the rules for poetry too, so it can regurgitate rules like defining a penthemimeral cæsura. But, LLMs do not understand those rules and thus cannot apply them as a child could. That makes ancient poetry a great way to show how far LLMs are from actually performing simple, rules-based analysis and how badly they hide that lack of understanding by BS-ing.


This is not a useful diversion, it's like arguing if a submarine swims.

LLMs are simple, it doesn't take much more than high school math to explain their building blocks.

What's interesting is that they can remix tasks they've been trained very flexibly, creating new combinations they weren't directly trained on: compare this to earlier smaller models like T5 that had a few set prefixes per task.

They have underlying flaws. Your example is more about the limitations of tokens than "understanding", for example. But those don't keep them from being useful.


> those don't keep them from being useful.

They do stop it from being intelligent though. Being able to spit out cool and useful stuff is a great achievement. Actual understanding is required for AGI and this demonstrably isn't that, right?


I don't care if people want to debate over the semantics of intelligence to be honest.

Similarly, most AGI discussions are just people talking past each other and taking pot shots at predicting the future.

I've come to accept some topics in this space just don't invite useful or meaningful discussion.


Pure failure:

"You’ve given:

Moon in the 10th house (from the natal Ascendant)

Venus in the 1st house (from the natal Ascendant)

Step-by-step: From the natal Ascendant’s perspective

Moon = 10th house

Venus = 1st house

Set Moon as the 1st house (Chandra Lagna)

The natal 10th house becomes the 1st house in the Chandra chart.

Therefore, the natal 1st house is 3rd house from the Moon:

10th → 1st (Moon)

11th → 2nd

12th → 3rd (which is the natal 1st)

Locate Venus from the Moon’s perspective

Since Venus is in the natal 1st, and natal 1st is 3rd from Moon,

Venus is in the 3rd house from Chandra Lagna.

Answer: From Chandra Lagna, Venus is in the 3rd house."


I too can't compose iambic trimeters in ancient Greek but am normally regarded as of average+ intelligence. I think it's a bit of an unfair test as that sort of thing is based of the rhythm of spoken speech and GPT-5 doesn't really deal with audio in a deep way.


Most classicists today can’t actually speak Latin or Greek, especially observing vowel quantities and rhythm properly, but you’d be hard pressed to find one who can’t scan poetry with pen and paper. It’s a very simple application of rules to written characters on a page, but it is application, and AI still doesn’t apply concepts well.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: