Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

As embedding models become more performant and context windows increase, “ideal chunking” becomes less relevant.

Cost isn’t as important to us, so we use small chunks and then just pull in the page before and after. If you do this on 20+ matches (since you’re decomposing the query multiple times), you’re very likely finding the content.

Queries can get more expensive but you’re getting a corpus of “great answers” to test against as you refine your approach. Model costs are also plummeting which makes brute forcing it more and more viable.



Bigger context windows don't work as well as advertised. The "hay in a haystack" problem is not solved yet.

Also bigger context windows mean a lot more time waiting for an answer. Given the quadratic nature of context windows, we are stuck using transformers in smaller chunks. Other architectures like Mamba may solve that, but even then, increases in context window accuracy are not 1000x.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: