Hacker Newsnew | past | comments | ask | show | jobs | submit | VeejayRampay's commentslogin

thanks for this post

it should be repeated ad-nauseam that he is a crook, a shame for the country and its values and that the whole discourse about the injustice of the sentencing has heavy anti-liberal vibes


you're not in the minority, there's just intense fanboyism on Hacker News to promote OpenAI, because it serves the whole "LLM revolution" schtick better

Gemini has been dominating the field for about a year now, but I suppose Google is bit boring cause they just do things well


due to the nature of PDF, none of the tools mentioned here can do things as simple as detecting tables on pages with high accuracy

PDF is absolutely mint for display but it really suffers when parsing is involved


Yeah, I've been expecting someone to work up a system where:

- source file is .md

- file is compiled to .pdf _and_ the .md source file is included as an attachment

- when working with the file beyond viewing as a .pdf the .md is extracted and used instead of the .pdf

The LaTeX folks have a similar system ages ago where the .tex source would be included in a .pdf made from a .tex file for embedding in documents so that it could be sent in say an e-mail and then edited by the recipient --- absolutely awesome for discussing math via e-mail.


That's a good concept but I don't think Markdown is expressive enough for all the layouts & formatting that people typically want in PDFs. More likely that the source format would be something like HTML or SVG or .docx.


Restructured text has mostly 1:1 correspondence with Docbook. I use an XSLT transform to convert its XML schema into Docbook and PDF from there via XSL-FO.


python will be the last man standing with basically no functional goodies

people will keep on trucking with their "pythonic" for loops containg appends to a list that was initialized right before, their disgust for recursion, their absence of lambdas, the lack of monadic return types or containers


> with basically no functional goodies

Python has had `map` and friends for well over 20 years. Also see the built in `functools`


List, dict and set comprehensions, generators are used quite a lot in Python and feel very functional to me.


remotely related, but I have yet to find a solution for page classification in a document for tables, i.e. a classifier that returns the index of pages containing tables in a document that is reliable

solutions using things like img2table or pymupdf are really bad (pymupdf is not even reliable for text pdfs)


In my experience, this task is incredibly difficult for generality.

Handcrafting based on the dataset is the only way to get high performance.


for context, the author is Aaron Patterson of Ruby and Ruby and Rails fame, a proficient C programmer and overall hacker, he knows his stuff


it's funny to observe how picky and cynical the HN crowd suddenly becomes when the disruptive technology is from china


What part of this is disruptive? It kind of has to work well to be disruptive, doesn't it?


You can't be critical anymore?


deepseek is from china and all their papers have been very well received


160 comments on this, OpenAI is definitely not the hype anymore


it is pedantic, everyone knows what "node" means in this context


Apparently not, because I first assumed that he was talking about TypeScript considering that JavaScript doesn't have much of type system to compare to.


and speed, it's way way faster


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: