Hacker Newsnew | past | comments | ask | show | jobs | submit | roykishony's commentslogin


Thanks so much for these thorough comments.

You suggested some directions for more complex analysis that could be done on this data - I would be so curious to see what you get if you could take the time to try out running data-to-paper as a co-pilot on your own - you can then give it directions and feedback on where to go - will be fascinating to see where you take it!

We also must look ahead: complexity and novelty will rapidly increase as ChatGPT5, ChatGPT6 etc are rolled in. The key with data-to-paper is to build a platform that harnesses these tools in a structured way that creates transparent and well-traceable papers. Your ability to read and understand and follow all the analysis in these manuscripts so quickly speaks to your talent of course, but also to the way these papers are structured. Talking from experience, it is much harder to review human-created papers at such speed and accuracy...

As for your comments on “it's certainly not close to something I could submit to a journal” - please kindly look at the examples where we show reproducing peer reviewed publications (published in a completely reasonable Q1 journal, PLOS One). See this original paper by Saint-Fleur et al: https://journals.plos.org/plosone/article?id=10.1371/journal...

and here are 10 different independent data-to-paper runs in which we gave it the raw data and the research goal of the original publication and asked it to do the analysis reach conclusions and write the paper: https://github.com/rkishony/data-to-paper-supplementary/tree... (look up the 10 manuscripts designated “manuscriptC1.pdf” - “manuscriptC10.pdf”)

See our own analysis of these manuscripts and reliability in our arxiv preprint: https://arxiv.org/abs/2404.17605

Note that the original paper was published after the training horizon of the LLM that we used and also that we have programmatically removed the original paper from the result of the literature search that data-to-paper does so that it cannot see it in the search.

Thanks so much again and good luck for the exam tomorrow!


wow - thank you for the meticulous check - these are issues we should certainly fix!


thanks! indeed currently we only provide the LLM with a short tldr created by Semantic Scholar for each paper. Reading the whole thing and extracting and connecting to specific findings and results will be amazing to do. Especially as it can start creating a network of logical links between statements in the vast scientific literature. txtai indeed looks extremely helpful for this.


Excellent! I’m glad my input was interesting.

txtai has some demos of automated semantic graph building that might be relevant.

I noticed you didn’t really use any existing agent frameworks, which I find very understandable as their value added can be questionable over DIY approaches. However txtai might fit better with your overall technology style and philosophy.

Has your team studied latest CoT, OPA, or research into Cognitive architectures?


thanks. will certainly look deeper into txtai. our project is now open and you are more than welcome to give a hand if you can! yes you are right - it is built completely from scratch. Does have some similarities to other agent packages, but we have some unique aspects especially in terms of tracing information flow between many steps and thereby creating the idea of "data-chained" manuscripts (that you can click each result and go back all the way to the specific code lines). also, we have a special code-running environment that catches many different types of common improper uses of imported statistical packages.


“data-chained” will be very valuable, especially for the system to evaluate itself and verify the work it’s performed.

this is obviously just my initial impression on a distracted Sunday but I’m very encouraged by your project and I will absolutely be following it and looking at your source code.

The detractors don’t understand LLMs and probably haven’t used them in the way you have and I have. They don’t understand that with CoT and OPA that they can be used to reason and think themselves.

I’ve used them for full automated script writing, performing the job of a software developer. I’ve also used them to create study guides and practice tests, and then grade those tests. When one implements first hand automated systems with agent frameworks using the APIs it gives a deeper understanding of their power over the basic chat usage most are familiar with.

The people arguing that your system can’t do real science are silly, as if the tedious process and logical thinking is something so complex and human that the LLMs can’t do it when used within a cognitive framework, of course they can!

Anyway I’m very exited by your project. I hope this summer to spend at least a week dedicated to setting it up and exploring potential integrations with txtai for use on private knowledge bases in addition to your public Scholarly published papers.


and yes we are implementing CoT and OPA - but surely there is ton of room for improvements!


Thanks everyone for engagement and discussion. Following the range of comments, just a few thoughts:

1. Traceability, transparency and verifiability. I think the key question for me is not only whether AI can accelerate science, but rather how we can use AI to accelerate science while at the same time enhancing key scientific values, like transparency, traceability and verifiability.

More and more these days when I read scientific papers, published either at high impact journals or at more specialized journals, I find it so hard, and sometimes even frustratingly impossible, to understand and check what exactly was done to analyze the raw data and get to the key results, what was the specific chain of analysis steps, what parameters where used, etc, etc. The data is often not there or is poorly annotated, the analysis is explained poorly, the code is missing or is impossible to track, etc etc. At all, it became practically impossible to repeat and check the analysis and the results of many peer reviewed publications.

Why are papers so hard to follow and trace? Because writing clear and fully traceable and transparent papers is very hard, and we don’t have powerful tools for doing this, and it requires the scientific process itself (or at least the data analysis part) to be done in an organized and fully traceable way.

Our data-to-paper approach is designed to provide ways to use AI powerfully, not only to speed up science (by a lot!), but also at the same time to use AI to enhance transparency, traceability and verifiability. Data-to-paper sets a standard for traceability and verifiability which imo exceeds the current level of human created manuscripts. In particular:

1. “Data-Chaining": by tracing information flow through the research steps, data-to-paper creates what we call “data-chained” manuscripts, where results, methodology and data are programmatically linked. See this video (https://youtu.be/mHd7VOj7Q-g). You can also try click-tracing results in this example ms: https://raw.githubusercontent.com/rkishony/data-to-paper-sup...

See more about this and more examples in our preprint: https://arxiv.org/abs/2404.17605

2. Human in the loop. We are looking at different ways to create a co-piloted environment where human scientists can direct and oversee the process. We currently have a co-pilot app that allows users to follow the process, to set and change prompts and to provide review comments at the end of each steps (https://youtu.be/Nt_460MmM8k). Will be great to get feedback (and help!) on ways in which this could be enhanced.

3. P-value hacking. Data-to-paper is designed to raise an hypothesis (autonomously, or by user input) and then go through the research steps to test the hypothesis. If the hypothesis test is negative, it is perfectly fine and suitable to write a negative-result manuscript. In fact, in one of the tests that we have done we gave it data of a peer reviewed publication that reports a positive and a negative result and data-to-paper created manuscripts that correctly report both of these results.

So data-to-paper on its own is not doing multiple hypothesis searches. In fact it can help you realize just how many hypotheses you have actually tested (something very hard for human research even when done honestly). Can people ask data-to-paper to create 1000 papers and then read them all and choose only the single one in which a positive result is found? Yes - people can always cheat and science is built on trust, but it is not going to be particularly easier than any other of the many ways available for people to cheat if they want.

4. Final note: LLMs are here and are here to stay and are already used extensively in science doing (sadly sometimes undisclosed: https://retractionwatch.com/papers-and-peer-reviews-with-evi...). The new models of ChatGPT5, ChatGPT6, ... will likely write a whole manuscript for you in just a single prompt. So the question is not whether AI will go into science (it already does), but rather how to do so and use AI in ways that fosters, not jeopardizes, accountability, transparency, verifiability and other important scientific values. This is what we are trying to do with data-to-paper. We hope our project stimulates further discussions on how to harness AI in science while preserving and enhancing key scientific values.


Hi,

thanks for the honest and thoughtful discussion you are conducting here. Comments tend to be simplistic and it's great to see that you raise the bar by addressing criticism and questions in earnest!

That said, I think the fundamental problem of such tools is unsolvable: Out of all possible analytical designs, they create boring existing results at best, and wrong results (i.e. missing confounders, misunderstanding context ...) as the worst outcome. They also pollute science with harmful findings that lack meaning in the context of a field.

These issues have been well-known for about ten years and are explained excellently e.g in papers such as [1].

There is really one way to guard against bad science today, and that is true pre-registration. And that is something which LLMs fundamentally cannot do.

So while tools such as data-to-paper may be helpful, they can only be so in the context of pre-registered hypotheses where they follow a path pre-defined by humans before collecting data.

[1] http://www.stat.columbia.edu/~gelman/research/unpublished/p_...


Thanks much for these thoughtful comments and ideas.

I can’t but fully agree: pre-registered hypothesis is the only way to fully guard against bad science. This in essence is what the FDA is doing for clinical trials too. And btw lowering the traditional and outdated 0.05 cutoff is also critical imo.

Now, say we are in a utopian world where all science is pre-registered. Why can’t we imagine AI being part of the process that creates the hypotheses to be registered? And why can’t we imagine it also being part of the process that analyzes the data once it’s collected? And in fact, maybe it can even be part of the process that help collects the data itself?

To me, neither if we are in such a utopian world, nor in the far-from-utopian current scientific world, there is ultimately no fundamental tradeoff between using AI in science and adhering to fundamental scientific values. Our purpose with data-to-paper is to demonstrate and to provide tools to harness AI to speed up scientific discovery while enhancing the values of traceability and transparency and make our scientific output much more traceable and understandable and verifiable.

As of the question of novelty: indeed, research on existing public datasets which we have currently done cannot be too novel. Though scientists can also use data-to-paper with their own fascinating original data. It might help in some aspects of the analysis, certainly help them keep track of what they are doing and how to report it transparently. Ultimately I hope that such co-piloting deployment will allow us delegating more straight forward tasks to the AI and letting us human scientists to engage in higher level thinking and higher level conceptualization.


True, we seem to have a pretty similar perspective after all.

My concern is an ecological one within science, and your argument addresses the frontier of scientific methods.

I am sure both are compatible. One interesting question is what instruments are suitable to reduce negative externalities from bad actors. Pre-registration works, but is limited to few fields where the stakes are high. We will probably similarly see a staggered approach with more restrictive methods in some fields and less restrictive ones in others.

That said, there remain many problems to think about: E.g. what happens to meta-analyses if the majority of findings comes from the same mechanism? Will humans be able to resist the pull of easy AI suggestions and instead think hard where they should? Are there sensible mechanisms for enforcing transparency? Will these trends bring us back to a world in which trust was only based on prestige of known names?

Interesting times, certainly.


> That said, I think the fundamental problem of such tools is unsolvable: Out of all possible analytical designs, they create boring existing results at best, and wrong results (i.e. missing confounders, misunderstanding context ...) as the worst outcome. They also pollute science with harmful findings that lack meaning in the context of a field.

This doesn't seem correct to me at all. If new data is provided and the LLM is simply an advanced tool that applies known analysis techniques to the data, then why would they create “boring existing results”?

I don’t see why systems using an advanced methodology should not produce novel and new results when provided new data.

There is a lot of reactionary or even luddite responses to the direction we are headed with LLMs.


Sorry but I think we have very different perspectives here.

I assume you mean that LLMs can generate new insights in the sense of producing plausible results from new data or in the sense of producing plausible but previously unknown results from old data.

Both these things are definitely possible, but they are not necessarily (and in fact often not) good science.

Insights in science are not rare. There are trillions of plausible insights, and all can be backed by data. The real problem is the reverse: Finding a meaningful and useful finding in a sea of billion other ones.

LLMs learn from past data, and that means they will have more support for "boring", i.e. conventional hypotheses, which have precedent in training material. So I assume that while they can come up with novel hypotheses and results, these results will probably tend to conform to a (statistically defined) paradigm of past findings.

When they produce novel hypotheses or findings, it is unlikely that they will create genuinely meaningful AND true insights. Because if you randomly generate new ideas, almost all of them are wrong (see the papers I linked).

So in essence, LLMs should have a hard time doing real science, because real science is the complex task of finding unlikely, true, and interesting things.


Have you personally used LLMs within agent frameworks that apply CoT and OPA patterns or others from cognitive architecture theories?

I’d be surprised if you have used LLMs beyond the classic chat based linear interface that is commonly used and still have the opinions you do.

In my opinion, once you combine RAG and agent frameworks with raw observational input data they can absolutely do real reasoning, analysis, and create new insights that are meaningful and will be considered genuine new science. This project/group we are discussing have practically proven this with their replication examples. The reason this is possible is because the LLM is not just taught how to repeat information but it can actually reason and analyze at a human level and beyond when utilizing it’s capabilities within a well designed cognitive architecture using agents.


yes - LLMs tuned based on data science publications will be great. need a dataset of papers with reliable and well-performed analysis. Notably though it works quite well even with the general purpose LLMs. The key was to break the complex process into smaller steps where results from upstream steps are used downstream. that also creates papers where every downstream result is programmatically linked to upstream data.


yes that's sounds like the type of data that will be fun to try out with data-to-paper! The repo is now open - you're welcome to give it a try. and happy to hear suggestions for improvements and development directions. data-to-treatment date-to-insights data-to-prevention data-to-???


Evaluate quality of generated papers on 10-20 samples with peer review.


Excited to launch "Quibbler", an open-source Python package for interactive data analysis. Fun to use. Nothing to learn. Your standard code effortlessly comes to life! With the amazing Maor Kern and Maor Kleinberger.

https://github.com/Technion-Kishony-lab/quibbler

See also our QUIBBLE COMPETITION: https://kishony.technion.ac.il/best-quibble-award/


Looks great! Saw it earlier on Twitter when someone quote-tweeted https://twitter.com/RoyKishony/status/1602311320073805824.

Suggestion: if you are able to edit the post title, add a "Show HN" prefix, see https://news.ycombinator.com/showhn.html for more details.

Edit: Just realized why the tool's name seemed familiar ;)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: