Hacker Newsnew | past | comments | ask | show | jobs | submit | pacbard's commentslogin

When you think about a data science pipeline, you really have three separate steps:

[Data Preparation] --> [Data Analysis] --> [Result Preparation]

Neither Python or R does a good job at all of these.

The original article seems to focus on challenges in using Python for data preparation/processing, mostly pointing out challenges with Pandas and "raw" Python code for data processing.

This could be solved by switching to something like duckdb and SQL to process data.

As far as data analysis, both Python and R have their own niches, depending on field. Similarly, there are other specialized languages (e.g., SAS, Matlab) that are still used for domain-specific applications.

I personally find result preparation somewhat difficult in both Python and R. Stargazer is ok for exporting regression tables but it's not really that great. Graphing is probably better in R within the ggplot universe (I'm aware of the python port).


I set up something similar at work. But it was before the DuckLake format was available, so it just uses manually generated Parquet files saved to a bucket and a light DuckDB catalog that uses views to expose the parquet files. This lets us update the Parquet files using our ETL process and just refresh the catalog when there is a schema change.

We didn't find the frozen DuckLake setup useful for our use case. Mostly because the frozen catalog kind of doesn't make sense with the DuckLake philosophy and the cost-benefit wasn't there over a regular duckdb catalog. It also made making updates cumbersome because you need to pull the DuckLake catalog, commit the changes, and re-upload the catalog (instead of just directly updating the Parquet files). I get that we are missing the time travel part of the DuckLake, but that's not critical for us and if it becomes important, we would just roll out a PostgreSQL database to manage the catalog.


Because applying for a visa takes money, time, and a visit to the embassy.

ESTA/ETIAS gets automatically approved within a few minutes of paying for the fee (I guess this is true for 99.999% of applicants).

Very few countries allow people to just show up and cross the border. US citizens had that privilege in a lot of places, but it looks like it’s changing now.


I have never visited an embassy to get a visa--though I did cancel a couple of business trips when it became too much of an effort because of timing relative to other trips. I've travelled to a bunch of countries where I could just go through immigration with a US passport or maybe pay for a visa on arrival.


Not true - there are plenty of countries that have e-visa (online application).


The hunter-gatherers in the study lived in the "Late Holocene (~4000 to 250 BP)", meaning between 2000 BCE to 1825 CE. These people are separated from us by less than 150 generations. I don't believe that humans evolve that fast, so the way you think, feel, ache, and so on also applies to them. Would you leave behind your injured and disabled in their situation (which is speculated to be the result of hunting accidents)?


Anthropology started at a time when people thought civilizations evolved in a straight line from savages to England. But it's hard to pretend that the natives sat around a rock grunting at each other when their e.g. bone-setting techniques were essentially modern, so there's a tradition of "not as benighted as you might have thought" articles.

WHY that point of view still exists is a question every anthro novice asks, and it turns out that cultural evolution is too attractive an idea for some people to let go of.


> sat around a rock grunting at each other

Seems crazy to me, given anyone with children that is exposed to multiple languages can easily imagine how complex the language scene must have been in humans that did not write, given how easy and natural it is for little ones to pick up different languages that they speak with different people.


Most likely even Heidelbergensis had "complex grunting" and hand signs so humans in the neolithic are effectively identical to us in language capability.


But then why did they spend so much time without writing or with the same level of tech for so long?


Go ahead. Invent some new tech that absolutely no one know about or how to do and that isn't based on any known tech. I'm waiting. What's taking so long?

Discovering stuff is hard and harder if you don't think you need it. People kept fire going before they knew how to start fires. If you don't know about the concept of flint or lighting dry stuff with sparks, it is really hard to invent fire starting. Writing isn't as useful if you can just learn what you need to know while growing up. A more complicated world later - as are discoveries slowly started to build up - probably created the need.

But again, those discoveries are hard and they took time. A really long time, apparently.


I think there is a tendency to project the modern era's speed of technological progress back in time, which isn't reasonable. We went from the Wright Brothers to Apollo 11 in 66 years. The first transistor to the iPhone in ~60 years. That rate of development is...new.


My thinking is that they didn’t have any time to invent new things. They did chores and then died.


Hunter gatherers had a ton of free time. It's almost impossible to describe how thick on the ground resources were pre-industrialization.


check your privilege, you anglos think you are the best culture, you have It so ingrained you dont even notice it


I believe they were joking. Maybe you should check your sense of humour?


Why is it written BP? These archaeology people / Phys.org really need to cease with that confusing nonsense. BP is supposedly "Before Present" or "Before Physics" modern referring to practical radiocarbon dating with a cutoff date of January 1, 1950. [1] Way too easy to transpose BCE / BC / BP.

[1] WP, Before Present: https://en.wikipedia.org/wiki/Before_Present

It's written like these people were supposedly cave people, yet based on this story's confusing usage, these people were caring for each other after the Spanish and Portuguese colonization of South America up to the 1700's. 4000 BP is the "really Late Holocene" 2050 BCE, 250 BP is 1700 AD. Also, the "late Holocene" goes all the way to Y2K (2000 AD). [2] The Meghalayan is the "the current age or latest geologic age." [3]

[2] WP, Holocene Era: https://en.wikipedia.org/wiki/Holocene

[3] WP, Meghalayan: https://en.wikipedia.org/wiki/Meghalayan

Really does make me wonder if these people know what they're doing / writing.


  I don't believe that humans evolve that fast
Evidence of animals doing this exists. Unsure why anyone would be surprised theres evidence of humans doing this.

It's really wild to me how many humans believe their feelings are so different from animals. Most animals have similar incentives and desires, humans just have "better" tools to achieve them.


Grieving orcas have been found to move their dead babies around for many weeks. Chimps will fetch plants like Scutia myrtina, which is toxic in large amounts but acts as a anthelmintics (anti-worm drug), for fellow members of their group when they're sick. Elephants will defend their wounded and even bring food or water to them or help them stand up when they're struggling to.

Not sure why you're being downvoted. You're absolutely right. These types of behaviors can be seen all throughout the animal world. Especially for animals showing degrees of eusociality.


These feelings are also extremely important for the preservation of one's species. No wonder evolution took this approach in multiple occasions, animals tend to get lonely if they aren't kind to one another, leaving themselves open to get killed. The incentive is there to have strength in numbers, and being "emotional" in some ways contributes to further that.


The people they talk about are contemporary to the Babylonions who have already absorbed the urban Uruk civilization that started to peak a millenium prior. The difference isn't biology but resource density and climate favorability leading to higher social organization.


The costs and benefits faced by ancient humans were very, very different. Maybe a different way to frame the question would be "At what probability of additional death, injury, or suffering (to you or other tribe members) would you abandon your injured/disabled?" Humans of that era did not have anything even remotely approaching modern medicine and most lived at subsistence levels with starvation always at their doorstep. A huge portion of ancient peoples energy and time was dedicating to obtaining calories. That means caring for the injured/disabled imposes a huge cost and risk. We can just as easily find examples of ancient peoples murdering or abandoning their injured, disabled, and weak. I don't think it would be right or fair to judge them through a modern lens. Of course they cared for their loved ones and mourned their deaths. But they were faced with much harsher circumstances to which their cultures and beliefs were suited.


It would be helpful to provide some citations and evidence around the claim “ most lived at subsistence levels with starvation always at their doorstep”. There is an increasing amount of evidence that this was not the case.

https://medium.com/sapere-aude-incipe/our-distorted-image-of...


> most lived at subsistence levels with starvation always at their doorstep

Genuine question: is this something we know from evidence, or an assumption? I vaguely recall having read that comparison between skeletal remains of early farmers and hunter-gatherers indicated that the latter had a better diet, but I'm not sure if I'm remembering correctly or how much that observation generalizes.


We actually have a ton of evidence refuting this. The two things anthropologists spend their whole time rejecting in popular sciences is the barter myth and the idea that hunter-gatherer lives are "nasty brutish and short".

The nasty brutish and short idea might have been true about many medieval European peasants but the rest of the world wasn't cramped up with livestock and poverty conditions with poor sanitation. Other people simply didn't face as much disease. There was actually some really interesting work in bioarcheology in 2018 that showed that even extremely long lifespans was not actually that rare.[0] And those who made it to adulthood could generally expect a long life (obviously tons of variation here). In the city of Cholula, Mexico, between 900 and 1531, most people who made it to adulthood lived past the age of 50.[1]

[0] https://sc.edu/uofsc/posts/2022/08/conversation-old-age-is-n...

[1] https://onlinelibrary.wiley.com/doi/10.1002/ajpa.22329

Not to mention the famous "Man the Hunter" symposium where Marshall Sahlins introduced the Original Affluent Society Thesis which has since been largely upheld and reinforced.


Both early farmers and hunter-gatherers regularly endured calorie scarcity. The difference between them along this dimension is minor compared to the difference between either group and us and our calorie security.


> most lived at subsistence levels with starvation always at their doorstep

I find this hilarious. Modern civilization has starvation at our doorstep. If the modern supply chains fail, so very many would starve.

Did toilet paper become scarce about 5 years ago? I don't see what protects the population from that for food and water.


you have a point actually. Non-agricultural people had much more varied diets and we have almost zero archeological examples of famines leading to mass deaths of non-agricultural peoples but we have plenty of examples of that happening to agricultural people. Agriculture was, especially initially, a huge step back in food security.

Obviously things have changed a lot since then but some of the risks remain. Cuba is a fascinating case study for what happens when a modern agricultural supply chain can collapse (due to US sanctions). Many many died. But since then there's been a massive focus on locally grown food and even wild tending. I know many people who are into permaculture and alternatives to industrial agriculture who have traveled there to study


This feels like video game analysis. Unit is likely to die, therefore do not spend resources on unit. Leave unit behind.

There is no world in which I would leave a family member or close friend to die in the woods alone, especially if I have no idea what germs are, why people die when they bleed, and am listening to a voice I have heard my whole live cry out in pain. Even if I knew for sure they were going to die, I would sit with them, or move them, or something.

Thought experiment: Would you visit your mother or father in the hospital knowing they were going to die that day? I mean there's nothing you can do, why bother??


It's not about writing off the injured due to their low odds of survival, its about your willingness to lower those odds for your other loved ones, or yourself. How does your thought experiment change when caring for your mother/father means your children might starve?


Look man, modern people die trying to save strangers from drowning. We can just see actual behavior, we don't need bloodless thought experiments


Ok but for every person who tries to save a stranger from drowning how many other people choose not to? Probably not 0. If I saw a stranger drowning and they were larger than child-sized I probably wouldn't attempt it- apparently its pretty common for the drowning person to panic and use their savior as a raft, drowning them in the process


Why do volunteer firefighters rush into a burning building to try to save children from some family they have never met before? Every day we afforded examples of people sacrificing their personal interests for the benefit of others.

But also, biologists usually use a definition of "altruism" that does not include close kin. Richard Dawkins was explicit about this in his 1976 book "The Selfish Gene." Helping someone you are directly related to is not considered altruism.


It's literally a skill issue. The correct way to help a drowning person is to get behind them and then hook your weaker arm around their neck & head while doing backstroke with the other. Having them on their back facing up (and out of the water) dispels the panic reflex. But this obviously requires you to be comfortable int he water and have some prior rescue training.


I think in the premodern era, you never saw strangers (not like we do). You probably had a pretty good idea who everyone was, and probably knew most people pretty well. If that's even partially true, then although nowadays you might drive past a person on the highway, if your cousin or a lifelong trusted acquaintance asked for help you'd give it. It seems that everyone you saw, esp saw injured or sick, was probably someone you've known your whole life.

You're also heavily discounting the fact that you had to live not only with yourself if you did nothing, but the shame/angst of their family who you definitely lived next door to. TFA is about taking care of "their own", not strangers.


Good way to look at it. More broadly, there must have been different groups that practiced different policies with regard to ill and injured. Some of the groups fared better than others. Since most of modern societies do care about their ill and injured, it appears that this policy proved more advantageous. Even if only slightly so.


Can you conceive of how caring for the injured might have a benefit in an evolutionary / game theoretical sense?


This is the right question to ask. You can reason your way around things, but occam's razor reigns supreme. Injured people can still do lots of work, as our most important tools were our brains, not our bodies. It's not hard to watch for predators near camp while sitting at the campfire, or to keep an eye on children - even if you can't resolve issues yourself. You could sit around making crafts for the tribe, repairing clothes, and more.

There's just way too much benefit to keeping the injured around. We don't need everyone working at top physical condition... ever.


Without knowing what happened, it's difficult to make the comparison between the Italian Years of Lead and what happened earlier today at Utah Valley University.

My understanding of the Italian political climate of the 60s, 70s, and 80s is that there were political groups/cells (on both the far right and far left) that organized around violent acts to further their political goals (which involved the eventual authoritarian takeover of the Italian government by either the far right or far left). For example, you can think of the Red Brigades to be akin to the Black Panthers, but with actual terrorism.

In contrast, most political violence in America has been less organized and more individual-driven (e.g., see the Oklahoma City Bombing). For better or worse, the police state in the US has been quite successful in addressing and dispersing political groups that advocate for violence as a viable means for societal change.


This was an intentional adoption of leaderless resistance[0] in response to the vulnerabilities in centrally administered organisations of the 60-80s.

Resistance orgs across the ideological spectrum were systematically dismantled after decades of violence because their hierarchical command structures made them vulnerable to infiltration, decapitation and RICO-style prosecutions.

The Weather Underground, Red Army Faction, European Fascist groups and many white supremacist groups all fell to the same structural weaknesses.

Lessons were codified by the KKK and Aryan Nations movements in the USA in the early 90s by Louis Beam[1] who wrote about distributed organisational models.

This was so successful it cross-pollinated to other groups globally. Other movements adopted variations of this structure, from modern far-right and far-left groups to jihadist organisations[2]

This is probably the most significant adaptation in ideological warfare since guerilla doctorine. There has been a large-scale failure in adapting to it.

The internet and social media have just accelerated its effectiveness.

"Inspired by" vs "carried out by" ideological violence today is the norm.

[0] https://en.wikipedia.org/wiki/Leaderless_resistance

[1] https://www.splcenter.org/resources/extremist-files/louis-be...

[2] https://www.memri.org/reports/al-qaeda-military-strategist-a...


The KKK has been a distributed movement from the beginning, though, starting as isolated remnants of Confederate forces acting as terrorist cells in tandem with local officials and businessmen (e.g., plantation owners), and resurgent in the 20s and 30s (obviously sans the direct Confederate connections, replaced with local law enforcement).

It's not so much that we haven't been able to adapt to it as we've simply refrained from doing so. Their violence was in line with the interests of local elites.


Timothy McVeigh got his start watching Waco burn, hanging out with groups around the US "militia movement", and reading The Turner Diaries, and had like 3 accomplices.

He wasn't a "lone wolf".


But he also wasn't actually acting as a part of anything like the Red Brigades either, so the GP's point still stands.



Actually, it has been proven that at least two of the major terrorist attacks that happened in Italy during the lead years were actually false-flags attacks organized by a deviated part of the secret services (that were politically aligned with the far right), funded and supported by the US, in order to isolate politically the Brigate Rosse movement and stop any advance of communism in Italy.



you are inexcusably wrong, since the comment you are replying to have a Wikipedia link with further links to the work of historians.

you really try hard to see "bad commies" uh?


This was my take as well. At least microeconomics has moved away from large-scale observational studies and has moved into experimental and quasi-experimental studies.

While the methods alone cannot fix it all ("You can’t fix by analysis what you bungled by design" [1] after all), it gets somewhat closer to unbiased results.

[1]: https://www.degruyterbrill.com/document/doi/10.4159/97806740...


Not all null results are created equal.

There are interesting null results that get published and are well known. For example, Card & Kruger (1994) was a null result paper showing that increasing the minimum wage has a null effect on employment rates. This result went against the common assumption that increasing wages will decrease employment at the time.

Other null results are either dirty (e.g., big standard errors) or due to process problems (e.g., experimental failure). These are more difficult to publish because it's difficult to learn anything new from these results.

The challenge is that researchers do not know if they are going to get a "good" null or a "bad" one. Most of the time, you have to invest significant effort and time into a project, only to get a null result at the end. These results are difficult to publish in most cases and can lead to the end of careers if someone is pre-tenure or lead to funding problems for anyone.


Is that actually a null result though? That sounds like a standard positive result: "We managed to show that minimum wage has no effect on employment rate".

A null result would have been: "We tried to apply Famous Theory to showing that minimum wage has no effect on employment rate but we failed because this and that".


Looking at the definition in the article, it definitely is a null result. However the example does illustrate that 'a null result' probably isn't a very interesting thing to talk about because it covers too many types of result. I think what people on HN actually want to track is something more like 'a boring null result'. The real question is whether there a process that is reliably being followed and leading to research that matches reality (where a statistically significant result suggests something is real) or is scientific publishing highly biased towards odd results (where studies that muck up or get lucky with statistical noise are over-represented).

In this case we would expect some studies of the minimum wage to show it increases employment regardless of what the effect of wage rises is in the general case - eg, some official raised the minimum wage while a sector went into a boom for unrelated and coincidental reasons.


From a related nature article (https://www.nature.com/articles/d41586-024-02383-9), "null or negative results — those that fail to find a relationship between variables or groups, or that go against the preconceived hypothesis." According to this definition, I think both examples you provided are null results. Particularly here, where the context is the file drawer problem.


No, because in theory a minimum wage increase could decrease the unemployment rate. If it does neither, that’s a null result.


That's sort of like saying "I measured acceleration due to gravity by dropping a bowling ball off the Tower of Pisa, and got a null result: the ball simply hovered in midair."

If the result is very surprising and contradicts established theory, I wouldn't consider it a null result, even if some parameter numerically measures as zero.


The question being asked is, "what is the correlation between these two variables": is it positive, negative, or zero (null). The null hypothesis is the baseline you start from because the overwhelming majority of variables from physical observations are uncorrelated (e.g. what is the correlation between "how many people in the treatment group of this clinical trial made a full recovery" and "the price of eggs in Slovakia").

Measurements of some physical quantity are a different kind of experiment, you cannot phrase it as a question about the correlation between two variables. Instead you take measurements and put error bars on them (unless what you're measuring is an indirect proxy for the actual quantity, in which case the null hypothesis and p-value testing does become relevant again).


In fact, you can express measurement of g as a linear correlation between y-y0 and (t-t0)^0.5.


Null means nothing, zero. In the context of scientific articles a null result means that the difference is zero. What difference? Well it depends. It could be difference between doing something and not doing it. The difference between before and after some intervention. Or perhaps the difference between two different actions.

In this case the difference between before and after raising the minimum wage.

Furthermore, the thing with a null result is that it's always dependant on how sensitive your study is. A null result is always of the form "we can't rule out that it's 0". If the study is powerful then a null result will rule out a large difference, but there is always the possibility that there is a difference too small to detect.


Null means something doesn't exist and proving something doesn't exist can at times especially in certain sciences be as valuable as proving something does exist. Especially if it stops the rest of the field chasing the pot of gold at the end of the rainbow.


> Is that actually a null result though?

The above is a good point but I would extend it further. I mean, philosophically, you get a positive result from a negative (null) result by merely changing your hypotheses (e.g., something should not cause something else).


if you are p testing this isn’t the case. A positive result is a much stronger assertion


Sure and of course you shouldn't change your hypotheses. That's kind of the whole purpose of pre-registrations at a meta-level of things.


> Card & Kruger (1994) was a null result paper showing that increasing the minimum wage has a null effect on employment rates. This result went against the common assumption that increasing wages will decrease employment at the time.

I had to look that up, because more precisely, it showed that a particular minimum wage increase in NJ from $4.25 to $5.05 didn't increase unemployment in 410 particular fast food joints in 1989-1990 - https://davidcard.berkeley.edu/papers/njmin-aer.pdf - not that "increasing the minimum wage has a null effect on employment rates" at all, ever, no matter what. It's not as if increasing the minimum wage to something impossible to afford like $100 trillion wouldn't force everyone to get laid off, but nobody generally cares about the limiting case like that, as that's relatively unlikely.

The interesting part is non-linearity in the response, seeing where and how much employment rates might change given the magnitude of a particular minimum wage increase, what sort of profit margins the affected industries have, elasticity of the demand for labor and other adaptations by businesses in response to increases, not whether it's a thing that can happen at all.

And we're seeing a lot more such adaptions these days. There are McDonalds around here that have computers in the lobby where you input your order yourself and I've gotten drones to deliver my food instead of DoorDash drivers. That kind of thing was not yet practical back in 1989, when I remember using an Apple ][ GS and it's not clear that findings like this should be relied upon too heavily given that some of the adaptations available to businesses now were not practical back then, especially when technology may change that even more in the future.


What matters for publication is a surprising result, not whether it confirms the main hypothesis or the null one.

The "psychological null hypothesis" is that which follows the common assumption, whether that assumption states that there is a relationship between the variables or that there is not.


One of my most celebrated recent "null result" was published in Biological Psychiatry, a journal with an impact Factor about 13 last I checked. Even though it was a null result, it was decently powered longitudinal study that provided evidence inconsistent with the prevailing view in autism research. Even if there exists a difference that we failed to detect, it probably much smaller than what others had expected on the basis of second-grade evidence (cross-sectional group differences).


I agree. More succinctly, it confounds two types of results: a null result due to statistical noise (big error bars, experiment failure) and a null result where the null model is more likely (actually, the effect doesnt exist).

Like many things in statistics, this is solved by Bayesian analysis: instead of asking if we can reject the null hypothesis, the question should be which model is more likely, the null model or the alternate model.


Also, doesn't a null result mean "we weren't able to measure an effect" rather than "there is no effect"? That's a pretty big difference in how important something is.


Sure. But publishing this result with your experimental methodology may help others to refine the experiment and get a better answer.

It is absolutely shameful that negative results are almost never published. I sure that a lot of money and effort wasted by repeating the same dead-end experiments by many research groups just because there is no paper that said: “we tried it this way, it didn’t work”.


1. Papers are written to invite replication — the most important part of the scientific process. It is already difficult to compel replication even when you only put the most promising research in people's faces. Now you want them to have to also sift through thousands of entrants that have almost no reason for replication attempts?

2. It's easy to call it shameful when it isn't you who has to do the work. If you are like most other normally functioning people, you no doubt perform little experiments every day that end up going nowhere. How many have you written papers for? I can confidently say zero. I've got better things to do.


The lack of null result publication is a real issue for research and scientific knowledge in general, including for accurate positive results (what if once you see a positive result by statistical chance but the experiment was actually conducted before you by 10 other people but that wasn't published because the results were null?)

> Papers are written to invite replication

They are written to share knowledge, new discoveries. We hope that they are replicated.

> It's easy to call it shameful when it isn't you who has to do the work.

We are not judging anyone, we are qualifying the situation. And we are speaking of publishing results for experiments that have already conducted, not voluntarily making up null cases and doing the related experiments. It wouldn't make sense and would be harmful. But if you did an experiment that produced a null result, not publishing is a loss of knowledge. Again, we are not judging anybody, it's a failure of the publishing system. But this cannot change if nobody points it out.

I'd have troubles understanding a researcher not acknowledging today that the lack of null result publication is an issue. It would show a lack of perspective IMHO. And for someone acknowledging that there's an issue, pushing back is not the right stance.


> They are written to share knowledge, new discoveries. We hope that they are replicated.

If they aren't replicated, no new knowledge is gained. Not really.

> it's a failure of the publishing system.

What failure? Again, the "publishing system" is to submit the "best of the best" research to the world in order to invite replication. While nothing is perfect, we don't want people to have to sift through papers that have effectively no reason to be replicated in a quest to find something that is. That would make things infinitely worse.

The internet was created for the minor-leagues. If you are willing to put in the effect to document your "less than the best" research, put it up on a website for all to see already. There is nothing stopping you. But who wants to put in the effort?

> We are not judging anyone

How do you define "shameful" if it isn't related to judgement of someone?


You assume that null results are not worth it and are not the best of science. You seem to be ignoring the issue where researchers collectively waste time redoing "failing" studies again and again because the null results are not published, and other issues not publishing them causes.

We fundamentally disagree here and I'm not willing to put in the effort needed to try to convince you otherwise, many other comments are here are better than what I could write.

I do hope you take the time to take a step back (and you seem to be in defensive mode, if so, you need to go out of this) and reconsider.

> How do you define "shameful" if it isn't related to judgement of someone?

Do you know the expression "it's a shame"?

> used when you wish a situation was different, and you feel sad or disappointed

https://www.ldoceonline.com/dictionary/it-s-a-shame-what-a-s...


> You assume that null results are not worth it and are not the best of science.

No. I make no such assumption. That may often be true, but there is nothing to stop a useful null result from being published. We also get things wrong from time to time. It is very possible that the best baseball player in the world has been overlooked by Major League Baseball. We're pretty good at scoping out the best, but nothing in life is perfect.

If the best baseball player in the world ends up in the minor leagues instead, oh well? Does it really matter? You can still watch them there. Same goes for research. If something great doesn't make it into the formal publication system, you can still read it on the studier's website (if they put in the effort to publish it).

> You seem to be ignoring the issue where researchers collectively waste time redoing "failing" studies again and again because the null results are not published

I may be ignoring it, but if that's the case that's because it is irrelevant. There is no reason to not publish your "failing" studies. That's literally why the public internet was created (the original private internet was created for military, but next in line was university adoption — to be used for exactly that purpose!).

> We fundamentally disagree here and I'm not willing to put in the effort needed to try to convince you otherwise

Makes sense. There is nothing to convince me of. I was never not convinced. But it remains: Who wants to put in the effort? Unless you are going to start putting guns to people's backs, what are you expecting?


> I was never not convinced

Ok.

> There is no reason to not publish your "failing" studies. That's literally why the public internet was created

You are suggesting researchers should blog about their null results? It seems to me the null results deserve the same route as any other paper, with peer reviews, etc.

It matters, because this route is what other researchers trust. They wouldn't base their work on some non reviewed blog article that can barely be cited. You don't even base good science on some random article on Arxiv that was not published in some recognized avenue. If you are using some existing work to skip an experiment because it tells you "we've already tried this, it didn't show any effect", you want to be able to trust it like any other work. Hell, as a random citizen in a random discussion, especially one with a PhD, I don't want to be citing a blog article as established scientific knowledge.

And yes, getting published in a proper, peer reviewed avenue is work, but we all need to deeply internalize that it's not lesser work if the result is null.

> Unless you are going to start putting guns to people's backs, what are you expecting?

If researchers collectively decide it's worth pursuing, it's all about creating the incentives at the right place. Like any other research, you could be rewarded, recognized and all. High impact journals and conferences could encourage researchers to publish / present their null results.

Of course, we are not speaking about such things like "what two unrelated things could I try to measure to find some absence of correlation", we are speaking about "I think those two things are linked, let's make an experiment. Oh, no, they are not correlated in the end!" -> the experiment is done either way, just that the results also deserve to be published either way. And the experiment should only be published if it doesn't exhibit a fatal flaw or something, we are not talking about flawed experiment either.


> You are suggesting researchers should blog about their null results?

If they want to. Especially if it doesn't meet the standard for the publication system, why not?

> It seems to me the null results deserve the same route as any other paper, with peer reviews, etc.

If it ranks with the best of them, it is deserving. There isn't room for everything, though, just as there isn't room for everyone who has ever played baseball to join the MLB. That would defeat the entire purpose of what these venues offer.

But that doesn't mean you can't play. Anyone who wants to play baseball can do so, just as anyone who wants to publish research can do so.

> If researchers collectively decide it's worth pursuing

It only takes an individual. Unlike baseball, you can actually play publishing research all by yourself!

1. Where do we read your failed research? Given your stance, it would look very foolish to find out that you haven't published it.

2. Do you draw a line? Like, if you add a pinch more salt to your dinner and found that it doesn't taste any better, do you publish that research?


> There isn't room for everything

I get your point, but this is not specific to null results.

> It only takes an individual

No no no. The desirability of null results need to be recognized and somewhat consensual, and high impact journals and conferences needs to accept them. Otherwise, there's no reason researchers will work to publish them.

1. I don't publish anymore: I'm not a researcher anymore. I didn't encounter the case during the short time I was one (I could have, though. Now I know, years later. I suspect it would have been difficult to convince my advisors to do it). I hope this doesn't matter for my points to stand on their own. Note that I think null results ARE NOT failed research. This is key.

2. Ideally, null or positive result alike, the experiments and the studies need to be solid and convincing enough. Like, there needs to be enough salt and not too much, the dinner needs to be tasty in both cases. If the dinner doesn't taste good, of course you don't publish it. There is something wrong with what you've done (the protocol was not well followed, there's statistical bias, not enough data points, I don't know)

It feels like we are talking past each others, you are thinking I'm talking about failed research, but I'm talking about a hypothesis you believed could be true, you built an experiment to test it, and found no correlation in the end. This result is interesting and should be published, it's not failed research.

As it happens, I attended a PhD defense less than a month ago where the thesis lead to null results… The student was able to publish, these null results felt somewhat surprising and counter intuitive, so it's not like it's impossible, it just needs to be widely seen as not failed research.


> The desirability of null results need to be recognized and somewhat consensual

If it is interesting you should also find it interesting when you read it 30 years in the future. You don't need other people. It's a nice feeling when other people want to look at what you are doing, sure, but don't put the cart before the horse here. Publish first and prove to others that there is something of value there. They are not going to magically see the value beforehand. That is not how the human typically functions.

It's not like you have to invent the printing press to do it. Putting your work up on a website for the entire world to see is easy peasy. Just do it!

> Ideally, null or positive result alike, the experiments and the studies need to be solid and convincing enough.

No need to let perfect become the enemy of good. Publishing your haphazard salting experiment isn't apt to be terribly convincing, but it gets you into the habit of publishing. Eventually you'll come around to something that actually is interesting and convincing. It's telling if someone isn't willing to do this.

> The student was able to publish, these null results felt somewhat surprising and counter intuitive, so it's not like it's impossible

Exactly. Anything worthy of the major leagues will have no trouble getting formally published. But not everything is. And that's okay. You can still publish it yourself. If you want to play baseball, there is no need to wait around for the MLB to call, so to speak... Just do it!

> you are thinking I'm talking about failed research [...] it just needs to be widely seen as not failed research.

Yes, I am talking about what is widely seen as failed research. It may not actually be failed research in a practical sense, but the moniker is still apt, especially given that you even call it that yourself. I guess I don't understand what you are trying to say here.


> I guess I don't understand what you are trying to say here

I guess I failed to get my point across to you and I doubt I will suddenly manage to do it this far in the discussion. Nature failed this too apparently, so I guess that doesn't tell much about me.

> Putting your work up on a website for the entire world to see is easy peasy. Just do it!

I have already said why this is not an option. I'm terribly confused as to why you are even suggesting it. Recognized research doesn't currently happen in blog posts.

> it gets you into the habit of publishing. Eventually you'll come around to something that actually is interesting and convincing.

Irrelevant? Patronizing?

The issue is not at the individual level anyway. It is a systemic issue. We are discussing on a post from Nature called "Researchers value null results, but struggle to publishing them". Here's your systemic issue, raised in one of highest impact journals.

> It's telling if someone isn't willing to do this.

What does it tell you? That research is not the person's current job maybe?

> I am talking about what is widely seen as failed research

And this is my point. It shouldn't be.

> you even call it that yourself

Nope. At best I used quotes around the word "failing".

I don't see this discussion progressing and surfacing interesting points anymore. You are disrespectful. Your points are subtly moving targets. You are sharing irrelevant advice to someone who doesn't need them. You are sharing irrelevant baseball and food comparisons (that I tried to adopt anyway). You are misrepresenting what I wrote. You don't really engage the actual topic. The whole discussion is looping.

I believe you are trolling me. I tried to assume my feeling about this was wrong and gave you too much attention as a result. I should have stopped earlier. That'll teach me.

If Nature can't convince you there's an issue that has absolutely nothing to do with me, I won't neither.

I'm done.

Bye!


> I have already said why this is not an option.

All I can see is that you said people can't find a compelling reason to do it. But that was already said long before you ever showed up and is specifically the point made in the comment you originally responded to... What do you think you are adding by just repeating that original comment over and over?

> Recognized research doesn't currently happen in blog posts.

Stands to reason. Who is doing it? Nobody is going to recognize something that doesn't exist! You have to demonstrate the value first. That is true in everything. Research is not somehow magically different.

> What does it tell you?

That nobody wants to do it. But what do you want to tell us? We already knew that nobody wants to do it.

> I don't see this discussion progressing and surfacing interesting points anymore.

It was never interesting, only humorous. Where did you find interest?

> You are disrespectful.

Ad hominem is a logical fallacy.

> Your points are subtly moving targets.

There is no apparent shift from my original comment as far as I can see. It is possible that you have misunderstood something, I suppose. I'm happy to keep trying to aid in your understanding.

> You are sharing irrelevant advice to someone who doesn't need them.

HN purportedly has 5 million monthly users. What makes you the expert on what they do and don't need? Get real.

> You are misrepresenting what I wrote.

It is possible, even likely, that I misunderstood what you wrote. But usually when you recognize that someone misunderstood you try to work with them in good faith to find an understanding, not run away crying that your precious words weren't written well enough to be understood, so I'm not sure you have really thought this through.

> I believe you are trolling me.

No you don't. The minute you legitimately thought I was a troll, you would have immediately cut off contact. Instead, you wrote a lengthy reply to give me your heartfelt goodbye. You can say this, but actions tell the true tale. Why make shit up?

> If Nature can't convince you

Said article in Nature effectively says the same things I have. What would it need to convince me of? It is on the very same page.

I'm not sure why you keep thinking you aren't (even though you clearly are). Perhaps you've confused HN with Reddit and are trying to "win" some stupid "argument" nobody cares about — most especially me? That would explain why you keep repeating my comments in what appears to be some kind of "combative" way.

> I'm done.

Got it. This pretty much proves that is exactly what you were trying to do. What motivates this?


> It's easy to call it shameful when it isn't you who has to do the work.

What work? The work of writing the paper?

It seems to me that it’s better that research group A will spend 10 days to write the paper about their dead-end experiment, than 20 other research groups will do the same series of experiments, wasting a lot of time, money, and energy just because there was no paper that said “it doesn’t work”. Perhaps it would be better if those 20 research groups would instead try 20 different ways to fix the faulty method in hopes to get somewhere, than dosing the same thing.

I do not understand how it’s even a question for debate.


Even if Career Technical Education (CTE) classes are offered, there is a large variation in their quality. For me, the question would be whether a graduate from a CTE program is more likely to be hired and receives higher wages (initially) than a non-CTE program completer. My 2-minute Google Scholar search hasn't found anything on the topic.

At the end of the day, a 3-course sequence in a CTE pathway (which is the CA requirements for a high school CTE certificate in California) doesn't prepare you for a career in the same way as being in journalism class prepares you to be a journalist or being in theater prepares you to be an actor. Students will most likely need to pursue some form of post-secondary training (either through a community college or on-the-job) to become somewhat competent in their field.


The most likely explanation for this phenomenon is that there isn't a change in the population average for variable X, but that the decrease in college students' average X is due to an increase in population college going rates.

Looking at the statistics[1], the US went from a 23.2% college completion rate in 1990 to 39.2% completion rate in 2022, or a 67% increase in college degree completions. If you assume that X in the population is constant over time, mechanically you will need to enroll and graduate students from lower percentiles of X in order to increase the overall college completion rate in the whole population.

This process might be particularly acute at "lower tier" institutions that cannot compete with "top tier" institutions for top students.

[1]: https://nces.ed.gov/programs/digest/d20/tables/dt20_104.20.a...


I don't think the increase is big enough. A 67% increase means the "new" students are 41% of the population. But these reports are coming from all over the place and describing the majority of their class.

You can also see it in the whole pipeline. Everything he described is true (age adjusted) for K-12 as well.


This particular professor has been teaching for 30 years. I'm not sure I find your explanation all that convincing in light of that, especially since this isn't an isolated opinion.

I'm much more interested in how much the average student has had a phone to distract them during their lifetime. For the incoming 2025 class of 18 year olds, the iPhone came out the year they were born. So potentially 100%. I expect that plus the availability of LLMs is a deadly combo on an engaged student body.


Based on the intro of the article, the university where this professor works is likely below median. Each year the typical student at his/her university is worse because the best students go to better schools


That most likely explains the slow creep of grade inflation, remedial courses, etc. which has been going on for decades. This article touches on that but mostly describes an entirely different phenomenon.


Apache Iceberg builds an additional layer on top of Parquet files that let's you do ACID transactions, rollbacks, and schema evolution.

A Parquet file is a static file that has the whole data associated with a table. You can't insert, update, delete, etc. It's just it. It works ok if you have small tables, but it becomes unwieldy if you need to do whole-table replacements each time your data changes.

Apache Iceberg fixes this problem by adding a metadata layer on top of smaller Parquet files (at a 300,000 ft overview).


I knot you’re not OP, but and while this explanation is good, it doesn’t make sense to frame all this as a “problem” for parquet. It’s just a file format, it isn’t intended to have this sort of scope.


The problem is that the "parquet is beautiful" is extended all the time to pointless things - pq doesn't support appending updates so let's merge thousands of files together to simulate a real table - totally good and fine.


Well… when Parquet came out, it was the first necessary evolutionary step required to solve the lack of the metadata problem in CSV extracts.

So, it is CSV++ so to speak, or CSV + metadata + compact data storage in a singular file, but not a database table gone astray to wander the world on its own as a file.


> Apache Iceberg builds an additional layer on top of Parquet files that let's you do ACID transactions, rollbacks, and schema evolution.

Delta format also supports this, correct?


Correct. They have feature parity, basically.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: