Hacker Newsnew | past | comments | ask | show | jobs | submit | ScottWRobinson's commentslogin

IIRC there are ways to give an authentic-looking fingerprint to browsers used for scraping. But as you said, it's a cat and mouse game


Can't tell if someone is unchecking/checking my boxes or if it's lag and my own previous (mistake) checks coming back from the server lol


Perhaps the server unchecks them automatically after a few minutes!


> However, as a father, I also don't want broken duplo pieces, so I wanted to make sure the track is not too much under tension.

The asker severely underestimates the amount of force it takes to break a Duplo piece.


I can confirm that even a 1-by-1 Lego brick can withstand the full weight of an adult human male at 2 in the morning.

...my foot on the other-hand...


Your logic is off, smaller pieces are generally harder to break than larger ones


Yup, the 2x2 can hold 950lbs: https://news.ycombinator.com/item?id=4870283

We can also observe this (to a lesser degree) when they build two story Lego statues like at the Mall of America.

I'll admit I've never seen a huge Duplo statue, but I assume the load limits are similar.


Does lego piece strength vary throughout the day?


If we're getting technical, the weight of a human does vary throughout the day. Generally, while asleep, your mass decreases. You're always gradually losing mass as you inhale O2 and exhale CO2. You're also losing mass as you exhale moisture, and you may also sweat.

Thus an adult human male (who sleeps, say, 10pm to 6am) is less likely to break a lego brick at 2am than at midnight and more likely than at 4am.


When I weigh myself, I make sure to do it in the morning. Too depressing otherwise.


"We can say there is at least one cow in Scotland, of which at least one side appears to be brown."

https://stepinmath.wordpress.com/2016/08/27/logic-with-the-c...


Strength I don't know. Pointiness does for sure.


Perhaps. Plastic structural rigidity varies with twmperature. Temperatures fluctuate throughout the day. This natural variation is probably insignificant in most cases though.


It was a joke about stepping on one of my kids' legos in the middle of the night while half asleep.


I managed to bend a Duplo track as a schild, the puzzle piece connecting them specifically.

A quick but incomplete algo is to ensure an even number of curves and straights. With them even, a bent track needs to be very bent so as to be immediately obvious.


In the picture in the story, the light gray pieces seem like Duplo ones and dark is the "duple compatible" from amazon.


Not really, I have similar or actually probably same sets (and same 'topics' to think about with various bridges and tunnels, track splits etc). I also have these straight or curved stuff in light and darker gray. Cheap non-original stuff is easy to spot - it simply doesn't fit nor hold as well. It doesn't matter whether its bricks or different stuff.

Due to economy of scales, Lego can manufacture those at consistently high quality and relatively reasonable prices. Competition aiming for same quality would be at least similarly priced. Also, its incredibly sturdy. So far I haven't seen a single one crack or break in past 2 years. My kids are not psychos but they for sure have no idea yet about treating their toys with care.


I made a "bot" server for myself, which is really just a server and app framework to host a bunch of scripts. The framework handles:

- Running bots periodically - Receives webhooks - Handles OAuth - Provides a shared DB - Posts updates to and receives commands from Slack

It's not very innovative, but super helpful. I love that I can deploy a new script so easily and already have all the tools I need so I can just focus in the logic. A few bots I have running:

- I run a site with thousands of articles, so one bot checks 10-15 articles per day for spelling mistakes, broken links, broken images, poor formatting, etc. Tasks to fix these are then posted to Notion. - Monitor Hacker News and Reddit for mentions of the sites/apps that I run so I can respond. - Sync calendars between apps without having to make them public - Gather financials and reports from various sources for bookkeeping - Monitor all of the servers we run and sync their status to Notion

Probably at least half of the automations could work on something like Zapier, but this is more fun and I get a lot more control over them.


We're running a deal for 38% off ($99 vs $159) for a yearly membership to StackAbuse.com. We have a number of data visualization courses, guided projects, and a pretty extensive course on Deep Learning for Computer Vision.

https://stackabuse.com/


For years now I've ran a programming site (stackabuse.com) and have closely followed the state of Google SERPs when it comes to programming content. A few thoughts/ramblings:

- The search results for programming content has been very volatile the last year or so. Google has released a lot of core algorithm updates in the last year, which has caused a lot of high-quality sites to either lose traffic or stagnate.

- These low-quality code snippet sites have always been around, but their traffic has exploded this year after the algorithm changes. Just look at traffic estimates for one of the worst offenders - they get an estimated 18M views each month now, which has grown almost 10x in 12 months. Compare that to SO, which has stayed flat or even dropped in the same time-frame

- The new algorithm updates seem to actually hurt a lot of high-quality sites as it seemingly favors code snippets, exact-match phrases, and lots of internal linking. Great sites with well-written long-form content, like RealPython.com, don't get as much attention as they deserve, IMO. We try to publish useful content, but consistently have our traffic slashed by Google's updates, which end up favoring copy-pasted code from SO, GitHub, and even our own articles.

- The programming content "industry" is highly fragmented (outside of SO) and difficult to monetize, which is why so many sites are covered in ads. Because of this, it's a land grab for traffic and increasing RPMs with more ads, hence these low-quality snippet sites. Admittedly, we monetize with ads but are actively trying to move away from it with paid content. It's a difficult task as it's hard to convince programmers to pay for anything, so the barrier to entry is high unless you monetize with ads.

- I'll admit that this is likely a difficult problem because of how programmer's use Google. My guess is that because we often search for obscure errors/problems/code, their algorithm favors exact-match phrases to better find the solution. They might then give higher priority to pages that seem like they're dedicated to whatever you searched for (i.e. the low-quality snippet sites) over a GitHub repo that contains that snippet _and_ a bunch of other unrelated code.

Just my two cents. Interested to hear your thoughts :)


I wonder if we'll see a comeback of hand-curated directories of content? I feel like the "awesome list" trend is maybe the start of something there.

I would be willing to pay an annual fee to have access to well-curated search results with all the clickbait, blogspam, etc. filtered out.

Until then, I recommend uBlacklist[0], which allows you to hide sites by domain in the search results page for common search engines.

0 - https://github.com/iorate/uBlacklist


> hide sites by domain

This gives me the idea to build a search engine that only contains content from domains that have been vouched for. Basically, you'd have an upvote/downvote system for the domains, perhaps with some walls to make sure only trusted users can upvote/downvote. It seems like in practice, many people do this anyway. This could be the best of both worlds between directories and search engines.


I don't think this would change a lot, you would probably raise big sites (Pinterest, Facebook) a lot higher in the rankings as the 99% non-programmers would vouch for them.

You could counter that somewhat by having a "people who liked X also like Y" mechanism, but that quickly brings you back to search bubbles.

In that sense Google probably should/could do a better job by profiling you and if you never click through to a page lower it in the rankings. Same with preferences, if I am mainly using a specific programming language and search for "how to do X" they could only give me results on that language.

In the end that will probably make my search results worse, as I am not only using one language ... and sometimes I actually click on Pinterest :-(


You don't need an upvote/downvote. If someone searches for X and clicks on results you just record when they stop trying sites or new search terms as you can assume the query has been answered. Reward that site. Most of them are already doing this in some form.


This is what Google already does, does it not? Why else would they be recording outbound clicks?

Unfortunately, this doesn't entirely solve the problem. Counting clicks doesn't work because you don't know if the result was truly worthwhile or if the user was just duped by clickbait.

As you say, clicking when they stop trying sites is better, but I don't know how good that signal to noise ratio is. A low-quality site might answer my question, sort of, perhaps in the minimal way that gets me to stop searching. But perhaps it wouldn't answer it nearly as well as a high-quality site that would really illuminate and explain things and lead to more useful information for me to explore. Both those scenarios would look identical to the click-tracking code of the search engine.


If I click on link 1 then click on link 2 several minutes later, 1 probably sucked. The difficulty is if I click on 1 and then 2 quickly, it just means I’m opening a bunch of tabs proactively.


Often you don’t know if a site is legit or not without first visiting it.

And new clone sites launch all the time, so I’m always clicking on search results to new clone sites that I’ve never seen before so can’t avoid them in results.


Yeah, when I get caught by these SEO spam sites it's because they haven't had a similar ranking to the SO thread that ripped off, so it wasn't immediately apparent.


> This gives me the idea to build a search engine that only contains content from domains that have been vouched for.

Just giving us personal blocklists would help a lot.

Then if search engines realize most people block certain websites they could also let it affect ranking.


You can access one without paying a dime.

http://teclis.com

Problem is people usually want one general search engine, not a collection of niche ones.


It’s not a directory. Hand-crafted descriptions instead of random citations and/or marketing from the site itself is what makes it a directory. This one is a search engine. Maybe it’s a good one for its purpose, but who knows, without an ability to navigate you can’t tell.

Problem is people usually want one general search engine, not a collection of niche ones.

In my opinion, the reason they want a general search engine is that they think in their box (search -> general search). What they really want is a way to discover things and quick summaries about them: “$section: $what $what_it_does $see_also”. Search engines abuse this necessity and suggest deceitfully summarized ads instead.


The trouble is, how do you prevent Sybil attacks? The spammers might vote for their own sites

https://en.wikipedia.org/wiki/Sybil_attack


would be better if my votes only affect my search result


There's an interesting dilemma here - if the algorithm were to favor "pages with lots of content alongside the exact-match phrase you're looking for" then it would incentivize content farms that combine lots of StackOverflow answers together. And if you favor the opposite, where there's less content alongside the exact-match phrase, you incentivize content farms that strip away content from StackOverflow before displaying it. Ideally, of course, you'd rely on site-level reputation - is Google just having trouble recognizing that these are using stolen content?


> is Google just having trouble recognizing that these are using stolen content?

It's very possible. In general Google will penalize you for duplicate content, but that might not apply to code snippets since code is often re-used within projects or between projects.

The code snippet sites also typically have 2 or more snippets on the same page. When combined, it might then look unique to Google since their algorithm probably can't understand code as well as it understands natural text. Just a guess


Also, while I imagine Google probably has (or at least easily could have) code-specific heuristics at play, it seems like it may be harder to reliably apply duplicate content penalties to source code listings, especially short code snippets.

Between the constrained syntax, literal keywords, standard APIs and common coding conventions it seems like even independently-authored source code listings will be superficially similar. Certainly the basic/naive markov-chain-style logic that works really well for detecting even small examples of plagiarism in natural language content isn't going to as effective on `public static void main(String[] args)` or whatever.

Obviously there are strategies that could distinguish superficial/boilerplate stuff from truly duplicated code, but between the volume of actual (and often legitimate) duplication on the internet and the (maybe?) low ROI for duplicate content penalties for code relative to other value metrics/signals, maybe this just isn't terribly important as a quality factor?


Wikipedia is a good example. Google considers it a high value website (less so than before but still) while it only has a single article about each topic. Other projects have entire websites dedicated to the single topic, products, experts on staff, live support. I presented as an example to some google engineers the wp page on suicide that explains popular ways of doing it vs dedicated prevention projects. Today (for me) it ranks the topic: 1-3) news articles, 4) world health organization statistics, 5) wikipedia, 6) Suicide prevention.

ABC news, the NYT, WP and the WHO are considered high profile but the topic is not their area of expertise. Non of them would consider themselves the go-to place for people searching for it.


as it seemingly favors code snippets, exact-match phrases

If only... it seems like the search results have gotten far worse for exact-matching. I regularly search for part numbers and markings, somewhere where exact matches are pretty much the only thing I want, and I can clearly see the decline over the years as it starts including more and more results that don't even have the string I'm looking for.


Funny, it was the "lots of internal linking" bit that felt wrong to me. Not that these low-quality sites don't do that, but I'm surprised to hear that the new algorithm rewards internal links. I'm certainly not a full-time SEO guy, but I happen to have or work with a few sites - some fairly established - that make extensive use of internal links for design/editorial reasons. As far as I can tell they are helpful for (a) user navigation and (b) getting search engines to discover/index the pages but I don't think I've seen any notable advantage in or even impact on ranking based on those internal links (whether in the body copy or in head/foot/side "navigation" areas).

Searching just now I do see some other sources making a similar claim, so maybe I'm just out of the loop. But in my cursory scan I haven't found much detail or real evidence beyond "google says they matter" either. I mean, that's not why those internal links were created in the first place, but it sure would be nice to get a ranking boost out of them too. I wonder what I'm doing wrong :)


...which even applies if you enclose the specific term with quotes. These used to help against unrelated results, but not so much anymore. I don't know why. Same thing with DDG and Bing.


> I'll admit that this is likely a difficult problem because of how programmer's use Google

It's beyond simple for Google to fix. Just drop those sites from the search index. But Google won't do that because it's in their interests to send you to those shit holes because they're littered with Google ads.


Recently I've just punted and begun searching SO and Github directly.

One thing Google has gotten really good at lately is knowing that when I search for "John Smith" I mean John Smith the baseball player not the video game character or vice-versa.


I've always just searched 'John Smith baseball' and that works well in DDG too.


you can also add 'site:github.com' or 'site:stackoverflow.com' to your search


It's not as good, especially for Github.

GH also breaks down the results into types which is very helpful when you only want code or are looking for documentation or discussion.


>it's hard to convince programmers to pay for anything

Offtopic but I'm curious why this is the case? Is the Free Software movement responsible for this mindset?


It's been a hard habit for me to break, but when you know that you could do something and theoretically do it better, it can bias you against paying for something. None of that is meant to sound arrogant - every company could do a better job given more time and resources, just as I could. But my time isn't infinite and I've found that paying for solutions in my life is a good thing. Sure, I could run my own email, but I'll just pay for it. Sure, I could create an app that does exactly what I want, but this one is close enough and it's $10.

With knowledge, the problem can be worse. You can't even evaluate if it's any good until it's been given to you. At that point, you don't need to pay for it because you have it. The number of paid knowledge things that I see that don't really have good information and avoid all the real problems that they promised they were going to solve for you can be high.

I think sites can build trust with users, but it can mean creating a lot of free content in the hopes of converting some people to paid users. Of course, if that model is too successful, then there will be an army of people dumping free content hoping to do the same, but then your paid content is competing with the free content available from many trying to do the same business model. If Bob creates a business doing this, do Pam, Sam, Bill, James, and Jess try to do that a few months later which then means that the amount of free content is now 5x what it was and there's no need to pay for it because it'll be a freebie on one of those sites?


I train programmers, and strongly recommend they buy books or do other types of money/time investment to make themselves better (and more highly-paid programmers).

They won't do it.

I've had multiple programmers literally shocked and avoid my outstretched book. Once I got a question, said "I just read the answer to this in this book right here"... and the programmer refused to read the book to answer his question.

I don't get it.

This, coupled with companies lack of investment in their expensive engineers, is mystifying.

None of the above has anything to do with the FSF.


Maybe books are too low density. Like the “quantity of information” per amount of words is lower than a blog post for example.

I dunno, I’m not really a reading type but I do own programming-related books. It’s the only type of book I own. I learned a lot from books like The Clean Coder, The Pragmatic Programmer and some 90’s book about Design Patterns with c++ examples and I don’t even write c++


If anything, a good book (or video course) will be a high-density, concentrated pill of everything you need to know about the subject to know what everything is and how it interacts. By comparison, reading blog posts and random YouTube videos is more akin to "grazing" - sure you can learn, but not as fast and you'll be missing context until the lightbulb goes off.


Good programming books are some of the highest information density of any writing you can find. College professors tend to prefer the lower density versions which might be biasing people.


Is that true? Or even your actual experience?

The recommendations I got from lecturers at university were thick academic textbooks. The 'good programming books' popular on HN etc. - without taking anything away from them - tend to be more waffley blog-turned-book type things.

Aside from classics from the former category that are also popular in the latter. Sipser, the dragon book, Spivak, etc.


This comes to personal preferences, I generally prefer reference books vs educational books. Find a good fit between what you’re working on and what the book is trying to do and they can be extremely helpful

Best example that comes to mind is the first edition of SQl in a nutshell. It was thin reference that covered SQL and the major differences between different SQL databases that I picked up back in 2002 ish. Quite useful if your working with several at the same time, but not something to teach you SQL. I haven’t looked at a recent edition because I am not in that situation any more, but I presume it’s still useful for that purpose.

Granted most books are significantly more verbose, but the physical limitations of books can promote extremely precise writing with minimal extraneous information.


It's my experience that this is often the case. I don't like high-theory books, so that means that things like the dragon book aren't likely to be in my sample.


Some of the highest, but nothing like quantum mechanics, where my head explodes at page 2.


Do you have some book recommendations for web development?


The content I want already exists. It’s provided for no charge on stackoverflow, reddit, Wikipedia, cppreference, and a handful of other high quality sources. All of which do not charge users a fee, most of which obtain the content from users in the first place.

So as far as I see it, the problem is not that the content is uneconomical to produce. The problem is that searching doesn’t discover the content I want. It brings up geeksforgeeks or worse.


Precisely. For me, the value in paying for a course or a book is not that they produce this knowledge, but that they collate it, filter through subject matter experience they already have to remove misleading additions, and add in any missing parts that may take a long time to grasp without a "Eureka!"-inducing explanation.


The service worth paying for is turning the collection of facts into usable information.


Exactly. Filtering information from misinformation is also a good selling point.


I think a lot of it has to do with the sheer friction of paying. In a corporate context it's going to be a days long battle, at best to get a new piece of software approved- just from an expense pov- we aren't even talking the IT and infosec teams. If it's technical content on a website, sure maybe I can pay out of pocket, but it's actually a lot less friction to just move onto the next site than to open up my wallet, get charged and maybe get what I am looking for that justifies the price.


Most developers will rather spent 2 days to build something than pay $60 for something better.


They're not wrong. When I build something, it's mine and I control it. I get to learn about all sorts of interesting stuff, making me a better programmer in the process. If I publish it on GitHub, it becomes résumé material, helps other people and might even start a community if enough people start contributing. I get to contribute to the state of the art in free computing, advancing it not just for myself but also for everyone else.

If I pay for something else, I don't get source code, I get borderline malware that spies on me constantly, I'm subjected to idiotic non-negotiable terms and conditions and I'm at the mercy of some gigantic corporation that can kill the product at any time.

We don't pay for "something better" because it's not actually better at all. We're not laymen, we actually understand the fact it's a shitty deal. We're not some mindless consumer companies can just dump these products on. We have the power to create. The more we use this power, the better off we are.


I suspect the reason is the fact that not all developers live in the bay area, and $60 is a good money for them, and could worth more than 2 days.

Also if you code for fun anyways, you might as well build what you need, and get chance to use that shiny new technology while you do it. You save money, have fun, improve your resume, share projects with your friends and communities for kudos, all at the same time.


If I can do it in a way that's better suited to my use case, learn something from it and/or entertain myself, that may not be as bad a tradeoff as it may otherwise seem.

Reinventing wheels out of plain curiosity has exposed me to a variety of problems and their solutions, and I believe it exercises my general problem solving skills.


A big part of programming is learning. You need to learn. It's not only about learning a programming language. Programming language is just a tool. What you need to learn is:

1. How to use that tool effectively

2. How to build better products with it.

You are never done learning those. And the best way to learn is to (at least try) to build it yourself. Therefore I think it makes sense for programmers to try to build it.


Teenagers are likely to do just that. Their money is expensive, while their time is cheap. If they get to learn in the meantime, even better.


The Free Software Movement is for free as in free speech and not free as in free beer.

It's about freedom to access, modify and distribute the source code.


It's hard to convince anyone to pay for anything.


I've seen programming newsgroups, those things from the 90's, with what can best be described as MITM attacks having taken place when coders have been looking for solutions to problems and the solutions have not been correct. Most newsgroups were never secure so vulnerable to MITM from day 1 and what is being reported today is just the latest variation in that attack process.

I've also seen Bing & Google citing StackOverFlow and the replies in SO awarding or agreeing on a solution comes straight from this "text book" "The Gentleman's Guide To Forum Spies" https://cryptome.org/2012/07/gent-forum-spies.htm

Perhaps it would be useful to dig into a posters history on a site and then decide who you trust instead of just trusting a random on the internet?

How many people have download code from SO into VS and found it doesnt even do what its purported to do? I've seen plenty of that.

Resource Burning the population, in this case programmers, is a perfectly valid technique for a variety of reasons, but the main one being, you are stuck in front of computer and that means you cant get into mischief away from work. Religions have been using that technique for hundreds of years and colloquially its know as "The devil makes work for idle hands to do" or something to that effect.

Choose carefully what you want to believe and trust.


> I've seen programming newsgroups, those things from the 90's, with what can best be described as MITM attacks having taken place when coders have been looking for solutions to problems and the solutions have not been correct. Most newsgroups were never secure so vulnerable to MITM from day 1 and what is being reported today is just the latest variation in that attack process.

Well, that's also the side effect of taking Cunningham's Law to heart, which says "the best way to get the right answer on the Internet is not to ask a question, it's to post the wrong answer."


> It's a difficult task as it's hard to convince programmers to pay for anything

I wonder if there's an opportunity for a paid bundle of programming-related sites. I indeed will not pay for a single site (nor for a news site for that matter), but a $10-20/month subscription that covers quality programming websites could be interesting.


> The programming content "industry" is highly fragmented (outside of SO) and difficult to monetize

I think part of the problem is that the content producers are trying to cater to everyone: google algo (by artifically inflating word count, using specific keywords), beginner programmers, advanced programmers, potentially paid users, ephemeral users that arrive on the site via a referral or via googling. In the end you end up catering to no one.

As a side note, RealPython.com is going to go down even more if they're going to keep their "register to view" policy I've started to see recently.


I guess that explains why I've been seeing so many results that are nothing more than scraped content lately :-(


I don't know what is going on with Google but I thought their main idea is to give points to sites which other sites link to. Are these code-sites linking to each other perhaps?


They’re called PBNs or “private blog networks”. A bunch of low quality sites all backlinking to each other.

I don’t like the SEO industry very much.


I pay for Interview Cake.

If you could create a site with both interview prep and quality code tips I'd pay for it.


Who is the offender?


Please consider releasing any paywalled content after some deadline, e.g., after a year. It makes content practically inaccessible to a lot of people, including non-adults and people in developing countries.


Really curious how this compares to Cloudflare's analytics. I run a site for developers with decent traffic, and according to Cloudflare only about 25% of our readers block Google Analytics. I had always thought it would be a higher percentage, so this seems to make sense.

It's also odd that the clicks from Google Search Console line up very closely with what we see from Google Analytics. I had always thought this data would be more accurate since Google SERPs uses (used to use?) forwarding URLs to track this stuff.


Just came here to say that I bought one of Cliff's Klein bottles maybe 5 years back, and it's still my only decorative contribution to our house.

Cliff seems like a great guy, and I hope this gets resolved for him.


Maybe GPT-3 can just do our fact-checking for us...


All of my web apps limit email addresses to 128 characters. This domain alone is 67. If you exceed the 128 character limit then I'm not sure I want you to sign up anyway...


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: