Hacker Newsnew | past | comments | ask | show | jobs | submit | erekp's commentslogin

same. good luck finding us out there - we can replicate all the patterns you point out there. been in this industry for 10 years now :)


Just don't cause problems on purpose and almost nobody will care about blocking you. Don't be an asshole.


Thanks! Glad we're helping


Hopefully one day you get impacted by similar tech and your business fails. What goes around, come around and Im sure it will come around to you too. Since you dont have any details on your website about who you are or where you are located (which says a lot about the type of company and people you are) I look forward to the day your servers and data are leaked to something like Distributed Denial of Secrets (love what happened to TMSignal today)


I don't see why such animosity. Many legit companies (which is the only type of enterprise we deal with) have legit automation use-cases and require solutions to do that against walled gardens and information gatekeepers. We believe whatever a human can do with a computer, in public sites, should be automatable with a machine. There's no difference. A browser like that does not DDOs anyone.


If you have such good intentions, then proudly display who you are and where you are located on your website. Why are you hiding behind a veil of secrecy? Your social links on the website go no where. Speaks volumes about the 'legit' use cases you are targeting.


You're not helping. You're doing something that the host or owner of the website does not want you to do, circumventing their protections. This means they'll end up strengthening their protections, which only makes things worse for everybody.


You can probably implement that using our self-hosted solution. You'll have full access to the browser and can save sessions.


Didn't see a self-hosted version on the website? Sounds awesome.


Yeah, our solution is a self-hosted one. You control everything. Send us a message, sounds like we can help you build something!


We're looking to deal with companies that have serious automation and scraping challenges. There are other solutions out there that are less robust and more transparent on how they bypass protections. We chose to keep ours tight to protect our clients and provide the best enterprise support we can.


We have a similar solution at metalsecurity.io :) handling large-scale automation for enterprise use cases, bypassing antibots


That's super cool, thank you for sharing! It's based on playwright though right? Can you verify if the approach you are using is also subject to the bug in TFA?

My original point was not necessarily about bypassing anti-bot protections, and rather to offer a different branch of browser automation independent of incumbent solutions such as Puppeteer, Selenium and others, which we believe are not made for this purpose, and has many limitations as TFA mentions, requiring way too many workarounds as your solution illustrates.


we fix leaks and bugs of automation frameworks, so we don't have that problem. The approach of using the user's browser, like yours, is that you will burn the user's fingerprint depending on scale.


Thanks for sharing your experience! I'm quite opinionated on this topic so buckle up :D

We avoided the fork & patch route because it's both labor intensive for a limited return on investment, and a game of catching up. Updating the forked framework is challenging on its own right, let alone porting existing customer payloads to newer versions, locking you de-facto to older versions. I did maintain a custom fork at a previous workplace that was similar in scope to Browserless[0] and I can tell you it was a real pain.

Developing your own framework (besides satisfying the obvious NIH itch) allows you to precisely control your exposure (reduce the attack surface) from a security perspective, and protects your customers from upstream decisions such as deprecations or major changes that might not be aligned with your customer requirements. I also have enough experience in this space to know exactly what we need to implement and the capabilities we want to enable. No bloat (yet)

> you will burn the user's fingerprint depending on scale

It's relative to your activity. See my other comment about scale and use cases, for personal device usage this is not an issue in practice, and users can automate several websites[1] using their personal agents without worrying about this. For more involved scenarios we have appropriate strategies that avoid this issue.

> we fix leaks and bugs of automation frameworks

Sounds interesting! I'd love to read a write up, or PRs if you have contributed something upstream.

0: https://www.browserless.io/

1: https://herd.garden/trails


sounds good. As you can probably imagine, I also come from a lot of experience in the space :) But fair enough, everyone has their own opinion on what is more or less painful to implement and maintain and the associated pros and cons. We're tailored to very specific use cases that require scale and speed, so the route we took makes the most sense. I can't obviously share details of our implementation as it'd expose our evasions. And this is the exact problem of open source alternatives like camoufox and the now defunct puppeteer-stealth.


Do people still need social media APIs? My experience is most of the use cases require accounts and login


how do you exactly fallback to common crawl? isn't the cost to even hold and query common crawl insane?


With AWS Athena, you can query the contents of someone else’s public S3 bucket. You pay per read, but if you craft your query the right way then it’s very inexpensive. Each query I run only scans about 1MB of data.


Since I was just looking at this accidentally, here are some examples of how to query at a ~cent-per-query cost level (just examples but quite illustrative): https://commoncrawl.org/blog/index-to-warc-files-and-urls-in...


with your account? you never get blocked? that's impressive


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: