Hacker Newsnew | past | comments | ask | show | jobs | submit | hydragit's commentslogin


Web pages use a data source behind the scenes, but web pages are NOT a data source. A rendered Qt app is not a data source either. weboob builds plain-old objects (or JSON), it is a data source. The Qt applications are just an example front-end using weboob data source as a data source. Furthermore, this data source is made for aggregation/standardization, as weboob returns results in the same format whatever the site you choose, so it's still better than if each site proposed an API specific to the site itself.


You probably don't use a lot of things, right? Did you do a background-check on the thousands of developers that contributed to all the software you use? Did you check the smartphone you use wasn't built in modern slavery facilities? Did you check in what conditions all the raw material needed for it was extracted, what countries and their political situations? Did you check the ethics for all the chain of the food you eat?


The fact that I'm a bad person doesn't change the same argument even you are making - it's a murky world out there.


Given the long list of contributors, maybe not all of them adhere to the same standards.


> Do you think women would be comfortable contributing to a project called "boobsize", which is part of a larger project called "weboob"? Not to mention "QHandJoob".

Looking at the authors, several women did contribute to the project.


You seem to be pointing at a specific revision and the file doesn't seem to exist anymore in the current version of the repository. I confess I didn't check more than this though.


Does that really make it any better? It's in the commit history and now it's there for all of us to see. Why on earth anybody would write something like that is beyond me, but doing it in public is mind-bogglingly fucked up and stupid.


It's also legal, at least in some countries


[flagged]


This isn't a dump, this is a forum specifically meant for sharing code publicly.


> Python 3, AFAIK, doesn't have anything as handy as Ruby/Perl's Mechanize. But using the web developer tools you can usually figure out the requests made by the browser and then use the Session object in the Requests library to deal with stateful requests

You could also use the WebOOB (http://weboob.org) framework. It's built on requests+lxml and it provides a Browser class usable like mechanize's one (ability to access doc, select HTML forms, etc.).

It also has nice companion features like associating url patterns to some custom Page classes where you can write what data to retrieve when a page with this url pattern is browsed.


WebOOB [0] is a good Python framework for scraping websites. It's mostly used to aggregate data from multiple websites by organizing each site backend implement an abstract interface (for example the CapBank abstract interface for parsing banking sites) but it can be used without that part.

On the pure scraping side, it has a "declarative parsing" to avoid painful plain-old procedural code [1]. You can parse pages by simply specifying a bunch of XPaths and indicating a few filters from the library to apply on those XPath elements, for example CleanText to remove whitespace nonsense, Lower (to lower-case), Regexp, CleanDecimal (to parse as number) and a lot more. URL patterns can be associated to a Page class of such declarative parsing. If declarative becomes too verbose, it can always be replaced locally by writing a plain-old Python method.

A set of applications are provided to visualize extracted data, and other niceties are provided for debug easing. Simply put: « Wonderful, Efficient, Beautiful, Outshining, Omnipotent, Brilliant: meet WebOOB ».

[0] http://weboob.org/

[1] http://dev.weboob.org/guides/module.html#parsing-of-pages


Actually, even if you're paying, the service operator may be mining your data. This is the case for example with cell phone operators: you pay a monthly fee, and yet they use your geolocation to send you custom advertising, sell market studies, etc.


Degooglisons-internet [1] is an initiative (broader than just email) to promote free services without selling your data. It aims at giving easily deployable apps so more hosters can appear.

[1] https://degooglisons-internet.org/


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: