Hi all! Aymeric (m-ric) here, maintainer of smolagents and part of the team who ...

transpute · 2025-02-04T23:58:36 1738713516

> smolagents does code execution, which means "danger for your machine" if ran locally. We've railguardeed that a bit with our custom python interpreter, but it will never be 100% safe, so we're enabling remote execution with E2B and soon Docker.

Those remote interfaces may also work with local VMs for isolation.

Paul-Craft · 2025-02-05T15:35:32 1738769732

Yeah, that's what I was thinking: just throw the whole lot inside a Docker container and call it a day. Unless you're dealing with potentially malicious code that could break out of a container, that should isolate the rest of your machine sufficiently.

Alternatively, PyPy is actually fully sandboxable.

On Linux, you can also use `seccomp.` See, for instance, https://healeycodes.com/running-untrusted-python-code

davidsojevic · 2025-02-04T23:42:34 1738712554

Great work on this, Aymeric and team! In terms of improving browsing and/or data sources, do you think it might be worth integrating things like Google Scholar search capability to increase the depth of some of the research that can be done?

It's something I'd be happy to explore a bit if it's of interest.

aubanel · 2025-02-04T23:56:28 1738713388

That's a good idea! Could be a very nice tool to add to the lib!

swyx · 2025-02-05T00:53:36 1738716816

> for now we've used a text browser developed by the Microsofit autogen team, congrats to them

oh super cool! i've usually heard it the other way - people develop LLM-friendly web scrapers. i wrote one for myself, and for others there's firecrawl and expand.ai. a full "text browser" (i guess with rendering?) run locally seems like a better solution for local agents.

ComputerGuru · 2025-02-05T16:31:21 1738773081

It’s basically a cli controlling selenium/webdriver driving Chrome + a few functions.

jsemrau · 2025-02-05T01:09:55 1738717795

I think using vision models for browsing is the wrong approach. It is the same as using OCR for scanning PDFs. The underlying text is already in digital form. So it would make more sense to establish a standard similar to meta-tags that enable the agentic web.

webmaven · 2025-02-05T03:01:19 1738724479

If you're working from the markup rather than the appearance of the page, you're probably increasing the incentives for metacrap, "invisible text spam" and similar tactics.

taneq · 2025-02-05T06:21:12 1738736472

PDFs are more akin to SVG than to a Word document, and the text is often very far from “available”. OCR can be the only way to reconstruct the document as it appears on screen.

jejeyyy77 · 2025-02-05T21:42:31 1738791751

no, websites/pdfs were designed and laid out visually by humans for humans.

if you are just parsing the text you’ve lost a ton of information encoded in the layout/formatting.

that doesn’t even yet consider actual visual assets like graphs/images, etc