Powdered sugar is 50/50 fructose and glucose, a close to perfect ratio for absorption. A dash of Gatorade powder for taste and you got yourself a drink that's inexpensive and gets you all the carbs you need.
Tour de France teams don’t care about the cost savings from cheaper sugar, they do care about the time and weight effects of drinking more water to absorb the glucose.
Since we at MotherDuck are running a serverless cloud Data Warehouse on top of DuckDB, we've observed the massive leaps in performance over time ourselves. Less exciting but perhaps more important are the improvements in stability and semantics recent DuckDB versions brought out.
Shameless plug - if you love DuckDB but don't want to manage it yourself, or if you need a supported 3rd party ecosystem, sharing/IAM, a fully-fledged beautiful web UI - all the things beyond DuckDB that are important to adapt DuckDB as a solution - I encourage you to look at MotherDuck. You can sign up in seconds (about 17 seconds, I timed it), you get a 30 day Free Trial without having to put in a credit card. We also offer a perpetual Free Tier and a Startup Program for qualifying startups.
Co-founder and head of produck at MotherDuck here - would love to chat. We're running DuckDB in a serverless fashion, so you're only paying for what you consume.
Shameless plug - MotherDuck[0] is a serverless managed DuckDB data warehouse with some interesting properties:
- Managed storage with zero-copy clone (and upcoming time travel)
- Secure Sharing
- Hybrid Mode that allows folks to combine client (WASM, UI, CLI, etc) with cloud data
- Approved and supported ecosystem of third party vendors
- Since we're running DuckDB in production, we're working very closely with the DuckDB team to improve both our service and open source DDB in terms of reliability, semantics, and capabilities
I agree. I think full text-to-results (even bypassing text-to-SQL) is akin to L5 self-driving cars. It's easy to get to some reasonable level, say 90%. But to get to the point, where folks can fully trust the system and don't need a steering wheel (or know SQL) may take decades.
We at MotherDuck took an incremental approach. We launched something more akin to lane-assist. We're calling it FixIt - an in-flow visual aid to help you identify and fix errors in your SQL [0].
I think there's gobs of opportunities to improve the analytics experiences without resorting to "L5 self-driving" (e.g. full text-to-results)
Do y'all have a reference architecture or templates for local development with CICD pipelines?
My last data engineering team struggled to get something like that working with BigQuery, so I'm super excited about the possibility of better data warehouse developer tooling.
1. First of all, thanks for outlining how you trained the model here in the repo: https://github.com/NumbersStationAI/DuckDB-NSQL?tab=readme-o...! I did not know about `sqlglot`, that's a pretty cool lib. Which part of the project was the most challenging or time-consuming: generating the training data, the actual training, or testing? How did you iterate, improve, and test the model?
2. How would you suggest using this model effectively if we have custom data in our DBs? For example, we might have a column called `purpose` that's a custom defined enum (i.e. not a very well-known concept outside of our business). Currently, we've fed it in as context by defining all the possible values it can have. Do you have any other recs on how to tune our prompts so that this model is just as effective with our own custom data?
3. Similar to above, do you know you can use the same model to work effectively on tens or even hundreds of tables? I've used multiple question-SQL example pairs as context, but I've found that I need 15-20 for it to be effective for even one table, let alone tens of tables.
Hi, Till here, worked on the DuckDB-NSQL model on MotherDuck side.
1. definitely training data (for me), we explored about 10 different directions before settling on the current approach. It's easy to underestimate the effect of training data on the quality of the model. Starting point was the benchmark dataset though, which we assembled manually (to avoid data pollution and also because there was simply no text2sql benchmark that covers anything else than plain old SQL select statements with a handful of aggregate functions). And training is also not a one-off thing. With large datasets it is hard to evaluate the quality of the dataset without actually training a few epochs on it and run the benchmark.
3. No way - I see a common stack emerging (take a look at companies like https://vanna.ai/, https://www.dataherald.com/, or https://www.waii.ai) that is mainly centered around foundation models like GPT-4 with strong in-context learning capabilities (that's a kind of a must to make these approaches work and comes with long inference times and higher costs). These solutions include things like embedding-based schema filtering, options for users to enrich metadata about tables and columns, including previous related queries into the context etc. around the model. I'd say it's a bit of a different problem from what we aimed at solving.
I didn't see this in the blog post, but did you train this from scratch or finetune an existing base model?
If from scratch, quite impressive that the model is capable of understanding natural language prompts (English presumably) from such a small, targeted training set.
Love these! We do want to deliver more features like FixIt! [0]
What's really exciting is what you can do with DuckDB, MotherDuck, and WASM. A powerful in-browser storage and execution engine tethered to a central serverless data warehouse using hybrid mode [1] opens the doors for unprecedented experiences. Imagine the possibilities if you have metadata, data, query logic, or even LLMs in the client 0ms away from the user and on user's own hardware.
So we're doing this in our UI of course, but we also released a WASM SDK so that developers can take advantage of this new architecture in their own apps! [2]
Perseverance of global rule of law. If Ukraine goes, we go back to 19th century. Any two-bit dictator wannabe will want to invade their neighbor for some sweet sweet territory.
Global economy relies on this stability, and US benefits the most from the stability of the global economy.
We also didn't "spend 75 billion dollars". Very little actual cash went to Ukraine from the US. We gave them old stockpiles, which we were going to decommission anyway, and paid maintenance for. We also paid US companies to increase production (of ammunition for example), and given that the Russia conflict is increasing demands for ammunition, increasing supply is not a bad thing.
This is such a lazy talking point permeating conservative spheres. And it quite likely coming straight from Russia psy-ops.
Look at a breakdown of funds to date. Even if you believe the military hardware is actual junk, it is less than 1/3 of the total. >40% is straight up cash, and %25 is services.
What is the source of your talking points contrary and why do you trust them when they are wrong about simple facts like these?