// Select all orders for users registered in last year, and compute average earnings per user
SELECT ...
input_data = duckdb.sql("SELECT * FROM read_parquet('...')")
step_1 = duckdb.sql("SELECT ... FROM input_data JOIN ...")
step_2 = duckdb.sql("SELECT ... FROM step_1")
final = duckdb.sql("SELECT ... FROM step_2;")
(129304 rows affected)
Is mother duck editor features available on-prem? My understanding is that mother duck is a data warehouse sass.
-Customer software engineer at MotherDuck
Kinda like regex101, draw.io or excalidraw.
It doesn't currently - the UI assets are loaded at runtime - but we do have an offline mode planned. See https://github.com/duckdb/duckdb-ui/issues/62.
SELECT json_serialize_sql('SELECT 2');
[
{
"json_serialize_sql('SELECT 2')": {
"error": false,
"statements": [
{
"node": {
"type": "SELECT_NODE",
"modifiers": [],
"cte_map": {
"map": []
},
"select_list": [
{
"class": "CONSTANT",
"type": "VALUE_CONSTANT",
"alias": "",
"query_location": 7,
"value": {
"type": {
"id": "INTEGER",
"type_info": null
},
"is_null": false,
"value": 2
}
}
],
"from_table": {
"type": "EMPTY",
"alias": "",
"sample": null,
"query_location": 18446744073709551615
},
"where_clause": null,
"group_expressions": [],
"group_sets": [],
"aggregate_handling": "STANDARD_HANDLING",
"having": null,
"sample": null,
"qualify": null
},
"named_param_map": []
}
]
}
}
]
I've used sqlglot to do this in the past, but doing it natively would be nice
I'm assuming it's more of a user preference like commas in front of the field instead of after field?
SQL is a declarative language. The ordering of the statements was carefully thought through.
I will say it's harmless though, the clauses don't have any dependency in terms of meaning so it's fine to just allow them to be reordered in terms of the meaning of the query, but that's true of lots and lots of things in programming and just having a convention is usually better than allowing anything.
For example, you could totally allow this to be legal:
def
for x in whatever:
print(x)
print_whatever(whatever):
There's nothing ambiguous about it, but why? Like if you are used to seeing it one way it just makes it more confusing to read, and if you aren't used to seeing it the normal way you should at least somewhat master something before you try to improve it through cosmetic tweaks.I think you see this all the time, people try to impose their own comfort onto things for no actual improvement.
LINQ popularized it and others followed. It does what it says.
Btw: saying that "people try to impose their own comfort" is uncalled for.
If you are right about why they did it its even dumber than my reason, they are changing a language grammar to let them make a much worse solution to the same problem.
First, repeat data analyst queries are a usage driver in SQL DBs. Think iterating the code and executing again.
Another huge factor in the same vein is running dev pipelines with limited data to validate a change works when modelling complex data.
This is currently a FE feature, but underneath lies effective caching.
The underlying tech is driving down usage cost which is a big thing for data practitioners.
> Getting the AST is a big step forward, but we still need a way to take your cursor position in the editor and map it to a path through this AST. Otherwise, we can’t know which part of the query you're interested in previewing. So we built some simple tools that pair DuckDB’s parser with its tokenizer to enrich the parse tree, which we then use to pinpoint the start and end of all nodes, clauses, and select statements. This cursor-to-AST mapping enables us to show you a preview of exactly the SELECT statement you're working on, no matter where it appears in a complex query.
But that's exactly what they show in the blog post??
Example:
FROM orders |> WHERE order_date >= '2024-01-01' |> AGGREGATE SUM(order_amount) AS total_spent GROUP BY customer_id |> WHERE total_spent > 1000 |> INNER JOIN customers USING(customer_id) |> CALL ENRICH.APOLLO(EMAIL > customers.email) |> AGGREGATE COUNT(*) high_value_customer GROUP BY company.country
https://cloud.google.com/blog/products/data-analytics/simpli...
It's pretty neat:
FROM mydataset.Produce
|> WHERE sales > 0
|> AGGREGATE SUM(sales) AS total_sales, COUNT(\*) AS num_sales
GROUP BY item;
Edit: formattingFROM table SELECT foo, bar WHERE zoo=“goo”
This is one of the reasons I'm excited about DuckDB's upcoming [2]PEG parser. If they can pull it off, we could have alternative dialects that run on DuckDB.
[1] https://www.malloydata.dev/ [2] https://duckdb.org/2024/11/22/runtime-extensible-parsers.htm...
I added a similar "get results as you type" feature to the SQLite integration in the Logfile Navigator (lnav)[2]. When entering PRQL queries, the preview will show the results for the current and previous stages of the pipeline. When you move the cursor around, the previews update accordingly. I was waiting years for something like PRQL to implement this since doing it with regular SQL requires more knowledge of the syntax and I didn't want to go down that path.
[1] - https://prql-lang.org [2] - https://lnav.org/2024/03/29/prql-support.html
https://community-extensions.duckdb.org/extensions/prql.html
I spent the first two quarters of 2024 working on observability for a build-the-plane-as-you-fly-it style project. I can’t express how useful the cte preview would have been for debugging.
Why is it a killer feature? First of all, LLMs complete text from left to right. That alone is a killer feature.
But for us meatboxes with less compute power, pipe syntax allow (much better) code completion.
Pipe syntax is delightful to work with and makes going back to SQL a real bummer moment (please insert meme of Kate Perry kissing the earth here).
ryguyrg•8h ago
Awesome video of feature: https://youtu.be/aFDUlyeMBc8
Disclaimer: I’m a co-founder at MotherDuck.
rancar2•5h ago
Feature request: enable the tuning of when Instant SQL is run and displayed. The erroring out with flashing updates at nearly every keystoke while expanding on a query is distracting for me personally (my brain goes into troubleshooting vs thinking mode). I like the feature (so I will keep it on by default), but I’d like to have a few modes for it depending on my working context (specifically tuning of update frequency at separation characters [space, comma], end of statement [semicolon/newline], and injections [paste/autocomplete]).
hamilton•5h ago
theLiminator•3h ago
hamilton•2h ago
rastignack•1h ago
Good to know a totally offline tool is being considered.
Thanks for the great tool BTW.