You Should Add Debug Views to Your DB

https://chrispenner.ca/posts/views-for-debugging

89•ezekg•5mo ago

Comments

vincekerrazzi•5mo ago

I agree. So I did. And over the span of a year I was the only one that used them. ¯\_(ツ)_/¯

QuantumSeed•5mo ago

I address that problem by scheduling a brown bag lunch and inviting devs from related teams to join in so I can present my cool new tools and techniques to everyone. Sometimes it's wasted effort, but sometimes it does result in wider usage

jerf•5mo ago

"Obviously querying over project_shorthand = '@unison/cloud' isn't going to be able to use an index, so isn't going to be the most performant query"

If you know you're going to be querying on username and project shorthand a lot you're just a handful more lines away from instantiating that as a database function which can take your shorthand apart with string functions and get back to a high-performance query. Functions can return rowsets which you can then further filter with WHERE clauses.

Although in that case I think I'd just take the first and second parameters separately anyhow, since

    debug_contribution("sophie", "@unison/cloud")

and

    debug_contributions("sophie", "unison", "cloud")

isn't a big enough difference to be worth writing string-parsing code. But do as you like.

ibejoeb•5mo ago

If you're using it regularly, you can make it a derived column and index it in most RDBSs. Then it'll work with predicate push-down and you can do other fancy things like store it on alternative partitions to optimize block reads when you're not using it.

The other obvious benefit is that it is no different in semantics than any other column, so there's no advance knowledge necessary for other users vs using a function.

ndriscoll•5mo ago

The next logical step is to realize that using these views to drive your actual application has great potential to both vastly simplify the logic and make it super high performance. You can even take this so far as to make a view that spits out XML (or I guess json): you basically get SSR for free with easy to inspect/compose queries.

Ozzie_osman•5mo ago

Or write the query as a function in code, that way, everyone can know it exists, call it easily, and change it along with any other changes to your logic or schema.

sgarland•5mo ago

So now you have to launch a special debug instance of your app that doesn't accept prod traffic, but can still connect to the prod DB, so you can shell into it and call the function?

You can and should store SQL queries in VCS. Everyone can still know that they're there.

physicles•5mo ago

Are there good tools that can dump a database schema to files that are human- and AI-readable?

We use migrations as the source of truth for the prod database schema, but they’re also a pain when you want to just see the latest version of a table or a stored procedure.

sgarland•5mo ago

mysqldump --nodata

pg_dump --schema-only

Read tool docs for other options, like restricting to a specific DB or table, but that’s the basic gist of it.

al3rez•5mo ago

I implemented something similar for a 3M/day ad tech platform. We created a few materialized views for my boss, myself, and the DevOps team to monitor, instead of querying over 100 tables. I used stored procedures in PostgreSQL, which made it fast, efficient, and non-blocking also not to mention we avoided setting up complex Grafana/UI in admin dashboard, boss: i want to know x, y ,z, okay -> tableplus, -> export csv boom!

morkalork•5mo ago

This plus some automated assertion checks on whatever invariants you can think of makes for a very cheap and scalable canary.

See an alert? Just query the view and you should have all the client/job/whatever IDs you need to trace the problem.

imACalorie12345•5mo ago

I do the same but it typically only stays my personal sphere or documented until you get to something like retool/admin ui as a common place.

noisy_boy•5mo ago

I was very confused about the term "Debug" view. We just called them views and used them normally. The main challenge was predicate pushdown which, though usually worked, was somewhat dependent on how smart the optimizer was. Also, things could get tricky when you start building views on top of views. Soon you are not certain why a simple looking query selecting from a simple looking view became slow until you see that it is views all the way down. As with most things, balance is the key.

ericHosick•5mo ago

To help understand why a complex query with many joins comes back empty, I like to provide helper debug functions that builds the query step by step. Each block checks one prerequisite and tells the user what’s missing:

```sql FUNCTION debug_user(user_id):

  IF NOT (SELECT FROM user WHERE user.user_id = user_id) THEN
    -- user doesn’t exist
  END IF;

  IF NOT (SELECT FROM user
           JOIN user_addr ON user_addr.user_id = user.user_id
           WHERE user.user_id = user_id) THEN
    -- user has no address
  END IF;

  -- keep adding joins until you find the break

```

because when you have a query that involves 6 relations, and you don't get results, it is nice to know why.

FrancoisBosun•5mo ago

I do the "same thing", but using PostgreSQL `EXPLAIN ANALYZE`. EXPLAIN ANALYZE has information about the number of rows returned, which means I know exactly which node in the query plan failed to return expected values.

ericHosick•5mo ago

don't mean to push the goalpost (didn't mention the following in the above post)

these functions can also be exposed in admin ui's making it easier for the general admin users/support users.

another very useful use case is when row level security (RLS) is enabled.

polishdude20•5mo ago

Not for function though unless you specify it as a debug flag inside the function?

iTokio•5mo ago

You can can also run a detailed explain plan and with a suitable visualizer it’s trivial to see where the query fell apart. You get to see the actual data distribution this way.

fifilura•5mo ago

If you build your queries with CTEs, it is easy to pull them out in a console and query each step separately to find the issue.

codeulike•5mo ago

now we're talking

AnishLaddha•5mo ago

i've seen some queries that look like they were written by an engineer with CTE

jameshart•5mo ago

Counterpoint: you should not be connecting to your production database, you should not be running non-critical queries on production database servers, and you probably shouldn’t have permission to see all this data about your users.

Obviously your mileage may vary, your scale is your own, your trade offs are your own trade offs.

But be aware that there comes an operational scale where this is not an acceptable way - operationally, legally, privacy-wise - to investigate customer issues, and you’ll need different tricks.

mrkeen•5mo ago

What are the tricks for investigating customers' data that don't violate privacy?

lillecarl•5mo ago

https://postgresql-anonymizer.readthedocs.io/en/stable/ can be helpful, though I don't know if it scales to the parents standards.

hdjrudni•5mo ago

IANAL but two ideas come to mind:

1) What I do for my small app is make a copy of the prod database and randomize nearly all the data. All the PII, phone numbers, email addresses, names, etc. All the relationships between the data are preserved so I can usually still repro whatever issue. I don't know if this would satisfy the lawyercats but I think it's a decent start.

2) If I had more time/money I'd build a specialized "Customer Support" app that gives limited access to customer data. Customer would have to provide consent before support worker could access their data, and this would be logged/audited. No one would have direct access to the prod DB.

mrkeen•5mo ago

1) We regularly get given bad data that requires us to debug. Financial transactions whose description and/or dates change, unique Ids are sometimes present and sometimes missing.

2) We have those - support app for the support team, and when they kick it up to us backend devs, we also have our own tools to try to debug. No idea if they're correct or not. I'd need to compare the output of the tool to the prod db to verify. Furthermore we can do all the spying/privacy violating we want with the debug tools. We just can't debug when things go wrong.

lillecarl•5mo ago

https://www.xkcd.com/1737

If you're OBVIOUSLY not the target audience you don't have to dismiss it because it doesn't fit your usecase. There's probably a thousand "apps" where this is just fine for every one "Sry we work with the government or are planet scale apps" you're talking about.

It's exhausting to read dismissive online dick-measuring comments, if you have the issues you're explaining you already know this doesn't apply to you. It's on the same level as "Bro I asked a question to an LLM and it gave an interesting answer and I'm unique because nobody but me can ask questions to LLMs like I can" style posts.

jameshart•5mo ago

I don’t think I was being dismissive, I was just pointing out the lack of universal applicability of this suggestion.

It is my experience that many people do not realize that it is possible not to have developers just connect to prod databases with admin privs.

Pointing out that there comes a point where this sort of approach isn’t the norm is part of how people who reach that level of scale learn that. https://xkcd.com/1053/

And that level of concern isn’t reserved for planet-scale - once you have a couple of million dollar contracts on your B2B SaaS platform you should be taking production data ops seriously enough that this sort of approach is unlikely to make sense.

And I shouldn’t need to say that user privacy ought to be a concern even for small operations.

lillecarl•5mo ago

> It is my experience that many people do not realize that it is possible not to have developers just connect to prod databases with admin privs.

Dismissive, everyone knows this but they probably can't be arsed/don't care

> Pointing out that there comes a point where this sort of approach isn’t the norm is part of how people who reach that level of scale learn that. https://xkcd.com/1053/

Not everyone has these ambitions

> And that level of concern isn’t reserved for planet-scale - once you have a couple of million dollar contracts on your B2B SaaS platform you should be taking production data ops seriously enough that this sort of approach is unlikely to make sense.

Sure, but you're talking about "seriousness" with the same dismissive "I'm better" tone here again, your usecase and the business you work for doesn't reflect what everyone else is doing

> And I shouldn’t need to say that user privacy ought to be a concern even for small operations.

Depends a lot on what PII you're collecting. But rather than stating "You shouldn't collect PII you don't need" since I don't know your usecase I'll say "I try to minimize the PII I collect so I don't have to deal with these issues yet".

kerblang•5mo ago

This is not unreasonable, and will be standard practice at many large companies. Judging from downvotes some of us are just too cool for that.

I connect to a production replica read-only. Many coworkers aren't even allowed that. Any DDL change has to go thru reviews & approvals etc. and is too much trouble, so I just keep a set of queries in git.

Also, any defined view in the db becomes a dependency that people are scared of breaking because who is using it and for what? This becomes especially true when random things connect to your db (bad but too late to do anything about it...) Now you can't change it without everyone yelling at you, and worse yet, a necessary data migration that affects such views means you have to fold them into your data migration project.

Working on giant corporate legacy systems is painful but it pays the bills...

cwillu•5mo ago

Counter-counter point: for most databases that “operational scale” will never come.

jameshart•5mo ago

Scale isn’t just measured in transactions per second - it’s also measured in dollars, and compliance risk, and legal exposure.

codeulike•5mo ago

I'm trying to work out why 'views can help you see whats going on in your database' has got to the front page of HN. Is it that we've had a whole generation grow up with ORMs and so only people over 50 know how to actually use a RDBMS?

sgarland•5mo ago

Not over 50, but yes. People are deathly afraid of using RDBMS features, like triggers, events, etc.

physicles•5mo ago

I’m finally waking up to how great views are for building abstractions within the database. They’re not as versatile as actual code, but still worth thinking about.

Triggers, however… as the years have gone by, I like triggers less and less. Triggers for complex validation are ok, but triggers that mutate data are a code smell. Our database contains some triggers that denormalize data to make some queries faster, and they’ve proven to be an endless source of problems.

codeulike•5mo ago

I use sql, views, sps and sql functions a lot but wouldnt touch database triggers with a bargepole I totally agree there

yogorenapan•5mo ago

This is one thing I liked about my uni course back in the day. One of the assignments was to build an entire application within SQL (using triggers, stored procedures, etc) and very minimal bindings to a skeleton written by the professor. It's impressive how much you can do with just Postgres or MySQL

The Rise of Spec Driven Development

The first good Raspberry Pi Laptop

Seas to Rise Around the World – But Not in Greenland

Will Future Generations Think We're Gross?

State Department will delete Xitter posts from before Trump returned to office

Show HN: Verifiable server roundtrip demo for a decision interruption system

Impl Rust – Avro IDL Tool in Rust via Antlr

Stories from 25 Years of Software Development

minikeyvalue

Neomacs: GPU-accelerated Emacs with inline video, WebKit, and terminal via wgpu

Show HN: Moli P2P – An ephemeral, serverless image gallery (Rust and WebRTC)

How I grow my X presence?

What's the cost of the most expensive Super Bowl ad slot?

What if you just did a startup instead?

Hacking up your own shell completion (2020)

Show HN: Gorse 0.5 – Open-source recommender system with visual workflow editor

GLM-OCR: Accurate × Fast × Comprehensive

Local Agent Bench: Test 11 small LLMs on tool-calling judgment, on CPU, no GPU

Show HN: AboutMyProject – A public log for developer proof-of-work

Expertise, AI and Work of Future [video]

So Long to Cheap Books You Could Fit in Your Pocket

PID Controller

SpaceX Rocket Generates 100GW of Power, or 20% of US Electricity

Kubernetes MCP Server

I Built a Movie Recommendation Agent to Solve Movie Nights with My Wife

What were the first animals? The fierce sponge–jelly battle that just won't end

Sidestepping Evaluation Awareness and Anticipating Misalignment

OldMapsOnline

What It's Like to Be a Worm

Don't go to physics grad school and other cautionary tales

The Rise of Spec Driven Development

The first good Raspberry Pi Laptop

Seas to Rise Around the World – But Not in Greenland

Will Future Generations Think We're Gross?

State Department will delete Xitter posts from before Trump returned to office

Show HN: Verifiable server roundtrip demo for a decision interruption system

Impl Rust – Avro IDL Tool in Rust via Antlr

Stories from 25 Years of Software Development

minikeyvalue

Neomacs: GPU-accelerated Emacs with inline video, WebKit, and terminal via wgpu

Show HN: Moli P2P – An ephemeral, serverless image gallery (Rust and WebRTC)

How I grow my X presence?

What's the cost of the most expensive Super Bowl ad slot?

What if you just did a startup instead?

Hacking up your own shell completion (2020)

Show HN: Gorse 0.5 – Open-source recommender system with visual workflow editor

GLM-OCR: Accurate × Fast × Comprehensive

Local Agent Bench: Test 11 small LLMs on tool-calling judgment, on CPU, no GPU

Show HN: AboutMyProject – A public log for developer proof-of-work

Expertise, AI and Work of Future [video]

So Long to Cheap Books You Could Fit in Your Pocket

PID Controller

SpaceX Rocket Generates 100GW of Power, or 20% of US Electricity

Kubernetes MCP Server

I Built a Movie Recommendation Agent to Solve Movie Nights with My Wife

What were the first animals? The fierce sponge–jelly battle that just won't end

Sidestepping Evaluation Awareness and Anticipating Misalignment

OldMapsOnline

What It's Like to Be a Worm

Don't go to physics grad school and other cautionary tales

You Should Add Debug Views to Your DB

Comments