frontpage.
newsnewestaskshowjobs

Open Source @Github

fp.

Fake Viral Guitarists Strike Again

https://www.youtube.com/watch?v=-Eik8uBvdxY
1•root-parent•39s ago•0 comments

Hackers Publish Knicks and Madison Square Garden Data Online

https://www.404media.co/hackers-publish-knicks-and-madison-square-garden-data-online/
2•Cider9986•52s ago•0 comments

Biff.fx: lightweight effects system for Clojure

https://biffweb.com/p/fx/
1•jacobobryant•1m ago•0 comments

Yes, we still need engineers

https://mattsayar.com/yes-we-still-need-engineers/
1•MattSayar•1m ago•0 comments

Ask HN: What do you think about blockchain's current trajectory

2•mobear•1m ago•0 comments

Google's Training Supercomputers from TPU v2 to Ironwood: Five Generations

https://arxiv.org/abs/2606.15870
1•matt_d•3m ago•0 comments

Can LLM Agents Infer World Models? Evidence from Agentic Automata Learning

https://arxiv.org/abs/2606.16576
1•Anon84•3m ago•0 comments

Apache Fory Serialization 1.2.0: JDK 25 support without sun.misc.Unsafe

https://github.com/apache/fory/releases/tag/v1.2.0
1•chaokunyang•3m ago•1 comments

Earth's underground fungal network would span 10% of the Milky Way

https://www.livescience.com/planet-earth/plants/earths-underground-fungal-network-is-so-massive-i...
1•gmays•4m ago•0 comments

Qwen-RobotWorld Technical Report

https://arxiv.org/abs/2606.17030
1•ilreb•5m ago•0 comments

JoyAI-VL-Interaction: Real-Time Vision-Language Interaction Intelligence

https://arxiv.org/abs/2606.14777
1•ilreb•5m ago•0 comments

Making ast.walk 220x Faster

https://reflex.dev/blog/why-ast-walk-when-you-can-ast-sprint/
4•palashawas•6m ago•0 comments

A Short Explanation of the Zettelkasten Method

https://isgin01.github.io/posts/explanation-of-zettelkasten/
1•pullshark91•6m ago•0 comments

Robinhood to cut 10% of workforce in restructuring

https://www.reuters.com/sustainability/robinhood-cut-10-its-full-time-workforce-2026-06-16/
1•indiesense•6m ago•0 comments

The ongoing debacle of hiring a fake coworker

1•blinkbat•7m ago•0 comments

Uncritical use of AI causes countrywide scandal at Starbucks Korea

https://www.theguardian.com/world/2026/jun/16/starbucks-korea-shut-all-stores-tank-day-promotionB...
2•Blackthorn•8m ago•0 comments

Agent Architecture Is a Compute Allocation Problem: The Advisor Strategy

https://harrisonsec.com/blog/agent-architecture-compute-allocation-advisor-strategy/
1•gzxharrison001•9m ago•0 comments

How we evaluate our LLM judge

https://build.forus.com/how-we-evaluate-our-llm-judge-a-perturbation-based-approach
2•abeinstein•9m ago•0 comments

Can gzip be a language model?

https://nathan.rs/posts/gzip-lm/
2•nathan-barry•10m ago•1 comments

The Faithfulness of LLMs as Solvers and Autoformalizers in Legal Reasoning

https://arxiv.org/abs/2606.16118
1•root-parent•11m ago•0 comments

The AI Hype – Too Costly – Alternative Rock, Original Lyrics [Video]

https://www.youtube.com/watch?v=jwfuNk2cRDc
1•NedCode•12m ago•0 comments

The Same Hetzner VM Cost $60 Last Week. Today It Costs $154

https://webbynode.com/articles/same-hetzner-vm-cost-60-last-week-today-hetzner-offers-it-at-154
2•gsgreen•13m ago•2 comments

Python 3.13 gets a JIT (2024)

https://tonybaloney.github.io/posts/python-gets-a-jit.html
1•tosh•13m ago•0 comments

TreeTrace, Git records what changed;this records how you steer your LLM sessions

https://github.com/TreeTraceTool/TreeTrace
1•ZionBoggan•14m ago•0 comments

Never Talk to the Police. Period

https://www.campolalaw.com/why-you-should-never-talk-to-the-po
2•Cider9986•15m ago•0 comments

Databricks Acquires Panther

https://www.databricks.com/company/newsroom/press-releases/databricks-agrees-acquire-panther-furt...
3•scapecast•17m ago•0 comments

Show HN: Sentinel – prevent duplicate execution using Postgres

https://github.com/Sreejay-reddy/Sentinel
1•Sreejay_reddy•17m ago•0 comments

GateGPT: 56k tokens per second Transformer (KV cache) on FPGA at 80 MHz

https://twitter.com/fguzmanai/status/2065832668172845209
9•laxmena•19m ago•1 comments

Hardware Is Asynchronous. Most of Our Operating Systems Still Aren't

https://vorjdux.com/articles/hardware-is-async.html
3•homarp•19m ago•0 comments

Apple's weird anti-nausea dots cured my car sickness

https://www.theverge.com/tech/942854/apple-vehicle-motion-cues-review-really-work
8•neilfrndes•19m ago•0 comments
Open in hackernews

Ask HN: Is our data warehouse setup normal or over-complicated?

4•ealready_value•1h ago
I've been pulled onto a new feature for replacing some of our existing customer-facing reports with reports from the data warehouse. This isn't the first data report from the data team we've integrated into the product, but since it involves existing reports that I'm the local expert on, I'm getting pulled into the process. The current reports don't have any performance issues, but the decision to change has been made anyway.

From what I've been able to gather, the data goes from the production MySQL database to a secondary MySQL database using DMS. Then come the Glue jobs that ship the data out to a data lake in S3. After that there are several transformation jobs that I've been told convert the data into a "canonical" form, smoothing out all the differences between verticals. I think they said that next the data goes into a second data lake and has additional transformations performed. Finally the entire process gets the data to its final resting place in Redshift where QuickSight is used to create reports. I'm fairly certain I missed a couple steps because I just couldn't figure out the purpose of each step as they were describing the process.

Getting reports out of that process seems painful. Showing a report for an internal customer (sales or customer support for instance) means they need a QuickSight account and access to the specific report. Getting access to that for myself was not straightforward, which makes me think it is hand-managed by a dev.

For showing a report in product it feels worse. First the data team are about the only people that can create these reports because not only do the product devs not know this "canonical" form, but getting the development environment running consistently for product devs has been like pulling teeth. Once someone has written the report, they have to promote the report by copying it exactly, including an identical report id, to another region. Finally the report id is given to the product team to put into the product. Adding the report id to the product is the easiest part, but the data journey doesn't stop there. The product has to pass that report id and user information to a lambda the data team maintains that generates a URL for the product to embed with an iframe. And after all of that, the report doesn't come close to matching the look of the site.

Is this data warehouse setup normal? Is this a common way to handle in-product reports after a company invests in a data warehouse? There are a lot of what seem like redundant steps, as well as a lot of custom code for what I would expect to be built into these products.

Comments

icedchai•43m ago
Without understanding differences between the "source" and "canonical" forms, it is tough to say. Also how much data are we actually talking about? The pipeline you describe may be entirely reasonable, or it may be an over engineered, convoluted contraption that could be replaced with a single DB replica and a few views to simplify queries.

My experience with QuickSight has been pretty negative. The overall UI/UX is pretty meh. If you're embedding it in your product you may be better off generating your own reports, in app.

ealready_value•26m ago
The source form is the production database, which is what the current reports pull from. The canonical form is the form that in theory all of the verticals get rolled into, but many of the nuances that our customers are used to having end up getting replaced with similar, but are not quite the same. Right now that's my biggest concern that customers are not going to get the data they need because of this canonical form.

We're talking about a few-hundred megabytes of data for all of the customers that these reports pull, but that's also for the past 15 years. We do have like 25k customers, which shrinks how much a customer can pull in even further. One last point is that we already de-normalize the report data into its own table specifically for these reports, so that's not something the data warehouse is doing for us.

I agree with your experience with QuickSight, it is exactly my experience. My preference is to continue using the reports we generate in the app, but I'm trying to wrap my head around cases where this ends up being the better direction.