From what I've been able to gather, the data goes from the production MySQL database to a secondary MySQL database using DMS. Then come the Glue jobs that ship the data out to a data lake in S3. After that there are several transformation jobs that I've been told convert the data into a "canonical" form, smoothing out all the differences between verticals. I think they said that next the data goes into a second data lake and has additional transformations performed. Finally the entire process gets the data to its final resting place in Redshift where QuickSight is used to create reports. I'm fairly certain I missed a couple steps because I just couldn't figure out the purpose of each step as they were describing the process.
Getting reports out of that process seems painful. Showing a report for an internal customer (sales or customer support for instance) means they need a QuickSight account and access to the specific report. Getting access to that for myself was not straightforward, which makes me think it is hand-managed by a dev.
For showing a report in product it feels worse. First the data team are about the only people that can create these reports because not only do the product devs not know this "canonical" form, but getting the development environment running consistently for product devs has been like pulling teeth. Once someone has written the report, they have to promote the report by copying it exactly, including an identical report id, to another region. Finally the report id is given to the product team to put into the product. Adding the report id to the product is the easiest part, but the data journey doesn't stop there. The product has to pass that report id and user information to a lambda the data team maintains that generates a URL for the product to embed with an iframe. And after all of that, the report doesn't come close to matching the look of the site.
Is this data warehouse setup normal? Is this a common way to handle in-product reports after a company invests in a data warehouse? There are a lot of what seem like redundant steps, as well as a lot of custom code for what I would expect to be built into these products.
icedchai•43m ago
My experience with QuickSight has been pretty negative. The overall UI/UX is pretty meh. If you're embedding it in your product you may be better off generating your own reports, in app.
ealready_value•26m ago
We're talking about a few-hundred megabytes of data for all of the customers that these reports pull, but that's also for the past 15 years. We do have like 25k customers, which shrinks how much a customer can pull in even further. One last point is that we already de-normalize the report data into its own table specifically for these reports, so that's not something the data warehouse is doing for us.
I agree with your experience with QuickSight, it is exactly my experience. My preference is to continue using the reports we generate in the app, but I'm trying to wrap my head around cases where this ends up being the better direction.