frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Terraform Industries Is Hiring

https://terraformindustries.com/
1•waynenilsen•2m ago•0 comments

Eidophor: 1950's space age video projection technology. [video]

https://www.youtube.com/watch?v=3-BvMcqEc98
1•fanf2•2m ago•0 comments

Notes from Ms. Morrison

https://slate.com/culture/2025/06/toni-morrison-editor-random-house-muhammad-ali.html
1•petethomas•2m ago•0 comments

Show HN: Monotone v1.2.0 is out (cloud native key-value storage for seq data)

https://monotone.studio/
1•pmwkaa•2m ago•0 comments

Buying a laptop for College/general purpose

1•pkrzysiek•2m ago•0 comments

Meta and $100M offers. I call BS

https://www.cnbc.com/2025/06/18/sam-altman-says-meta-tried-to-poach-openai-staff-with-100-million-bonuses-mark-zuckerberg.html
1•idkwhattocallme•3m ago•1 comments

Nietzschean Reflections on Liberty

https://isonomiaquarterly.com/archive/volume-3-issue-1/nietzschean-reflections-on-liberty/
2•brandonlc•5m ago•0 comments

Musk's X sues New York state over social media hate speech law

https://www.bbc.com/news/articles/c4g8vy2n3dko
1•nradov•6m ago•0 comments

High levels of antihistamine drugs can reduce fitness gains

https://medicalxpress.com/news/2025-06-high-antihistamine-drugs-gains.html
3•bikenaga•6m ago•0 comments

Augmented Vertex Block Descent (AVBD)

https://graphics.cs.utah.edu/research/projects/avbd/
1•Luc•7m ago•0 comments

Advisory Committee on Immunization Practices at a Crossroads

https://jamanetwork.com/journals/jama/fullarticle/2835626
1•rntn•8m ago•0 comments

Speeder Speed Controller

https://chromewebstore.google.com/detail/speeder-youtube-speed-con/hkhfaempiaejenaddpbeckghomphjjcp
1•doroved•9m ago•0 comments

What I did during the basketball game, or, browser screenshots in Sketch

https://sketch.dev/blog/browser-tool
1•tosh•10m ago•0 comments

I've almost completely switched from "Python" to "uv run"

https://actinium226.substack.com/p/ive-almost-completely-switched-from
4•actinium226•12m ago•0 comments

A simple Go error handling pattern led to 54GB memory usage with 65535 errors

https://gist.github.com/alingse/9c4414c8d56d9aeea02052818d78659a
1•alingse•14m ago•1 comments

Landing a Model Rocket [video]

https://www.youtube.com/watch?v=1GanrexRyVY
1•pillars•15m ago•0 comments

Artavolo – Your $0 Airtable Alternative

https://artavolo.com/
2•boikom•17m ago•2 comments

An episodic burst of genomic rearrangements

https://www.nature.com/articles/s41559-025-02728-1
1•darkwater•18m ago•0 comments

macOS Containers, Docker Desktop and Unikernels

https://nanovms.com/blog/macos-containers-docker-desktop-unikernels
2•transpute•19m ago•0 comments

GerriScary: Hacking the Supply Chain of Popular Google Products

https://www.tenable.com/blog/gerriscary-hacking-the-supply-chain-of-popular-google-products-chromiumos-chromium-bazel-dart
1•bearsyankees•20m ago•0 comments

The OpenAI Files

https://www.openaifiles.org/
8•shscs911•22m ago•0 comments

A dwarf galaxy just might upend the Milky Way's predicted demise

https://www.sciencenews.org/article/uncertain-milky-way-andromeda-collision
1•gmays•22m ago•0 comments

Selecting a Model Based on Stripe Conversion

https://cookbook.openai.com/examples/stripe_model_eval/selecting_a_model_based_on_stripe_conversion
2•tosh•24m ago•0 comments

Notes on Retries

https://justinblank.com/notebooks/retries.html
1•todsacerdoti•26m ago•1 comments

Mathematical Optimization: Solving Problems Using SCIP and Python

https://scipbook.readthedocs.io/en/latest/
1•marklit•27m ago•0 comments

Social media destroyed one of America's key advantages

https://www.noahpinion.blog/p/social-media-destroyed-one-of-americas
3•PaulHoule•31m ago•1 comments

Pope Leo Takes on AI

https://www.wsj.com/tech/ai/pope-leo-ai-tech-771cca48
1•calstad•32m ago•0 comments

Lessons from Building AI Coding Assistants: Context Retrieval and Evaluation

https://sourcegraph.com/blog/lessons-from-building-ai-coding-assistants-context-retrieval-and-evaluation
1•Destiner•35m ago•0 comments

Connect any React application to an MCP server in three lines of code

https://blog.cloudflare.com/connect-any-react-application-to-an-mcp-server-in-three-lines-of-code/
1•tzury•35m ago•0 comments

Agentic Engineering

https://zed.dev/agentic-engineering
2•Destiner•35m ago•0 comments
Open in hackernews

Ask HN: Data engineers, What suck when working on exploratory data-related task?

4•robz75•4h ago
Hey guys,

Founder here. I’m working on building my next project and I don’t want to waste time solving fake problems.

Right now, what's currently extremely painful & annoying to do in your job? (You can be very brutally honest)

More specifically, I'm interested how you handle exploratory data-related tasks from your team?

Very curious to get your current workflows, issues and frustrations :)

Comments

squircle•4h ago
Conversations and interviews > Jupyter notebook
robz75•4h ago
Why? What's currently annoying about notebooks that you have to deal with compared to just directly going to users?
squircle•3h ago
Ah, well, rereading your original post I realize now this isn't necessarily painful for me. Perhaps though, the annoying aspect is seeing others use proprietary excel spreadsheets without a data lake. Conway's Law?

Does VS here mean Visual Studio? I would not call myself a data engineer, I just play one at work sometimes. Many hats, yknow?

robz75•3h ago
"the annoying aspect is seeing others use proprietary excel spreadsheets without a data lake" => what's painful about that?

VS = compared to, versus

squircle•17m ago
Hah okay. I read VS different from vs. The pain, in part, is hidden functions, rarely ever inline documentation, difficult to reuse or repurpose, etc.
clejack•3h ago
The main issues for problems like this fall into 3 categories

- Things that prevent you from starting the job. Org silos, security, and permissions

- Things that prevent you from doing the job. This is primarily data cleaning.

- Things that make the job more difficult. This involves poor tooling, and you'll struggle to break the stranglehold that SQL and python-pandas have in this area. I'll also add plotting libraries to this. Many of them suck in a seemingly unavoidable way.

On the second and third points llms will most likely own these soon enough, though maybe there's room to build something small and local that's more efficient if the scope of the agent is reduced?

The first point is organizational generally, and it's very difficult to solve outside of integrating your system into an environment which is the strategy pursued by companies like snowflake and databricks.

robz75•1h ago
What are the pain points your are facing with data cleaning? How do you handle it for now?