frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Trying to make an Automated Ecologist: A first pass through the Biotime dataset

https://chillphysicsenjoyer.substack.com/p/trying-to-make-an-automated-ecologist
1•crescit_eundo•1m ago•0 comments

Watch Ukraine's Minigun-Firing, Drone-Hunting Turboprop in Action

https://www.twz.com/air/watch-ukraines-minigun-firing-drone-hunting-turboprop-in-action
1•breve•2m ago•0 comments

Free Trial: AI Interviewer

https://ai-interviewer.nuvoice.ai/
1•sijain2•2m ago•0 comments

FDA Intends to Take Action Against Non-FDA-Approved GLP-1 Drugs

https://www.fda.gov/news-events/press-announcements/fda-intends-take-action-against-non-fda-appro...
1•randycupertino•3m ago•0 comments

Supernote e-ink devices for writing like paper

https://supernote.eu/choose-your-product/
1•janandonly•5m ago•0 comments

We are QA Engineers now

https://serce.me/posts/2026-02-05-we-are-qa-engineers-now
1•SerCe•6m ago•0 comments

Show HN: Measuring how AI agent teams improve issue resolution on SWE-Verified

https://arxiv.org/abs/2602.01465
2•NBenkovich•6m ago•0 comments

Adversarial Reasoning: Multiagent World Models for Closing the Simulation Gap

https://www.latent.space/p/adversarial-reasoning
1•swyx•6m ago•0 comments

Show HN: Poddley.com – Follow people, not podcasts

https://poddley.com/guests/ana-kasparian/episodes
1•onesandofgrain•14m ago•0 comments

Layoffs Surge 118% in January – The Highest Since 2009

https://www.cnbc.com/2026/02/05/layoff-and-hiring-announcements-hit-their-worst-january-levels-si...
7•karakoram•14m ago•0 comments

Papyrus 114: Homer's Iliad

https://p114.homemade.systems/
1•mwenge•15m ago•1 comments

DicePit – Real-time multiplayer Knucklebones in the browser

https://dicepit.pages.dev/
1•r1z4•15m ago•1 comments

Turn-Based Structural Triggers: Prompt-Free Backdoors in Multi-Turn LLMs

https://arxiv.org/abs/2601.14340
2•PaulHoule•16m ago•0 comments

Show HN: AI Agent Tool That Keeps You in the Loop

https://github.com/dshearer/misatay
2•dshearer•18m ago•0 comments

Why Every R Package Wrapping External Tools Needs a Sitrep() Function

https://drmowinckels.io/blog/2026/sitrep-functions/
1•todsacerdoti•18m ago•0 comments

Achieving Ultra-Fast AI Chat Widgets

https://www.cjroth.com/blog/2026-02-06-chat-widgets
1•thoughtfulchris•20m ago•0 comments

Show HN: Runtime Fence – Kill switch for AI agents

https://github.com/RunTimeAdmin/ai-agent-killswitch
1•ccie14019•22m ago•1 comments

Researchers surprised by the brain benefits of cannabis usage in adults over 40

https://nypost.com/2026/02/07/health/cannabis-may-benefit-aging-brains-study-finds/
1•SirLJ•24m ago•0 comments

Peter Thiel warns the Antichrist, apocalypse linked to the 'end of modernity'

https://fortune.com/2026/02/04/peter-thiel-antichrist-greta-thunberg-end-of-modernity-billionaires/
3•randycupertino•25m ago•2 comments

USS Preble Used Helios Laser to Zap Four Drones in Expanding Testing

https://www.twz.com/sea/uss-preble-used-helios-laser-to-zap-four-drones-in-expanding-testing
3•breve•30m ago•0 comments

Show HN: Animated beach scene, made with CSS

https://ahmed-machine.github.io/beach-scene/
1•ahmedoo•31m ago•0 comments

An update on unredacting select Epstein files – DBC12.pdf liberated

https://neosmart.net/blog/efta00400459-has-been-cracked-dbc12-pdf-liberated/
3•ks2048•31m ago•0 comments

Was going to share my work

1•hiddenarchitect•34m ago•0 comments

Pitchfork: A devilishly good process manager for developers

https://pitchfork.jdx.dev/
1•ahamez•34m ago•0 comments

You Are Here

https://brooker.co.za/blog/2026/02/07/you-are-here.html
3•mltvc•39m ago•1 comments

Why social apps need to become proactive, not reactive

https://www.heyflare.app/blog/from-reactive-to-proactive-how-ai-agents-will-reshape-social-apps
1•JoanMDuarte•39m ago•1 comments

How patient are AI scrapers, anyway? – Random Thoughts

https://lars.ingebrigtsen.no/2026/02/07/how-patient-are-ai-scrapers-anyway/
1•samtrack2019•40m ago•0 comments

Vouch: A contributor trust management system

https://github.com/mitchellh/vouch
3•SchwKatze•40m ago•0 comments

I built a terminal monitoring app and custom firmware for a clock with Claude

https://duggan.ie/posts/i-built-a-terminal-monitoring-app-and-custom-firmware-for-a-desktop-clock...
1•duggan•41m ago•0 comments

Tiny C Compiler

https://bellard.org/tcc/
7•guerrilla•42m ago•1 comments
Open in hackernews

Show HN: DataFlow – Open Tool for LLM data prep 10k synthetic > 1M generic data

https://github.com/OpenDCAI/DataFlow
1•Mey0320•1mo ago
Hi HN, We are the OpenDCAI team from Peking University. We just released DataFlow, an open-source framework designed to make LLM data preparation as programmable and modular as model training.

The Problem: While model architectures are standardized (PyTorch/JAX), data preparation is still dominated by ad-hoc scripts and loosely defined workflows. Most existing tools focus on "cleaning" or "filtering" existing large datasets, but modern LLM training increasingly relies on complex synthetic data generation and iterative refinement.

Our Solution: DataFlow DataFlow treats data processing like constructing a neural network. It provides a PyTorch-like programming interface where you compose Operators into Pipelines.

Key Technical Features: - Modular Abstraction: Just like torch.nn.Module, we provide standard interfaces for Operators, Prompt Templates, and Pipelines. - Rich Operator Zoo: Nearly 200 pre-built operators covering Text, Math, Code, Text-to-SQL, and RAG. - DataFlow-Agent: An agentic layer (built on LangGraph) that translates natural language requirements directly into executable pipelines.

Results: We found that data quality matters more than scale. - 10k > 1M: A unified 10k-sample dataset produced by DataFlow enables base models (Qwen2/2.5) to surpass counterparts trained on 1M generic instruction samples (Infinity-Instruct). - Code & SQL: Our pipelines achieved +7% improvement on code benchmarks and +3% execution accuracy in Text-to-SQL using significantly less data.

Links: - Paper: https://arxiv.org/abs/2512.16676 - Repo: https://github.com/OpenDCAI/DataFlow - Docs:https://opendcai.github.io/DataFlow-Doc/

We believe data engineering deserves the same level of rigorous abstraction as model architecture. We hope DataFlow can serve as a foundational substrate for future data-centric AI development.