frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: An unstructured data workspace for data transformations with LLM

https://www.usefolio.ai/
3•nibab•1h ago
hi HN!

a couple of months ago I had to analyze a few thousand audio recordings to help identify issues with customer support. i was able to get some raw high-level initial results with python scripts invoking LLM APIs, but they were too general and unhelpful. writing basic prompts is easy, but tuning them and making them specific enough to ensure no faint signal is missed is hard. you need to iterate through the data with an initial prompt, segment the data into different buckets, chain another prompt for each bucket etc. Then you need to constantly review the raw data to tweak the prompts just the right way to get the desired results.

There are no good user-facing tools for scaling to thousands of rows of unstructured data analysis with LLMs. Claude Cowork / agents with access to filesystems are scratching the surface, but having a text-only UI is challenging, especially when you want to go back and adjust your research pipeline, narrow down deterministically to a specific subset of your data with SQL-like filters, or do any cost management. Scaling past 100 files is not well supported. Deep research is difficult to steer and verify.

I needed a mini-data warehouse that could help me get insights out of my data, optimize costs with bulk LLM operations (via cost estimation and model choice), and let me browse and verify the data in a user-friendly way, without requiring me to set up something like Databricks. So, I built folio.

Folio is a free, local, macOS app for analyzing your unstructured data with LLMs. It's a UI wrapper around a minimal data warehouse that lets users (and agents) do LLM-based transformations on big unstructured datasets. All you need to get started is an AI API key and an account with modal.com

Users bring their files into Folio which then get loaded into a tabl, where each row contains a markdown representation of the file contents. Users can then run LLM operations in bulk on those files and use sql filters to create views and narrow down the scope of the transformations. Agents are a first-class citizen and they can plug into folio to do most of the work for you. To take load off the desktop for OCR/Audio Transcription as well as the thousands of http requests to AI APIs we integrate with modal.com as the execution engine. A local orchestrator fans out jobs to modal and then fans them in once complete. Data is never stored anywhere, and only moves in transit through AI API provider and the user's own modal infrastructure.

folio workspaces are multi-modal (you can load different data types in the same workspace and move it through the same analysis pipeline) and they can support thousands of files.

People use folio today to: - review customer support tickets/emails: bucket issue into different categories, narrow in on categories of interest, and then action that data by generating a response. - extract detailed data from financial documents: load all data that can be found on a particular company, extract structured data like revenue numbers and projections. - do literature reviews: there are lots of agents that help you load data from research paper repositories. once that data is loaded into folio, users can do a steerable deep research over those files. - perform criteria-based search: generate yes/no criteria like "document contains data on XYZ", "document mentions ABC", "documented cites XYZ".

Companies like v7labs, hebbia, Legora, Harvey have similar "Tabular Document Review" features, but they are not scalable or compatible with outside agents like Claude Code. Additionally they require expensive enterprise contracts.

I see folio moving beyond data analysis into the perfect companion for agentic tasks that require a human-facing UI/UX, cost management and actioning on data in bulk.

Website: https://www.usefolio.ai Github: https://github.com/usefolio/folio X: https://x.com/usefolio_ai

Looking forward to hearing what people think!

Modular hardcoded circuits for computer vision plus autonomous production tools

https://github.com/SwuduSusuwu/SusuLib/discussions/52
1•SwuduSusuwu•1m ago•1 comments

Anthropics Buffa: a Rust implementation of protobuf

https://github.com/anthropics/buffa
1•harporoeder•2m ago•0 comments

We need to talk about Elixir

https://blog.gjmveloso.dev/posts/2026/03/25/we-need-to-talk-about-elixir/
1•gjmveloso•3m ago•0 comments

Nasdaq Flirts with Correction Territory

https://www.wsj.com/livecoverage/stock-market-today-dow-sp-500-nasdaq-03-26-2026/card/nasdaq-flir...
1•pera•3m ago•0 comments

JPMorgan now monitoring investment banker screen time to prevent burnout

https://finance.yahoo.com/sectors/healthcare/articles/jpmorgan-started-monitoring-keystrokes-vide...
1•s3p•4m ago•0 comments

Artisanal software in the age of AI codegen

https://kinarey.com/when-craft-relocates/
1•indynz•5m ago•2 comments

China Learned to Love the Classics

https://www.newyorker.com/news/annals-of-education/how-china-learned-to-love-the-classics
3•tzury•6m ago•0 comments

Basecamp CLI and Agent Skill

https://basecamp.com/agents
1•SunshineTheCat•7m ago•0 comments

All code is sorcery, until it isn't

https://andreafortuna.org/2026/03/26/programming_as_magic/
1•speckx•8m ago•0 comments

Taming LLMs: Using Executable Oracles to Prevent Bad Code

https://john.regehr.org/writing/zero_dof_programming.html
2•mad44•10m ago•0 comments

What's coming to our GitHub Actions 2026 security roadmap

https://github.blog/news-insights/product-news/whats-coming-to-our-github-actions-2026-security-r...
1•abraham•12m ago•0 comments

A 54KB client-side HNSW vector search engine in WASM

https://github.com/Altor-lab/altor-vec
4•anshulbasia27•12m ago•1 comments

Testing Neglected VHS Tapes and CDs

https://hackaday.com/2026/03/25/testing-severely-neglected-vhs-tapes-and-cds/
1•toomuchtodo•12m ago•0 comments

One Pipe, Two Sandboxes, Zero Prompt Injection

https://multikernel.io/2026/03/26/sandlock-pipeline-xoa/
1•wang_cong•13m ago•0 comments

Show HN: Switch Country – Get News, Stalk what's happening in other countries?

https://play.google.com/store/apps/details?id=com.kocial.news&hl=en_US
1•kocialnews•13m ago•0 comments

Key Disclosure Law

https://en.wikipedia.org/wiki/Key_disclosure_law
1•basilikum•14m ago•0 comments

SpecKit: Not Impressed

https://jaksa.me/blog/2026-03-26-speckit-not-impressed
1•jaksa•14m ago•0 comments

Show HN: Content Addressable Storage for ML Checkpoints

https://olamyy.github.io/posts/tensorcas/
1•TotallyNotOla•14m ago•1 comments

Ethics.md – distributed AI Ethics Framework (co-created with AIs)

https://github.com/davyvalekestrel/ethics.md
1•davyvalekestrel•16m ago•1 comments

Cops Use AI to Jail Innocent Grandmother for 6 Months [video]

https://www.youtube.com/watch?v=4ifXObNvTaA
2•ghastmaster•18m ago•1 comments

What did your strengths cost you? (Interactive)

https://secondorder-469bce2c03ac.herokuapp.com/simulations/tradeoff-atlas
1•icyou780•19m ago•0 comments

Edera spent years calling KVM less secure. Here's why it changed its mind

https://thenewstack.io/edera-adds-kvm-support/
1•CrankyBear•23m ago•0 comments

Zero Days: Electric Motorcycles Are a Security Nightmare

https://persephonekarnstein.github.io/post/zero-days/
3•Ivoah•24m ago•0 comments

Native Instant Space Switching on macOS

https://arhan.sh/blog/native-instant-space-switching-on-macos/
2•birdculture•24m ago•0 comments

Mitochondrial Ca2 efflux controls neuronal metabolism and long-term memory

https://www.nature.com/articles/s42255-026-01451-w
2•PaulHoule•28m ago•0 comments

Siclair Microvision (1977)

https://r-type.org/articles/art-452.htm
2•joebig•28m ago•0 comments

Android Canary blesses the Linux Terminal with a modern UI, new features

https://www.androidauthority.com/android-canary-linux-terminal-upgrades-3651830/
1•thunderbong•30m ago•0 comments

Open-source startups should do more embedded/OEM deals

https://getlago.com/blog/embedded-software
1•FinnLobsien•30m ago•0 comments

Red Lobster's Last Gasp

https://www.bloomberg.com/news/features/2026-03-24/red-lobster-turnaround-in-question-as-restaura...
1•herbertl•30m ago•2 comments

$500 GPU outperforms Claude Sonnet on coding benchmarks

https://github.com/itigges22/ATLAS
1•yogthos•30m ago•0 comments