frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

https://openciv3.org/
529•klaussilveira•9h ago•146 comments

The Waymo World Model

https://waymo.com/blog/2026/02/the-waymo-world-model-a-new-frontier-for-autonomous-driving-simula...
859•xnx•15h ago•518 comments

How we made geo joins 400× faster with H3 indexes

https://floedb.ai/blog/how-we-made-geo-joins-400-faster-with-h3-indexes
72•matheusalmeida•1d ago•13 comments

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
180•isitcontent•9h ago•21 comments

Monty: A minimal, secure Python interpreter written in Rust for use by AI

https://github.com/pydantic/monty
182•dmpetrov•10h ago•79 comments

Show HN: I spent 4 years building a UI design tool with only the features I use

https://vecti.com
294•vecti•11h ago•130 comments

Dark Alley Mathematics

https://blog.szczepan.org/blog/three-points/
69•quibono•4d ago•12 comments

Microsoft open-sources LiteBox, a security-focused library OS

https://github.com/microsoft/litebox
343•aktau•16h ago•168 comments

Sheldon Brown's Bicycle Technical Info

https://www.sheldonbrown.com/
338•ostacke•15h ago•90 comments

Hackers (1995) Animated Experience

https://hackers-1995.vercel.app/
434•todsacerdoti•17h ago•226 comments

Show HN: If you lose your memory, how to regain access to your computer?

https://eljojo.github.io/rememory/
237•eljojo•12h ago•147 comments

Delimited Continuations vs. Lwt for Threads

https://mirageos.org/blog/delimcc-vs-lwt
13•romes•4d ago•2 comments

An Update on Heroku

https://www.heroku.com/blog/an-update-on-heroku/
373•lstoll•16h ago•252 comments

Unseen Footage of Atari Battlezone Arcade Cabinet Production

https://arcadeblogger.com/2026/02/02/unseen-footage-of-atari-battlezone-cabinet-production/
6•videotopia•3d ago•0 comments

PC Floppy Copy Protection: Vault Prolok

https://martypc.blogspot.com/2024/09/pc-floppy-copy-protection-vault-prolok.html
41•kmm•4d ago•3 comments

Show HN: ARM64 Android Dev Kit

https://github.com/denuoweb/ARM64-ADK
14•denuoweb•1d ago•2 comments

How to effectively write quality code with AI

https://heidenstedt.org/posts/2026/how-to-effectively-write-quality-code-with-ai/
220•i5heu•12h ago•162 comments

Why I Joined OpenAI

https://www.brendangregg.com/blog/2026-02-07/why-i-joined-openai.html
91•SerCe•5h ago•75 comments

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

https://github.com/phreda4/r3
62•phreda4•9h ago•11 comments

Learning from context is harder than we thought

https://hy.tencent.com/research/100025?langVersion=en
162•limoce•3d ago•82 comments

Introducing the Developer Knowledge API and MCP Server

https://developers.googleblog.com/introducing-the-developer-knowledge-api-and-mcp-server/
38•gfortaine•7h ago•10 comments

I spent 5 years in DevOps – Solutions engineering gave me what I was missing

https://infisical.com/blog/devops-to-solutions-engineering
127•vmatsiiako•14h ago•53 comments

Female Asian Elephant Calf Born at the Smithsonian National Zoo

https://www.si.edu/newsdesk/releases/female-asian-elephant-calf-born-smithsonians-national-zoo-an...
18•gmays•4h ago•2 comments

Understanding Neural Network, Visually

https://visualrambling.space/neural-network/
261•surprisetalk•3d ago•35 comments

I now assume that all ads on Apple news are scams

https://kirkville.com/i-now-assume-that-all-ads-on-apple-news-are-scams/
1029•cdrnsf•19h ago•428 comments

FORTH? Really!?

https://rescrv.net/w/2026/02/06/associative
55•rescrv•17h ago•18 comments

Show HN: Smooth CLI – Token-efficient browser for AI agents

https://docs.smooth.sh/cli/overview
83•antves•1d ago•60 comments

WebView performance significantly slower than PWA

https://issues.chromium.org/issues/40817676
18•denysonique•6h ago•2 comments

Zlob.h 100% POSIX and glibc compatible globbing lib that is faste and better

https://github.com/dmtrKovalenko/zlob
5•neogoose•2h ago•1 comments

I'm going to cure my girlfriend's brain tumor

https://andrewjrod.substack.com/p/im-going-to-cure-my-girlfriends-brain
109•ray__•6h ago•54 comments
Open in hackernews

We collected 10k hours of neuro-language data in our basement

https://condu.it/thought/10k-hours
117•nee1r•2mo ago

Comments

ArjunPanicksser•2mo ago
Makes sense that CL ends up being the best for recruiting first-time participants. Curious what other things you tried for recruitment and how useful they were?
n7ck•2mo ago
The second most useful by far is Indeed, where we post an internship opportunity for participants interested in doing 10 sessions over 10 weeks. Other things that work pretty well are asking professors to send out emails to students at local universities, putting up ~300-500 fliers (mostly around universities and public transit), and posting on Nextdoor. We also just texted a lot of groupchats/posted on linkedin/ gave out fliers and the signup link to kind of everyone we talked to in cafes and similar. We take on some participants as ambassadors as well, and pay them to refer their friends.

We tried google/facebook/instagram ads, and we tried paying for some video placements. Basically none of the explicit advertisement worked at all and it wasn't worth the money. Though for what it's worth, none of us are experts in advertising, so we might have been going about it wrong -- we didn't put loads of effort into iterating once we realized it wasn't working.

mishajw•2mo ago
Interesting dataset! I'm curious what kind of results you would get with just EEG, compared to multiple modalities? Why do multiple modalities end up being important?
n7ck•2mo ago
EEG has very good temporal resolution, but quite bad spacial resolution, and other modalities have different tradeoffs
g413n•2mo ago
what's the basis for conversion between hours of neural data to number of tokens? is that counting the paired text tokens?
rio-popper•2mo ago
edit: oops sorry misread - the neural data is tokenised by our embedding model. the number of tokens per second of neural data varies and depends on the information content.
n7ck•2mo ago
Hey I'm Nick, and I originally came to Conduit as a data participant! After my session, I started asking questions about the setup to the people working there, and apparently I asked good questions, so they hired me.

Since I joined, we've gone from <1k hours to >10k hours, and I've been really excited by how much our whole setup has changed. I've been implementing lots of improvements to the whole data pipeline and the operations side. Now that we train lots of models on the data, the model results also inform how we collect data (e.g. we care a lot less about noise now that we have more data).

We're definitely still improving the whole system, but at this point, we've learned a lot that I wish someone had told us when we started, so we thought we'd share it in case any of you are doing human data collection. We're all also very curious to get any feedback from the community!

internet_points•2mo ago
I thought that kind of career change only happened in The Sims :-)
n7ck•2mo ago
hahahah tell me about it!
SubiculumCode•2mo ago
The article seem to indicate fMRI as a modality, but I know that is not generally an affordable resource. Was it an ultra low field set?
paparicio•1mo ago
Good story!

I have dreamed many times about same story but with apple or epic games. But they have millions of human beings testing their products FOR FREE in every place of the world, hahahaha

Gormisdomai•2mo ago
The example sentences generated “only from neural data” at the top of this article seem surprisingly accurate to me, like, not exact matches but much better than what I would expect even from 10k hours:

“the room seemed colder” -> “ there was a breeze even a gentle gust”

ninapanickssery•2mo ago
Yeah, agreed
jcims•2mo ago
Exactly. And honestly both this example and the one about the woman seemed to be what I would actually think/feel vs what I say.

Very interesting!

CobrastanJorji•2mo ago
Tangential to your point, if you collect 10,000 hours of brain scanning in exactly one damp basement, I wonder if perhaps the model would become very, very specialized for all of the flavors of "this room seems colder."
rio-popper•2mo ago
For the record, it was two basements -- we moved office in the middle -- and a bigger issue was actually overheating. But your point is basically right! The model is a lot better at certain kinds of ideas than others. Particularly concerning was the fact that the first cluster I noticed getting good was all the different variations of 'the headset is uncomfortable/heavy' etc. But this makes sense -- what participants talk about has a lot to do with what kinds of ideas the model can pick up, and this was more or less what we expected
ag8•2mo ago
This is a cool setup, but naively it feels like it would require hundreds of thousands of hours of data to train a decent generalizable model that would be useful for consumers. Are there plans to scale this up, or is there reason to believe that tens of thousands of hours are enough?
n7ck•2mo ago
Yeah I think the way we trained the embedding model focused a lot on how to make it as efficient as possible, since it is such a data-limited regime. So I think based on (early) scaling results, it'll be closer to 50-70k hours, which we should be able to get in the next months now we've already scaled up a lot.

That said, the way to 10-20x data collection would be to open a couple other data collection centers outside SF, in high-population cities. Right now, there's a big advantage in just having the data collection totally in-house, because it's so much easier to debug/improve it because we're so small. But now we've mostly worked out the process, it should also be very straightforward for us to just replicate the entire ops/data pipeline in 3-4 parallel data collection centers.

nullbyte808•2mo ago
I live nearish to Seattle, in Tacoma. I would be willing to setup a center.
richardfeynman•2mo ago
This is an interesting dataset to collect, and I wonder whether there will be applications for it beyond what you're currently thinking.

A couple of questions: What's the relationship between the number of hours of neurodata you collect and the quality of your predictions? Does it help to get less data from more people, or more data from fewer people?

n7ck•2mo ago
1. The predictions get better with more data - and we don't seem to be anywhere near diminishing returns. 2. The thing we care about is generalization between people. For this, less data from more people is much better.
richardfeynman•2mo ago
I noticed you tracked sessions per person, implying a subset of people have many hours of data collected on them. Are predictions for this subset better than the median?

For a given amount of data, is it better to have more people with less data per person or fewer people with more data per person?

clemvonstengel•2mo ago
Yes, the predictions are much better for people with more hours of data in the training set. Usually, we just totally separate the train and val set, so no individual with any sessions in the train set is ever used for evals. When we instead evaluate on someone with 10+ hours in the train set, predictions get ~20-25% better.

For a given amount of data, whether you want more or less data per person really depends on what you're trying to do. The thing we want is for it to be good at zero-shot, that is, for it to decode well on people who have zero hours in the train set. So for that, we want less data per person. If instead we wanted to make it do as well as possible on one individual, then we'd want way more data from that one person. (So, e.g., when we make it into a product at first, we'll probably finetune on each user for a while)

richardfeynman•2mo ago
Makes a ton of sense, thanks.

I wonder if there will be medical applications for this tech, for example identifying people with brain or neurological disorders based on how different their "neural imaging" looks from normal.

wiwillia•2mo ago
Really interested in how accuracy improves with the scale of the data set. Non-invasive thought-to-action would be a whole new interaction paradigm.
devanshp•2mo ago
Cool post! I'm somewhat curious whether the data quality scoring has actually translated into better data; do you have numbers on how much more of your data is useful for training vs in May?
rio-popper•2mo ago
so the neural quality real-time checking was the most important thing here. Before we rewrote the backend, between 58-64% of participant hours were actually usable data. Now, it's between 90-95%

If you mean the text quality scoring system, then when we added that, it improved the amount of text we got per hour of neural data by between 30-35%. (That includes the fact that we filter which participants we have return based on their text quality scores)

rajlego•2mo ago
Did you consider trying to collect data in a much poorer country that still has high quality English? e.g. the Philippines
rio-popper•2mo ago
Yeah we did consider this. For now, there's an advantage to having the data collection in the same building as the whole eng team, but once we hire a couple more engs, I expect we'll just replicate the collection setup in other countries as well
estitesc•2mo ago
Loved watching this unfold in our basement. : )
dang•2mo ago
[under-the-rug stub]

[see https://news.ycombinator.com/item?id=45988611 for explanation]

ClaireBookworm•2mo ago
Yoo this is sick!! sometimes it might actually just be a data game, so huge props to them for actually collecting all that high-quality data
ninapanickssery•2mo ago
This is very cool, thanks for writing about your setup in such detail! It’s impressive that you can predict stuff from this noninvasive data. Are there similar existing datasets or this the first of its kind?
cpeterson42•2mo ago
Wild world we live in
titzer•2mo ago
I lol'd at the hardware "patch" that kept the software from crashing--removing all but the alpha-numeric keys (!?). Holy cow, you had time to collect thousands of hours of neurotraces but couldn't sanitize the inputs to remove a stray [? That sounds...funky.
NoraCodes•2mo ago
Presumably it's more like an errant Ctrl-C.
clemvonstengel•2mo ago
Yup exactly this. Also Ctrl-W, alt tab, etc.
ricudis•1mo ago
All these issues having been solved already in kiosk setups.
in-silico•2mo ago
It's interesting that the model generalizes to unseen participants. I was under the impression that everyone's brain patterns were different enough that the model would need to be retrained for new users.

Though, I suppose if the model had LLM-like context where it kept track of brain data and speech/typing from earlier in the conversation then it could perform in-context learning to adapt to the user.

clemvonstengel•2mo ago
Basically correct intuition: the model does much better when we give it, e.g., 30 secs of neural data in the leadup instead of e.g. 5 secs. My sense is also that it's learning in context, so people's neural patterns are quite different but there's a higher-level generator that lets the model learn in context (or probably multiple higher-level patterns, each of which the model can learn from in context).

We only got any generalization to new users after we had >500 individuals in the dataset, fwiw. There's some interesting MRI studies also finding a similar thing that when you have enough individuals in the dataset, you start seeing generalization.

asgraham•2mo ago
Really cool dataset! Love seeing people actually doing the hard work of generating data rather than just trying to analyze what exists (I say this as someone who’s gone out of his way to avoid data collection).

Have you played at all with thought-to-voice? Intuitively I’d think EEG readout would be more reliable for spoken rather than typed words, especially if you’re not controlling for keyboard fluency.

clemvonstengel•2mo ago
Yeah we do both text and voice (roughly 70% of data collection is typed, 30% spoken). Partly this is to make sure the model is learning to decode semantic intent (rather than just planned motor movements). Right now, it's doing better on the typed part, but I expect that's just because we have more data of that kind.

It does generalize between typed and spoken, i.e. it does much better on spoken decoding if we've also trained on the typing data, which is what we were hoping to see.

asgraham•2mo ago
Interesting! I imagine speech-related motor artifacts don't help matters either, even if noise starts mattering less at scale.
n7ck•2mo ago
Yeah -- we have the participants use chinrests as well, which reduces head motion artifacts for typing but less so for speaking (because they have to move their heads for that of course). so a lot of the data is with them keeping their heads quite still, although the model is becoming much more robust to this over time.
Terretta•2mo ago
> we do both text and voice (roughly 70% of data collection is typed, 30% spoken). Partly this is to make sure the model is learning to decode semantic intent (rather than just planned motor movements)

Both of these modes are incredibly slow thinking. Conciously shifting from thinking in concepts to thinking in words is like slamming on brakes for a school zone on an autobahn.

I've gathered most people think in words they can "hear in their head", most people can "picture a red triangle" and literally see one, and so on. Many folks who are multi-lingual say they think in a language, or dream in that language, and know which one it is.

Meanwhile, some people think less verbally or less visually, perhaps not verbally or visually at all, and there is no language (words).

A blog post shared here last month discussed a person trying to access this conceptual mode, which he thinks is like "shower thoughts" or physicists solving things in their heads while staring into space, except "under executive function". He described most of his thoughts as words he can hear in his head, with these concepts more like vectors. I agree with that characterization.

I'm curious what % of folks you've scanned may be in this non-word mode, or if the text and voice requirement forces everyone into words.

clemvonstengel•2mo ago
I agree that thinking in words is much slower than thinking in concepts would be -- that's the point of training models like this, so that ideally people can always just think in concepts. That said, we do need to get some kind of ground truth of what they're thinking in order to train the model, so we do need them to communicate that (in words).

One thing that's particularly exciting here is that the model often gets the high-level idea correct, without getting any words correct (as in some of the examples above), which suggests that it is picking up the idea rather than the particular words.

Terretta•2mo ago
> ideally people can always just think in concepts

Are you pursing an idea of how to help people like this author* access this mode that some of us are always in unless kicked out of it by the need for words?

Very needed right now — the opposite of the YouTube-ization of idea transfer.

It doesn't seem clear this is accessible without other changes in wiring? The inability to "picture" things as visuals seems to swap out for "conceptualizing" things in -- well, I don't have words for this.

An attempt from that essay:

This is not what Hadamard is talking about when he describes the wordless thought of the mathematicians and researchers he has surveyed. Instead, what they seem to be doing is something similar to this subconscious, parallelized search, except they do it in a “tensely” focused way.

The impression I get is that Hadamard loads a question into his mind (either in a non-verbal way, or by reading a mathematical problem that has been written by himself or someone else), and then he holds the problem effortfully centered in his mind. Effortfully, but wordlessly, and without clear visualizations. Describing the mental image that filled his mind while working on a problem concerning infinite series for his thesis, Hadamard writes that his mind was occupied by an image of a ribbon which was thicker in certain places (corresponding to possibly important terms). He also saw something that looked like equations, but as if seen from a distance, without glasses on: he was unable to make out what they said.

I’m not sure what is going on here.

* https://www.henrikkarlsson.xyz/p/wordless-thought

A couple of this author's speculations aren't how I'd say it works when this is one's default mode, but most are in the neighborhood. He comes the closest of what I've read by people who do think the way the author thinks — which seems to be most people.

whatshisface•2mo ago
What's the plan for after this mind reading helmet works reliably?
brovonov•2mo ago
Sell it to an ad agency.
clemvonstengel•2mo ago
We build headsets that lets you control your computer directly with your mind. Initially I expect we can get increased bandwidth / efficiency on common tasks (including coding) - but I think it gets really exciting when people start designing new software / interaction paradigms with this in mind.
whatshisface•2mo ago
If you want it to be remembered as a revolutionary computer interface, you will have to make sure it is not used in interrogations.
xg15•2mo ago
It's an enormously cool project (and also feels like the next logical thing to do after all the existing modalities)

But it feels eery to read a detailed story how they built and improved their setup and what obstacles they encountered, complete with photos - without any mention who is doing the things we are reading about. There is no mention of the staff or even the founders on the whole website.

I had a hard time judging how large this project even is. The homebuilt booths and trial-and-error workflow sound like "three people garage startup", but the bookings schedule suggests a larger team.

(At least there is an author line on that blog post. Had to google the names to get some background on this company)

You should consider an "about us" page :)

rio-popper•2mo ago
Good point. We're a team of 7 right now (3 engineering, 4 running data collection across shifts). We've been spending ~all our time on the data and model side, so the “About us” page lagged behind, but we’ll add one this week. Appreciate the feedback!
xg15•2mo ago
No question these are the more important things to spend time on. Good luck!
accrual•2mo ago
Very cool project! I had a couple ideas during the read:

* A ceiling-based pully system could help take the physical load off the users and may allow for increased sensor density. Some large/public VR setups do this.

* I'm sure you considered it, but a double-converting UPS might reduce the noise floor of your sensors and could potentially support multiple booths. Expensive though, and it's already mentioned that data quantity > quality at this stage. Maybe a future fine-tuning step could leverage this.

Cool write up and hope to see more in the future!

rio-popper•2mo ago
We actually did try making a pulley-system to reduce weight at one point. The issue was that it moved the headset kind of oddly WRT the person's head, which reduced sensor contact. One thing we intend to try is pulley system + good chinstrap, which might make the headset stay still and let us reduce the weight at the same time. Good ideas!
moffkalast•2mo ago
Your engineers were so preoccupied with whether or not they could, they didn't stop to think if they should.

Those predictions sound good enough to get you CIA funding.

nullbyte808•2mo ago
How well does it work when trained for 100 hours on just one participant? As in a model trained from the ground up for just one person?
n7ck•2mo ago
Currently, there are no participants in our dataset with >100hrs -- intentionally so, we've been optimizing heavily for diversity of dataset up to this point. We've explored the idea of fine tuning on a particular participant's data, and we expect that this will be pretty impactful
paparicio•1mo ago
I was one of the individuals who gave his neuro-language data for the mission. Super sick experience!

What you are trying to do is BIG, I love it. And I hope you could have more than 1M in a few months!

Keep pushing team!!!