frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: TheorIA – An Open Curated Physics Dataset (Equations,Explanations,JSON)

https://theoria-dataset.github.io/theoria-dataset/
9•ManuelSH•7mo ago
We’re building TheorIA— an open, high quality dataset of theoretical physics results: equations, derivations, definitions, and explanations — all in structured, machine- and human-readable JSON.

Why? Physics is rich with beautiful, formal results — but most of them are trapped in PDFs, LaTeX, or lecture notes. That makes it hard to:

- train symbolic/physics-aware ML models,

- build derivation-checking tools,

- or even just teach physics interactively.

THEORIA fills that gap. Each entry includes:

A result name (e.g., Lorentz transformations)

Clean equations (AsciiMath)

Straightforward step-by-step derivation with reasoning

Symbol definitions & assumptions

Programmatic validation using sympy

References, arXiv-style domain tags, and contributor metadata

Everything is in open, self-contained JSON files. No scraping, no PDFs, just clear structured data for physics learners, teachers, and ML devs.

Contributors Wanted: We’re tiny right now and trying to grow. If you’re into physics or symbolic ML:

Add an entry (any result you love)

Review others' derivations

Build tools on top of the dataset

GitHub https://github.com/theoria-dataset/theoria-dataset/

Licensed under CC-BY 4.0, and we welcome educators, students, ML people, or just anyone who thinks physics deserves better data.

Comments

somethingsome•7mo ago
There are only 3 entries, am I correct?
ManuelSH•7mo ago
Yes, we are at very early stage. Looking for other physics experts to help increasing it.
somethingsome•7mo ago
I like the idea of having a dataset for physics, but those entries are very basics, most of the physics happens with very complicated maths and it will be difficult to make an entry for a lot of physics.

For example, imagine the entry for the standard equation, should all the derivation and symbolic implementation done as a unique entry? It will be difficult to separate it in logical entries that reference each others, and many physical ideas are fundamentally different, leading to divergences.

I have the impression that it should be easier to just parse reference books and format each paragraph/section as an entry, and maybe build a graph. (considering the reference book as authoritative on the subject)

ManuelSH•7mo ago
I guess you mean the Lagrangian of the Standard Model… which I agree, it will be daunting… although there is no limit in a json…

The idea of automatically parsing books is very nice and possibly faster, but note that:

- there are already various datasets of physics papers and such content - the result will be quite different versus what we intent here, which is to have a high quality dataset of physics results with clear derivations (whenever derivation exist)

Maybe we can still use your idea to achieve the last point in some way… maybe there is a book that is already formatted as the dataset and we could use it as a starting point. But I don’t know any.

BrandiATMuhkuh•7mo ago
This is some cools work.

Not sure if it fits but I still have ~20k currated step by step solution for mathematics (pedagogical math) "lying" around from my previous startup. They are all hand currated. And could even be used for fine tuning or so.

Here are some details: The dataset has 20.600 Abstract Exercises which turn into 1.193.958 Concrete Exercises.

An Abstract Exercise looks like this: a + b = c A Concrete Exercise looks like this: 2 + 3 = 5 Tital compiled file size (JSONL): 11.6GB

And here is an explorer to see some of the data https://curriculum.amy.app/ToM

ManuelSH•7mo ago
very nice! maybe you can put this dataset in some repository like github, kaggle or hugging face, if you are not doing anything with it. Can be helpful to train models.

Show HN: Historical Options Chain Data for 100 US Equities (2008–2025)

https://github.com/philippdubach/options-data
1•7777777phil•1m ago•0 comments

Notes on Building an Internal Agent

https://lethain.com/agents-series/
1•rognjen•3m ago•0 comments

Cursor UI is built with SolidJS

https://www.reddit.com/r/solidjs/s/inVuYmkIJ6
1•itayadler•3m ago•0 comments

Show HN: Real-Time Website Generator

https://sudo.sdan.io/
1•sdan•5m ago•0 comments

Code a database in 45 steps: test-driven coding puzzles

https://trialofcode.org/database/
1•rohitpaulk•6m ago•0 comments

Joyce Project

https://github.com/alexchunt90/joyce
1•paddw•10m ago•0 comments

Show HN: Automoderated Anonymous Wall of Messages

https://wall.tulv.in/
1•atulvi•13m ago•0 comments

Show HN: A lightweight UI to manage my LXC dev environments

https://github.com/joseprupi/lxc_manager
1•melenaboija•14m ago•1 comments

Grok and the Naked King: The Ultimate Argument Against AI Alignment

https://ibrahimcesar.cloud/blog/grok-and-the-naked-king/
1•ibrahimcesar•17m ago•0 comments

Gaussian Splatting 3 Ways

https://github.com/NullandKale/NullSplats
2•nullandkale•19m ago•0 comments

Mathematically extra-complicated Secretest Santa 2025 [video]

https://www.youtube.com/watch?v=4pG8_bWpmaE
1•yboris•19m ago•0 comments

Simplifying the build process for vst3-rs

https://micahrj.github.io/posts/vst3/
1•glowcoil•25m ago•0 comments

The Epstein Files: Why Half the Internet Is Wrong About Those Redactions

https://pdfa.org/a-case-study-in-pdf-forensics-the-epstein-pdfs/
5•moonshotideas•27m ago•0 comments

"Vibecession" reflects an increasingly difficult economy and society

https://greyenlightenment.com/2025/12/24/vibecession-reflects-increasingly-difficult-economy/
1•paulpauper•27m ago•1 comments

Hookmark 6.12 released with new ways to add and automate bookmarks

https://hookproductivity.com/release-notes/hookmark-6-12/
1•LucCogZest•29m ago•1 comments

My insulin pump controller uses the Linux kernel. It also violates the GPL

https://old.reddit.com/r/linux/comments/1puojsr/the_device_that_controls_my_insulin_pump_uses_the/
23•davisr•29m ago•1 comments

New Testing and Benchmarking Software for Amiga: XSysInfo

https://www.amigalove.com/viewtopic.php?t=2982
2•erickhill•32m ago•0 comments

Debian network packet multicast whenever USB storage device is connected (2023)

https://askubuntu.com/questions/1456506/multicast-packets-sent-out-every-time-a-usb-device-is-con...
3•transpute•33m ago•1 comments

Circular Causality: A Short History (With Receipts)

https://medium.com/@maddyjean/circular-causality-a-short-history-with-receipts-87454402f987
1•asplake•34m ago•0 comments

DIY E-Reader Folds Open Like a Book

https://hackaday.com/2025/12/24/diy-e-reader-folds-open-like-a-book/
3•axiomdata316•34m ago•0 comments

Show HN: Chaos engineering for LLMs – Making models cross-examine each other

https://www.usecouncil.app/
1•jonnyhere•35m ago•0 comments

Our king, our priest, our feudal lord – AI is taking us back to the dark ages

https://www.theguardian.com/commentisfree/2025/dec/26/ai-dark-ages-enlightenment
3•binning•35m ago•0 comments

Daphne Oram, a visionary pioneer in electronic music

https://theconversation.com/5-things-to-know-about-daphne-oram-the-visionary-pioneer-in-electroni...
2•binning•39m ago•0 comments

Lewis Carroll Computed Determinants

https://www.johndcook.com/blog/2023/07/10/lewis-carroll-determinants/
24•tzury•39m ago•3 comments

Claude Bootstrap – Opinionated Project Initialization for Claude Code

https://github.com/alinaqi/claude-bootstrap
1•naxmax•40m ago•1 comments

University threatened with legal action after protest at academic's talk

https://www.bbc.co.uk/news/articles/cwyx3y84ln9o
2•binning•41m ago•0 comments

Show HN: AI Directories – Submit your AI tool to 300 directories (2 minutes)

https://300aidirectories.com
1•HansP958•41m ago•0 comments

One-Stop Publication Workbench – Zettlr

https://www.zettlr.com
1•Tomte•42m ago•0 comments

Using the Problem to Solve the Problem

https://marcosvpj.com.br/en/posts/using-the-problem-to-solve-the-problem/
1•marcosvpj•43m ago•0 comments

Windows Recall

https://en.wikipedia.org/wiki/Windows_Recall
1•CGMthrowaway•43m ago•1 comments