frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Starting from scratch: Training a 30M Topological Transformer

https://www.tuned.org.uk/posts/013_the_topological_transformer_training_tauformer
44•tuned•2h ago

Comments

lostmsu•1h ago
Comparison with vanilla of the same size/flops budget?
Lerc•1h ago
I'm not sure if that is the right calculation.

Provided the flops are not prohibitive. Output quality per model bytes might be better. In general people run the largest model they can.

I certainly think trading speed for quality at the same size is worth looking at. Especially if it uses methods that can benefit from the efforts of others to improve speed in general.

That said performance difference at 30M may not be representative of performance difference at 30B

There are probably a lot of really good ideas out there waiting for someone to drop a few million in training to reveal how good they are on large sizes.

lostmsu•1h ago
So no comparison?
keyle•1h ago
Does this make any sense, to anyone?
kannanvijayan•59m ago
I think this is an attempt to try to enrich the locality model in transformers.

One of the weird things you do in transformers is add a position vector which captures the distance between the token being attended to the some other token.

This is obviously not powerful enough to express non-linear relationships - like graph relationships.

This person seems to be experimenting with doing pre-processing of the input token set, to linearly reorder it by some other heuristic that might map more closely to the actual underlying relationship between each token.

liteclient•59m ago
it makes sense architecturally

they replace dot-product attention with topology-based scalar distances derived from a laplacian embedding - that effectively reduces attention scoring to a 1D energy comparison which can save memory and compute

that said, i’d treat the results with a grain of salt give there is no peer review, and benchmarks are only on 30M parameter model so far

reactordev•6m ago
Yup, keyword here is “under the right conditions”.

This may work well for their use case but fail horribly in others without further peer review and testing.

ashirviskas•34m ago
I wonder what if we just crammed more into the "tokens"? I am running an experiment of replacing discrete tokens with embeddings + small byte encoder/decoder. That way you can use embedding space much more efficiently and have it contain much more nuance.

Experiments I want to build on top of it:

1. Adding lsp context to the embeddings - that way the model could _see_ the syntax better, closer to how we use IDEs and would not need to read/grep 25k of lines just to find where something is used. 2. Experiments with different "compression" ratios. Each embedding could encode a different amount of bytes and we would not rely on a huge static token dictionary.

I'm aware that papers exist that explore these ideas, but so far no popular/good open source models employ this. Unless someone can prove me wrong.

geoffbp•27m ago
I dug into this a bit (with AI ofc) and it spat this out. I found it an easy way to visualise and start to understand:

> Standard AI models (like GPT-4) treat data using Global Geometry. They imagine every word as a point floating in a massive, flat, high-dimensional room. To see how two words relate, they draw a straight line between them.

> Local Topology changes the "room" into a landscape (a manifold). Instead of a flat void, the data exists on a curved surface that has hills, valleys, and paths.

xtiansimon•2m ago
What is a "high-dimensional room"? A "room" is by definition three-dimensional in so far as we're using metaphor for description. Then to add this "high-dimensional" modifier does little for me, since the only visualizable high-dimensional cube is a tesseract, which still leaves you at 4-d.

The presented counterpoint to this metaphor has the "room" change into a "landscape". The room is a "flat void" compared to a landscape with "hills, valleys, and paths". None of these landscape features evoke higher dimensionality in my imagination. Certainly not in the way, say, the metaphor of the "coastline" of Great Britain does when discussing the unusual properties of a fractal.

These moves don't shift my railroad mind from one track onto another. So I wonder, if a metaphoric usage is not in some way universal, how can it be instructive?

Starting from scratch: Training a 30M Topological Transformer

https://www.tuned.org.uk/posts/013_the_topological_transformer_training_tauformer
44•tuned•2h ago•11 comments

What is Plan 9?

https://fqa.9front.org/fqa0.html#0.1
40•AlexeyBrin•44m ago•4 comments

The guide to real-world EV battery health

https://www.geotab.com/blog/ev-battery-health/
22•giuliomagnifico•1h ago•9 comments

Command-line Tools can be 235x Faster than your Hadoop Cluster (2014)

https://adamdrake.com/command-line-tools-can-be-235x-faster-than-your-hadoop-cluster.html
84•tosh•5h ago•49 comments

ThinkNext Design

https://thinknextdesign.com/home.html
146•__patchbit__•7h ago•63 comments

Show HN: Figma-use – CLI to control Figma for AI agents

https://github.com/dannote/figma-use
14•dannote•8h ago•5 comments

Iconify: Library of Open Source Icons

https://icon-sets.iconify.design/
354•sea-gold•7h ago•37 comments

Erdos 281 solved with ChatGPT 5.2 Pro

https://twitter.com/neelsomani/status/2012695714187325745
228•nl•10h ago•184 comments

Milk-V Titan: A $329 8-Core 64-bit RISC-V mini-ITX board with PCIe Gen4x16

https://www.cnx-software.com/2026/01/12/milk-v-titan-a-329-octa-core-64-bit-risc-v-mini-itx-mothe...
36•fork-bomber•6d ago•15 comments

Profession by Isaac Asimov

https://www.abelard.org/asimov.php
120•bkudria•11h ago•21 comments

Keystone (YC S25) Is Hiring

1•pablo24602•2h ago

ASCII characters are not pixels: a deep dive into ASCII rendering

https://alexharri.com/blog/ascii-rendering
1063•alexharri•1d ago•123 comments

A free and open-source rootkit for Linux

https://lwn.net/SubscriberLink/1053099/19c2e8180aeb0438/
23•jwilk•4h ago•4 comments

jQuery 4

https://blog.jquery.com/2026/01/17/jquery-4-0-0/
432•OuterVale•9h ago•134 comments

Show HN: GibRAM an in-memory ephemeral GraphRAG runtime for retrieval

https://github.com/gibram-io/gibram
35•ktyptorio•7h ago•4 comments

The recurring dream of replacing developers

https://www.caimito.net/en/blog/2025/12/07/the-recurring-dream-of-replacing-developers.html
519•glimshe•23h ago•402 comments

The longest Greek word

https://en.wikipedia.org/wiki/Lopado%C2%ADtemacho%C2%ADselacho%C2%ADgaleo%C2%ADkranio%C2%ADleipsa...
146•firloop•10h ago•65 comments

Consent-O-Matic

https://github.com/cavi-au/Consent-O-Matic
116•throawayonthe•4h ago•69 comments

We put Claude Code in Rollercoaster Tycoon

https://labs.ramp.com/rct
485•iamwil•5d ago•268 comments

Kip: A programming language based on grammatical cases of Turkish

https://github.com/kip-dili/kip
207•nhatcher•17h ago•62 comments

The grab list: how museums decide what to save in a disaster

https://www.economist.com/1843/2025/11/21/the-grab-list-how-museums-decide-what-to-save-in-a-disa...
30•surprisetalk•4d ago•2 comments

No knives, only cook knives

https://kellykozakandjoshdonald.substack.com/p/no-knives-only-cook-knives
80•firloop•14h ago•37 comments

How London cracked mobile phone coverage on the Underground

https://www.ianvisits.co.uk/articles/how-london-finally-cracked-mobile-phone-coverage-on-the-unde...
101•beardyw•5d ago•94 comments

Play chess via Slack DMs or SMS using an ASCII board

https://github.com/dvelton/dm-chess
17•dustfinger•6d ago•5 comments

When Sysadmins Ruled the Earth (2006)

https://craphound.com/overclocked/Cory_Doctorow_-_Overclocked_-_When_Sysadmins_Ruled_the_Earth.html
22•b112•2h ago•1 comments

Poking holes into bytecode with peephole optimisations

https://xnacly.me/posts/2026/purple-garden-first-optimisations/
5•ibobev•2d ago•0 comments

Raising money fucked me up

https://blog.yakkomajuri.com/blog/raising-money-fucked-me-up
297•yakkomajuri•19h ago•104 comments

If you put Apple icons in reverse it looks like someone getting good at design

https://mastodon.social/@heliographe_studio/115890819509545391
624•lateforwork•14h ago•234 comments

Five Practical Lessons for Serving Models with Triton Inference Server

https://talperry.com/en/posts/genai/triton-inference-server/
17•talolard•4d ago•2 comments

Xous Operating System

https://xous.dev/
148•eustoria•3d ago•57 comments