Fmllm: 4mb training data, 100mb model, Fibonacci embeddings, near-coherent. WTF?

https://github.com/henrygabriels/FMLLM/blob/main/README.md

24•gabriel666smith•2h ago

Comments

gabriel666smith•2h ago

I found some weird results when messing around with different embeddings in text generation.

I'm not sure if this is meaningful, and - if anyone on here is interested - I could use some help figuring out what's going on.

I was invited to repost this by the mods' (thank you!) second-chance system.

In the meantime, I've added to the repo a small study I did. This study seems to initially indicate that appending Fib words generated by the model the model to a prompt quite drastically improves LLMs output on creative writing tasks.

Again, I'd love to know if anyone could take this thing further.

vessenes•1h ago

I understand you’re worried about publishing your code. I’d be happy to help do something with this at a larger scale, but I think I could use a litttle more detail. Are you saying the training task is to ask for the (fib_i)th token rather than the next token?

If that’s all you did, then I think you’ll probably benefit more from just publishing the code then holding it back. Check out for instance lucidrains (Phil Wang) repository on GitHub to see the speed at which a full academic paper is turned into a python codebase for replication.

Anyway, I suggest you add a little code snippet illustrating the key point, or just confirm on my q. I think it would be fun to train a larger model!

gabriel666smith•55m ago

Thank you, I agree, I think it'd be helpful to publish aspects of it.

> Are you saying the training task is to ask for the (fib_i)th token rather than the next token?

Yes, functionally - I explained in more detail in another comment.

I'm not sure which is the key point (sort of what I'm trying to work out), but I'll get the model-generation code into the repo. Is that the best thing for you?

jstanley•2h ago

The repo doesn't contain any code, and the description isn't clear enough for me to work out what you're doing, and the generated sentences don't seem like they mean anything.

Where are you going with this?

h3ctic•1h ago

The fibonacci intervals should be elaborated more imho. The text starts too deep and needs a lighter introduction.

This makes it confusing what it is actually about from a technical/AI researcher perspective

ActionHank•1h ago

This is just star signs but with extra steps.

Looking for meaning where there is none.

gabriel666smith•1h ago

> Where are you going with this?

I don't know!

Adding code now.

Is this more clear, for at least the initial 'word set' generation? I can add to repo if so:

Concept:

The system predicts and generates words based on Fibonacci distances - instead of looking at the next word or previous word, it looks at words that are 2, 3, 5, 8, 13, 21, etc. positions away (following the Fibonacci sequence).

Key Components

1. Training Phase

Takes a text file and extracts all words Builds two prediction models:

Forward model: "If I see word X at position N, what word appears at position N+2, N+3, N+5, N+8, etc.?" Backward model: "If I see word X at position N, what word appeared at position N-2, N-3, N-5, N-8, etc.?"

2. Generation Phase

Starts with seed words (user input) For each seed word, predicts what should come before and after using Fibonacci distances

Uses bidirectional validation: a word is only chosen if it's probable in BOTH forward and backward directions

This attempts to create a more coherent, contextually consistent text.

Then runs multiple passes where generated words become new starting points for further generation, creating richer, more developed text. The words with strongest association values = the final generation set of available words.

jstanley•31m ago

When you say you build "prediction models" - what exactly is that? Is it just a list of possibilities for each word at each position?

So for example forward[the][2] is a list of words that can come 2 places after "the"? Either with duplicates for more likely words, or with probabilities that you can sample, like a Markov model.

Or is the "prediction model" some sort of neural network, or something else?

When you say a word is only chosen if it's probable in both the forward and backward direction, what does that mean?

I still can't see any code in your repo.

gabriel666smith•3m ago

I've just added the model generation code! I hope that's helpful.

1. Yes — that’s exactly right. It counts. If "Hacker" appeared 2 places before "News" multiple times, the counts would reflect that.

Later, when generating, these counts are turned into probabilities by normalising (dividing by the total).

2. So I think this part is a Fibonacci-structured Markov-like model (not a neural network, I don't think).

> When you say a word is only chosen if it's probable in both the forward and backward direction, what does that mean?

This is the key part, potentially.

When generating, the script does this:

Forward model: “Given seed word A, what words appear fib_distance ahead?” → gives P_forward(B | A)

Backward model: “Given candidate word B, what words appear fib_distance behind?” → gives P_backward(A | B)

It then checks both directions.

If word B is predicted forward and backward, it multiplies the probabilities. If a word only shows up in the forward direction but never appears in backward training data (or vice versa), it gets discarded.

It’s a kind of bidirectional filter to avoid predictions that only hold in one direction.

I'm learning a lot of these words as I go, so questions like these are really helpful for me - thanks.

pvtmert•1h ago

it certainly seems interesting, however with 4mb of text, i believe/think you can also create a _legible_ enough markov-chain that will output more/less similar sentences, minus punctuation of course.

sebstefan•1h ago

>Input: "the text you posted is nonsense"

>Output: "the text you posted is nonsense about sentences that syntactically are not lexical"

This looks like just a markov chain

tovej•1h ago

This seems like nonsense to me. The results are clearly incoherent, both the text generation and punctuation.

Seems like the idea is to sample words at fibonacci intervals and use those for generating new text. Obviously these word will be roughly related. If the author is really just Monte Carlo generating words based on these frequency tables then you would obviously get roughly related words as output, but with worse coherence, due to the fibonacci gaps. The author also doesn't seem to have any clue about what is happening or any way of evaluating the results.

There is also no code? Why did this make the front page?

boxed•1h ago

23 points and on the HN frontpage... for something that seems like just nonsense..

fergie•1h ago

I don't understand- why is this on the front page?

jstanleyy•1h ago

Trying to find meaning in noise, a common past time hobby of humans. See also: astrology, bible code, entrails reading, etc

https://en.m.wikipedia.org/wiki/Bible_code

almatia•1h ago

Couldn't this coherence be obtained with Markov chains fed with your texts? I found this: https://medium.com/@preranabora12/text-generating-with-marko...

For creative writing it's true the "most common word based on your personal corpus for this input" could be useful as a hint input on an llm-powered creative text UI like loom: https://github.com/socketteer/loom

lettucegoblin•1h ago

I think the coherence might be coming from the filtering, not the fibonacci intervals. my thinking is that the process looks like this: your fibonacci script finds a big list of candidate words -> your grammar rules arrange them -> sentencetransformers aggressively filters this list and picks the single most semantically coherent sentence. so is it possible that sentencetransformers is doing all the heavy lifting for creating coherence? have you ever looked at the raw, unfiltered candidate words/sentences to see if they have any coherence on their own? on the "french toast" example, could this be a case of language pareidolia(e.g. seeing faces in clouds)? the model selects "piney connubial produit" because its math is the closest to the input -> your brain, being a great pattern matcher, creates a story to connect the words. so is the meaning actually being generated by the fibonacci part, or is it just being found by the filter and then interpreted by you? with the punctuation model, i'm guessing it's just learning simple surface-level patterns, right? like, it learns: text contains "but" -> place comma before "but". how does it handle complex sentences that don't follow the simple patterns in your 4mb of training data? does that break down? the comparison to bert seems off because they're solving the problem in fundamentally different ways. tofigure out if the fibonacci part is actually meaningful, have you thought about running some controls? for example, what happens if you replace the fibonacci intervals with random intervals? does the quality of the final, filtered output get worse? what if you just looked at the raw word lists themselves? is a list generated by your fibonacci model measurably more coherent than a list of words just pulled randmly from the text?

gabriel666smith•1h ago

Thank you for all this! I ran some controls in a separate study more recently - the (tiny!) results seemed interesting to me:

https://github.com/henrygabriels/FMLLM/blob/main/improving_l...

This compares the impact of Fib-interval 'word clouds' appended to prompts for creative writing tasks to randomly-selected 'word clouds' appended, to 'no words appended to prompt'.

Checking semantic coherence between Fib intervals vs random intervals is a super interesting idea, thank you. I'll make this priority 1, because you're right - that changes everything.

And yes, I think the sentence generation is the least interesting part; as other commenters have pointed out, it's basically Markov Chain (a term I'd not heard before) + sentencetransformers. It wasn't working - to generate those sentences - with a large volume of available words, however. Those were capped at 50 words available + stop words. I'm not sure if this changes anything.

I didn't want to oversell the sentence-generation aspect, but it seems I may have. I did state twice in the doc "I do think sometimes the human mind sees what it wants to see, especially with language", but I should have definitely made this more clear!

On 'how does punctuation model handle longer sequences', I depunctuated and repunctuated your comment:

>i think the coherence might be coming from the filtering not the fibonacci intervals. my thinking is that the process looks like this your fibonacci script, finds a big list of candidate words. your grammar rules, arrange them. sentencetransformers aggressively filters this list and picks the single most semantically coherent sentence, so is it possible that sentencetransformers is doing all the heavy lifting for creating coherence have you ever looked at the raw unfiltered candidate wordssentences to see if they have any coherence on their own on the french, toast. example, could this be a case of language pareidoliaeg seeing faces, in clouds. the model selects, piney connubial produit because its math is the closest to the input your brain. being a great pattern matcher creates a story, to connect the words. so is the meaning, actually being generated, by the fibonacci part or is it just being found by the filter and then interpreted by you with the punctuation model im guessing its just learning simple. surfacelevel patterns right like it learns text. contains but place. comma before but how does it handle complex sentences, that dont follow. the simple. patterns in your 4mb of training data. does that break down the comparison, to bert seems off because theyre solving the problem. in fundamentally different ways, tofigure out if the fibonacci part is actually meaningful have you thought about running some controls. for example, what happens if you replace the fibonacci intervals. with random. intervals. does the quality of the final filtered output get worse what if you just looked at the raw word. lists themselves. is a list generated, by your fibonacci model measurably more coherent than a list of words. just pulled randmly from the text.

It doesn't have any punctuation characters more 'complex' than what you see in that output, hence it smashing 'wordssentences' together. I guess a next step = add these characters, see results.

I'm going to come back to your comment, and respond to more of your points, but I also want to reply to some of the others. I really appreciate the extensive input!

Show HN: SFOR – minimal, no-backtracking, typed data format (experimental)

Militant Contentment

Launch HN: Inconvo (YC S23) – AI agents for customer-facing analytics

Mel Brooks: 'Hitler was bad to every Jew in the world, but he was good to me'

They Are Sacrificing the Economy on the Altar of a Machine God

Mediabunny – Complete media toolkit for web

DMCA Used to Take Down GamersNexus Nvidia Smuggling Expose

Arctic sea ice decline has slowed, but why?

Bloom patterns: radially expansive, developable and flat-foldable origami

Pet-Type: Commodore PET Horizontal Shootemup in C

Experiments assess 'artificial' altruism displayed by large language models

How I Implemented Realtime Wiggle Physics on Real N64 Hardware [video]

The Management Skill Nobody Talks About

Kilowatt Madness – Paul Krugman

CLEC: Competitive Local Exchange Carrier

Indico: Event Management Using Python

Why the Hunt for Red October Had the US Navy Running Scared

I made a Tamagotchi that lives in your Claude Code statusline

Gamers Nexus video about GPU smuggling taken down via DMCA

Urban Heat Is Delaying Spring in NYC's Parks

Electrical Grid Mapping

Show HN: Explaining Bundesliga predictions for the start of the season

Show HN: Generate sound for silent AI clips

Convergence in Software

National Archives Recovers and Preserves Rare Pearl Harbor Navy Logbook

Skewing the Overton Window

The Happiness Blueprint: Your 30-Page Guide to Joy and Calm

Scheme Basics

I'm too dumb for Zig's new IO interface

Comparison of Working Fluids for Geothermal Snow Melting with Heat Pipe