Domain Adaptation of Base Models + ShadowdarkQA Bench

https://gygaxtest.com/posts/continued_pretraining_for-rules/

17•pact_inference•21h ago

Comments

palmfacehn•21h ago

Isn't this a use case for a RAG?

pact_inference•21h ago

definitely! However, my intuition is that correctly interpreting the rules pulled in context will require some basic understanding of the game system that pretraining would help with. Ultimately after training this base model for instruction-tuning and tool-use (to provide a search tool) I'll compare it against https://huggingface.co/Qwen/Qwen3-0.6B without any specific domain pretraining and see how it performs at rule adjudication. I expect the shadowdark-trained model will have better understanding of the rules, but there's only one way to find out.

palmfacehn•20h ago

It is an interesting problem to solve. When reading, I noticed the model's ambiguity around terms like 4d6. At first I thought you might try editing your markup to describe the concept of dice more thoroughly. Ultimately I wonder if you might try having the model fill in data to be utilized by a hard coded combat system. Are you going to rely on the LLM for pseudorandom numbers? Concepts like turns and dice rolls could be abstractly defined in code and instantiated by the model.

The model might excel at creating character sheets, after you define a schema. From there you can validate the generated sheets against known lore. You could combine the story telling from the LLM with the formalized character schema to create campaigns. I'm not an expert here, but I suspect you might try asking the model to translate an existing fantasy story dataset into a series of narration/dialogue blocks and character sheets.

Without training, I've experimented with similar approaches for item generation using EBNF.

pact_inference•20h ago

> Are you going to rely on the LLM for pseudorandom numbers?

Definitely! I'm going to start with instruction tuning it for basic question answering, and then add tools to allow it to search the markdown source to cite answers to rules questions. I think adding some dice tooling for proper character sheet creation would be an awesome task to test as well. I'm actually thinking a lot about what tasks I could try that are "trivially" programmatically verifiable in their correctness for stuff like GRPO, so I'm definitely going to use that idea.

> You could combine the story telling from the LLM with the formalized character schema to create campaigns. I'm not an expert here, but I suspect you might try asking the model to translate an existing fantasy story dataset into a series of narration/dialogue blocks and character sheets.

I think probably late this year I'll be able to work on that sort of thing. There's a really interesting approach to story generation https://arxiv.org/abs/2503.22828 here, but modifying ways to translate it into campaign relevant structured objects and "reward" that will take some experimentation.

jasonjmcghee•20h ago

> I used the AdamW optimizer and selected a learning rate of 5e-5. I’ve seen learning rates of 5e-6 for pretraining and 5e-5 for finetuning. I would consider this closer to the latter - I don’t want to totally destroy the knowledge Qwen already had, I just want to add to it a bit.

Is this a typo? Maybe 5e-4 for pretraining?

Otherwise this goes against all the intuition I have around learning rates and catastrophic forgetting. (a smaller learning rate causing knowledge degredation)

pact_inference•20h ago

whoops, definitely a typo! It should be 5e-4 for as the base "pretraining" LR, you're absolutely correct.

your intuition is sound, but my fingers are not.

The radix 2^51 trick (2017)

Radio Astronomy Software Defined Radio (Rasdr)

Tokenization for language modeling: BPE vs. Unigram Language Modeling (2020)

Bridged Indexes in OrioleDB: architecture, internals and everyday use?

What Happens When AI-Generated Lies Are More Compelling Than the Truth?

Atomics and Concurrency

Turn a Tesla into a mapping vehicle with Mapillary

Practical SDR: Getting started with software-defined radio

Triangle splatting: radiance fields represented by triangles

WeatherStar 4000+: Weather Channel Simulator

FLUX.1 Kontext

Why do we get earworms?

Show HN: MCP Server SDK in Bash (~250 lines, zero runtime)

Printing metal on glass with lasers [video]

OpenBAO (Vault open-source fork) Namespaces

Dr John C. Clark, a scientist who disarmed atomic bombs twice

The atmospheric memory that feeds billions of people: Monsoon rainfall mechanism

Buttplug MCP

Show HN: I wrote a modern Command Line Handbook

Smallest Possible Files

Player Piano Rolls

How to Do Ambitious Research in the Modern Era [video]

Show HN: templUI – The UI Kit for templ (CLI-based, like shadcn/UI)

Superauthenticity: Computer Game Aspect Ratios

Show HN: Donut Browser, a Browser Orchestrator

Making C and Python Talk to Each Other

Why is everybody knitting chickens?

I'm starting a social club to solve the male loneliness epidemic

White House MAHA Report may have garbled science by using AI

Notes on Tunisia