Domain Adaptation of Base Models + ShadowdarkQA Bench
Published on: 2025-06-15 15:59:17
Investigating the effects of continued pre-training for learning precise mechanical rules of TTRPGs.
There’s a fast way and a slow way to go about developing an autonomous LLM Game Master. The fast way - the agentic way - would probably be to build some MCP servers to act as a harness, providing access to maps and rulesets, plugging the results into a frontier model and seeing what happened.
I was intending to go that way, developing some environments I could do GRPO on immediately. The thought was, provide an MCP server that allowed rule search, and then come up with some combat scenarios that had verifiable outcomes. Those would work both as eval for frontier models and external rewards for any model being trained, and would certainly go a lot further a lot faster to getting to the final desired result.
However, that would only make sense if my goal was primarily or exclusively to get to the end product: an LLM that can run TTRPGs. It’s not. The goal is to get a better understandi
... Read full article.