Alignment whack-a-mole: Finetuning activates recall of copyrighted books in LLMs

Alignment Whack-a-Mole: Finetuning Activates Verbatim Recall of Copyrighted Books in Large Language Models

The paper is now on arxiv and check out our demo!

This repository contains the data preprocessing pipeline, finetuning scripts, memorization evaluation code, and analysis scripts for our paper.

We provide partial example files in data/ containing a small subset of excerpts and generations from The Road by Cormac McCarthy. Full book content and model generations are not included because the books are copyrighted and the generations contain large portions of verbatim text.

Setup

We use uv for dependency management. Install uv if you haven't already:

curl -LsSf https://astral.sh/uv/install.sh | sh

Create a virtual environment and install all dependencies:

uv venv --python 3.11 source .venv/bin/activate uv pip install html2text natsort ftfy openai tqdm nltk numpy

For Gemini finetuning and generation, also install:

... continue reading