Tech News
← Back to articles

Speculative Sampling Explained

read original related products more articles

Speculative Sampling

The idea of speculative sampling is to use a draft sampling to achieve the same sampling result as the target sampling.

We have a target sampling distribution $p(x)$ and a draft sampling distribution $q(x)$. For each token $x_i$, we have two probabilities: $p(x_i)$ and $q(x_i)$, and we have:

either $p(x_i) > q(x_i)$,

or $p(x_i) \leq q(x_i)$,

If we directly sample from $q(x)$, we will get a sample $x$ that is not from the target distribution $p(x)$, and it is

over-sampled if $q(x_i) > p(x_i)$,

under-sampled if $q(x_i) < p(x_i)$.

The core trick of speculative sampling is to design a smart rejection method to down-sample the over-sampled tokens and up-sample the under-sampled tokens. This way we can achieve the same sampling result as the target sampling.

How to down-sample the over-sampled tokens?

... continue reading