Understanding Rejection Sampling: From Basics to DeepSeek-R1
When dealing with complex probability distributions, directly sampling from them can be challenging. Rejection sampling is a clever statistical technique that allows us to overcome this hurdle. This method is not only foundational in computational statistics but also a crucial component in advanced models like DeepSeek-R1. In this blog, we’ll start with the basics of rejection sampling, progress to its mathematical framework, and ultimately delve into its application in DeepSeek-R1.
What is Rejection Sampling?
Imagine trying to throw darts at a dartboard with an intricate shape. Instead of directly aiming at the complex outline, you aim at a rectangular board that completely covers it. Each dart that lands within the shape is accepted, while others are rejected. This is the essence of rejection sampling.
In mathematical terms, rejection sampling is a method to generate samples from a complex probability distribution, referred to as the target distribution
- Sample from the Proposal Distribution: Generate a candidate sample,
, from . - Generate a Random Threshold: Draw a uniform random number,
, from ([0, 1]). - Acceptance Probability: Compare
to the ratio , where is a scaling constant ensuring for all .- If
, accept . - Otherwise, reject
and repeat.
- If
This simple algorithm ensures that the accepted samples follow the target distribution, even though they are initially drawn from the proposal distribution.
Key Components of Rejection Sampling
- Target Distribution
: The complex distribution we want to sample from. - Proposal Distribution
: A simpler distribution that approximates the target. - Scaling Factor
: A constant such that covers everywhere. - Acceptance Probability: Determines whether a sample is kept based on how well it aligns with the target distribution.
Why Rejection Sampling Works
The magic lies in the acceptance probability. Samples where the target distribution is high relative to the proposal distribution are more likely to be accepted, creating a set of samples that mimic the target distribution.
Efficiency Considerations
The efficiency of rejection sampling depends on how closely the proposal distribution matches the target distribution. A poor choice of
Bridging to DeepSeek-R1
Rejection sampling plays a pivotal role in the DeepSeek-R1 model, a cutting-edge framework designed for optimizing response generation in language models. Before diving into its specifics, let’s revisit the basics in the context of machine learning.
In DeepSeek-R1, the goal is to align the generated responses with a reward-maximizing optimal policy. Here’s how rejection sampling is adapted to achieve this:
- Target Distribution
: Represents the optimal policy derived from a reward function , which scores how well responses align with desired outcomes. - Proposal Distribution
: Represents a simpler, supervised fine-tuned (SFT) policy that serves as the starting point.
The Rejection Sampling Pipeline in DeepSeek-R1
The process starts with generating response candidates
Mathematical Steps
- Sampling: Generate response candidates from
. - Reward Scoring: Use a reward function to assign scores to each candidate.
- Acceptance Probability: Compute the ratio
, where ensures the proposal distribution sufficiently covers the target. - Acceptance Decision: Accept candidates with a probability proportional to their alignment with the target policy.
Algorithmic Enhancements in DeepSeek-R1
DeepSeek-R1 implements a refined version of rejection sampling to improve efficiency and applicability:
- Pairwise Reward-Ranking Model: Instead of directly calculating
, a ranking model evaluates pairs of responses to derive relative preferences. - Iterative Sampling: Samples are drawn iteratively, and accepted candidates are excluded from subsequent iterations, enhancing diversity.
- Hyperparameter Tuning
: A parameter controlling the trade-off between exploiting high-reward samples and exploring diverse responses.
Theoretical Foundation
The method’s statistical correctness is ensured by the expected acceptance rate:
where
Why It Matters
Rejection sampling in DeepSeek-R1 enhances the model’s ability to generate responses that are not only accurate but also align with human preferences and rewards. By iteratively refining response selection, the method ensures higher-quality outputs compared to simpler sampling methods.
Final Thoughts
Rejection sampling is a deceptively simple yet powerful tool in the arsenal of computational statistics and machine learning. From sampling complex distributions to optimizing response generation in models like DeepSeek-R1, its applications are vast and impactful. Whether you’re a beginner trying to grasp the basics or a technical expert delving into advanced implementations, understanding rejection sampling is a valuable skill.
FAQs
-
Can rejection sampling be used for any distribution? Yes, as long as you have a proposal distribution that adequately covers the target distribution.
-
What are the limitations of rejection sampling? Its efficiency heavily depends on the choice of proposal distribution and scaling factor.
-
How does DeepSeek-R1 improve upon standard rejection sampling? By introducing iterative sampling, reward ranking, and a tunable hyperparameter
, it tailors the method for language model optimization.
References:
- Liu, T., Zhao, Y., Joshi, R., Khalman, M., Saleh, M., Liu, P. J., & Liu, J. (2023). Statistical Rejection Sampling Improves Preference Optimization. ArXiv
- Guo, D., Yang, D., Zhang, H., Song, J., Zhang, R., Xu, R., Zhu, Q., Ma, S., Wang, P., Bi, X., Zhang, X., Yu, X., Wu, Y., Wu, Z. F., Gou, Z., Shao, Z., Li, Z., Gao, Z., Liu, A., . . . Zhang, Z. (2025). DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning. ArXiv