
The Algorithm Divide: Why FSRS Is Making SM-2 Obsolete and What It Means for Choosing a Flashcard App
Most flashcard app comparisons ignore the single most important technical differentiator: the spaced repetition algorithm. This guide explains how FSRS, SM-2, confidence-based, and Leitner systems work, why FSRS can reduce daily reviews by 20-30%, and how to choose an app based on its underlying SRS engine.
Updated:
Why the Algorithm Matters More Than UI or Price
When students compare flashcard apps, the conversation usually orbits around interface polish, platform availability, or monthly subscription fees. These are surface-level concerns. Beneath every flashcard app lies a scheduling engine — the spaced repetition algorithm — that determines when you see each card again. That engine is the single largest variable in how efficiently you learn, yet most comparison guides treat it as an invisible implementation detail.
The gap between the best and worst algorithms is not marginal. Simulation data from the open-spaced-repetition project suggests that switching from the decades-old SM-2 algorithm to the newer FSRS can reduce daily review volume by 20–30% while holding retention constant. For a medical student reviewing 500 cards per day, that translates to 100–150 fewer reviews — roughly 30–45 minutes of reclaimed time. Over a semester, the cumulative difference is measured in days, not minutes.
This article is written for students who already know that spaced repetition works and want to make a technically informed choice about which implementation to trust with their study time. We will examine the four major algorithmic families — SM-2, FSRS, confidence-based systems, and Leitner boxes — map them to the apps that use them, and give you a framework for evaluating any app's SRS quality on your own.

How SM-2 Works: The 1987 Standard and Its 'Ease Hell' Problem
SM-2 was created in 1987 by Piotr Woźniak as part of a Turbo Pascal program. It became the default scheduling engine in Anki for 17 years and remains the algorithmic foundation for countless other flashcard applications. Its design is elegantly simple: each card carries a single ease factor — a multiplier that determines how much the next interval grows after a successful recall.
When you review a card and press "Good," SM-2 multiplies the current interval by the ease factor. If you press "Again" (failed recall), the card returns to a one-minute interval and the ease factor drops by 20 percentage points. The problem emerges after repeated failures. Once the ease factor hits Anki's floor of 130%, every subsequent success produces only a 1.3× interval increase. A card that was on a 10-day interval before a failure might climb back to only 13 days, then 17 days, then 22 days — a slow, grinding recovery that feels like the card is stuck in a low-interval rut. Users call this "ease hell."
SM-2 also treats all cards as equally difficult once their ease factor stabilizes. A card that you find trivially easy and a card that you barely remember are scheduled using the same multiplier. The algorithm has no mechanism to distinguish between a card that needs a 60-day interval and one that needs a 6-day interval — it only knows that both were answered correctly.

How FSRS Works: A Three-Variable Memory Model Trained on 700M+ Reviews
FSRS — the Free Spaced Repetition Scheduler — was created in 2022 by Jarrett Ye and integrated natively into Anki with version 23.10 in November 2023. Rather than tracking a single ease factor per card, FSRS models memory using three distinct variables:
- Difficulty (D): A value on a 1–10 scale that represents how inherently hard a card is to remember. Crucially, difficulty exhibits mean reversion — after a series of successes, a difficult card's difficulty drifts upward toward the average, and after failures, an easy card's difficulty drifts downward. This prevents the ease-hell trap.
- Stability (S): The estimated time in days for the probability of recall to drop from 100% to 90%. This is the algorithm's prediction of how long a card will stay in memory before it needs review.
- Retrievability (R): A continuously decaying probability — not a binary remembered/forgotten state — that represents the likelihood you will recall the card at any given moment. FSRS uses a power-law decay function rather than the exponential decay assumed by SM-2.
These three variables interact dynamically. When you press "Again" on a card, FSRS increases the card's difficulty (with mean reversion) and sharply reduces its stability. When you press "Easy," it decreases difficulty and increases stability. The algorithm was trained on a dataset of over 700 million reviews from more than 10,000 Anki users, giving it a statistical foundation that SM-2 — designed by hand in the 1980s — never had.
The practical consequence is that FSRS can schedule cards with far more granularity. A card that you consistently find easy will rapidly reach intervals of months or years, while a card that you struggle with will stabilize at shorter intervals without getting stuck in ease hell. The algorithm adapts to your personal forgetting curve rather than applying a one-size-fits-all multiplier.
Benchmark Results: FSRS Beats SM-2 for 99.6% of Users
The open-spaced-repetition benchmark is the most comprehensive public comparison of SRS algorithms available. It evaluated 9,999 Anki collections containing approximately 350 million filtered reviews. The metric used is mean log loss — a measure of how accurately the algorithm predicts whether you will recall a card on any given day. Lower log loss means better prediction.
| Algorithm | Mean Log Loss | Users Outperformed |
|---|---|---|
| FSRS-6 (per-user optimization) | 0.344 | 99.6% |
| SM-2 (default Anki) | Higher than 0.344 | Baseline |
| FSRS-5 (previous version) | ~0.35 | ~99% |
FSRS-6 with per-user parameter optimization achieves a mean log loss of 0.344, outperforming SM-2 for 99.6% of users. The remaining 0.4% of users — those whose review patterns happen to align with SM-2's assumptions — see negligible difference. For the vast majority, FSRS provides a measurably better prediction of when cards will be forgotten.
The 20–30% reduction in daily reviews is a simulation-based projection derived from these benchmark results. The logic is straightforward: if FSRS can predict forgetting more accurately, it can schedule reviews closer to the moment of forgetting without risking a drop in retention. Fewer early reviews mean fewer total reviews over time. This is not a controlled experiment with human subjects — it is an algorithmic simulation — but the magnitude of the projected improvement is consistent across multiple independent analyses.
Which Apps Use Which Algorithm?
The algorithm an app uses determines the ceiling of its scheduling intelligence. Here is how the major flashcard apps map to the four algorithmic families:
| App | Algorithm | Notes |
|---|---|---|
| Anki | FSRS + SM-2 (fallback) | FSRS native since Anki 23.10 (Nov 2023). Users can switch between FSRS and SM-2. Per-user parameter optimization available via the FSRS helper add-on. |
| RemNote | FSRS + SM-2 (fallback) | Added FSRS support in 2024. Similar dual-algorithm architecture to Anki. |
| Brainscape | Proprietary confidence-based | Users rate confidence on a 1–5 scale per card. The algorithm adjusts intervals based on confidence, not binary recall. No public benchmark data. |
| Quizlet | Basic Leitner-like | Uses a simplified box system with fixed intervals. No per-card difficulty adjustment. Suitable for casual review, not optimized for long-term retention. |
| Mindomax | Adaptive Leitner | Modifies Leitner intervals at the deck or subject level rather than per card. An incremental improvement over fixed-box systems but still lacks per-card difficulty modeling. |
| SuperMemo | SM-19 / SM-20 (proprietary) | Woźniak's current algorithms. Not available in any third-party app. Requires SuperMemo software. No public benchmark comparison with FSRS is available. |
The critical distinction is between apps that model per-card difficulty (FSRS, SM-2, Brainscape's confidence system) and apps that treat all cards in a given box or deck identically (Leitner, basic Quizlet). Per-card modeling is the prerequisite for efficient scheduling. Without it, easy cards are over-reviewed and hard cards are under-reviewed.
Practical Impact: What 20–30% Fewer Reviews Means for a Med Student
Approximately 70% of first-year medical students use Anki, according to a 2022 study at the UCF College of Medicine. A typical pre-clinical med student maintains a review load of 400–600 cards per day during dedicated study periods. At 500 cards per day, a 25% reduction saves 125 reviews — roughly 30–40 minutes depending on card complexity.
The evidence that spaced repetition itself works is robust. A 2025 study in Academic Medicine involving over 26,000 physicians found that spaced repetition groups retained knowledge at a rate of 58% compared to 43% in control groups — a 15 percentage point advantage. A 2025 meta-analysis in The Clinical Teacher covering more than 21,000 learners reported a large effect size of d = 0.78 for long-term retention. Dunlosky et al. (2013) rated both distributed practice and practice testing as the only two "high utility" learning techniques out of ten studied, giving them the strongest evidence rating available.
The question is not whether spaced repetition works — that has been settled for over a decade. The question is which implementation of spaced repetition extracts the most retention per unit of study time. FSRS's advantage over SM-2 is that it achieves the same or better retention with fewer reviews. For a medical student who will spend thousands of hours reviewing flashcards across four years of medical school, the cumulative time savings are substantial.
Decision Guide: Choosing an App by Algorithm Preference
Your choice of algorithm should match your willingness to configure settings and your tolerance for suboptimal scheduling. Here is a structured framework:
- Choose FSRS (Anki or RemNote) if: You are willing to spend 15–30 minutes on initial setup — enabling FSRS, running the optimizer on your review history, and setting a desired retention target. You want the most efficient scheduling available in a mainstream app. You are comfortable with a steeper learning curve in exchange for long-term time savings.
- Choose Brainscape if: You prefer a confidence-based rating system (1–5 scale) over binary pass/fail. You find the act of rating your confidence to be metacognitively useful. You are willing to pay a subscription for a polished interface and do not need per-card difficulty optimization backed by public benchmarks.
- Choose a Leitner-based app (Quizlet, basic flashcard apps) if: You only need casual review for low-stakes material. You do not want to configure any settings. You are not optimizing for long-term retention efficiency. Be aware that fixed-box systems over-review easy cards and under-review hard cards.
- Choose SuperMemo if: You want Woźniak's latest algorithms (SM-19/SM-20) and are willing to use a standalone application with a smaller community. Note that no independent benchmark comparing SM-19/20 against FSRS is publicly available as of Q2 2026.
For most power users, the recommendation is straightforward: use Anki or RemNote with FSRS enabled. The benchmark data is clear, the algorithm is free and open-source, and the time investment for setup is recouped within the first few weeks of reduced review load.

What to Look for in a Flashcard App's SRS Quality
Not all implementations of the same algorithm are equal. An app that claims to use "spaced repetition" may implement it poorly. Here is a checklist for evaluating any flashcard app's SRS quality:
- Does it model per-card difficulty? If the app uses fixed intervals for all cards in a deck, it cannot optimize for individual card difficulty. This is the single most important question.
- Can you set a desired retention target? FSRS allows you to target a specific retention percentage (e.g., 90%). The algorithm then schedules reviews to hit that target. Apps without this feature use a fixed schedule that may over- or under-review.
- Does it support per-user parameter optimization? FSRS can optimize its parameters based on your personal review history. Apps that use default parameters for all users cannot adapt to your forgetting curve.
- Is the algorithm documented and benchmarked? Public benchmark data (like the open-spaced-repetition project) allows you to evaluate claims. Proprietary algorithms with no published benchmarks should be treated with skepticism.
- Does it handle failed cards intelligently? Watch for ease-hell-like behavior. If failing a card causes it to get stuck in a low-interval cycle, the algorithm lacks mean reversion or a similar mechanism.
The algorithm divide is not a niche technical concern — it is the difference between spending 30 minutes per day on reviews versus 45 minutes for the same retention. Over a year of consistent study, that gap adds up to over 90 hours. For students preparing for high-stakes exams where every hour counts, choosing the right algorithm is one of the highest-leverage decisions they can make.
Comments
Join the discussion with an anonymous comment.