medical schoolboth

The Science Behind Anki Flashcards: What Peer-Reviewed Research Actually Shows About Spaced Repetition and Exam Scores

This article examines the peer-reviewed evidence behind Anki flashcards, covering how spaced repetition and active recall translate into measurable exam score improvements for medical and graduate students, and what the research reveals about Anki's limitations for conceptual learning.

Deck Sources

AnkiWeb shared decks

Introduction: Why the Research on Anki Matters

Walk into any medical school library in the United States, and you will see a familiar pattern: students hunched over laptops, pressing the spacebar, cycling through digital flashcards. The tool behind this ritual is Anki, an open-source flashcard application that has become a de facto standard in high-stakes exam preparation. But the question that serious students and educators ask is not whether Anki is popular — it is whether the tool actually delivers measurable academic results.

This article examines the peer-reviewed evidence behind Anki with a narrow focus: named studies with published sample sizes, effect sizes, and statistical significance. We will trace the cognitive science foundations that make spaced repetition and active recall effective, then walk through the specific research that links Anki usage to exam score improvements. The goal is not to sell you on Anki — it is to give you an honest, evidence-based assessment of what the research actually supports and where the gaps remain.

The Cognitive Foundations: Forgetting Curve and Testing Effect

Anki does not invent new learning principles — it operationalizes two well-established phenomena from cognitive psychology: the forgetting curve and the testing effect.

Hermann Ebbinghaus first described the forgetting curve in 1885, demonstrating that memory decays exponentially over time unless it is actively reinforced. His experiments showed that within 24 hours of initial learning, roughly 50–60% of new information is lost. The critical insight was not that forgetting happens — it was that the rate of forgetting slows dramatically after each successful recall attempt. This is the theoretical justification for spaced repetition: review material just before you are about to forget it, and each review strengthens the memory trace more efficiently than the last.

A timeline illustration showing expanding circular nodes along a curved path with a fading curve below each node, representing the forgetting curve being reset by spaced review sessions. — The forgetting curve resets with each spaced review session, and the rate of forgetting slows over time.

The testing effect, rigorously documented by Roediger and Karpicke in 2006, adds the second pillar. Their research demonstrated that the act of retrieving information from memory — taking a test or answering a question — produces significantly better long-term retention than re-reading the same material. In one experiment, students who took a recall test after studying retained roughly 50% more information after one week compared to students who simply re-studied the material. This effect holds across multiple subjects and testing formats.

For readers interested in the practical side of card design — how to write questions that maximize the testing effect — our guide to effective flashcards covers writing rules and common mistakes. This article, by contrast, focuses on the research evidence connecting these principles specifically to Anki.

How Anki Operationalizes Spaced Repetition and Active Recall

Anki translates the forgetting curve and testing effect into a practical review system through its scheduling algorithm. When you review a card, you rate your recall on a scale from "again" (complete failure) to "easy" (instant recall). Based on your rating, Anki calculates when to show you that card next. Cards you struggle with reappear within minutes; cards you recall easily may not appear for days or weeks.

Until version 23.10, Anki used the SM-2 algorithm, originally developed for the SuperMemo program in the late 1980s. SM-2 is a rule-based system: it applies fixed multipliers to intervals based on your rating. It works well, but it treats every user and every card the same way — it has no mechanism to learn your individual memory patterns.

That changed with the introduction of FSRS — the Free Spaced Repetition Scheduler. FSRS uses the Three Component Model of Memory, which characterizes each card by three parameters: Retrievability (the probability you will recall it today), Stability (how long the memory lasts after a successful review), and Difficulty (how inherently hard the card is). FSRS applies machine learning to analyze your review history and personalize these parameters for every card in your collection.

The practical consequence is significant: FSRS can reduce your daily review load by 20–30% while maintaining the same retention rate, or improve retention with the same time investment. For a medical student reviewing 300 cards per day, that translates to roughly an hour saved each week.

Gilbert et al. (2023): Anki Users Score 6–11% Higher on Medical School Exams

The most directly relevant peer-reviewed study on Anki's academic impact was published in 2023 by Gilbert and colleagues at Wright State University Boonshoft School of Medicine. The study followed 130 first-year medical students across four NBME-style exams: three course-specific exams and the Comprehensive Basic Science Examination (CBSE).

Of the 130 students, 78 (60.0%) reported using Anki for at least one exam, while 52 did not use Anki at all. After controlling for MCAT percentiles — a critical adjustment, since higher-performing students might be more likely to use Anki — the results showed statistically significant score differences across all four exams.

Score differences between Anki users and non-users after controlling for MCAT percentiles (Gilbert et al., 2023).
Exam	Score Difference (Anki Users vs. Non-Users)	p-value
Course I	6.4% higher	<0.001
Course II	6.2% higher	0.002
Course III	7.0% higher	0.002
CBSE	10.7% higher	0.011

The 10.7% difference on the CBSE — a cumulative exam covering the entire first-year curriculum — is particularly noteworthy. It suggests that the benefit of spaced repetition compounds over time, producing larger effects on comprehensive assessments than on individual course exams.

The study also reported that 70% of first-year US medical students in this cohort used Anki, confirming the tool's near-universal adoption in medical education. High dependency on Anki — defined as using it for all four exams — was a significant predictor of scores on Course I and the CBSE.

Deng et al. (2015) and Wothe et al. (2023): Anki Use and USMLE Step 1 Performance

Two additional studies extend the evidence base from course-level exams to the USMLE Step 1, the high-stakes licensing exam for medical students in the United States.

Deng et al. (2015), published in Perspectives on Medical Education, examined the relationship between student-directed retrieval practice using Anki and USMLE Step 1 performance. The study found that each additional 1,500 Anki cards a student reviewed was associated with approximately a 1-point increase on the Step 1 exam. This dose-response relationship — more cards, higher scores — strengthens the case that Anki usage itself, not just the characteristics of students who use it, drives the improvement.

Wothe et al. (2023), published in the Journal of Medical Education and Curricular Development, surveyed 165 medical students about their Anki usage, Step 1 scores, and sleep quality. The results showed that daily Anki use correlated with higher Step 1 scores. Notably, the study also found that students who used Anki daily reported better sleep quality during their dedicated Step 1 preparation period — a finding that challenges the assumption that more study time necessarily means less sleep.

Summary of Anki-related USMLE Step 1 studies.
Study	Sample Size	Key Finding
Deng et al. (2015)	Not specified in available sources	Each additional 1,500 Anki cards associated with ~1-point USMLE Step 1 increase
Wothe et al. (2023)	165 students	Daily Anki use correlated with higher Step 1 scores and improved sleep quality

The FSRS Algorithm: How Anki's Scheduling Compares to State-of-the-Art Research

The FSRS algorithm represents a significant departure from Anki's legacy SM-2 scheduler. Where SM-2 applies fixed interval multipliers — a card rated "good" gets multiplied by 2.5, regardless of who you are or what you are studying — FSRS builds a personalized model of your memory.

FSRS works by tracking three parameters for each card: Retrievability (R), Stability (S), and Difficulty (D). Retrievability is the probability that you will recall the card today. Stability is the amount of time the memory will last after a successful review. Difficulty captures how inherently hard the card is — a card about the Krebs cycle might have high difficulty, while a card about a simple vocabulary word might have low difficulty.

After each review, FSRS updates these parameters using machine learning. Over time, it learns that you, for example, struggle with biochemistry cards but breeze through pharmacology. It adjusts intervals accordingly — showing biochemistry cards more frequently and pharmacology cards less frequently — while maintaining your target retention rate.

The key benchmark for FSRS is SuperMemo SM-17, the proprietary algorithm developed by Piotr Wozniak. SM-17 has been the gold standard in spaced repetition research for years, but it is not available for use outside SuperMemo. According to Anki's official FAQ, preliminary tests indicate that FSRS is roughly on par with SM-17 in terms of scheduling efficiency and retention accuracy.

Limitations of the Research: What the Studies Don't Tell You

The evidence supporting Anki is real, but it comes with important caveats that any evidence-minded student should understand before building their entire study system around the tool.

Cohort design, not randomized. The Gilbert et al. study is the strongest direct evidence, but it is a cohort-control study at a single institution. Students self-selected into Anki use, and despite MCAT controls, unmeasured confounders — motivation, prior study habits, access to shared decks — could influence the results. A randomized controlled trial would provide stronger causal evidence, but none has been published.
Weaker effect for conceptual courses. The Gilbert study noted that Anki's benefit was smaller for courses requiring conceptual application versus memorization. This aligns with the cognitive science: spaced repetition is optimized for declarative memory (facts, definitions, associations), not for procedural or conceptual understanding. Anki will help you memorize the steps of glycolysis, but it will not teach you to reason through a novel biochemistry problem.
Sample bias. Students who use Anki are likely more organized, more motivated, and more engaged with their studies than students who do not. These traits independently predict higher exam scores. The studies attempt to control for this, but no observational study can fully eliminate selection bias.
Not a replacement for understanding. Anki is a memorization tool, not a comprehension tool. Students who rely exclusively on Anki without engaging in deeper learning activities — problem-solving, discussion, application — may develop a false sense of mastery. The research supports Anki as a supplement to, not a substitute for, active learning.

For students weighing how to create their cards, the debate between AI-generated and handmade flashcards is directly relevant. Our comparison of AI-generated vs. handmade flashcards examines the research on each approach, which matters because the studies discussed here primarily rely on hand-crafted or shared decks — not AI-generated cards.

Practical Takeaways for Evidence-Minded Students

After reviewing the peer-reviewed evidence, here is what the research actually supports — and what it does not.

Anki is effective for memorization-heavy content. Medical students using Anki scored 6–11% higher on NBME exams. The effect is strongest for courses that require recall of facts, definitions, and associations — anatomy, pharmacology, microbiology. For conceptually demanding subjects, Anki should be paired with problem-solving practice.
The dose-response relationship matters. Deng et al. found that each additional 1,500 cards correlated with a ~1-point Step 1 increase. Consistency — daily review, not cramming — is the mechanism that produces results.
FSRS improves efficiency. Upgrading to the FSRS algorithm can reduce review time by 20–30% while maintaining the same retention rate. For students reviewing hundreds of cards daily, this is a meaningful time savings.
Anki is not a complete study system. The research supports Anki as a memorization tool, not a replacement for understanding. Students who combine Anki with active learning strategies — practice questions, group discussion, teaching others — consistently outperform those who rely on flashcards alone.

For students considering alternatives to Anki, our RemNote vs. Anki comparison examines how modern tools address different workflow needs. The research evidence presented here supports Anki specifically, but other tools may offer advantages in note-taking integration, AI features, or collaborative study.

Related Resources

How to Make Effective Flashcards: Writing Rules, Review Systems, and Common Mistakes to Avoid →
Most students make flashcards but don't see results because they skip two essential steps: writing cards that force genuine retrieval and reviewing with a system built on active recall and spaced practice. This guide covers the science-backed rules for both — with annotated card examples, format comparisons, and review methods that actually build lasting memory.
How to Make Mandarin Flashcards That Actually Stick: Card Design, Pacing, and the Input Loop →
Most Mandarin learners struggle with flashcards not because they lack discipline, but because their card design is wrong. This guide prescribes a complete system for Chinese-specific card types, research-backed daily pacing, leech card triage, and the input loop that connects flashcard study to real-world fluency.
The Ultimate Guide to Truly Free Flashcard Apps in 2026: What You Actually Get Without Paying →
Not all free flashcard apps are equal. This guide compares 8-10 apps using a consistent 'truly free' framework — evaluating spaced repetition, card limits, ads, offline access, and sign-up requirements — to help budget-conscious students find the best free option for their study needs.

spaced repetitionactive recallAnkiMCATmedical school

Comments

Join the discussion with an anonymous comment.

Loading comments...