Reduce Pull Request Review Time Without Cutting Corners
The fastest way to reduce pull request review time is to cap PRs at roughly 400 changed lines, set a pickup SLA of four working hours, and automate the first review pass so humans start from a triaged diff instead of a cold one. Teams that do those three things typically cut median turnaround from 2-3 days to under one business day, without approving anything more carelessly. The catch is that most of the delay isn't review at all: it's waiting, and you fix waiting with process, not with faster readers.
Where does PR review time actually go?
Before changing anything, split "review time" into its two components, because they have completely different fixes:
- Pickup time: the gap between "review requested" and a reviewer's first interaction. Industry benchmarks (LinearB's analysis of thousands of teams, for example) put median pickup at well over a day for typical teams. Elite teams keep it under an hour.
- Active review time: the time a reviewer actually spends reading. For a reasonably sized PR this is 15-30 minutes. It is almost never the bottleneck.
If your PRs sit for two days, your problem is queueing, prioritization, and PR size, not slow reviewers. Measure both numbers for your last 50 PRs before you touch anything. Most teams discover that 80% or more of total turnaround is pure wait time, often split across multiple wait periods because the first review round triggers changes that wait again.
That re-review loop is the silent killer. A PR that needs three rounds with a one-day pickup each takes most of a week even if every human involved worked fast.
How small should a pull request be?
Under 400 changed lines, with 200 as the comfortable target. This isn't an arbitrary preference. The SmartBear study of Cisco's reviews found defect detection drops sharply beyond 400 lines per session, and Google's internal research on its review process found the median change there is about 24 lines, with small changes getting reviewed in hours rather than days.
Size affects every stage of the pipeline:
| PR size (lines changed) | Typical pickup behavior | Review quality | Rounds to merge |
|---|---|---|---|
| Under 100 | Reviewed between tasks, often same hour | Line-by-line reading | 1 |
| 100-400 | Scheduled into the day | Thorough with effort | 1-2 |
| 400-1,000 | Procrastinated, often 1-2 days | Skimmed, "LGTM" risk rises | 2-3 |
| Over 1,000 | Avoided until someone is forced | Rubber stamp | Many, or none |
Big PRs are slow twice: nobody wants to start them, and once started, nobody can hold the whole change in their head, so feedback comes in fragments across multiple rounds.
Enforce the cap mechanically, not socially. A CI check or a Danger rule that flags PRs over 400 lines (with an explicit override label for genuinely atomic changes like generated code or large renames) works better than a guideline in a doc. Expect grumbling for two weeks, then expect people to defend the rule when someone tries to remove it.
What does a workable review SLA look like?
A review SLA sets an expectation for pickup, not for approval. The version that holds up in practice:
- First response within 4 working hours of review request. First response can be a full review, a partial pass, or an honest "I can get to this at 2pm."
- Re-reviews within 2 working hours, because the author is now blocked on you specifically and the context is already loaded.
- Reviewer is on the hook until handed off. If you can't review it, you reassign it, you don't let it rot in your queue.
Two implementation details decide whether the SLA survives contact with reality.
First, make the queue visible. A Slack reminder bot that posts each engineer's pending reviews at 10am and 3pm costs nothing and routinely halves pickup time on its own. GitHub's scheduled reminders feature does this out of the box.
Second, give the SLA teeth via norms, not punishment. The norm that works: review requests outrank starting new work. Finishing your current focused block is fine; starting a fresh task while two PRs wait on you is not. Code waiting for review is the most expensive inventory a team holds, because the author's context is evaporating the entire time.
A team of six doing 25 PRs a week will find this costs each engineer about 45-60 minutes of review per day. That's not overhead on the real work. At that volume, it is the real work.
Do stacked PRs actually speed things up?
Yes, and they're the answer to the most common objection to size limits: "my feature can't fit in 400 lines." It can, as a stack.
Stacked PRs split one feature into a chain of dependent PRs: the schema migration, then the data layer, then the API endpoint, then the UI. Each PR targets the branch below it instead of main. Reviewers get four 150-line reviews instead of one 600-line review, and you keep writing PR 3 while PR 1 is in review instead of sitting blocked.
The math is what sells it. One 600-line PR with a one-day pickup and two review rounds takes roughly three days of calendar time, during which the author is context-switching or blocked. A four-PR stack reviewed in parallel, where each small PR gets picked up in a few hours and approved in one round, lands the same feature in about a day, and each individual review was more thorough.
Tooling makes or breaks this. Manually rebasing a four-deep stack after feedback on PR 1 is miserable. Use Graphite, git-machete, ghstack, or spr to automate the restacking. GitHub's native support for retargeting PRs when a base branch merges has improved enough that smaller teams can get by with just careful branch discipline, but past two levels of depth you want a tool.
Start small: pick one feature next sprint and ship it as a 3-PR stack. The first stack is awkward. The third one is just how you work.
What should be automated before a human ever reviews?
Every minute a human spends on something a machine could have flagged is a minute added to turnaround, and worse, it trains reviewers to do shallow scanning. Automate in three layers.
Layer 1: kill style review completely. Formatter (Prettier, gofmt, Black, rustfmt) enforced in CI, linter rules for naming and imports, conventional commit checks if you use them. Target: zero human comments about formatting, ever. If one appears, fix the tooling, not the author.
Layer 2: CI answers "does it work" before review starts. Tests, type checks, and build must be green before a review request fires. Reviewing red PRs wastes the most expensive resource you have. Auto-request review only on green builds.
Layer 3: AI first pass answers "does it have known problems." This is the newest layer and the one with the most leverage on turnaround. An AI reviewer like Diffwise runs 40+ specialist agents in parallel on every PR (security, performance, conventions, plus language-specific agents) and posts findings inline within minutes of the push. The human reviewer then opens a diff where injection risks, N+1 queries, missing timeouts, and convention violations are already pinned to specific lines, and spends their 20 minutes on architecture and product correctness instead of pattern-hunting.
The feature that matters most for turnaround specifically is incremental review. The expensive loop in slow teams is re-review: author pushes fixes, reviewer re-reads the whole diff to figure out what changed and whether the issues are gone. Diffwise classifies every prior finding as Fixed, Still Open, or New on each push, so the re-review round collapses from "re-read everything" to "confirm the Still Open list is empty." Paired with GitHub Check Runs that block merge while critical findings remain open, humans stop doing gatekeeping entirely and re-review rounds drop from hours to minutes. The free tier covers 50 reviews a month across 3 repos, which is enough to run a honest two-week trial before deciding anything.
One rule keeps this layer healthy: tune out the noise aggressively. An AI reviewer that cries wolf gets ignored within a month. Use config (Diffwise does this via a .diffwise.yml file: severity thresholds, ignored paths, disabled categories) to keep the signal rate high enough that a posted finding means something.
Which async norms keep reviews moving?
Process and tooling get you most of the way. A few communication norms close the gap, especially across timezones.
Label comment severity. Prefix every comment with blocking:, nit:, or question:. Without labels, authors treat all 14 comments as required changes and burn a day addressing nits. With labels, they fix the two blockers, batch the nits or ticket them, and re-request review the same afternoon.
Author merges after approval, nits notwithstanding. "Approve with comments" should be the default verdict when nothing is blocking. Requiring a fresh approval round for renamed variables adds a full pickup cycle for zero risk reduction.
One round of comments, not a trickle. Reviewers finish the whole diff before submitting, as a single batched review. Five separate comment notifications over three hours means five context switches for the author.
Escalate to synchronous after two rounds. If a thread hits its third back-and-forth, it's a 10-minute call or a pairing session, not a fourth comment. Async is the default, not a religion.
Timezone handoffs need explicit owners. Distributed teams should assign reviews to someone whose working hours overlap the author's by at least 2-3 hours, or accept a deliberate follow-the-sun pattern where end-of-day review requests are part of the handoff ritual.
How do you know it's working?
Track four numbers weekly, per team, trended over time. Don't rank individuals on them; that turns every metric into a target and every target into rubber-stamping.
- Pickup time (median): target under 4 working hours. This moves first and fastest.
- Total turnaround, open-to-merge (median and p90): target under 24 working hours median. Watch p90, because the average hides the monster PRs that hurt the most.
- Review rounds per PR: target under 2. If this falls toward 1.0 while defect escapes rise, you've started rubber-stamping; if it sits at 3+, your PRs are too big or your feedback arrives in trickles.
- Defects escaped to production: the integrity check. Speed gains that show up alongside rising escapes aren't gains.
Realistic trajectory for a team starting at a 2-3 day median: size caps and the SLA get you under 1.5 days within a month. Automation and severity labels get you under one day by month two. The last gap to a few hours usually requires stacking plus the re-review loop fix, because by then the only remaining wait is round-trips.
Frequently asked questions
What's a good average PR review turnaround time?
Under 24 working hours from open to merge is good; elite teams sustain 4-8 hours. Pickup specifically should be under 4 working hours. If your median is over two days, start with PR size limits and a visible review queue before anything else.
How many reviewers should each PR have?
One required reviewer for most changes. Microsoft's research on code review found the second reviewer adds little defect detection while roughly doubling coordination delay. Reserve two reviewers for genuinely risky surfaces: auth, payments, migrations, public APIs.
Why are my PRs sitting unreviewed for days?
Almost always three causes stacking up: PRs too large to review in one sitting, no explicit expectation about pickup time, and reviews losing to feature work in prioritization. Fix in that order: cap size, set a 4-hour pickup SLA, make the queue visible twice a day.
Does AI code review replace human reviewers?
No. It replaces the mechanical 60% of review (known bug patterns, security smells, convention checks) so the human 40% (design, product correctness, judgment) happens faster and with more attention. Teams using tools like Diffwise typically keep one human approval required but cut their re-review rounds dramatically.
What's the single highest-impact change if we can only do one thing?
Cap PR size at 400 changed lines, enforced by CI. It shortens pickup, deepens review quality, and reduces rounds all at once, and every other tactic on this page works better once PRs are small.