{"site":{"name":"Koji","description":"AI-native customer research platform that helps teams conduct, analyze, and synthesize customer interviews at scale.","url":"https://www.koji.so","contentTypes":["blog","documentation"],"lastUpdated":"2026-05-05T09:50:12.056Z"},"content":[{"type":"documentation","id":"c7b508eb-c27e-40e4-b418-9697f82552ee","slug":"system-usability-scale-guide","title":"System Usability Scale (SUS): Complete Guide with Calculator, Benchmarks & Examples","url":"https://www.koji.so/docs/system-usability-scale-guide","summary":"A definitive reference on the System Usability Scale (SUS): the 10 standard questions, the 0–100 scoring formula, Sauro–Lewis grade benchmarks, sample-size guidance, and how Koji turns a 2-week SUS study into an afternoon.","content":"## What is the System Usability Scale (SUS)?\n\nThe **System Usability Scale (SUS)** is a 10-item Likert questionnaire that produces a single 0–100 score representing the perceived usability of any product, app, or system. It was developed by John Brooke at Digital Equipment Corporation in 1986 as a \"quick and dirty\" measure of usability, and has since become the most widely cited usability questionnaire in the world — used in more than 1,300 published research articles and behind tens of thousands of commercial usability evaluations.\n\nIf you only have time to capture one number that summarizes how usable your product feels to a real user, it should be a SUS score. It is short (10 questions), validated, comparable across products, and reliable even with small samples (Brooke's original validation work showed SUS produces consistent scores with as few as 8 to 12 participants per condition).\n\n> **The bottom line: a SUS score above 68 is \"above average.\" Above 80.3 is in the top 10% of products tested. Anything below 51 is in the bottom 15%.** These cut-points come from Jeff Sauro and James Lewis's analysis of more than 5,000 SUS scores across 500+ studies — the de-facto benchmark database.\n\n## The 10 SUS Questions (verbatim)\n\nThe SUS uses ten statements, alternating between positive and negative phrasing, and a 5-point Likert scale (\"Strongly disagree\" → \"Strongly agree\"):\n\n1. I think that I would like to use this system frequently.\n2. I found the system unnecessarily complex.\n3. I thought the system was easy to use.\n4. I think that I would need the support of a technical person to be able to use this system.\n5. I found the various functions in this system were well integrated.\n6. I thought there was too much inconsistency in this system.\n7. I would imagine that most people would learn to use this system very quickly.\n8. I found the system very cumbersome to use.\n9. I felt very confident using the system.\n10. I needed to learn a lot of things before I could get going with this system.\n\nThe alternating polarity is intentional. It forces respondents to read each item rather than satisficing down a column of \"Strongly agree\" answers, and it yields a more discriminating score.\n\n## How to Calculate a SUS Score (the Formula)\n\nSUS scoring confuses people the first time they see it because the math is non-obvious. The reason is the alternating polarity above — odd items measure positive sentiment, even items measure negative sentiment, and both have to be normalized in opposite directions before they can be summed.\n\nThe full formula:\n\n```\nSUS = 2.5 × ( 20 + (Q1 + Q3 + Q5 + Q7 + Q9) − (Q2 + Q4 + Q6 + Q8 + Q10) )\n```\n\nStep by step, for each respondent:\n\n1. **Convert each Likert response to a 1–5 number** (\"Strongly disagree\" = 1, \"Strongly agree\" = 5).\n2. **For odd-numbered items (1, 3, 5, 7, 9):** subtract 1 from the score.\n3. **For even-numbered items (2, 4, 6, 8, 10):** subtract the score from 5.\n4. **Sum all 10 adjusted scores.** You will get a number between 0 and 40.\n5. **Multiply by 2.5** to convert to the final 0–100 scale.\n\nThat gives you a per-respondent SUS score. **To produce the SUS score for the system,** average the per-respondent scores.\n\nA worked example: a respondent rates Q1 = 4, Q2 = 2, Q3 = 5, Q4 = 1, Q5 = 4, Q6 = 1, Q7 = 5, Q8 = 2, Q9 = 4, Q10 = 1. Adjusted scores: 3, 3, 4, 4, 3, 4, 4, 3, 3, 4 → sum = 35 → × 2.5 = **87.5**, an \"A\" grade.\n\nA common mistake: SUS is **not** a percentage. A score of 75 does **not** mean \"75% of users found it usable.\" It is a normalized index, and the only meaningful comparison is to the benchmark distribution.\n\n## SUS Benchmarks: What Counts as \"Good\"?\n\nJeff Sauro's 2011 analysis of 5,000+ SUS scores produced a now-canonical benchmark distribution. Sauro and Lewis later refined it into a curved letter-grade scale (Sauro & Lewis, 2016).\n\n| SUS Score | Grade | Percentile | Interpretation |\n|-----------|-------|------------|----------------|\n| ≥ 84.1 | A+ | 96–100 | Best imaginable |\n| 80.8 – 84.0 | A | 90–95 | Excellent |\n| 78.9 – 80.7 | A− | 85–89 | Excellent |\n| 77.2 – 78.8 | B+ | 80–84 | Good |\n| 74.1 – 77.1 | B | 70–79 | Good |\n| 72.6 – 74.0 | B− | 65–69 | Good |\n| 71.1 – 72.5 | C+ | 60–64 | OK |\n| 65.0 – 71.0 | C | 41–59 | Average (the 68 mid-point) |\n| 62.7 – 64.9 | C− | 35–40 | OK / borderline |\n| 51.7 – 62.6 | D | 15–34 | Poor — fix it |\n| < 51.7 | F | 0–14 | Unusable |\n\nThe single number to remember is **68** — the population mean across all SUS studies. If your score is below 68, your product is below average. If it is above 80, you are in the top 10–15%.\n\n> \"A SUS score above a 68 would be considered above average, and anything below 68 is below average. The best way to interpret your results involves normalizing the scores to produce a percentile ranking.\" — *Jeff Sauro, MeasuringU*\n\n## When to Use SUS (and When Not To)\n\n**Use SUS when you want to:**\n\n- Track the perceived usability of a product over time (release-over-release).\n- Compare two design alternatives in an A/B usability test.\n- Benchmark your product against competitors using a shared yardstick.\n- Compare across very different product categories — SUS is technology-agnostic and works for desktop apps, mobile apps, websites, voice interfaces, hardware, and even physical products.\n- Provide a single, defensible number for stakeholders who want a usability KPI.\n\n**SUS is the wrong tool when:**\n\n- You need diagnostic insight into *why* something is hard. SUS gives a score, not reasons. Pair it with open-ended interview questions to find the root cause.\n- You are testing a prototype with major missing functionality — respondents will rate the gaps, not the design.\n- You are measuring task-level success or efficiency. Use task completion rates, time-on-task, and error counts for those.\n- You only have one or two responses. SUS is reliable at small samples, but a sample of 1 is still a sample of 1.\n\nA widely cited Nielsen Norman Group rule of thumb: **complement quantitative usability metrics like SUS with at least 5 qualitative usability sessions** — quant gives you the score, qual gives you the why.\n\n## Sample Size: How Many Respondents Do You Need?\n\nSUS is unusually robust at small samples — Brooke's 1996 paper showed it produces stable scores with as few as 8 respondents per condition, and Tullis & Stetson (2004) found that 12–14 respondents are enough to detect a 95% confidence interval narrow enough to make decisions.\n\nPractical guidance:\n\n- **Formative usability test (one design):** 8–12 respondents is sufficient.\n- **Comparative test (A vs B):** 14–20 per condition for adequate statistical power.\n- **Benchmark / tracking study:** 30–50 respondents per release for tighter confidence intervals.\n- **Public benchmark claims (\"our SUS is 82\"):** 50+ respondents, ideally weighted to your real user mix.\n\nThe width of the 95% confidence interval shrinks rapidly between n=5 and n=20, then much more slowly. There is rarely a payoff to going beyond 100 unless you need sub-segment estimates.\n\n## How to Run a SUS Study with Koji (the Modern Approach)\n\nTraditional SUS studies require a survey tool, an email invite list, manual scoring in a spreadsheet, and a separate qualitative session to understand the *why* behind the score. That is the workflow MeasuringU and Sauro have refined for two decades — and it works, but it takes days of wall-clock time.\n\nWith Koji, a SUS study runs end-to-end in a single AI-moderated study, and the score plus the diagnostic insights come back together.\n\n**Step 1 — Create the study.** In Koji, start a new project and pick the **Discovery** or **Exploratory** methodology. Tell the AI consultant your goal: *\"I want to measure the System Usability Scale for our checkout flow and understand the top usability frictions.\"* The AI consultant drafts a research brief in seconds.\n\n**Step 2 — Add the 10 SUS items as `scale` questions.** Koji supports six [structured question types](/docs/structured-questions-guide), one of which is `scale`. Configure each SUS item with `scaleMin: 1`, `scaleMax: 5`, and `scaleLabels: [\"Strongly disagree\", \"Strongly agree\"]`. Because SUS items have deterministic numeric responses, Koji's **ground-truth override** locks in the click-based answer at high confidence — the LLM never re-interprets a deterministic widget click.\n\n**Step 3 — Add 2–3 open-ended probes.** Tell the AI moderator to follow each \"Strongly disagree\" or \"Strongly agree\" response with a probe: *\"What made it feel that way?\"* Koji's adaptive interviewer (configured via [`maxFollowUps`](/docs/structured-questions-guide)) handles the probing dynamically — you do not script every branch.\n\n**Step 4 — Recruit and launch.** Share a personalized link, embed the interview widget, or import a CSV of contacts. Koji supports voice (3 credits per interview) or text (1 credit per interview) modalities — voice produces 3.4× longer responses on average, but text often yields higher completion rates for longer SUS surveys.\n\n**Step 5 — Read the report.** Koji's [`aggregateScaleResponses`](/docs/insights-dashboard) function rolls up every respondent's 1–5 answers into per-question distributions, and the [`aggregateThemes`](/docs/understanding-themes-patterns) function clusters the open-ended probes into top friction themes. The aggregate SUS score appears at the top of the report; the *why* sits underneath as themed quotes with citations back to the source interview.\n\n**What used to take 2 weeks (recruit → field → score in Excel → write up) collapses to a single afternoon.** Teams using AI-moderated research report 60–80% reductions in time-to-insight versus traditional manual workflows.\n\n## Common SUS Pitfalls and How to Avoid Them\n\n1. **Modifying the question wording.** Researchers love to \"improve\" SUS by swapping \"system\" for \"app\" or \"website.\" Brooke's original validation work was done on the 1986 wording — modifying it invalidates comparison to the benchmark database. Use the items verbatim, even if the language sounds dated.\n2. **Reporting SUS as a percentage.** Stakeholders see \"75\" and assume \"75% of users approve.\" Always present the score next to the benchmark grade so the meaning is clear.\n3. **Comparing SUS to NPS or CSAT directly.** They measure different constructs. SUS measures perceived usability; NPS measures loyalty/recommendation; CSAT measures task-level satisfaction. They are complementary, not interchangeable.\n4. **Running SUS on a broken prototype.** If 40% of respondents rate \"I found the various functions well integrated\" with \"Strongly disagree\" because the demo had a bug, you have measured the bug, not the design.\n5. **Ignoring the open-ended why.** A SUS score with no qualitative follow-up is just a vanity metric. Always pair it with open-ended probes.\n\n## Beyond SUS: Adjacent Usability Instruments\n\nIf SUS does not quite fit your context, consider:\n\n- **UMUX-Lite** — 2 items, designed to correlate with SUS; useful for ultra-short pulse surveys.\n- **SUPR-Q** — 8 items, optimized for websites (includes loyalty, trust, appearance, usability sub-scales).\n- **CSUQ (Computer System Usability Questionnaire)** — 19 items, more diagnostic.\n- **QUIS** — longer (50+ items), heavyweight diagnostic instrument for academic studies.\n\nFor most product teams, SUS plus open-ended probes is the right starting point. You can always layer in a more diagnostic instrument once you know which screens or flows to investigate.\n\n## Frequently Asked Questions\n\nSee the FAQ section below.\n\n## Related Resources\n\n- [Usability Testing Guide](/docs/usability-testing-guide) — the broader methodology that SUS fits inside\n- [Structured Questions Guide](/docs/structured-questions-guide) — how Koji's 6 question types (scale, single_choice, multiple_choice, ranking, yes_no, open_ended) work\n- [Scale Questions in AI Interviews](/docs/scale-questions-guide) — how to deploy SUS items in Koji\n- [Likert Scale Research Guide](/docs/likert-scale-research-guide) — the underlying scale type SUS relies on\n- [Customer Effort Score (CES) Guide](/docs/customer-effort-score-guide) — a complementary task-level metric\n- [How to Analyze Qualitative Data](/docs/how-to-analyze-qualitative-data) — for the open-ended probes that pair with SUS\n\n---\n\n**Sources & further reading:** Brooke, J. (1986). *SUS: A Quick and Dirty Usability Scale*; Sauro, J. & Lewis, J. R. (2016). *Quantifying the User Experience*; Tullis, T. & Stetson, J. (2004). *A Comparison of Questionnaires for Assessing Website Usability*; MeasuringU SUS benchmark database (5,000+ scores across 500+ studies).","category":"Research Methods","lastModified":"2026-05-05T03:18:29.981446+00:00","metaTitle":"System Usability Scale (SUS): 2026 Calculator, Formula & Benchmarks Guide","metaDescription":"The complete 2026 guide to the System Usability Scale (SUS): 10 questions verbatim, the scoring formula, Sauro–Lewis grade benchmarks, sample-size rules, and how to deploy SUS in minutes on Koji.","keywords":["system usability scale","SUS","SUS calculator","SUS benchmark","SUS score","usability questionnaire","Brooke 1986","Sauro Lewis benchmark","SUS formula","usability testing"],"aiSummary":"A definitive reference on the System Usability Scale (SUS): the 10 standard questions, the 0–100 scoring formula, Sauro–Lewis grade benchmarks, sample-size guidance, and how Koji turns a 2-week SUS study into an afternoon.","aiPrerequisites":["Familiarity with usability testing concepts","Basic understanding of Likert scales","A product or prototype to evaluate"],"aiLearningOutcomes":["Calculate a SUS score correctly using the 2.5× formula","Interpret a SUS score against the Sauro–Lewis benchmark grades","Choose the right sample size for SUS studies","Avoid the five most common SUS pitfalls","Deploy a SUS study in Koji using scale questions plus open-ended probes"],"aiDifficulty":"beginner","aiEstimatedTime":"12 min read"}],"pagination":{"total":1,"returned":1,"offset":0}}