Open, Axial, and Selective Coding: The Complete Guide to Qualitative Coding Phases
A complete guide to the three coding phases of grounded theory — open, axial, and selective coding. Examples, decision points, and how AI-native research with Koji compresses weeks of coding into minutes.
What are open, axial, and selective coding?
Open, axial, and selective coding are the three sequential phases of qualitative analysis in grounded theory — each one moves your data from raw text toward an explanatory theory. Open coding breaks data into discrete concepts. Axial coding finds the relationships between those concepts. Selective coding chooses one core category and ties everything else to it.
Developed by Anselm Strauss and Juliet Corbin in the 1990s as a refinement of Glaser and Strauss's original grounded theory (Springer, 2019), the three-phase approach is now the most widely-used coding framework in qualitative research — taught in PhD programs, used by UX researchers, and applied in product discovery worldwide.
This guide walks through each phase with concrete examples, shows where teams typically get stuck, and explains how AI-native research platforms like Koji compress what used to be weeks of manual coding into automated thematic analysis you can validate and refine in hours.
Why grounded theory coding still matters
In an era of AI-assisted analysis, why bother understanding manual coding phases? Three reasons:
-
AI thematic analysis is a coding accelerator, not a replacement. A 2025 tutorial in the Journal of Medical Internet Research found that ChatGPT-assisted coding "enhanced the efficiency and diversity of coding" but showed shortcomings in depth and context compared to manual work (JMIR, 2025). Researchers who understand the underlying phases can validate and refine AI output; those who do not, cannot.
-
Grounded theory is how you build theory from data, not impose theory on data. Open coding in particular is the discipline of staying open to what the data is telling you, rather than coding to confirm a hypothesis you already hold.
-
The three phases map onto how every product team thinks anyway — observation → relationships → core insight. Knowing the formal vocabulary makes synthesis conversations sharper.
Phase 1: Open coding
Open coding is the first pass through your data, where you read line-by-line (or segment-by-segment) and label each meaningful unit with a short code. The goal is breadth, not precision — capture every concept, action, or meaning you see, without pre-committing to categories.
How it works
- Read or listen to your data once with no coding — interview transcripts, field notes, open-ended survey responses. Get a feel for the whole.
- Pass through again, line-by-line. For each meaningful chunk, write a short label (a "code") that captures what is happening. Codes can be:
- Descriptive — "first-time login confusion"
- In-vivo — using participants' exact words: "the dashboard is overwhelming"
- Process — "deciding-whether-to-upgrade"
- Keep codes provisional. Resist the urge to consolidate yet. You will end up with hundreds of codes — that is normal and correct.
- Write memos. When a code feels significant or ambiguous, write a paragraph explaining what you saw. These memos become the connective tissue for axial coding.
Open coding example
Suppose you ran six interviews about how product managers prepare for stakeholder reviews. A passage like:
"I usually rebuild my deck from scratch the morning of, even though I have a template, because I never feel like the template covers what this particular VP cares about."
Might produce open codes like:
- last-minute deck rebuilding
- template-not-fitting-audience
- VP-specific preferences
- avoiding-template-reuse
- morning-of-review preparation
Notice the codes are granular and overlapping. That is fine in open coding.
Common open coding mistakes
- Coding too sparsely. If you have one code per page, you are summarizing, not coding. Aim for codes every 1–3 sentences.
- Importing pre-existing categories. If you already "know" the themes, you will see only those themes. Open coding requires deliberate openness.
- Conflating codes too early. "Confusion" and "frustration" feel similar but may have different antecedents. Keep them separate until axial coding tells you whether to merge.
- Skipping memos. Memos are where the meaning lives. A spreadsheet of codes without memos loses the analytical reasoning behind each label.
Phase 2: Axial coding
Axial coding takes the codes from open coding and finds the relationships between them — what causes what, what conditions affect what, what consequences flow from what. The output of axial coding is a smaller set of categories, each with subcategories that explain how the underlying codes connect.
Strauss and Corbin's coding paradigm
The canonical axial coding model uses a six-element framework — sometimes called the "coding paradigm" (Springer, 2019):
- Phenomenon — the central concept being explained.
- Causal conditions — what gives rise to it.
- Context — the setting in which it occurs.
- Intervening conditions — broader structural factors that shape it.
- Action/interaction strategies — how people respond.
- Consequences — what happens as a result.
Not every study uses all six elements. The framework is a scaffold, not a checklist.
Axial coding example
Using our open codes from earlier, axial coding might cluster them into:
Category: Audience-fit anxiety in stakeholder presentations
- Causal conditions: generic templates, varied stakeholder priorities
- Context: high-stakes review meetings, limited prep time
- Intervening conditions: lack of stakeholder profile data, no shared template library
- Action strategies: last-minute rebuilding, asking peers for the latest VP version, padding decks with backup slides
- Consequences: late nights, slide redundancy, decreased confidence
Now you have a category with structure, not just a list of codes. A reader can see the causal logic.
When to move from open to axial
Move when:
- You stop seeing meaningfully new open codes (an early sign of saturation — see Data Saturation in Qualitative Research).
- Open codes are starting to cluster around recognizable concepts.
- You have written enough memos to have a sense of the relationships at stake.
In practice, open and axial coding overlap. Most researchers iterate between the two before settling categories.
Phase 3: Selective coding
Selective coding is the final phase, where you choose one core category — the unifying concept that explains the most variance in your data — and integrate every other category around it. This is where grounded theory earns its name: the output is a theory grounded in the data.
How to identify the core category
Strauss and Corbin proposed several criteria. The core category should:
- Appear frequently in the data
- Connect to all (or most) other categories
- Explain variation across cases
- Be abstract enough to apply broadly, concrete enough to feel grounded
- Have explanatory power — using it, can you make sense of why participants did what they did?
Selective coding example
From the stakeholder presentation study, we might have several axial categories:
- Audience-fit anxiety
- Template-tool mismatch
- Time-pressure cascading
- Cross-team learning gaps
The selective coding question: is there one core category that ties these together?
A candidate: "the gap between generic preparation tools and audience-specific expectations." Each axial category becomes a facet of this core. Audience-fit anxiety is the felt experience; template-tool mismatch is the artifact-level cause; time-pressure cascading is the temporal consequence; cross-team learning gaps is the structural condition.
Now you have a theory. You can write a research narrative around it. You can design product interventions against it.
When selective coding is too ambitious
Not every qualitative study needs to produce a grounded theory. For most product and UX research, axial coding output is sufficient: a set of well-structured categories with relationships explained. Selective coding is the right move when:
- You are doing dissertation-level or publication-targeted research.
- The decision your insights inform is large enough to warrant a unifying explanation.
- You have enough data (typically 20+ rich interviews) to support theoretical claims.
For a sprint-level study, stop at axial coding and skip selective. Forcing selective on thin data produces theories that overreach.
Open vs. axial vs. selective: at-a-glance
| Phase | Goal | Output | Typical duration (manual) |
|---|---|---|---|
| Open | Identify concepts | 100s of codes + memos | Days to weeks |
| Axial | Find relationships | 5–15 categories with subcategory structure | Days |
| Selective | Build a unifying theory | One core category integrating all others | Days |
Manual grounded theory coding for a 20-interview study typically takes 80–120 hours of analyst time across the three phases — which is why so few teams do it rigorously.
How AI-native research changes the workflow
This is where Koji enters the picture. Rather than spend 80–120 hours coding by hand, modern teams use AI to accelerate the heavy lift while preserving researcher judgment for the strategic moves.
With Koji:
- AI-moderated interviews generate clean transcripts automatically — no transcription cost, no waiting.
- Automatic thematic analysis runs the equivalent of open + axial coding the moment interviews complete: clustering quotes, surfacing recurring themes, and labeling them in plain language.
- Quality scoring (1–5 scale) identifies the richest interviews — the ones most likely to yield axial categories with explanatory depth.
- AI consultants can be configured per-study to encode your research focus and coding priorities, so the AI's clustering reflects what you care about, not just what is statistically most frequent.
- 6 structured question types (open_ended, scale, single_choice, multiple_choice, ranking, yes_no) capture both the qualitative material that powers grounded coding and the quantitative anchors that ground theoretical claims. See the Structured Questions Guide.
- Real-time reports let you see emerging categories as interviews land, so axial coding happens during fieldwork instead of after.
The critical principle: AI accelerates open and axial coding; selective coding remains a human judgment call. Koji surfaces candidate themes faster than any manual workflow can, but choosing the core category — the unifying explanation that earns the right to be called a theory — is editorial work that belongs to you.
The 2025 JMIR tutorial on ChatGPT for grounded theory captured the principle well: AI is strongest at the breadth-and-clustering end of coding, weakest at the depth-and-judgment end (JMIR, 2025). Use Koji for the former; bring researcher expertise to the latter.
A practical workflow combining manual and AI coding
- Run interviews via Koji (AI-moderated voice or text — see Setting Up Voice Interviews).
- Let Koji's automatic thematic analysis surface initial themes. Treat these as candidate axial categories, not finished conclusions.
- Read 3–5 full transcripts manually. Even with AI clustering, deep reading is irreplaceable for catching nuance, contradictions, and emergent codes the AI missed.
- Refine categories. Merge AI-generated themes that overlap, split themes that conflate distinct concepts, add categories the AI missed.
- Apply the coding paradigm to top categories. For each major axial category, ask: what causes it, what consequences flow from it, what conditions shape it.
- Identify the core category (selective coding) — only if the study scope warrants it.
- Write the research narrative using the core category as the spine. See Research Storytelling.
Frequently asked questions
Do I need to do all three coding phases for every study? No. Open and axial coding are sufficient for most product and UX research. Selective coding is appropriate for academic work or large strategic studies where a unifying theory is the deliverable.
What is the difference between open coding and thematic analysis? Thematic analysis is a broader, more flexible methodology popularized by Braun and Clarke. It overlaps significantly with open + axial coding from grounded theory but is methodologically less prescriptive. See the Thematic Analysis Guide for a full comparison.
How long does manual coding take? Roughly 4–6 hours of coding per hour of interview, across all three phases. A 20-interview study (20 hours of audio) typically requires 80–120 hours of analyst time. AI-native platforms like Koji can compress the open/axial portion by 70–90%.
Can AI do grounded theory coding for me? AI accelerates open and axial coding effectively, but selective coding requires human judgment. The 2025 JMIR tutorial found AI useful for breadth and clustering, weaker for depth and theoretical integration. Best practice is a hybrid workflow: AI for the heavy lift, human researcher for the strategic synthesis.
What is in-vivo coding? In-vivo coding uses participants' exact words as codes — for example, "the dashboard is overwhelming" instead of "interface complexity." It is a sub-technique within open coding that preserves participant voice and is especially valuable when terminology itself is part of the finding.
Do I need a software tool for grounded theory coding? For studies under 5 interviews, a spreadsheet works. For 5–20 interviews, dedicated tools like NVivo, ATLAS.ti, or Delve speed manual coding. For ongoing or large-scale research, AI-native platforms like Koji handle the open/axial work automatically while you direct the strategic synthesis.
Related resources
- Structured Questions Guide — Combine open-ended and quantitative question types in interviews to power richer grounded theory analysis.
- Coding Qualitative Data — Broader overview of qualitative coding approaches beyond grounded theory.
- Grounded Theory Qualitative Research — The full methodology of which open/axial/selective coding is the analytical engine.
- Thematic Analysis Guide — A flexible alternative to grounded theory coding.
- Data Saturation in Qualitative Research — Knowing when to stop collecting and start coding.
- Research Synthesis Guide — The broader synthesis workflow that selective coding feeds into.
Related Articles
Research Synthesis: How to Combine Multiple Studies Into Clear Insights
A practical guide to synthesizing findings across multiple research studies — using thematic synthesis, triangulation, and structured data aggregation to build compounding organizational knowledge.
How to Code Qualitative Data: A Step-by-Step Guide
Learn the complete process of qualitative coding — from building a codebook to identifying themes — and how AI tools like Koji automate the most time-consuming parts.
Structured Questions in AI Interviews
Mix quantitative data collection — scales, ratings, multiple choice, ranking — with AI-powered conversational follow-up in a single interview.
Grounded Theory in Qualitative Research: A Practical Guide
A practical guide to grounded theory methodology — how to collect, code, and analyze qualitative data to develop theory from the ground up, and how AI-powered tools accelerate the iterative analysis process.
The Complete Guide to Thematic Analysis
Learn how to systematically analyze qualitative data using Braun and Clarke's six-phase thematic analysis framework.
Data Saturation in Qualitative Research: How to Know When You Have Enough
Data saturation is the point at which additional interviews stop producing new information. This guide covers the four types of saturation (theoretical, data, code, meaning), how to recognize and document them, the empirical sample sizes from Hennink and Guest, and how AI-moderated interviews let you reach saturation in days instead of months.