{"site":{"name":"Koji","description":"AI-native customer research platform that helps teams conduct, analyze, and synthesize customer interviews at scale.","url":"https://www.koji.so","contentTypes":["blog","documentation"],"lastUpdated":"2026-05-31T06:25:22.757Z"},"content":[{"type":"documentation","id":"3f66af18-afcf-442b-913b-6041681a9a08","slug":"search-interview-transcripts","title":"How to Search Across All Customer Interview Transcripts (Semantic + Keyword)","url":"https://www.koji.so/docs/search-interview-transcripts","summary":"Koji repository search combines semantic vector search (meaning) with keyword search (exact words) across every customer interview transcript in your account. Filter by study, theme, sentiment, quality score, segment, date range, language, or interaction mode. Every match returns a deep link into the source transcript with proper attribution. Search is exposed in the web UI, via REST API (GET /api/v1/search), and through MCP tools so AI assistants (Claude, Cursor, Windsurf) can query the repository. Replaces ad-hoc Notion databases and manual Dovetail tagging — works from the first interview because themes, sentiment, and embeddings are AI-generated at analysis time.","content":"**TL;DR:** Searching across customer interview transcripts is the single biggest unlock when your repository moves from \"10 interviews per project\" to \"300+ interviews across 12 studies.\" Koji combines semantic vector search (find the meaning, not just the word) with classic keyword search and filterable facets — theme, sentiment, quality score, segment, study, date range — and returns every match as a deep link straight into the original transcript. Most teams replace ad-hoc Notion databases and Dovetail tag taxonomies with this in their first week.\n\n---\n\n## Why Transcript Search Is a Repository Superpower\n\nThe dirty secret of customer research is that 80% of the value of any past interview is locked away because no one can find it again. A PM asks \"did anyone mention onboarding friction in the last quarter?\" and the only honest answer is \"maybe — let me re-read 40 transcripts.\"\n\nSearch across transcripts fixes that. It turns a passive archive into an active layer of evidence your team queries the way they query analytics dashboards. The shift in behavior is dramatic: teams that adopt repository search go from running research before every product decision to *consulting* research before every product decision. The cycle time drops from weeks to minutes.\n\nTraditional research tools either skip search entirely (SurveyMonkey, Typeform) or bolt it on as keyword-only (older Dovetail) — which means \"checkout flow problems\" only matches the exact phrase, not the participant who said \"I couldn't figure out how to pay.\" Koji uses AI-native search from day one: vectors built at analysis time, keyword indexing in parallel, and faceted filtering on the structured analysis the AI moderator already produced.\n\n---\n\n## Two Modes of Search\n\n### Semantic Search (Meaning)\n\nType a natural-language query. Koji embeds your query into the same vector space it embedded every participant utterance into, then returns the top matches by cosine similarity. Examples:\n\n- Query: \"users frustrated with pricing transparency\" → matches a respondent who said \"I had no idea what I was actually paying for until the invoice arrived.\"\n- Query: \"first-time onboarding confusion\" → matches \"I clicked around for ten minutes trying to find the button.\"\n- Query: \"willingness to recommend us\" → matches \"I've told three friends about this already\" and the participant who chose 9 on the NPS scale.\n\nSemantic search is the right tool when you do not know the exact words a participant might have used. It is also the only mode that works well across languages — a French respondent's quote can match an English query if their meaning aligns. See [Multi-Language User Research](/docs/multilingual-research-guide) for how Koji handles that.\n\n### Keyword Search (Exact Words)\n\nSometimes you do want the literal word. Searching for `Stripe` should not match \"payment processor\" — you want the participant who named the integration. Koji keyword search supports:\n\n- Exact phrase: `\"checkout button\"` (quoted)\n- Boolean: `mobile AND slow NOT android`\n- Wildcards: `onboard*` to match onboarding, onboarded, onboards\n- Field-scoped: `respondent:p_jane123 cancel*` to search only one participant's history\n\nKeyword and semantic results can be merged into a single ranked list (hybrid mode), which is the default. You can flip to pure-semantic or pure-keyword from the search bar if a query is misbehaving.\n\n---\n\n## Filters That Actually Matter\n\nSearch alone is rarely enough — you almost always want to scope by something. Koji's facet rail on the left of the search results page exposes:\n\n- **Study** — limit to one or several studies.\n- **Theme** — every transcript is tagged with themes by the analysis pipeline. Pick one (e.g. \"Pricing Confusion\") and only matching quotes appear.\n- **Sentiment** — positive, neutral, negative, mixed.\n- **Quality score** — Koji rates every interview 1–5; filter to ≥3 to remove low-signal conversations (the same threshold the [credit gate](/docs/understanding-quality-scores) uses).\n- **Question type** — show only [scale](/docs/structured-questions-guide), single_choice, multiple_choice, ranking, yes_no, or open_ended answers.\n- **Segment / persona** — if your study uses lead-form fields or imported respondent metadata, every value becomes a filter (industry, role, plan tier, etc.).\n- **Date range** — last 7 days, last 30, last quarter, or a custom range.\n- **Interaction mode** — voice or text interviews.\n- **Language** — if your studies span multiple languages.\n\nFilters compose. \"Negative sentiment + Pricing Confusion theme + last 30 days + Enterprise segment\" is a single click rather than a SQL query.\n\n---\n\n## What a Match Looks Like\n\nEach result card shows:\n\n1. **The matched quote** — highlighted inline so you can see why it matched.\n2. **The surrounding context** — the previous question and the follow-up the AI moderator asked, so the quote does not look stranded.\n3. **The participant** — display name, segment, and any metadata the lead form captured.\n4. **Quality score and themes** — at-a-glance signal of how much to trust this interview.\n5. **A jump-to-quote deep link** — clicking opens the full transcript scrolled to the exact message, with the quote highlighted. This is the workflow that makes search actually replace re-reading.\n\nHover over any card to copy the quote to clipboard with proper attribution (`— Jane D., Enterprise Plan, Q3 Pricing Study`). This is how teams populate PRDs and pitch decks in minutes instead of hours.\n\n---\n\n## Common Workflows\n\n### \"Did anyone ever mention X?\" Repository Query\n\nThe classic question. Type a semantic query, scan the top 10 matches, and you have the answer in seconds. If the answer is yes, you have the exact quote and source ready to paste into Linear or Notion. If the answer is no, you know to plan a fresh study.\n\n### Theme Validation Across Studies\n\nYou think \"pricing transparency\" is a theme — but is it? Filter by theme = \"Pricing\", scroll the participant list, and count how often each segment shows up. Real themes have density across studies; ghost themes only show up in one. Koji's analysis pipeline tags themes per-interview automatically, so the filter populates on its own. See [Research Synthesis Guide](/docs/research-synthesis-guide) for how to turn search results into a synthesized theme.\n\n### Drafting a PRD With Real Voice\n\nPRDs grounded in actual customer quotes get more stakeholder buy-in than PRDs full of paraphrase. Search for the problem you're solving (\"manual export workflows\", \"data sync friction\"), pick three to five quotes from distinct participants and segments, and lead the PRD's \"Why Now\" section with those quotes. The \"Generate Quote Block\" button packages them with attribution.\n\n### Bug Triage From Support Tickets\n\nA support ticket arrives describing a confusing error in checkout. Before triaging engineering effort, search transcripts for \"checkout\" filtered to negative sentiment in the last 90 days. If three other participants mentioned the same friction, that is a top-of-queue bug. If no one did, it may be a single-customer edge case.\n\n### Pre-Launch Risk Check\n\nBefore shipping a feature, search for everything respondents said about that area of the product across all past studies. You'll often find a forgotten concern from six months ago that the team should address before launch.\n\n---\n\n## Search From the MCP and the API\n\nRepository search is not just a web UI feature. It is exposed in three programmatic surfaces:\n\n- **MCP tools.** Connect Koji to [Claude Desktop](/docs/mcp-setup-claude), [Claude Code](/docs/mcp-setup-claude-code), [Cursor](/docs/mcp-setup-cursor), [VS Code](/docs/mcp-setup-vscode), or Windsurf and ask \"search every transcript for participants who mentioned pricing confusion in the last 30 days.\" The agent calls the search tool, returns ranked quotes, and you can ask follow-up questions like \"summarize the top three themes.\"\n- **REST API.** `GET /api/v1/search?q=...&study_id=...&theme=...&sentiment=...&limit=50` returns the same ranked results as the UI. Use it from a backend service, a Slack bot, or a research-ops pipeline. See [User Research API](/docs/user-research-api-guide) for the full endpoint.\n- **Webhook trigger search.** Subscribe to `interview.analysis_ready` ([webhook setup](/docs/webhook-setup)) and run a search query each time a new interview comes in — useful for automatically tagging new participants who match an existing theme.\n\nA common production pattern: a nightly job calls the search API for \"negative sentiment + Enterprise segment + last 7 days,\" posts the matches into a Slack channel, and the CS lead reaches out to those customers before the week starts.\n\n---\n\n## Search Quality Tips\n\nSearch results are only as good as the underlying interviews. A few habits dramatically improve hit rate:\n\n1. **Use structured questions where possible.** [Structured questions](/docs/structured-questions-guide) (scale, single_choice, etc.) produce typed values that are 100% deterministic to filter on. Open-ended questions produce free text that semantic search is excellent at — but if you have a quantifiable question, make it structured.\n2. **Let the AI follow up.** Koji's moderator asks probing follow-ups when a participant gives a vague answer. The follow-up text is what often contains the most search-valuable signal.\n3. **Tag themes consistently.** When the analysis pipeline suggests a theme name, accept the canonical version rather than creating a near-duplicate. \"Pricing Confusion\" and \"Pricing Unclear\" should be merged.\n4. **Set quality thresholds.** Filter out interviews scored below 3 unless you specifically want to see why a conversation underperformed. The default credit gate already excludes them from billing, but they still live in the repository.\n5. **Use segments aggressively.** Even a free-text \"company size\" field on the lead form becomes a powerful filter facet over time. Capture metadata at intake.\n\n---\n\n## How Koji Search Compares\n\nMost legacy research tools cannot search across all transcripts at all — they treat each study as a silo. Dovetail and Marvin pioneered repository search but rely heavily on manual tagging. Koji's advantage is the AI-native pipeline: themes, sentiment, structured answers, and embeddings are all generated automatically when the analysis runs, so search works out of the box from your very first interview. You do not need a researcher to spend a week tagging a corpus before search becomes useful.\n\nFor teams comparing Koji against legacy research repositories, see [Best UX Research Repository Tools 2026](/docs/best-ux-research-repository-tools-2026).\n\n---\n\n## Related Resources\n\n- [Viewing Interview Transcripts](/docs/viewing-interview-transcripts) — single-transcript view\n- [Chat With Interview Transcripts (AI)](/docs/chat-with-interview-transcripts-ai) — ask questions across transcripts\n- [Structured Questions Guide](/docs/structured-questions-guide) — the 6 question types Koji supports\n- [Research Synthesis Guide](/docs/research-synthesis-guide) — turn search results into themes\n- [Research Repository Guide](/docs/research-repository-guide) — how to structure your repository\n- [Understanding Quality Scores](/docs/understanding-quality-scores) — the score search filters on\n- [User Research API](/docs/user-research-api-guide) — the headless API behind search","category":"Analysis & Synthesis","lastModified":"2026-05-31T03:24:04.464214+00:00","metaTitle":"Search Across All Customer Interview Transcripts — Semantic + Keyword | Koji","metaDescription":"Find the exact moment a customer said the thing. Semantic vector search, keyword search, theme + sentiment + segment filters, and jump-to-quote links across every Koji interview.","keywords":["search interview transcripts","transcript search","semantic search interviews","search across customer interviews","research repository search","user research search"],"aiSummary":"Koji repository search combines semantic vector search (meaning) with keyword search (exact words) across every customer interview transcript in your account. Filter by study, theme, sentiment, quality score, segment, date range, language, or interaction mode. Every match returns a deep link into the source transcript with proper attribution. Search is exposed in the web UI, via REST API (GET /api/v1/search), and through MCP tools so AI assistants (Claude, Cursor, Windsurf) can query the repository. Replaces ad-hoc Notion databases and manual Dovetail tagging — works from the first interview because themes, sentiment, and embeddings are AI-generated at analysis time.","aiPrerequisites":["Koji account with at least one completed interview","Familiarity with research themes and sentiment tagging"],"aiLearningOutcomes":["Run semantic and keyword search across all transcripts","Filter by study, theme, sentiment, quality, segment, and date","Use deep links to jump into source transcripts","Query the search API from code","Drive search from Claude or Cursor via MCP"],"aiDifficulty":"beginner","aiEstimatedTime":"7 min"}],"pagination":{"total":1,"returned":1,"offset":0}}