Query Understanding

Query expansion, rewriting, and the asymmetry between queries and documents

Queries and documents are fundamentally different. Queries are short, informal, sometimes misspelled. Documents are long, detailed, carefully written. Bridging this gap—understanding what users mean from what they type—is essential for effective retrieval.

The Asymmetry Problem

Interactive: Query-document asymmetry

Query

"how to make code faster"

5 wordsInformalQuestion

Matching Document

Performance optimization involves identifying bottlenecks through profiling, implementing caching strategies, reducing algorithmic complexity, and minimizing I/O operations. Common techniques include memoization, lazy evaluation, and parallel processing.

26 wordsFormalTechnical
The gap:

The query uses casual language ("how to make...") while the document uses technical terms. Different vocabulary, different style, yet they should match.

Queries are:

  • Short (3-7 words typically)
  • Questions or fragments
  • Informal, conversational
  • May use different vocabulary than answers

Documents are:

  • Long (100s to 1000s of words)
  • Declarative statements
  • Formal, complete
  • Use technical vocabulary

Embedding a query produces a different kind of vector than embedding a document. The semantic space may not align perfectly.

Query Expansion

Add related terms to the query to improve recall.

Interactive: Query expansion

Original query:
fast code
expand
fast codequick coderapid executionperformance optimizationspeed improvement

Expansion adds 4 related terms to improve recall

Synonym expansion: Add synonyms of query terms. "fast" → "fast, quick, rapid"

Concept expansion: Add related concepts. "machine learning" → "machine learning, neural networks, deep learning"

Entity expansion: Expand abbreviations and aliases. "NYC" → "NYC, New York City, New York"

Pseudo-relevance feedback: Retrieve initial results, extract terms from top documents, add to query, re-retrieve.

Expansion improves recall but may hurt precision. Balance carefully.

Hypothetical Document Embeddings (HyDE)

A clever technique: instead of embedding the query directly, ask an LLM to generate a hypothetical document that would answer the query, then embed that.

Interactive: HyDE

User query:
"What causes rain?"
HyDE Process:
  1. Receive short query from user
  2. Use LLM to generate hypothetical answer
  3. Embed the hypothetical (not the query)
  4. Search for documents similar to the hypothetical

The process:

  1. User asks: "What causes rain?"
  2. LLM generates: "Rain is caused by the condensation of water vapor in the atmosphere. When warm, moist air rises and cools, the water vapor condenses into droplets that form clouds. When droplets become heavy enough, they fall as precipitation."
  3. Embed the generated text (not the query)
  4. Search for similar documents

Why it works: The hypothetical document is in the same "style" as real documents. Its embedding is closer in the semantic space to actual answers than a short query embedding.

Trade-offs:

  • Requires LLM call (cost, latency)
  • LLM may hallucinate incorrect hypothetical
  • Works best when LLM has relevant knowledge

Query Rewriting

Transform the query to better match document vocabulary.

Query rewriting strategies

Original:
"what's wrong with my wifi"
Technical:
wireless network connectivity troubleshooting
Spelling fixed:
what is wrong with my Wi-Fi
Statement:
WiFi connection problems and solutions

Multiple rewrites can be used for multi-query retrieval

Conversational to keyword: "What's the best way to make my code run faster?" → "code performance optimization techniques"

Question to statement: "How do neural networks learn?" → "Neural networks learn through..."

Disambiguation: "Apple stock" → "Apple Inc. stock price" (not apple fruit)

Spelling correction: "recieving error mesage" → "receiving error message"

Modern approaches use LLMs for rewriting:

Given the query: "{user_query}"
Rewrite it to better match technical documentation.
Output only the rewritten query.
plaintext

Multi-Query Retrieval

Generate multiple query variations and retrieve for each:

  1. Original: "best practices for API design"
  2. Variation 1: "REST API design guidelines"
  3. Variation 2: "how to design good APIs"
  4. Variation 3: "API architecture patterns"

Merge results using reciprocal rank fusion or union with deduplication.

Benefits:

  • Different phrasings match different documents
  • Improves recall significantly
  • Robust to query ambiguity

Costs:

  • Multiple embedding computations
  • Multiple retrieval operations
  • Result merging complexity

Query Intent Classification

Not all queries need the same treatment:

Navigational: User wants a specific page. "OpenAI documentation" → direct link.

Informational: User wants to learn. "how do transformers work" → educational content.

Transactional: User wants to do something. "sign up for API access" → action pages.

Classifying intent enables routing to appropriate retrieval strategies.

Practical Query Processing Pipeline

A production system might:

  1. Normalize: Lowercase, remove punctuation, fix encoding
  2. Spell correct: Fix typos
  3. Classify intent: Route appropriately
  4. Expand: Add synonyms and related terms
  5. Rewrite (optional): Convert to document style
  6. Generate variations: Create multiple query forms
  7. Embed: Produce query vector(s)
  8. Retrieve: Search across variations
  9. Merge: Combine and deduplicate results

Not every query needs every step. Balance thoroughness with latency.

Key Takeaways

  • Queries and documents differ fundamentally in length, style, and vocabulary—this asymmetry hurts naive embedding
  • Query expansion adds related terms to improve recall, at some cost to precision
  • HyDE generates a hypothetical answer document, then searches with that embedding—closer to document space
  • Query rewriting transforms informal questions into document-style text
  • Multi-query retrieval generates variations and merges results for robust recall
  • Intent classification enables routing to appropriate retrieval strategies
  • A full pipeline normalizes, corrects, expands, rewrites, and generates variations before embedding