The Brute Force Baseline

Exact nearest neighbors: when it works and when it doesn't

Before exploring sophisticated indexing algorithms, we should understand the baseline: brute force search. Compute the similarity between the query and every vector in the database. Return the top k. Simple, exact, and often surprisingly viable.

The Algorithm

Brute force nearest neighbor search is straightforward:

function search(query, vectors, k):
    scores = []
    for each vector in vectors:
        score = similarity(query, vector)
        scores.append((score, vector))
    sort scores descending
    return top k scores
plaintext

No preprocessing. No index structure. Just compute and compare.

Interactive: Brute force complexity

Operations per query
7.7M
Estimated latency
0.02 ms
Memory usage
31 MB
Time: O(n × d) = O(10,000 × 768) = O(7,680,000)
Space: O(n × d) = 30.7 MB (float32)

Complexity Analysis

Time complexity: O(n × d) per query

  • n = number of vectors
  • d = embedding dimension

For 1 million vectors of 768 dimensions, that is 768 million floating-point operations per query. With optimized SIMD instructions, a modern CPU can do billions of operations per second. Rough estimate: tens to hundreds of milliseconds per query.

Space complexity: O(n × d)

  • Just storing the vectors
  • No additional index overhead

When Brute Force Works

Brute force is the right choice more often than you might think:

When is brute force viable?

Vectors
10,000
Target QPS
10
Latency req
100 ms
Est. latency
15 ms
viable

Brute force meets requirements. 15ms latency with 1 CPU(s) for 10 QPS.

Small collections (< 10K vectors) At 10,000 vectors, brute force takes single-digit milliseconds. Index overhead is not worth it.

High accuracy requirements Approximate algorithms trade accuracy for speed. When 100% recall is mandatory, brute force is the only guarantee.

Infrequent queries If you run one query per minute, 100ms latency is fine. Index building time would never pay off.

Rapid prototyping Building an index requires choosing parameters, tuning, and evaluating trade-offs. Brute force gets you started immediately.

Filtered queries with small result sets If metadata filters reduce candidates to 1,000 vectors, brute force on that subset is fast.

When Brute Force Fails

Scaling behavior

Brute Force
154ms
HNSW
8.3ms
IVF
2.5ms

Brute force scales O(n). HNSW scales O(log n). The gap widens exponentially with collection size.

Large collections (> 100K vectors) At 1 million vectors, queries take hundreds of milliseconds. At 100 million, they take seconds. User experience suffers.

High query throughput 100 queries per second with 100ms each means you need 10 CPUs just for search. Approximate methods are 10-100x faster.

Real-time requirements If P99 latency must be < 20ms, brute force fails at moderate scale.

Cost constraints CPU time costs money. Approximate search on the same hardware handles 10-100x more QPS.

Exact vs Approximate

Exact vs approximate trade-off

Found (95%)

Missed (5%)

At 95% recall, 5 of 100 truly relevant results are missed. Usually acceptable.

Key insight: The items missed are typically those ranked just outside the top-k. If you need the top 10 and the 11th best item replaces the 10th, the difference in relevance is usually negligible.

The fundamental trade-off:

ApproachRecallLatencyBuild Time
Brute force100%O(n)0
HNSW95-99%O(log n)O(n log n)
IVF90-98%O(n/k)O(n)

Approximate algorithms miss some true neighbors. But if the top-10 by exact search includes items ranked 1, 2, 3, 4, 5, 6, 7, 8, 10, 12 by approximate search, does it matter? For most applications, no.

Optimizing Brute Force

If you need brute force at scale, optimization helps:

SIMD vectorization: Process multiple dimensions simultaneously. Libraries like FAISS use AVX-512 for 16x throughput on compatible CPUs.

GPU acceleration: GPUs excel at parallel dot products. A single GPU can brute-force millions of vectors in milliseconds.

Batching: Process multiple queries together. Better cache utilization, amortized overhead.

Quantization: Reduce precision from float32 to int8. 4x less memory, faster computation. Small accuracy loss.

Filtering first: Apply metadata filters before distance computation. Fewer vectors to compare.

Brute Force as Reranking

A common pattern: use approximate search to get 100 candidates, then brute force rerank to get the final top 10.

This combines:

  • Speed of approximate search for the initial filter
  • Accuracy of exact scoring for the final ranking

Approximate methods have highest error for items near the decision boundary. Reranking with exact scores recovers precision where it matters most.

Establishing Your Baseline

Before implementing any indexing, establish a brute force baseline:

  1. Measure latency: How long do queries take?
  2. Measure throughput: How many QPS can you handle?
  3. Define requirements: What latency and recall do you need?
  4. Evaluate need: Is brute force actually insufficient?

Many projects implement complex indexing that never was necessary. Start simple. Optimize when measurements demand it.

Key Takeaways

  • Brute force search computes similarity between the query and every vector—O(n × d) per query
  • It is the right choice for small collections, high accuracy needs, or rapid prototyping
  • It fails at scale: 100K+ vectors with latency or throughput requirements
  • Approximate methods trade small recall losses for 10-100x speedups
  • Optimize brute force with SIMD, GPU, batching, or quantization if needed
  • Use brute force as a reranking step after approximate retrieval
  • Always establish a brute force baseline before building complex indices