The Brute Force Baseline

Exact nearest neighbors: when it works and when it doesn't

Before exploring sophisticated indexing algorithms, we should understand the baseline: brute force search. Compute the similarity between the query and every vector in the database. Return the top k. Simple, exact, and often surprisingly viable.

The Algorithm

Brute force nearest neighbor search is straightforward:

function search(query, vectors, k):
    scores = []
    for each vector in vectors:
        score = similarity(query, vector)
        scores.append((score, vector))
    sort scores descending
    return top k scores

plaintext

No preprocessing. No index structure. Just compute and compare.

Interactive: Brute force complexity

Vectors: 10,00010000

Dimensions: 768768

Operations per query

7.7M

Estimated latency

0.02 ms

Memory usage

31 MB

Time: O(n × d) = O(10,000 × 768) = O(7,680,000)

Space: O(n × d) = 30.7 MB (float32)

Complexity Analysis

Time complexity: O(n × d) per query

n = number of vectors
d = embedding dimension

For 1 million vectors of 768 dimensions, that is 768 million floating-point operations per query. With optimized SIMD instructions, a modern CPU can do billions of operations per second. Rough estimate: tens to hundreds of milliseconds per query.

Space complexity: O(n × d)

Just storing the vectors
No additional index overhead

When Brute Force Works

Brute force is the right choice more often than you might think:

When is brute force viable?

Vectors

10,000

Target QPS

Latency req

100 ms

Est. latency

15 ms

viable

Brute force meets requirements. 15ms latency with 1 CPU(s) for 10 QPS.

Small collections (< 10K vectors) At 10,000 vectors, brute force takes single-digit milliseconds. Index overhead is not worth it.

High accuracy requirements Approximate algorithms trade accuracy for speed. When 100% recall is mandatory, brute force is the only guarantee.

Infrequent queries If you run one query per minute, 100ms latency is fine. Index building time would never pay off.

Rapid prototyping Building an index requires choosing parameters, tuning, and evaluating trade-offs. Brute force gets you started immediately.

Filtered queries with small result sets If metadata filters reduce candidates to 1,000 vectors, brute force on that subset is fast.

When Brute Force Fails

Scaling behavior

Collection size: 100,000 vectors100000

Brute Force

154ms

HNSW

8.3ms

IVF

2.5ms

Brute force scales O(n). HNSW scales O(log n). The gap widens exponentially with collection size.

Large collections (> 100K vectors) At 1 million vectors, queries take hundreds of milliseconds. At 100 million, they take seconds. User experience suffers.

High query throughput 100 queries per second with 100ms each means you need 10 CPUs just for search. Approximate methods are 10-100x faster.

Real-time requirements If P99 latency must be < 20ms, brute force fails at moderate scale.

Cost constraints CPU time costs money. Approximate search on the same hardware handles 10-100x more QPS.

Exact vs Approximate

Exact vs approximate trade-off

Recall: 95%95

Found (95%)

Missed (5%)

At 95% recall, 5 of 100 truly relevant results are missed. Usually acceptable.

Key insight: The items missed are typically those ranked just outside the top-k. If you need the top 10 and the 11th best item replaces the 10th, the difference in relevance is usually negligible.

The fundamental trade-off:

Approach	Recall	Latency	Build Time
Brute force	100%	O(n)	0
HNSW	95-99%	O(log n)	O(n log n)
IVF	90-98%	O(n/k)	O(n)

Approximate algorithms miss some true neighbors. But if the top-10 by exact search includes items ranked 1, 2, 3, 4, 5, 6, 7, 8, 10, 12 by approximate search, does it matter? For most applications, no.

Optimizing Brute Force

If you need brute force at scale, optimization helps:

SIMD vectorization: Process multiple dimensions simultaneously. Libraries like FAISS use AVX-512 for 16x throughput on compatible CPUs.

GPU acceleration: GPUs excel at parallel dot products. A single GPU can brute-force millions of vectors in milliseconds.

Batching: Process multiple queries together. Better cache utilization, amortized overhead.

Quantization: Reduce precision from float32 to int8. 4x less memory, faster computation. Small accuracy loss.

Filtering first: Apply metadata filters before distance computation. Fewer vectors to compare.

Brute Force as Reranking

A common pattern: use approximate search to get 100 candidates, then brute force rerank to get the final top 10.

This combines:

Speed of approximate search for the initial filter
Accuracy of exact scoring for the final ranking

Approximate methods have highest error for items near the decision boundary. Reranking with exact scores recovers precision where it matters most.

Establishing Your Baseline

Before implementing any indexing, establish a brute force baseline:

Measure latency: How long do queries take?
Measure throughput: How many QPS can you handle?
Define requirements: What latency and recall do you need?
Evaluate need: Is brute force actually insufficient?

Many projects implement complex indexing that never was necessary. Start simple. Optimize when measurements demand it.

Key Takeaways

Brute force search computes similarity between the query and every vector—O(n × d) per query
It is the right choice for small collections, high accuracy needs, or rapid prototyping
It fails at scale: 100K+ vectors with latency or throughput requirements
Approximate methods trade small recall losses for 10-100x speedups
Optimize brute force with SIMD, GPU, batching, or quantization if needed
Use brute force as a reranking step after approximate retrieval
Always establish a brute force baseline before building complex indices