Dot Product and Cosine Similarity

Two sides of the same coin: when magnitude matters and when it doesn't

Finding similar vectors requires measuring similarity. Two operations dominate: the dot product and cosine similarity. They are closely related, yet subtly different. Understanding when to use each is essential for effective retrieval.

The Dot Product

The dot product of two vectors sums the element-wise products:

ab=i=1naibi=a1b1+a2b2++anbn\vec{a} \cdot \vec{b} = \sum_{i=1}^{n} a_i b_i = a_1 b_1 + a_2 b_2 + \ldots + a_n b_n

Geometrically, it measures how much two vectors "agree" in each dimension.

Interactive: Explore the dot product

a · b
2.83
|a| × |b|
4.00
cos(θ)
0.71

a · b = |a| × |b| × cos(θ). The dot product depends on both magnitudes and the angle.

The dot product has a geometric interpretation:

ab=abcos(θ)\vec{a} \cdot \vec{b} = |\vec{a}| \cdot |\vec{b}| \cdot \cos(\theta)

Where θ\theta is the angle between vectors. This formula reveals that the dot product depends on three things: the magnitude of a\vec{a}, the magnitude of b\vec{b}, and the angle between them.

This is both useful and problematic. Longer vectors naturally produce larger dot products, regardless of direction.

Cosine Similarity

Cosine similarity isolates the directional component by normalizing:

cos(θ)=abab\cos(\theta) = \frac{\vec{a} \cdot \vec{b}}{|\vec{a}| \cdot |\vec{b}|}

The result ranges from -1 (opposite directions) through 0 (perpendicular) to +1 (same direction).

Interactive: Cosine similarity

cos(θ)
0.866
+1: Same direction0: Perpendicular-1: Opposite

By dividing out the magnitudes, cosine similarity measures only how aligned two vectors are. A short vector pointing in the same direction as a long vector has cosine similarity 1.

This is usually what we want for semantic search: two texts should be similar if they point in the same semantic direction, regardless of how "much" of that direction they express.

When Magnitude Matters

Sometimes magnitude carries meaning. Consider two document embeddings: a long, detailed article about machine learning and a short tweet about machine learning. Both point in similar directions—they are "about" the same topic. But the long document has more content, more information, potentially more relevance to a detailed query.

If embeddings encode magnitude as "amount of information," dot product preserves this signal. Cosine similarity discards it.

Normalization effects

DocumentDot ProductCosine Sim
Doc A (long)5.631.000
Doc B (short)2.251.000
Doc C (medium)3.751.000

All vectors point in similar directions (same cosine). But dot products differ due to magnitude.

In practice, most embedding models output normalized vectors (length 1). This makes dot product and cosine similarity equivalent:

If a=b=1, then ab=cos(θ)\text{If } |\vec{a}| = |\vec{b}| = 1, \text{ then } \vec{a} \cdot \vec{b} = \cos(\theta)

Computing Similarity at Scale

The choice of metric affects more than semantics—it affects computation.

Dot product is faster. It is a single SIMD-friendly operation: multiply and add. Vector databases optimize heavily for this.

Cosine similarity requires computing norms and dividing. If vectors are pre-normalized, this cost is paid once at indexing time, not at query time.

Best practice: Normalize vectors when inserting into the database. Then use dot product for search, getting cosine similarity semantics with dot product speed.

Inner Product vs Cosine in Vector Databases

Vector databases typically offer multiple metrics. The "cosine" metric normalizes vectors then computes the dot product. The "dot product" or "inner product" metric uses the raw dot product without normalization. The "euclidean" or "L2" metric measures distance rather than similarity.

Comparing similarity metrics

vs QueryDot ProductCosineEuclidean
Doc 10.9900.9910.132
Doc 20.3600.3561.140
Doc 3-0.860-0.9221.895

Dot Product Rank

  1. 1.Doc 1
  2. 2.Doc 2
  3. 3.Doc 3

Cosine Rank

  1. 1.Doc 1
  2. 2.Doc 2
  3. 3.Doc 3

Euclidean Rank

  1. 1.Doc 1
  2. 2.Doc 2
  3. 3.Doc 3

Different metrics can produce different rankings. For normalized vectors, cosine and Euclidean give the same order.

For retrieval with normalized embeddings, use "cosine" if the database normalizes for you, or use "dot product" if you pre-normalize yourself—the latter is faster but gives the same results. For retrieval with unnormalized embeddings, "cosine" handles varying magnitudes gracefully while "dot product" may give unexpected results if magnitudes vary widely.

Similarity vs Distance

Cosine similarity and dot product measure similarity—higher is more similar.

Some algorithms require distance—lower is more similar. Cosine distance is simply 1cos(θ)1 - \cos(\theta), ranging from 0 for identical vectors to 2 for opposite vectors. Angular distance is arccos(cos(θ))/π\arccos(\cos(\theta)) / \pi, ranging from 0 to 1—more intuitive but slower to compute due to the arc cosine.

For normalized vectors, Euclidean distance relates directly to cosine similarity:

ab=22cos(θ)|\vec{a} - \vec{b}| = \sqrt{2 - 2\cos(\theta)}

This means Euclidean distance is monotonically related to cosine similarity for unit vectors. Ranking by either gives the same order.

Practical Recommendations

Start with normalized embeddings. Most models output them by default; check that yours does. With normalized vectors, use dot product search—it gives cosine semantics at lower computational cost.

Understand your specific model. Some models like ColBERT use non-normalized vectors intentionally to encode additional information in magnitude. Check the documentation rather than assuming.

Think carefully about thresholds. A cosine similarity of 0.8 means different things for different models. One model might rarely exceed 0.7 for any pair; another might cluster similar documents around 0.95. Calibrate empirically on real data, not intuitively.

Consider the distribution of values. Some embedding spaces use the full [-1, 1] range, including negative similarities. Others cluster all vectors in a narrow band like [0.3, 0.9]. Knowing your embedding distribution helps set meaningful thresholds and detect anomalies.

Key Takeaways

  • Dot product sums element-wise products; it depends on both direction and magnitude
  • Cosine similarity normalizes out magnitude, measuring only directional alignment
  • For normalized vectors, dot product equals cosine similarity
  • Pre-normalize vectors and use dot product for the best of both: cosine semantics, maximum speed
  • Vector databases offer multiple metrics; choose based on whether your vectors are normalized
  • Similarity thresholds are model-specific; calibrate on real data