Tail latencies and percentiles — what are they and why do they matter?

How fast is fast enough?

Average vs. outliers

Importance of tail latencies

  • “At Amazon, every 100 milliseconds of latency causes a 1% decrease in sales. And at Bing, a two-second slowdown was found to reduce revenue per user by 4.3%. At Shopzilla, reducing latency from seven seconds to two seconds increased page views by 25%, and revenue by 7%.” source
  • For every 100 ms of latency, Google estimated a drop in search traffic by ~0.20% source
  • “Keeping tail latencies under control doesn’t just** make your users happy, but also significantly **improves your service’s resilience while reducing operational costs.” source.
  • “A 99th percentile latency of 30ms means that every 1 in 100 requests experiences 30ms of delay. For a high-traffic website like LinkedIn, this could mean that for a page with 1 million page views per day, 10,000 of those page views experience (noticeable) delay.” source
  • The fastest rate at which humans can process incoming visual information is about 13 ms. Increasing latency above 13 ms has an increasingly negative impact on human performance for a given task source.
  • Data from Akamai shows that a 100-millisecond delay in website load time can hurt conversion rates by 7 percent, a two-second delay in web page load time increases bounce rates by 103 percent, and within ~3 seconds, more than half (53%) will lose patience and leave the page. source
  • At Walmart.com and Staples.com, every 1 second of load time improvement equals a 2% increase and a 10% increase in conversion rates [source (https://medium.com/@vikigreen/impact-of-slow-page-load-time-on-website-performance-40d5c9ce568a) source

Root causes of tail latencies

  • Long tail queries are difficult to cache. The memory consumption would be either prohibitive or the hit rate negligible.
  • Queries with frequent terms (e.g. “The Who”), require long posting lists to be loaded from the disk and intersected.
  • High-load peaks with many parallel queries may cause a bottleneck in processor load, IO, and memory bandwidth.
  • Parallel indexing for real-time search may cause a bottleneck in processor load, IO, and memory bandwidth.
  • Garbage collection may cause occasional delays.
  • Commits and compaction may cause occasional delays.

Percentiles, the way tail latencies are measured

  • Arithmetic mean latency: The sum of all latency measurements divided by the number of latency measurements.
  • 50th percentile latency = Median latency: The maximum latency, for the fastest 50% of all requests. For example, if the 50th percentile latency is 0.5 seconds, then 50% of requests are processed in less than 0.5 seconds. This metric is sometimes called the median latency. The median is also explained as the value separating the higher half from the lower half of a data sample.
  • 75th percentile latency: The maximum latency, for the fastest 75% of requests.
  • 95th percentile latency: The maximum latency, for the fastest 95% of requests.
  • 99th percentile latency: The maximum latency, for the fastest 99% of requests.
  • 99.9th percentile latency: The maximum latency, for the fastest 99.9% of requests.



  • buffer reuse
  • full processor utilization by saturating all cores
  • efficient locking or lock-free architecture decides whether 100% usage of cores can be achieved
  • scalability of memory consumption: if not effective then crash under load
  • memory bandwidth: if not effective then bottleneck under load
  • stability, throughput, tail latencies under load
  • maximum load
  • multiple indexes and multi-tenant





Founder SeekStorm (Search-as-a-Service), FAROO (P2P Search) https://seekstorm.com https://github.com/wolfgarbe https://www.quora.com/profile/Wolf-Garbe

Love podcasts or audiobooks? Learn on the go with our new app.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store