Category “Capacity Planning”

Capacity Planning: Query Throughput

This is the second article in my series on capacity planning in ElasticSearch. In the last article, we explored indexing throughput as one increases the number of shards and replicas in a cluster.

In this article, we are going to give the same treatment to query throughput.

Index Data and Mapping

For a refresher on cluster layout and hardware specs, check the first article in this series.

We used the same mapping as seen in the first article. The documents held three fields (timestamp and two strings). Each string field was 100 words long and around 500-600 characters each, totaling a doc size of around 1kb. The documents were then shingled at index-time.

The indices were created with the desired number of shards and then pre-filled with 10 million documents, roughly 40GB per node.

Query Throughput

The purpose of this test was to provide a baseline evaluation of a fairly simple query, to see how it would perform across various shard/replica setups. The queries were pre-generated using the same word-list that created the documents, with each query using 1-4 words in a match query. Some examples:

      "field1":"still bread bought say"

      "field1":"score slow music"


And the results from this experiment:

As before, you can safely ignore the actual queries/second value – that is highly dependent on hardware, mapping and document configuration. So what do we see? Well, it appears that as you add shards or replicas, query throughput increases.

But based on the last article, increasing shard count is the primary mechanism to increase indexing throughput…why is it also increasing query throughput?

Zero Replicas

Let’s dive into the Zero Replica situation first. In the [1s 0R] situation, we have 10 million documents living in a single shard. The query rate of roughly 80 queries/second is the base query rate for a single node given my index/doc/mapping configuration.

We are going to take a moment and do some basic math, converting requests-per-second into seconds-per-doc. What we want to determine is the amount of time Elastic spends looking at an individual document.

  • 80 RPS = 12.5ms per request
  • 10 million documents/request * 12.5ms/request = 0.00000125 ms per doc

So that value, 1.25E-6 ms/doc, is just another way to say that our shard does 80 queries/second. But it’s useful because it helps us to understand what happens when we add a second shard.

With the second shard, the index is now split in half: 5 million docs for each shard. When ElasticSearch executes a search request, it doesn’t know where your document lives. It has to broadcast the query to all available shards.

In [2s 0R], a query request is broadcast to both shards (one on each node). The shards maintain their base query rate of 1.25E-6 ms/doc, but now only have 5 million docs to search.

  • 5 million docs/request * 1.25E-6 ms/doc = 6.25 ms/request

As you can see, the 5 million doc shard completes in half the time as 10million docs, but since both shards are chugging away at the same time, you effectively search 10 million docs in half the time as well. By involving another node, you double your query throughput. Cool!

The same logic applies to the [3S 0R] configuration. Base rate of 80 rps, three nodes involved, ~math~, end result of 250 rps.

One Replica

The [1S 1R] configuration is basically identical to the [2S 0R] configuration in terms of query throughput, and the reason should be obvious. In both situations, two machines are involved in the query process, so you get a combined throughput of roughly 150 queries/second.

Sidebar: Admittedly, the situation is slightly different. In [2S 0R], you are broadcasting your search to two shards with 5 million docs each. In [1S 1R], ElasticSearch is picking one shard to query (either the primary or a replica), and both shards contain all 10 million docs.

However, the net result is the same. You are either querying two shards with half the docs all the time, or you are choosing one shard with all the docs…but only choosing that shard half as often).

But how about [2S 1R]? That breaks the trend of nodes * 80 = queries/sec. This one is best understood with a diagram of the physical shard layout in the cluster:


ElasticSearch attempts to balance the shards as evenly as possible. We have four shards to distribute across three nodes which means one node will have to double up.

At search time, our cluster needs to broadcast our query since two shards are involved. But at the same time, the shards are replicated, which means Elastic can choose which set of shards to broadcast to (the primaries or the replicas).

So a query will be broadcast to either [S1 + S2] or [S1 + S3] or [S2 + S3]. This situation involves three nodes in the query process, but node S2 tends to see more traffic than the other two since it has two sets of shards. ElasticSearch round-robins requests and doesn’t take load into account, which means that S2 will tend to be the bottleneck.

This is why the combined query throughput is slightly less than the theoretical “three machine” throughput of around 250 rps.

When the configuration switches to [3S 1R], you are back to an even distribution involving all three machines, so your query performance is back to the theoretical max.

Two replicas (aka fully replicated)

No surprises here: fully replicated setups evenly distribute all the shards to all the nodes. Regardless of how many shards you have, all three machines are involved and you hit that magical 250 rps number.

In theory, the [3S 2R] setup should be the fastest. Each shard only contains 3.33 million records which should increase the query search time slightly, and the setup is fully replicated so there are plenty of broadcasting choices.

However, in this test the results are basically identical across the board. This performance increase may manifest in larger indices, or those with different query constraints, but for this test it was a non-issue.


Once you boil down the query process into the number of participating nodes, query throughput is easily estimated. For the most part, query performance scales linearly – for more throughput, add more nodes. Choosing replica count increases availability as well as query throughput, although similar speeds can be obtained with fewer replicas at the expense of adding more nodes/shards.


These first two articles are, frankly, just the tip of the iceberg. Elastic is so immensely flexible in arrangement that these two tests are only vaguely representative to cluster performance. In the next few articles, we are going to focus on all the various components that can affect throughput and capacity planning. As you’ll see, the numbers can change drastically depending on how your cluster and data is configured.

Like this article?

Capacity Planning: Indexing Throughput

This is the first post in a small series about capacity planning. Of all the questions that surface about ElasticSearch, arguably the most frequent is “How large can ElasticSearch scale?” or “Given this hardware, how many QPS can I hit?

Invariably, the answer is: “It depends“. Ugh.

Unfortunately, it’s true, it really does depend. ElasticSearch provides a remarkable level of flexibility in data organization and query variety. The QPS of a cluster handling small docs with simple analyzers will be vastly different than one handling large docs with complex, multi-field mappings.

All the factors in a cluster (hardware, doc size, doc fields, analyzers, indexing load, query load, query variety, etc etc etc) all contribute heavily to your final yield. However, “It Depends” is an intensely unsatisfying answer even if it is true.

In this series, we are going to look at the various factors that contribute to performance differences in a cluster. The “It Depends” mantra still applies – the results from my cluster will not be the same as from your cluster. But the relationships between values should hold relatively constant between clusters, at least enough to make generalities.

In short: don’t take these numbers to your boss as proof. Rather, use these tests as a framework for performing your own capacity planning and benchmarking.

Cluster Setup

Our cluster consists of four servers with the following specs:

  • Intel Core i7-2600 Quadcore
  • 32 GB RAM
  • 2 x 3 TB SATA, 6 Gb/s 7200 HDD (Software RAID 1)
  • Minimal Debian installation

Cluster setup

Three servers live in the cluster as proper data nodes (S1, S2, S3), while the fourth server (C1) is acting as the benchmarking server. This server runs JMeter and a client node.

The client node doesn’t index any data, it simply routes requests to the appropriate data node. JMeter generates HTTP requests and sends them to the local client node (, and then the client node routes that request to a node using ElasticSearch’s binary protocol on port 9300.

(If a client node isn’t used, JMeter would have to round-robin requests between the various data nodes, and each node would be responsible for re-routing the request to the appropriate node. I wanted to avoid the overhead of this re-routing, so a local client node made the most sense.)


I tried several mappings for this benchmark test and settled on a relatively “heavy” mapping (shingling over moderately sized fields). The reason is simple: my benchmarking server fell over before the cluster on “lighter” mappings. When testing a three node setup, my benchmarking server was having difficulty saturating the cluster before it was saturated itself.

By using a relatively intensive mapping, this guarantees that the benchmark values are accurate and due to latency in ElasticSearch, not artificial latency introduced by JMeter.

The mapping used in this test can be found at this gist.

Documents consisted of a timestamp and two 100-word fields. Each field held around 500-600 characters, so total doc size was roughly 1kb. JMeter cycled through a pre-compiled CSV of field contents, which was was randomly generated ahead of time from a common english word-list. While not entirely “real life”, the fields are less artificial than spewing random characters.

Indexing Throughput

In this test, I wanted to look at the throughput of “realtime indexing”, where one does not have the luxury of using the (much!) more performant bulk API method.

You could envision this happening in situations where indexing may be dynamic and variable, such as indexing tweets. There are plenty of ways to batch new documents into a bulk request, but for the sake of this test, let’s assume this is impossible (we’ll look at bulk indexing in a later article).

Jmeter was configured to pump as many requests through the cluster as possible. Tests were run for an hour. Indexing throughput didn’t noticeably decrease as the test progressed (e.g. performance is not related to index size, at sub 200Gb index sizes), so I simply averaged the QPS into a single value.

Indexing throughput

As we increase the number of shards, the indexing throughput increases. ElasticSearch automatically distributes data evenly across available shards, so it is unsurprising that performance increases as you add more shards (nodes) to the cluster. More nodes == more parallel indexing operations, which translates to higher QPS.

Replicas decrease throughput

It get’s a little more interesting when you add replicas. The [1S, 1R] and [1S, 2R] configurations suffer a slight performance decrease, but not huge. In both configurations, a node still only has one shard to manage. All server resources are dedicated to that single shard, so indexing speed remains relatively constant.

Once you get into configurations where a node holds more than one shard (primary or replica), you start to see some serious performance loss. In these setups, a node is responsible for both indexing primary shard requests as well as replicating docs from other shards elsewhere in the cluster. Unsurprisingly, this crushes cluster QPS.

If you normalize the values, you can see that in most cases adding replicas will drop your indexing throughput by 20-60%. For example, the “fully replicated” [3S, 2R] configuration operates at 40% the throughput of an unreplicated setup.

Normalized indexing throughput

Capacity Planning

You run benchmarks so you can make predictions about situations in a production environment. So what kind of predictions can we draw from this simple test?

Well let’s assume that your product environment has an estimated indexing load of 2000 documents/second and you are unable to batch them (and that they are similar in size/complexity used in this test). You also want one replica, for both query performance and higher availability.

Based on this benchmark, you can see that a two-node cluster with the given hardware specs can accommodate your indexing load…but just barely. A three or four node cluster would be more appropriate. We’ll leave it at that for now, but suffice to say it get’s a lot more complicated when you consider simultaneous querying.


This was a fairly straightforward test, and I don’t think the results are surprising to anyone. More shards == higher indexing throughput. More replicas == less indexing throughput.

It should be noted that we aren’t saying replica’s are bad. Replicas increase query speed and availability. In the next article, we’ll look at query speed in relation to shard/replica count, and it should be fairly obvious why a cluster administrator must choose the shard/replica ratio that is appropriate to their expected load.

Enjoy this article? Leave a comment or share with your friends