When you run your own Elasticsearch Instances

So, Meltwater decided to run their own ES setup (on AWS, but still, their own instances).
Why?
In their own words — “AWS Elasticsearch Service allows us too little control, and Elastic Cloudwould cost us 2–3 times more than running directly on EC2
Go read the entire post — it is well worth it (especially the bit about sharding!), but the following performance tips are worth calling out
  1. 1. Limit searches to relevant data. Obvious, but lords, it’s amazing how much this one gets ignored.
  2. 2. Invest in Observability. Do you know where your resources are going? GC stats, io-wait, memory hogs, CPU usage, etc. etc. In particular, get really close and intimate with JVM performance tuning
  3. 3. Memory, not GC. Unless you’re really good, or have a real edge case, odds are that you’re not going to do much with the GC sub-system. Instead, focus on reducing memory allocation.
  4. 4. Azul Zing for Memory. Basically, outsourcing your JVM management It’s an expensive product, but we saw a 2x speedup in throughput just by using their JVM. We ended up not using it though, since we could not justify the cost.
  5. 5. Manage your Shards. Learn all about shard allocation filtering or do it yourself using cluster rerouting.
  6. 6. Profilers. Use one on the ES process. Flight Recorder through Java Mission Control and VisualVM are good bets. Use these, and your new-found JVM knowledge to figure out whats going on, and where you can optimize, by changing your requests. If you prefer to go down the “change the code” road, then be aware that this will be heavily geared towards your current usage patterns…

Comments

Popular posts from this blog

Cannonball Tree!