A story of scalability with OroCommerce

When I load tested OroCommerce for my last blog post (Challenging OroCommerce : a performance test), I confirmed that it provides a very decent web performance, in its "out of the box" configuration. But I also had the feeling that this platform would scale very well.

And as I am a pragmatic and curious person, this seemed like a good opportunity for a new blog post full of testing, measuring and comparisons.

 So I built a large setup on Amazon to see how far I could go with the scalability of OroCommerce. My ultimate goal was:  to identify the definitive limit we can't easily cross just by adding more servers into the pool.

 So let's get right at it!

 

First test

For this first test, the setup I used was like following :

  • 12 webservers nodes

  • ELB for load balancing between webserver instances

  • 1 node for database and redis

  • The test scenario was basic (not with logged-in users)

For more information about the complete setup, see the Appendix at the end of the blog post.

With this setup I was expecting 8-10x more hits/s than I did in my previous test. The reason I wasn’t expecting 12x more traffic, is because using the network to connect to services like Redis and the database is simply less efficient than connecting directly through a Unix socket or locally. Also, the impact of the shared state (database and redis cache) is statistically higher as I added a lot of PHP-FPM workers and then concurrent access, which might lead to locks and/or cache slamming issues.

 Fig 1 - Total time and page views during the 1st load test

Fig 1 - Total time and page views during the 1st load test

I have been quite surprised by the first results, as I only reached 1211 hits/mn (~20hits/second). That's only 68% higher than the 717 hits/mn that I achieved on a single instance.

You know me by now (and if you don’t, drop me an email and we'll solve this :))… I wanted to understand what was the bottleneck.

So when I checked the load average on the www instances it was quite low, but in the meantime CPU time was spent in PHP processes. So, why the heck was the application not able to use all the CPU ressources?

 Fig 2 - Time spent in PHP processing

Fig 2 - Time spent in PHP processing

My conclusion was that the issue came from an external source behind the application code, and possibilities were quite limited if you only consider the most probable options: MySQL, Redis, or Disk IO.

I backtraced code execution during my loadtest (using the wonderful Blackfire.io made by our friends) and noticed that Redis queries were taking more time than usual. And I noticed a difference in MySQL queries execution time too. Finally, I had a look at the network traffic chart and noticed the bandwidth usage was quite high.

So the deduction was simple.

 Fig 3 - Network traffic on the database instance during the 1st load test

Fig 3 - Network traffic on the database instance during the 1st load test

If you look at the network traffic graph above, you can observe a peak at 91.08MiB/s, roughly 730Mbits/s. After googling for network bandwidth benchmarks on AWS, I found that large instances types (like the m4.xlarge I use) were able to reach ~750Mbits/s at max. So in this case, the network bandwidth limitation seemed like a good culprit, which is quite unusual from my experience.

After upgrading the database instance to a larger one to validate my hypothesis, I chose a m4.10xlarge instance which is known to be able to handle 10Gbit/s worth of traffic. :)

As my bank account was directly linked to my new and shiny m4.10xlarge database instance, I immediately ran a load test to see if I was right (spoiler: yup ;) ).

 

How about with a 10Gb/s network? (a.k.a Test #2)

I kept the same setup as for the first test, except for the database which is now a m4.10xlarge instance.

 Fig 4 - Total time and page views during a load test with a m4.10xlarge DB instance

Fig 4 - Total time and page views during a load test with a m4.10xlarge DB instance

This time we reached 7486 hits/mn (~125 hits/s) before the load times reached the ELB health-check limit and my www nodes were getting pulled, in and out of the pool, repetitively, causing a lot of 502 errors.

For those interested, the network bandwidth reached 4.41Gbits/s on the database instance, pretty impressive if you think about the number, but still uncommon (that's about 35Mb/HTTP request for those who don’t want to do the math).

 Fig 5 - Network traffic on the DB instance with a 10Gb/s network

Fig 5 - Network traffic on the DB instance with a 10Gb/s network

The results were interesting and promising as I reached the 10x hosting capacity increase I was expecting in the first run, but it was definitely not a setup suitable nor affordable for most real-world uses.

I first looked around in the source code and documentation to see if there was a way to add compression at the application level to reduce bandwidth usage, but unfortunately it didn't seem possible yet (I've seen a pull request on SncRedisBundle https://github.com/snc/SncRedisBundle/pull/297 that seems to allow compression by choosing a different data serializer, but it's not merged yet).

So, I decided to experiment with a different approach for caches management, which also yielded interesting results, as you'll see now.

 

Another caching approach

I thought a bit about the options and decided to choose what seemed to be the best one for a more realistic load test: make the cache local for each web host (just keeping the sessions in the shared Redis instance). I wanted to go with regular disk cache which seemed to outperform Redis in my previous performance tests, but the oro/redis-config bundle doesn't allow the use of Redis for sessions and not for caches, simultaneously, so I installed Redis locally on each web server host to handle the cache while the Redis server address for the session storage was still pointed to my shared instance.

Please note that this setup could cause some cache synchronizations and warm-up issues. For example, when you clear the cache using the console, Oro puts a task in the message queue to mass-recalculate some caches; but with this setup, they will be recalculated only on the server running the message queue consumer, leading to a performance decrease on all other servers.

I'm pretty sure there are other similar issues to take care of, but it's probably harmless to periodically rsync and load a Redis dump on each node. Another option would be to store the cache on a shared NFS mount point, but be sure to have a good NFS storage behind as it can easily lead to a massive outage if the NFS server goes down. You can also use any other shared filesystem.

I'm sure that's just some of the many options, so if you have any other ideas, I'll be glad to hear from you, through the Comment section (and tell me if you think that it could be a good topic for a next blog post).

Back to business now! So I bumped the web instances count to 20, all configured to use the new local cache mechanism. I also fine-tuned the ELB health-checks to avoid the www instances getting mark as unhealthy before I exhausted their physical resources. I also had to bump the MySQl max_connections to a higher number than the default (150) to be able to handle all PHP-FPM workers connections simultaneously. Then I downgraded the database instance to m4.2xlarge (still a bit larger than for the first try, to make sure that the database server would handle the load, since I doubled the www instances) and re-ran a load test.

 Fig 6 - Total time and page views during a load test with local cache on www nodes

Fig 6 - Total time and page views during a load test with local cache on www nodes

The platform handled 13633 hits/mn at peak time (~227 hits/s), which is ~19x what I reached on my single instance setup.

The biggest bottleneck was still the CPU of the www instances but I noticed the time spent in PHP augmented too (in a larger measure than before, still quite subtle in the chart below but we can see the PHP time goes from ~250ms to more than 1s in peak time), which means the application started to slow down and, thus, limiting requests throughput. A Blackfire backtrace showed that SQL queries were slowed-down quite a bit, so I probably reached the database server optimal throughput.

 Fig 7 - Time spent in PHP during a load test with local cache

Fig 7 - Time spent in PHP during a load test with local cache

That means that with a few more webserver hosts, I would have probably hit the database server limit (however, I could have easily scaled it vertically by choosing a larger instance type). I wasn't interested in going further with this because I already had 20 web server nodes and a large database instance in place, which is a quite high number of resources for any real-world scenario.

Now that I had defined my definitive setup, I wanted to see how it behaved with logged-in users, which is a more complicated but more realistic test for B2B stores.

 

Logged-in load test

I used exactly the same setup for the infrastructure but I configured a new scenario that made the load test probes connect to the store with a test account. This is more realistic for B2B stores because users need to be logged-in to access the pricing.

 Fig 8 - Load test on a logged-in scenario

Fig 8 - Load test on a logged-in scenario

The good news is that the behaviour was exactly the same as with the non-logged-in scenario, which means that the application works well when visitors are logged-in to the store.

The test showed a throughput a bit reduced (13055 hits/mn which gives 217 hits/s on average) compared to the same test with the non-logged-in scenario, which was expected because of the overhead of the logged-in user.

 

Conclusion

The results are exciting because I've firmly established that OroCommerce scales very well. In fact, it scales so well that I wasn't able to reach the scalability limit even with a definitely-not-realistic number of www nodes (I'm pretty sure most stores owner won't spawn more than a few web server hosts and definitely don't need to handle that much traffic). It means you can scale out horizontally peacefully without fearing that  MySQL would crash under high loads (you might need a quite large MySQL server at some time though, but not something too esoteric to be infeasible).

Please note that I didn’t install any community module on top of OroCommerces'. Moreover, I do believe that most stores depend on external modules to run, so I’d strongly suggest to proceed with constant performance and scalability testing, when using extra modules before going to production.

But still, be aware of  the cache system. As I've observed the bandwidth needs are quite high and you probably don't want to have to have a 10Gbit/s connection between your hosts just to reach out your Redis instance. In the latter setup, I used a non-shared cache (each host has it's own version of the cache locally), which is not a suitable option for production environments without prior work to guarantee cache consistency.

Although they provide a different set of features (because of the different needs between B2B and B2C stores), I'd like to compare OroCommerce performance to Magento (versions 1 and 2) as it could yield some interesting results, I'll definitely explore that in a future blog post.

Finally, there is a more personal thought I'd like to share: I'm pretty happy to have spent some time on these tests, because it was the opportunity for me to sharpen my AWS deployment skills and I finally came up with interesting results too.

I'd call that "productive practicing" and I definitely enjoy that. ;)

Please ping me in the comments if you have any feedbacks or thoughts about OroCommerce scalability or if you just want to say hi!

 

Appendix

For those who want to reproduce the performance tests described here, here is the exact setup and tools I used.

As usual, I conducted all my load tests using QUANTA's monitoring tool. It's a SaaS app designed to monitor and manage the performance of ecommerce websites, by simulating the behavior of users going through the classic sales funnel. The load testing feature of QUANTA adds more and more virtual users until the breaking point is reached.

The setup consisted of a few different machines:

  • A m4.xlarge instance to host the MySQL database and a Redis instance
  • A m4.xlarge instance used to run the cron and the message queue consumer
  • A bunch of m4.xlarge instances (www nodes) to serve the HTTP requests, I adjusted the number of those from 12 to 20 during my experiments
  • I used AWS Application Load Balancer to load balance the traffic to www nodes

I was running the demo website on Debian Jessie with:

  • PHP 7 (7.0.19-1~dotdeb+8.1) installed from DotDeb repositories
  • PHP-FPM, configured with pm.static = 8
  • MySQL 5.6 from official MySQL repository
  • Nginx 1.10.3 from the standard debian repositories
  • Redis 3.2.9 from Dotdeb repositories

I followed installation guidelines from https://github.com/orocommerce/orocommerce-application

I ran the command "composer dump-autoload -oa" to optimize the autoloading of classes. Both the OroCommerce crontab and message queue consumer were running during the tests (on a separate instance).

I used the following OpCache configuration:

  • opcache.memory_consumption=512
  • opcache.validate_timestamps=0
  • opcache.interned_strings_buffer=16
  • opcache.max_accelerated_files=30000

I also tweaked MySQL a bit with these options (I have not benchmarked this, that’s basically allowing MySQL to use more RAM and be a bit less aggressive on disk flush):

  • skip-name-resolve
  • query_cache_size=256M
  • innodb_buffer_pool_size=8G
  • innodb_log_file_size=2G
  • innodb_flush_log_at_trx_commit=2

Finally, here are the URLs I configured for the load test non-logged-in scenario:

  • Home: /
  • Category: /new-arrivals
  • Sub Category: /new-arrivals/lighting-products
  • Product: /new-arrivals/lighting-products/_item/500-watt-work-light

And for the logged-in scenario:

  • Home: /
  • Account: /customer/user/login
  • Login: /customer/user/login-check (POST)
  • Logged-in: /?_rand=<random number from the JSON response of the Login step>
  • Category: /new-arrivals
  • Sub Category: /new-arrivals/lighting-products
  • Product: /new-arrivals/lighting-products/_item/500-watt-work-light