Tag Archives: scaling

Show tables like === BAD , Sharding saves DBs and Open Source debugging

Just a quick post on why sharding makes large WordPress installs happy with a real life example and how seeing the code makes debugging and lifting a service out of a downtime much easier.

We sharded the db of UBC Blogs about a year into the project in 2010 after reading about potential bottle necks a large WPMU install can hit and generally just makes the db happy. Why this makes a DB happy are situations like what happened last week. We had a plugin on UBC Blogs running happily since May with no problems churning out xAPI statements by the thousand . We needed to move this to another WordPress multi site install with far fewer sites and users as UBC Blogs (still a fair amount 2000+ sites) but not partitioned / sharded. It ran for a week no problems noticed… once the students arrived though bam the CPU was devoured on the database server quickly watching the idle drop to 20% it was clear a crash was going to occur.

Thoughts were initially
1) Did the devs push anything new? confirmed that was a no.
2) Was it just more traffic? Looking at the analytics it was not up near a peak that we handled no problem in the past?

Thinking it might have been more traffic and some bad code we asked for more CPU / RAM from IT which they delivered. That brought the idle up but not out of the woods still around 25% and eventually tanked briefly.

Quickly started digging through the web logs nothing out of the ordinary then onto the db logs no MySQL errors. Then moved onto the db console using the old SHOW PROCESSLIST; It was revealed a nasty SHOW TABLES LIKE query was running often which is absolutely devastating on a multi site install with an un partitioned database, this query had to traverse 50000+ tables just running the query raw took 4 seconds to execute. On UBC Blogs because the db was partitioned and the table is located in a global db this query was pretty much instant zero performance impact. Once realizing this was the issue quickly asked Richard to kill the plugin that was doing this and we were back to normal. The plugin fix was very easy since the plugin only had to run that query once on installation Richard patched and put it back into production quickly.

Take aways?

  • Partitioning is good.
  • Having an application performance monitor would be best to locate bad code / issues more efficiently (time for New Relic?) .
  • Having the ability to see fix and re-deploy code quickly in an open source enviro makes getting out of bad situations manageable in some instances relatively painless (I actually have no idea how we would get out of this if it was a black box closed source app call the vendor???) .

Time for Memcache. Qcache off?

Seems like we constantly hover around 60% for Qcache hit rates. Based Qcache_hits / (Qcache_hits + Qcache_inserts)

MariaDB [(none)]> show status Like '%Qcache%';
| Variable_name           | Value    |
| Qcache_free_blocks      | 8618     |
| Qcache_free_memory      | 27146520 |
| Qcache_hits             | 14722398 |
| Qcache_inserts          | 8665581  |
| Qcache_lowmem_prunes    | 3151341  |
| Qcache_not_cached       | 708833   |
| Qcache_queries_in_cache | 25966    |
| Qcache_total_blocks     | 71756    |
8 rows in set (0.03 sec)

The global wp_usermeta table in WordPress which is huge on a site like UBC Blogs is constantly being updated so the Qcache would constantly be invalidated. Wondering if we would be better off with it off? Does invalidation cost more than the savings… deep thoughts for a DBA…

Pretty sure if we committed to implementing Memcache for object cache we would would see gains.

WP Super Cache vs Batcache / Memcache

Furthering our attempts to make our WordPress install handle real time classroom back channel usage I started testing Batcache/Memcache in our verf environment as a replacement for the good old reliable WP Super Cache . WP Super Cache is great in a single server install with traditional traffic serving cached pages but once you have a fair number of users logging in across a multi server install it maybe time to move to a  persistent backend cache for the WordPress object cache.

memcache installed on a 1 CPU / 4 GB RAM VM runing RHEL 6.3 (Santiago)

on the web servers pecl memcache-3.0.7 installed *this was key 2.2.7 the default was very buggy. web server had 4GB of RAM and 2xCPU

memcache settings:


Test Results comparing WP-Super-Cache vs Batcache/Memcache

using ab -n 600 -c 100 against a site running PulsePress

wpsupercache vs batcache

WP Super Cache

Concurrency Level:      100
Requests per second:    49.32 [#/sec] (mean)


Concurrency Level:      100
Requests per second:    1007.86 [#/sec] (mean)

That is a huge jump in requests per second.

Trying even bigger loads with 250 concurrent making 5000 requests still had a pretty solid result:

Requests per second:    333.19 [#/sec] (mean)

Tuned Apache vs Nginx

WordPress is great when used as a general CMS/Blog platform it can scale quite easily using some basic caching methods, but when you start using it as an app with a fairly large number of users logged in like in a live classroom backchannel/response system scenario some performance issues start to pop up.

We are going through all the components and trying to see how we can improve performance. One is moving away from Apache and switching to Nginx. I did a very quick test comparing a tuned Apache with PHP and APC enabled vs a Nginx PHP-FPM APC install results are below. Should note I did not do any tuning with Nginx (will be the next test) but as you can see Nginx performs much better under high load.

using: ab -n 600 -c 100



Concurrency Level: 100
Time taken for tests: 54.546 seconds
Complete requests: 600
Failed requests: 0
Write errors: 0
Total transferred: 9122400 bytes
HTML transferred: 8999400 bytes
Requests per second: 11.00 [#/sec] (mean)
Time per request: 9090.968 [ms] (mean)
Time per request: 90.910 [ms] (mean, across all concurrent requests)
Transfer rate: 163.32 [Kbytes/sec] received


Concurrency Level: 100
Time taken for tests: 35.677 seconds
Complete requests: 600
Failed requests: 0
Write errors: 0
Total transferred: 9125400 bytes
HTML transferred: 8999400 bytes
Requests per second: 16.82 [#/sec] (mean)
Time per request: 5946.115 [ms] (mean)
Time per request: 59.461 [ms] (mean, across all concurrent requests)
Transfer rate: 249.79 [Kbytes/sec] received