Tag Archives: MySQL

How to re sync MySQL slave when duplicate entries exist.

So your slave is out of sync and RESET the slave but you see this error when doing a SHOW SLAVE STATUS \G;

Last_Errno: 1062

How to remedy.


1) > Stop slave.
2) update my.cnf with

slave-skip-errors = 1062

3) restart MySQL

4) Start Slave.

Bonus other Errors:

Delete_rows NO RECORD ERROR (1032) , Table already exists (1050)

add multiple like:
slave-skip-errors = 1062, 1032, 1050


Show tables like === BAD , Sharding saves DBs and Open Source debugging

Just a quick post on why sharding makes large WordPress installs happy with a real life example and how seeing the code makes debugging and lifting a service out of a downtime much easier.

We sharded the db of UBC Blogs about a year into the project in 2010 after reading about potential bottle necks a large WPMU install can hit and generally just makes the db happy. Why this makes a DB happy are situations like what happened last week. We had a plugin on UBC Blogs running happily since May with no problems churning out xAPI statements by the thousand . We needed to move this to another WordPress multi site install with far fewer sites and users as UBC Blogs (still a fair amount 2000+ sites) but not partitioned / sharded. It ran for a week no problems noticed… once the students arrived though bam the CPU was devoured on the database server quickly watching the idle drop to 20% it was clear a crash was going to occur.

Thoughts were initially
1) Did the devs push anything new? confirmed that was a no.
2) Was it just more traffic? Looking at the analytics it was not up near a peak that we handled no problem in the past?

Thinking it might have been more traffic and some bad code we asked for more CPU / RAM from IT which they delivered. That brought the idle up but not out of the woods still around 25% and eventually tanked briefly.

Quickly started digging through the web logs nothing out of the ordinary then onto the db logs no MySQL errors. Then moved onto the db console using the old SHOW PROCESSLIST; It was revealed a nasty SHOW TABLES LIKE query was running often which is absolutely devastating on a multi site install with an un partitioned database, this query had to traverse 50000+ tables just running the query raw took 4 seconds to execute. On UBC Blogs because the db was partitioned and the table is located in a global db this query was pretty much instant zero performance impact. Once realizing this was the issue quickly asked Richard to kill the plugin that was doing this and we were back to normal. The plugin fix was very easy since the plugin only had to run that query once on installation Richard patched and put it back into production quickly.

Take aways?

  • Partitioning is good.
  • Having an application performance monitor would be best to locate bad code / issues more efficiently (time for New Relic?) .
  • Having the ability to see fix and re-deploy code quickly in an open source enviro makes getting out of bad situations manageable in some instances relatively painless (I actually have no idea how we would get out of this if it was a black box closed source app call the vendor???) .

Time for Memcache. Qcache off?

Seems like we constantly hover around 60% for Qcache hit rates. Based Qcache_hits / (Qcache_hits + Qcache_inserts)

MariaDB [(none)]> show status Like '%Qcache%';
| Variable_name           | Value    |
| Qcache_free_blocks      | 8618     |
| Qcache_free_memory      | 27146520 |
| Qcache_hits             | 14722398 |
| Qcache_inserts          | 8665581  |
| Qcache_lowmem_prunes    | 3151341  |
| Qcache_not_cached       | 708833   |
| Qcache_queries_in_cache | 25966    |
| Qcache_total_blocks     | 71756    |
8 rows in set (0.03 sec)

The global wp_usermeta table in WordPress which is huge on a site like UBC Blogs is constantly being updated so the Qcache would constantly be invalidated. Wondering if we would be better off with it off? Does invalidation cost more than the savings… deep thoughts for a DBA…

Pretty sure if we committed to implementing Memcache for object cache we would would see gains.

I love UBC IT VMs also Caching because you’re not Caching everything.

Students are definitely back on campus! The Sub is packed there are line ups for coffee at all the Starbucks and some of our web servers have increased traffic 10x fold!

10x fold? This is related to the CMS WordPress install we help maintain/develop with Public Affairs one site on that service is the reason for the traffic spike basically all logins/logouts to the campus LMS is directed through that site otherwise it would have been more or less business as usual for that service.

We did plan for this and load tested and we thought it could handle it but when we were hit with a huge flood of traffic because of a Vista outage we were crushed CPU was maxed out. *This is the reason why I love UBC IT VMs imagine being in this scenario with a locked in piece of hardware you would be hooped big time. We were able to contact UBC IT Infrastructure and they added a CPU literally on the fly! Do you know how cool that is? It’s very cool. Chris dropped the CPU in and the load came down the site was still sluggish but functional. VMs rock!

BUT… this was still very surprising to me to see it crumble on Tuesday. I have had a site hit by front page Digg and Reddit traffic which received considerably more traffic on a 7 dollar per month Webhost account back in 2006 and did not crash! I was simply using wp-cache 2 (before Super-Cache) a Cutline theme and one plugin! No APC no Super-Cache no MySQL Query Cache! It held up on shared host with hundreds of other users. I was for lack of a better word PO’d at our fail whale. To remedy we looked at all our code we disabled all plugins that were not being used cleaned up some code, made some tweaks to PHP and it seemed to help. The following night during the next big spike (which had even more users) it responded OK but still not as fast as I would like it to be.

Why did this fail? We did load testing right? I believe it is because our CLF theme/plugins are considerably more heavy than when we tested (there were a lot of code changes over the month). During the post crash testing we tested disabling and enabling widgets and showed you can 2x the number of page requests your page can handle which is significant (I am not sure what widgets were enabled during the initial test the site was still being actively developed). Another culprit is Domain Mapping which is used on most of the websites on the service. Domain Mapping runs each time a page is hit even when the page is cached via Super-Cache . Luckily we use MySQL Query cache so most of these would have been served from memory so it doesn’t change page requests/second by that much. But this really shows how much of a hog for resources WordPress can be if you let it and why you should do all the little things:

mysql> SHOW STATUS LIKE 'Qcache%';
| Variable_name           | Value    |
| Qcache_free_blocks      | 11524    | 
| Qcache_free_memory      | 36261544 | 
| Qcache_hits             | 50840927 | 
| Qcache_inserts          | 2040382  | 
| Qcache_lowmem_prunes    | 113918   | 
| Qcache_not_cached       | 1857166  | 
| Qcache_queries_in_cache | 24673    | 
| Qcache_total_blocks     | 64344    | 
8 rows in set (0.00 sec)

I hope we think more critically about features we add to higher traffic pages in the future and think more new school in web development. New school as in performance matters and little changes on the app level front end / back end code can help more than old school adding a whack of hardware (ironic because we are doing/did this… but I argue that is so we can do upgrades with less downtime not as much performance).

*I think FB, Google, Yahoo! really promote the new school style well I would say they started it. FB basically re-tooled PHP and contributed to APC this probably saves them tons of $cash by not adding a boatload of new servers as well as make their users happy with faster response times. Yahoo! does a great job promoting front end performance the same with Google’s Let’s make the Web Faster. I like the way they think.

Database Makeover

My brain is adjusting to being back to work after a computer less vacation so I will take this time to do a blog post on the database setup on UBC Blogs.

Before a few weeks ago we had one server and things were ok’ish but we planned on adding the BP for September along with a shiny CWL login button so it meant a lot more users might jump on board… so we added a couple more  servers a database Master/Slave setup with a few more to come (another web and a cache).

But… before the Master/Slave we did database “partitioning”, what is this besides something that made me more than nervous for a fair amount? Well because WPMU creates around 8 database tables per blog overtime this will grow to a very large number of tables things start to get slow during backups (very slow actually would take 10-30 minutes to do a backup, this is because it would have to lock up a large number of tables 21k+) performance also begins to lag reportedly once you move past the 1000+ blog range, so we needed to move to a “Mulit-database” setup to be exact which basically involves blowing apart your database and enabling a Multi-DB plugin. *There are a few Multi-DB plugins: Hyper-DB used by WordPress.com which is quite complicated and provides little to now help documentation or support. SharDB developed by Ron Rennick which is based on HyperDB and then there is Multi-DB which is developed by WPMU Premium Dev and used by Edublogs… I decided to go with the Multi-DB *for now* first choice would have been SharDB because of the fact it’s based on HyperDB and it is a community plugin vs the “premium”  but there were issues once the database moved to it’s own server (connection/timeout errors) it worked beautifully on the same box but on it’s own a no go. Ron mentioned this will be fixed in the future at which time I think we would switch.

Should be noted doing this step although nerve racking is relatively straight forward because of the great setup scripts that accompany both SharDB and Multi-DB and some great how to posts by Jim Groom I was able to figure this out and test during a Sakai presentation… implementation on the live took a fair amount of time approx 30 minutes (breaking 1 db into 256 dbs is slow). Once this was completed I have to say it felt faster (no empirical evidence) but doing things that were not so snappy prior like as an admin listing all users reported results much faster. I implemented the Master/Slave setup the following AM which is probably the coolest part because we have zero downtime with backups now and essentially a spare server… (if the master died we could flip to this quite easily). Here is how that was done *I am not a DBA like I said in the past so there maybe better ways but this seemed to work well so far*.

On the Master:
the my.cnf contains these settings related to master slave:

# for master/slave

# Enable binary logging
#Expire the logs every three days

Create a slave user on the master:

 *.* to slave_user@'SLAVE_IP_ADDRESS'  
 IDENTIFIED BY 'slave_pw';

Dump the master db and import into the slave.

On the Slave include this in your my.cnf:

# for master/slave
# changes made to do slave

Start the slave:

mysql> start slave;

Check the status of the slave:

mysql> show slave status\G

If things worked out you should see:

Slave_IO_Running: Yes
Slave_SQL_Running: Yes

Try to run SHOW MASTER STATUS and SHOW SLAVE STATUS (quickly master first) you will see the Relay_Log_Pos will be the pretty much the same:

On master:

mysql> show master status\G ;
*************************** 1. row ***************************
File: blogsdb1-bin.000009
Position: 579567003
1 row in set (0.00 sec)

Exec_Master_Log_Pos should be close to the Position on master.

Now for backups just have your backup script run stop slave then dump and restart which gives no downtime for backups on production! Bloggers can post now at 3:00 am and not make passive aggressive tweets about blog DT.