Behind the preprint: Declaring a tuberculosis outbreak over with genomic epidemiology

I’ve got some time, it’s Friday at 2:30 and I don’t want to start anything new or look at my to-do list, so, inspired by Nick Loman‘s behind-the-paper blog post about his Ebola nanopore sequencing work, here’s the behind-the-scenes story of a preprint we (myself, Caroline Colijn, her student Hollie-Ann Hatherell, and Xavier Didelot, on behalf of a larger author group) recently posted to

But first, the paper…

Declaring a tuberculosis outbreak over with genomic epidemiology

We report an updated method for inferring the time at which an infectious disease was transmitted between persons from a time-labelled pathogen genome phylogeny. We applied the method to 48 Mycobacterium tuberculosis genomes as part of a real-time public health outbreak investigation, demonstrating that although active tuberculosis (TB) cases were diagnosed through 2013, no transmission events took place beyond mid-2012. Subsequent cases were the result of progression from latent TB infection to active disease and not recent transmission. This evolutionary genomic approach was used to declare the outbreak over in January 2015.

Alright, story time. Back in December 2010, my co-worker (and cat aficionado, which I can say here because I bet he won’t ever read this post) Jay Johnston and I gave a BCCDC Grand Rounds talk on our work using genomics to unravel a tuberculosis outbreak (see the NEJM article for more on that story). These things are webcast throughout BC and by the time I had packed up my computer and walked the two flights of stairs back my office, I already had a voicemail on my phone from Rob Parker, the then-Medical Health Officer in Kelowna, BC. He had been dealing with a large outbreak of TB in the region and wanted to know if we could use our genomics approach to figure out whether their outbreak management strategy was on the right track.

In late 2011, we sequenced the first 33 cases of the outbreak, along with 7 cases from elsewhere in BC with the same MIRU-VNTR genotype but no epi-link to the outbreak. We got the data back in early 2012, and noticed something odd. Three SNVs separated cases #1 and #2, even though we know #1 had to have infected #2. 3 SNVs is around six years’ worth of evolution in TB, so we were a bit confused. Then we realized we’d not accounted for within-host diversity – variants that arose in a host and that were then transmitted on. This was also around the time that a MRSA outbreak paper came out and Twitter realized that this diversity is an issue – see Ed Feil’s guest post at Nick Loman’s blog for more on that.

I had been working with Caroline Colijn at Imperial College London on a project investigating how different patterns of outbreak spread lead to different structures within a phylogeny, and when she saw this diversity issue, she teamed up with Xavier Didelot, also at ICL, to tackle it. The result was TransPhylo, which we published in MBE in 2014 and which is currently undergoing an update and a port to R. Expect a preprint on that soon. TransPhylo can infer potential person-to-person transmissions from pathogen genome data, and can also infer when those transmission occurred. This will become important later.

Anyway, many years after they first asked, we were able to give the Kelowna public health team a nice outbreak reconstruction and tell them “yes, your management strategy is working”.

A few more years passed, and on one of our regular outbreak management team conference calls, the new Medical Health Officer for Kelowna, Sue Pollock, brought up the idea of declaring the outbreak over – incident cases had been declining, and only one case was diagnosed in 2014. There’s no great definition for a TB outbreak being “over”, but the generally accepted wisdom is that if you don’t have transmissions occurring for two years, you can stand down (though you’ll always get a long tail of cases that continue to activate over many years). The problem is that because TB can go latent and then wake up years after the actual transmission, it’s hard to tell whether a case you’ve just diagnosed is somehow who was just infected or whether they were infected years earlier and are just progressing to active disease now. If the former, boo – your outbreak isn’t over. If the latter, congratulations.

Sue was familiar with the TransPhylo method and how it could date the time at which a person had likely become infected, and she raised the idea of using genomics to determine whether the outbreak had truly ended – were the cases we saw in 2013/14 the result of recent transmission or had they been infected earlier? Patrick Tang, who was leading BCCDC’s mycobacteriology laboratory at the time, extracted DNA from the 15 cases we had diagnosed since our 2011 genomics study and got them onto our in-house MiSeq in late 2014. The data arrived right before Christmas, and by early January 2015, Xavier and Caroline had reported some preliminary TransPhylo data back suggesting the last transmission occurred in mid-2012. We shared this with Sue and her team on a conference call in late January, and the outbreak was declared over shortly thereafter.

Caroline’s student Hollie had an idea for improving the accuracy of our timing inference by replacing TransPhylo’s SIR model with a branching model, so she set to work on that task. When we revisited the data with her updated model later in 2015, we again got the happy result that transmission ended mid-2012. Hollie presented the work as a poster at Epidemics, and as a group, we wrote it up in January 2016. That’s the preprint that’s up on biorxiv now – a brief writeup of the analysis and details on the branching model.

The genomic data for this paper is (mostly) up at ENA – I say “mostly” because all the genome files are on their server but I keep getting an MD5 checksum error for many of the files that I haven’t had a chance to fix yet. As soon as I get that sorted (next few weeks, hopefully), I’ll write up a little bit about the dataset and post it here.