Ten Simple Rules – genomic epidemiology edition

Monday morning status update: attempting to clear out most of the inbox backlog before leaving this afternoon for IMMEM XI in sunny Portugal. And I suppose I should write my IMMEM keynote talk at some point before Wednesday afternoon too. My assigned topic is integrating genomic data with epidemiological data for outbreak investigation, and I’m thinking of doing a PLoS Comp Bio “Ten Simple Rules” sort of thing, with anecdotes from our work and other key studies in the genomic epidemiology space.

What have I got so far? Here are a few; feel free to comment with more (if I use them in the talk, you’ll get a shout-out on the slide):

Clinical bits to remember:

  1. Know your pathogen. Different bugs have different quirks that will affect an outbreak reconstruction, like latency, asymptomatic carriage, varying levels of infectiousness.
  2. Think SIIR – Susceptible, Infected, Infectious, Removed. Each patient will move through these stages in different ways. For each individual, try to plot their potential trajectory. For some bugs with well-defined stages, like measles, this is easy. For others, mehhhhh, not so much.
  3. Not all hosts are created equal. Within-host genetic diversity already bit us in the @$$ (see this post for a bit of background on that), and both recent and older work has shown that in diseases like TB, the within-host mutation rate fluctuates rather wildly. Stringent thresholds for person-to-person transmission might miss events involving a mutational-hotspot host.

Epidemiological bits to remember:

  1. People, places, food/water, and things all need to be considered. In any given outbreak, at least two of these things are probably in play, if not more. Your reconstruction diagram shouldn’t be just a bunch of people connected by arrows – there’s always more to the story.
  2. When the genomic data says “maybe” and the field epis say “definitely”, believe the epis. Genomics is great for ruling out transmission events, but in complex outbreaks, the ruling-in part is harder. I’d trust a public health nurse over a model any day, especially for outbreaks like TB.
  3. The line list is only half the story. Talk to the epis and nurses who collected the data about each case, especially if you’re trying to link two cases and the line list isn’t giving you much to work with. There’s always way more to the story than what ends up in the Excel spreadsheet.
  4. Following on from #3, learn to love tolerate Excel. Epidemiologists love a good spreadsheet. Tools like Microreact (David Aanensen) and NextFlu (Trevor Bedford) are helping move them into the next generation, but for now, expect to be sent about a million Excel sheets.
  5. Spread the ontology gospel. Groups like IRIDA (van Domselaar/Brinkman/Hsiao) are working towards developing a shared vocabulary for genomic epidemiology – try repackaging what you dig up out of those Excel files into these standardized vocabularies.

Big picture bits to remember:

  1. Genomics and phylogenetics is a strange and foreign landscape for most epidemiologists, but they’re enthusiastic to learn. We need to make our work accessible and open, and provide more opportunities to train our colleagues in understanding and interpreting genomic data.
  2. Public health agencies change slowly. Don’t expect to march in with a new ontology and a new platform for collecting and displaying data and revamp the whole system overnight. Public health changes in response to evidence – when you can show that something works better, cheaper, and faster, then you can start making changes. Moving genomics to routine clinical use takes years of relationship-building.

So, uh, that’s ten. But my “Ten Simple Rules” can totally turn into 12, 15, 18, 36, 47, etc simple rules, so feel free to chime in with new ones below, or comments/feedback/examples on the ones above. I should probably add the caveat that for each of those, your experience may vary depending on your pathogen and the team you’re working with.