statistical genetics – The Modern Forest

Want to know about the history of the populations you’re studying? Joane Elleouet and Sally Aitken see how far Approximate Bayesian Computation (ABC) and your sequencing method of choice can take you in a new paper in Molecular Ecology Resources.

20 years ago, Tavaré et al. used ABC to estimate that the time to coalescence for the human population based on the Y chromosome was about 157,300 years ago (but!) and evolutionary biologists were off to the races. Now ABC is a common tool in the field with many software implementations to choose from. A lot else has happened in the last 20 years – like huge advances in genotyping technology. Even for non-model species, we can now get lots of genomic data for cheaper than ever using reduced representation library sequencing methods like genotyping-by-sequencing and GBS.

But how well does ABC perform with RRL data and different demographic models? What are its limitations? How do you make the best choices for your sequencing efforts? To answer these questions, Jo simulated data sets for 4 kinds of demographic models and 5 types of sequencing efforts and performed ABC on those datasets. She compared different model’s performance with

phased and unphased data, (phasing doesn’t help)
data from lots of short reads vs fewer, longer sequences (lots of short reads just as good)
different times since colonization, (depends on parameter value and demographic model)
tradeoffs between number and individuals and sampling depth at different sequencing error rates, (go for more individuals over depth)
and compared ABC to an SFS method. (similar)

As far as the different demographic models they consider, they find that ABC can be used with data from reduced representation library sequencing methods to precisely infer very simple demographic models, but not complex ones.

Here’s what Aitken Lab members had to say after reading the paper:

What’s your takeaway from this paper?

Reader 1: This paper provides several rules of thumb for inferring demographic events from incomplete, fragmented genomic data. Demographic models should be kept as simple as possible, and numerous short sequences from many individuals is preferable to fewer long sequences from a small sample of the population.

Reader 2: You’d better know what kind of demographic history your population has before you start trying to estimate parameters!

Reader 3: Estimating demographic parameters with ABC has limitations even with very simple demographic models

What’s the coolest thing about this paper?

R1: Improving techniques to infer the ancient demographic history of any species you like, not just model species.

R2: I didn’t realize how hard inferring demographic history is, even with so much genomic data. The extensive simulations are really impressive and convincing.

R3: Adding a realistic component by testing the effect of sequencing depth and error

What questions are you left with after reading this paper?

R2: How often do researchers know the “right” model of demographic history to try to infer? Whether or not to include migration or how many populations there have been? How do people figure this out?

R3: What if summary statistics lead to a too drastic loss of data? is there a better way to summarize the data while keeping crucial information?

The Modern Forest

Does my forest look good in these genes?

Tag: statistical genetics

The Homebrew Series: Inferring demographic history with ABC, by Joane Elleouet and Sally Aitken