Detecting recent selective sweeps while controlling for mutation rate and background selection

From AcaWiki
Jump to: navigation, search

Citation: Christian D. Huber, Michael DeGiorgio, Ines Hellmann, Rasmus Nielsen (2015/08/20) Detecting recent selective sweeps while controlling for mutation rate and background selection. Molecular Ecology (RSS)
DOI (original publisher): 10.1111/mec.13351
Semantic Scholar (metadata): 10.1111/mec.13351
Sci-Hub (fulltext): 10.1111/mec.13351
Internet Archive Scholar (search for fulltext): Detecting recent selective sweeps while controlling for mutation rate and background selection
Download: https://onlinelibrary.wiley.com/doi/full/10.1111/mec.13351
Tagged: Biology (RSS) molecular population genetics (RSS)

Summary

The paper attempts to improve upon the existing composite likelihood model of Sweepfinder in detecting Hard sweeps (i.e., selection of rare beneficial mutations which lead to reduction in surrounding local variation). The authors trace the different models used to detect possible sweep sites. This includes: ➢ using the entire genomic background as the null model with the alternate hypothesis as the Site Frequency Spectrum (SFS) under selection. The issues with these models were that although they were robust and computationally quick, it led to identification of higher false positives especially in populations that had gone through recent bottlenecks. Additionally, including the entire genomic background would also include invariable sites which could’ve risen due to selective constraint or reduced mutation rates, thus adding to more false positives. ➢ Nielsen’s suggestion of including only polymorphic sites, leaving out the invariable ones. While this may certainly reduce false positives from above mentioned sources, even in polymorphic sites, background selection causes local reduction in neutral variation. Therefore, in Nielsen’s model, one would need to extensively model background selection, which is still a work in progress. Under the assumptions of a single population (complete genome of 9 unrelated European individuals) and considering mutations to have only recently reached fixation, the authors attempt to improve the robustness and simultaneously control for false positives by: i. Including invariant sites with fixed differences with an outgroup (under infinite sites model) ii. Including invariant sites including all polymorphic sites By including invariable sites that differ from an outgroup (Chimpanzees), effect of mutation rates across all sites are proportional and therefore does not affect the SFS. Another key component is the effect of background selection which the authors model using B value (a factor that specifies the effective population size after background selection). Subsequently, the authors are quick to point out the limitations of this approach given that B value estimates are available only for certain organisms. Additionally, the B values only consider the effect of Background selection on effective population size and not other factors such as allele frequency distribution. Coalescent simulations using this model was used to indicate higher detection power with age of mutation in the new models compared to the older ones. Additionally, they were also shown to be more robust under mutation variation and population bottleneck conditions. Finally, addressing False Positive Rates (FPRs) of sweep detection, a model including all sites generates higher FPRs compared to the model including just fixed differences with outgroup. Therefore, the authors suggest using the latter model where outgroup information is available.

Theoretical and Practical Relevance

Perspectives and Limitations While the authors show that the new model has improved performance in robustness and lower FPRs, it does so contingent on certain conditions. This includes factors such as outgroup information availability, bottleneck strength and time of occurrence, mutation rate and background selection, based on which the appropriate model needs to be chosen. The authors also put forward a new method of correcting for background selection using B value maps, helping us differentiate between neutral diversity and diversity because of sweeps. This also leads to lower FPRs compared to the HKA test. It needs to be noted though, as previously mentioned, that the B value maps are available only for certain organisms and only account for effect on Effective population size. More accurate models of background selection can lead to even lower FPRs.