Single nucleotide polymorphism (SNP) set tests have been a powerful method in analyzing next-generation sequencing (NGS) data. The popular sequence kernel association test (SKAT) method tests a set of variants as random effects in the linear mixed model setting. Its P-value is calculated based on asymptotic theory that requires a large sample size because it is known that SKAT can lose power at small or moderate sample sizes. Given the current cost of sequencing technology, scales of NGS are still limited.
[Photo: Dr. Jin Zhou]
In this report, Dr. Jin Zhou, assistant professor of biostatistics at the University of Arizona Mel and Enid Zuckerman College of Public Health and colleagues performed simulation studies under various genetic scenarios. The investigators derived and implemented computationally efficient, exact (non-asymptotic) score (eScore), likelihood ratio (eLRT), and restricted likelihood ratio (eRLRT) tests, ExactVCTest, that can achieve high power even when sample sizes are small.
The investigator’s ExactVCTest (i.e., eScore, eLRT, eRLRT) exhibits well-controlled type I error. Under the alternative model, eScore P-values are universally smaller than those from SKAT. eLRT and eRLRT demonstrate significantly higher power than eScore, SKAT, and SKAT optimal (SKAT-o) across all scenarios and various samples sizes.
Dr. Zhou says the team applied these tests to an exome sequencing study. The findings replicate previous results and shed light on rare variant effects within genes. The software package is implemented in the open source, high-performance technical computing language Julia, and is freely available on GitHub. Analysis of each trait in the exome sequencing data set with 399 individuals and 16,619 genes takes around 1 min on a desktop computer.
Boosting Gene Mapping Power and Efficiency with Efficient Exact Variance Component Tests of SNP Sets
Genetics. November 2016