“Count data are increasingly ubiquitous in genetic association studies, where it is possible to observe excess zero counts as compared to what is expected based on standard assumptions. For instance, in rheumatology, data are usually collected in multiple joints within a person or multiple sub-regions of a joint, and it is not uncommon that the phenotypes contain enormous number of zeroes due to the presence of excessive zero counts in majority of patients. Most existing statistical methods assume that the count phenotypes follow one of these four distributions with appropriate dispersion-handling mechanisms: Poisson, Zero-inflated Poisson (ZIP), Negative Binomial, and Zero-inflated Negative Binomial (ZINB),” observe University of Alabama at Birmingham alumnus Dr. Himel Mallick, former graduate research assistant in UAB’s department of biostatistics and current postdoctoral fellow in Harvard University and Broad Institute, and Dr. Hemant Tiwari, head of UAB’s section on statistical genetics. “However, little is known about their implications in genetic association studies. Also, there is a relative paucity of literature on their usefulness with respect to model misspecification and variable selection.”
[Photos: Dr. Himel Mallick (left) and Dr. Hemant Tiwari]
In their recent study, the researchers examined the abilities of several traditional approaches to manage zero-inflated count data along with “a novel penalized regression approach with an adaptive LASSO penalty,” by replicating data under various disease models and linkage disequilibrium patterns. Their suggested novel variable selection method delivers an increased flexibility in multi-single nucleotide polymorphism (SNP) modeling of zero-inflated county phenotypes, by incorporating data-adaptive weights in a computationally efficient expectation maximization (EM) algorithm.
Results indicate that this method has superior performance when two or more variables are highly correlated, which is particularly noticeable as the sample size enlarges. Moreover, note Drs. Mallick and Tiwari, “The Type I error rates become more or less uncontrollable for the competing methods when a model is misspecified, a phenomenon routinely encountered in practice.”
“The need for more flexible analysis tools became apparent and motivated our development of EM – Adaptive LASSO,” said first author Dr. Mallick. “EM – Adaptive LASSO arose, in part, because the standard analysis approaches did not easily accommodate aspects of zero-inflated phenotypes in genetic association studies. Our work makes a significant effort to close this gap and to help researchers better analyze their data and obtain more reliable and meaningful results.”
“EM Adaptive LASSO — A Multilocus Modeling Strategy for Detecting SNPs Associated with Zero-inflated Count Phenotypes” was published March 30, in the journal Frontiers in Genetics.
Journal article: http://journal.frontiersin.org/article/10.3389/fgene.2016.00032/full