Incorporating prior knowledge about genetic variants into the analysis of genetic association data: An empirical Bayes approach

Description
Title: Incorporating prior knowledge about genetic variants into the analysis of genetic association data: An empirical Bayes approach
Authors: Karimnezhad, Ali
Bickel, David R.
Date: 11-Jun-2016
Abstract: The probability that a single nucleotide polymorphism (SNP) is associated with a disease may be assessed by estimating its local false discovery rate (LFDR). Since the LFDR for each SNP is relative to a reference class, a set of other SNPs, the selection of the reference class has a large impact on which SNPs are considered disease-associated. For example, the LFDR of an exonic SNP can vary widely depending on whether it is considered relative to the reference class of other exonic SNPs, considered separately from the non-exonic SNPs, or relative to the combined reference class of all SNPs in the data set. As a result, the analysis of the data based on the combined reference class might determine that a specific exonic SNP is associated with the disease, using the separate reference class might indicate that it is not associated, or vice versa. To solve this reference class problem, we introduce novel empirical Bayes methods of discovering SNPs associated with the disease based on prior knowledge available in the form of biological annotations such as "exonic". The proposed methods simultaneously consider the combined reference class and the separate reference class. Our simulation studies indicate that the proposed methods lead to improved performance. The new maximum entropy method avoids choosing the worst reference class by depending on the separate class when it has enough SNPs for reliable LFDR estimation and depending solely on the combined class otherwise. Among our faster and simpler estimation methods that consider separate and combined reference classes without depending on the reliability of estimation, our game-theoretic rule also performs well. We also analyze a CAD data set consisting of a case-control study of 2000 cases and 3,000 controls. To estimate the LFDR of each of the SNPs annotated as exonic, we consider the separate reference class consisting only of exonic SNPs and the combined reference class consisting of all ncRNA SNPs. We observe that while the analysis of the data using the separate reference class identifies 7 SNPs associated with the disease, using the combined reference class suggests that only one of these SNPs is associated with the disease. The maximum entropy method predicts that of those 7 exonic SNPs, only two are actually associated with the disease.
URL: http://www.davidbickel.com
http://hdl.handle.net/10393/34889
CollectionMathématiques et statistiques // Mathematics and Statistics
Files