Repository logo

An Investigation of the Use of Linear Mixed Models Under an Extreme Phenotype Sampling (EPS) Design

Loading...
Thumbnail ImageThumbnail Image

Journal Title

Journal ISSN

Volume Title

Publisher

Université d'Ottawa / University of Ottawa

Creative Commons

Attribution-NonCommercial-NoDerivatives 4.0 International

Abstract

Mixed models have been used in genome-wide association studies to correct for confounding by population stratification and other forms of hidden relatedness. This class of models includes linear mixed models (LMMs) and generalized linear mixed models (GLMMs). This thesis presents an investigation into the use and application of LMMs within the context of extreme phenotype sampling (EPS) designs where genetic covariates are missing for some participants since genotypes are only collected on samples having extreme response variable values. We begin by exploring whether existing mixed model approaches correct for population stratification under an EPS design. These methods have been previously investigated with both continuous and case/control response variables. However, they have not been investigated in the context of EPS designs. We assess the performance of three mixed model approaches suitable for binary traits (GMMAT, LEAP and CARAT) and one linear mixed model approach (GEMMA) for continuous traits. Our investigation includes an overview of mixed model methodology applicable to binary response variables. We assess type 1 error rates and power using simulation studies with both common and rare variants scenarios. As a practical application of these mixed model techniques, we also compared methods when applied to a prostate cancer dataset collected as part of the PROtEUs study conducted in Québec, Canada that is known to have population substructure. Our simulation results show that for a common candidate variant, both LEAP and GMMAT had type 1 error rate close to the nominal value and similar power. Similar type 1 error control was observed with the analysis on the PROtEUs dataset. However, for rare variants the false positive rate remains inflated even after correction with mixed model approaches. Next, we present an Expectation Maximization (EM) algorithm for fitting linear mixed models with missing genetic covariates that was motivated by EPS designs. We used the method of weights adapted for linear mixed models to handle the missing genotypes. We derive two hypothesis tests for genetic association, a likelihood ratio test using importance sampling and a Monte-Carlo based Wald test. The performance of our algorithm was then assessed. Simulation studies were used to estimate type 1 error and power. We observed type 1 error rates below the nominal values of 0.05, signifying a conservative test, and low power for all missing data scenarios considered. Moreover some point estimates appear biased. We applied our algorithm to analyze the PROtEUs dataset and although our algorithm was able to correctly estimate most of the model parameters, the genetic effect estimated using the EM approach was larger than values by other approaches. The false positive rate also seemed inflated based on the p-value distribution across 5000 genetic markers. More investigation is needed to ensure the EM-based procedure is a valid approach to handle missing genotype data, particularly from an EPS study.

Description

Keywords

Extreme Phenotype Sampling (EPS), Linear Mixed Model (LMM), Expectation maximization (EM) algorithm, Population Stratification, False positive rate

Citation

Related Materials

Alternate Version