Bioinformatics Tools for the Analysis of Gene-Phenotype Relationships Coupled with a Next Generation ChIP-Sequencing Data Analysis Pipeline

Title: Bioinformatics Tools for the Analysis of Gene-Phenotype Relationships Coupled with a Next Generation ChIP-Sequencing Data Analysis Pipeline
Authors: Pranckeviciene, Erinija
Date: 2015
Abstract: The rapidly advancing high-throughput and next generation sequencing technologies facilitate deeper insights into the molecular mechanisms underlying the expression of phenotypes in living organisms. Experimental data and scientific publications following this technological advancement have rapidly accumulated in public databases. Meaningful analysis of currently available data in genomic databases requires sophisticated computational tools and algorithms, and presents considerable challenges to molecular biologists without specialized training in bioinformatics. To study their phenotype of interest molecular biologists must prioritize large lists of poorly characterized genes generated in high-throughput experiments. To date, prioritization tools have primarily been designed to work with phenotypes of human diseases as defined by the genes known to be associated with those diseases. There is therefore a need for more prioritization tools for phenotypes which are not related with diseases generally or diseases with which no genes have yet been associated in particular. Chromatin immunoprecipitation followed by next generation sequencing (ChIP-Seq) is a method of choice to study the gene regulation processes responsible for the expression of cellular phenotypes. Among publicly available computational pipelines for the processing of ChIP-Seq data, there is a lack of tools for the downstream analysis of composite motifs and preferred binding distances of the DNA binding proteins. This thesis is aimed to address the gap existing in the tools available to process high-throughput ChIP-Seq data to provide rapid analysis and interpretation of large lists of poorly characterized genes. Additionally, programs for the analysis of preferred binding distances of transcription factors were integrated into the pipeline for expedited results. A gene prioritization algorithm linking genes to non-disease phenotypes described by meaningful keywords was developed. This algorithm can be used to process candidate genetic targets of a transcription factor produced by a computational pipeline for ChIP-Seq data analysis.
CollectionThèses, 2011 - // Theses, 2011 -