Repository logo

Wide Scale Analysis of Transcription Factor Biases and Specificity

dc.contributor.authorAwdeh, Aseel R.
dc.contributor.supervisorPerkins, Theodore
dc.contributor.supervisorTurcotte, Marcel
dc.date.accessioned2022-11-23T14:57:06Z
dc.date.available2022-11-23T14:57:06Z
dc.date.issued2022-11-23en_US
dc.description.abstractThere are approximately 30 trillion cells in the human body, and nearly every cell has the same genomic sequence. Yet, due to differential gene expression, we have around 200 distinct cell types each with varying functionalities. The cell type specific states are maintained via the binding of multiple regulatory proteins to different locations along the genome in a process known as transcriptional regulation. Additionally, disruptions to the transcriptional regulation process may lead to the development of disease. Hence, uncovering the complex interplay of protein-DNA interactions along the genome is of critical importance. The advent of technologies probing the genomic sequence, as well as the development of powerful computational modeling techniques to relate DNA sequences to molecular phenotype, has enabled the understanding of many molecular processes genome wide. However, these computational methods require significant adaptation to biological systems - to accurately and fully account for the biology behind the molecular processes, as well as the biases associated with the data generating systems and processes. In this thesis, we address three main issues that arise from the use of omics data, more specifically ChIP-seq data, when identifying regulatory proteins along the genome. The first part of the thesis involves the study of the biases and noise associated with ChIP-seq experiments. Each experiment is prone to noise and bias, and as such we propose the use of a customized set of weighted controls, instead of equally weighted controls, for each ChIP-seq experiment in the peak calling process to mitigate the noise and bias. To do this, we implement a peak calling algorithm, called Weighted Analysis of ChIP-seq (WACS), which is an extension of the well-known peak caller MACS2, to incorporate the weighted controls in the peak calling process. We show that our approach assists in a better approximation of the noise distribution in controls, and fundamentally improves our understanding of ChIP-seq signals and their biases. Another aspect we explore in this thesis is the ability to uncover cell type specificity of transcription factor binding from the ChIP-seq data. A transcription factor may bind to various parts of the genome in different cell types, due to modifications in the DNA-binding preferences of the transcription factor, or other mechanisms, such as chromatin accessibility or cooperative binding, thus leading to a "DNA signature" of differential binding. We develop a deep learning approach, called SigTFB (Signatures of TF Binding) and conduct a wide scale analysis of hundreds of transcription factors to identify and quantify the varying degrees of cell type specific DNA signatures of various transcription factors across cell types. We also assess the consistency of cell type specificity for a specific transcription factor when assayed by different antibodies. We show that many transcription factors are indeed cell type specific, while others are more general with lower cell type specificity. Finally, to further explain the biology behind a transcription factor's cell type specificity, or lack that of, we conduct a wide scale motif enrichment analysis of all transcription factors in question. We show that cell type specific transcription factors are typically associated with corresponding differences in motif enrichment and gene expression. Together, these contributions deepen our knowledge of transcription factor binding, and how experimental and cell type specific variations can be uncovered.en_US
dc.identifier.urihttp://hdl.handle.net/10393/44298
dc.identifier.urihttp://dx.doi.org/10.20381/ruor-28511
dc.language.isoenen_US
dc.publisherUniversité d'Ottawa / University of Ottawaen_US
dc.rightsAttribution 4.0 International*
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/*
dc.subjecttranscription factoren_US
dc.subjectDNA-bindingen_US
dc.subjectdeep learningen_US
dc.subjectmachine learningen_US
dc.subjectdifferential bindingen_US
dc.subjectcell type specificityen_US
dc.subjectnoiseen_US
dc.subjectbiasen_US
dc.subjectChIP-seqen_US
dc.subjectcontrolsen_US
dc.titleWide Scale Analysis of Transcription Factor Biases and Specificityen_US
dc.typeThesisen_US
thesis.degree.disciplineGénie / Engineeringen_US
thesis.degree.levelDoctoralen_US
thesis.degree.namePhDen_US
uottawa.departmentScience informatique et génie électrique / Electrical Engineering and Computer Scienceen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail ImageThumbnail Image
Name:
Awdeh_Aseel_R_2022_thesis.pdf
Size:
8.68 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail ImageThumbnail Image
Name:
license.txt
Size:
6.65 KB
Format:
Item-specific license agreed upon to submission
Description: