Repository logo

Computational Methods for Inferring Transcription Factor Binding Sites

dc.contributor.authorMorozov, Vyacheslav
dc.contributor.supervisorAris-Brosou, Stéphane
dc.contributor.supervisorIoshikhes, Ilya
dc.date.accessioned2012-10-11T14:58:39Z
dc.date.available2012-10-11T14:58:39Z
dc.date.created2012
dc.date.issued2012
dc.degree.disciplineSciences / Science
dc.degree.levelmasters
dc.degree.nameMSc
dc.description.abstractPosition weight matrices (PWMs) have become a tool of choice for the identification of transcription factor binding sites in DNA sequences. PWMs are compiled from experimentally verified and aligned binding sequences. PWMs are then used to computationally discover novel putative binding sites for a given protein. DNA-binding proteins often show degeneracy in their binding requirement, the overall binding specificity of many proteins is unknown and remains an active area of research. Although PWMs are more reliable predictors than consensus string matching, they generally result in a high number of false positive hits. A previous study introduced a novel method to PWM training based on the known motifs to sample additional putative binding sites from a proximal promoter area. The core idea was further developed, implemented and tested in this thesis with a large scale application. Improved mono- and dinucleotide PWMs were computed for Drosophila melanogaster. The Matthews correlation coefficient was used as an optimization criterion in the PWM refinement algorithm. New PWMs keep an account of non-uniform background nucleotide distributions on the promoters and consider a larger number of new binding sites during the refinement steps. The optimization included the PWM motif length, the position on the promoter, the threshold value and the binding site location. The obtained predictions were compared for mono- and dinucleotide PWM versions with initial matrices and with conventional tools. The optimized PWMs predicted new binding sites with better accuracy than conventional PWMs.
dc.embargo.termsimmediate
dc.faculty.departmentMathématiques et statistique / Mathematics and Statistics
dc.identifier.urihttp://hdl.handle.net/10393/23382
dc.identifier.urihttp://dx.doi.org/10.20381/ruor-6117
dc.language.isoen
dc.publisherUniversité d'Ottawa / University of Ottawa
dc.subjectmachine learning
dc.subjecttranscriptional regulatory sites
dc.subjecttranscription factor
dc.subjectprotein binding
dc.subjectPWM
dc.subjectPSSM
dc.subjectDNA sequence
dc.subjectoptimization
dc.subjectstatistics
dc.subjectbinding site
dc.subjectprediction
dc.subjectcomputational methods
dc.subjectMatthews correlation
dc.subjectDrosophila melanogaster
dc.subjectbinding motif
dc.subjectweight matrix
dc.subjectDNA sequence analysis
dc.titleComputational Methods for Inferring Transcription Factor Binding Sites
dc.typeThesis
thesis.degree.disciplineSciences / Science
thesis.degree.levelMasters
thesis.degree.nameMSc
uottawa.departmentMathématiques et statistique / Mathematics and Statistics

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail ImageThumbnail Image
Name:
Morozov_Vyacheslav_2012_thesis.pdf
Size:
8.01 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail ImageThumbnail Image
Name:
license.txt
Size:
4.21 KB
Format:
Item-specific license agreed upon to submission
Description: