Implementation and evaluation of scoring schemes for the automated discovery of nucleic acid structures

FieldValue
dc.contributor.authorAnwar, Mohammad
dc.date.accessioned2013-11-07T18:13:25Z
dc.date.available2013-11-07T18:13:25Z
dc.date.created2006
dc.date.issued2006
dc.identifier.citationSource: Masters Abstracts International, Volume: 45-02, page: 0902.
dc.identifier.urihttp://hdl.handle.net/10393/27220
dc.identifier.urihttp://dx.doi.org/10.20381/ruor-18600
dc.description.abstractWith recent experimental evidence, it has been shown that RNA (ribonucleic acid) plays a greater role in various cellular functions than previously thought. With the increasing number of known RNA families a need arises to develop computational techniques to analyze RNA sequences. An array of evolutionary related RNA sequences believed to contain signals at both the sequence and structure levels can be exploited to detect motifs common to all or a portion of those sequences. Finding these similar structural features can provide substantial information as to which parts of the sequence are functional. Recently, Nguyen (M.A.Sc thesis, Electrical Engineering, University of Ottawa, 2004) introduced a novel approach for discovering consensus secondary structure motifs in a set of unaligned RNA sequences. The algorithm has been implemented in a software system called Seed. The aim of this thesis is to devise, implement and evaluate (3) scoring schemes for the software system. The first scoring scheme is based on the sum of the thermodynamics free energy, based on the nearest neighbor model. We then present a general framework for evaluation of RNA structures using statistical regression analysis. The third scoring scheme to be validated is based on the framework of minimum description length principle. We implemented and validated the above scoring schemes on four different data sets having varying range of complexity. The first two were derived from selected members of UTRdb database where the coding region is flanked by two untranslated regions (5' UTR and 3' UTR). The others were assembled using a subset of the sequences from Masoumi and Turcotte (IJBRA, 1(2), 230--245, 2005). By three measures, positive predicted value, sensitivity and Matthews correlation coefficient, our methods performed well on the data sets and showed significant ranking statistics. Also, our first method compares favorably with state-of-the-art tool, RNAprofile. For small motifs, the scoring methods are able to rank motifs with high PPV/sensitivity, often 100%. The top ranked motifs were used as input constraints for MFOLD, a widely used tool for RNA secondary structure determination. They showed improvements in both PPV and sensitivity measurements of the foldings made.
dc.format.extent155 p.
dc.language.isoen
dc.publisherUniversity of Ottawa (Canada)
dc.subject.classificationComputer Science.
dc.titleImplementation and evaluation of scoring schemes for the automated discovery of nucleic acid structures
dc.typeThesis
dc.degree.nameM.C.S.
dc.degree.levelMasters
CollectionTh├Ęses, 1910 - 2010 // Theses, 1910 - 2010

Files
MR18393.PDF4.91 MBAdobe PDFOpen