Analysis and Generation of Formal and Informal Text

Description
Title: Analysis and Generation of Formal and Informal Text
Authors: Abu Sheikha, Fadi
Date: 2010
Abstract: In this thesis, we discuss an important issue in computational linguistics: distinguishing between formal and informal style of texts, in document classification and in text generation. There is a need to identify formal texts and informal texts automatically. In addition, there is a need of having a computer system that could generate correct English texts in formal or informal style. Therefore, we propose to use two main techniques in order to solve the two tasks. The first technique is to build a model that can be used to classify any text or sentence as having formal or informal style. The second technique is based on natural language generation (NLG) and it generates correct English sentences with formal or informal style. In order to achieve our goals, we start by studying the main differences between formal and informal style and summarize their characteristics. In addition, we manually collect parallel lists of formal versus informal words, phrases, and expressions from different sources that will be used for our proposed work. Then, we build our model for the classification task by using machine learning technique in order to classify texts and sentences into formal and informal style. The evaluation results show that our model is able to predict a class of formal/informal for any text or sentence with high accuracy. After that, we build our system that can generate formal and informal sentences by using NLG techniques. The evaluation results on a sample of generated sentences show that our NLG system can produce high-quality sentences in formal or informal style. The main contribution of this work consists in designing a set of features that led to good results for both tasks: text classification and text generation with different formality levels.
URL: http://hdl.handle.net/10393/28845
http://dx.doi.org/10.20381/ruor-19469
CollectionTh├Ęses, 1910 - 2010 // Theses, 1910 - 2010
Files
MR74178.PDF3.64 MBAdobe PDFOpen