Automated analysis of French-as-a-Second-Language student's free-text answers for computer assisted assessment

Title: Automated analysis of French-as-a-Second-Language student's free-text answers for computer assisted assessment
Authors: Hermet, Matthieu
Date: 2009
Abstract: This project is a proof-of-concept. It aims to demonstrate the feasibility of an approach to Computer-Assisted Assessment of free-text material in the domain of Computer-Assisted Language Learning (CALL). The underlying theory places the project within given pedagogic constraints, just as a Language Tutoring System should. The constraints call for the project to be addressed to the intermediate-advanced student of French-as-a-Second-Language, and be oriented toward the autonomous enhancement of text comprehension skills, following a previous project in CALL at the University of Ottawa, DidaLect. This type of learning activity requires a student to answer questions related to informative texts. The goal of this work has been to build a framework to assess the correctness of the student's sentence' s grammar on the one hand, and on the other hand to ensure that the answer's content matches the reference. In order to achieve this, the research has gone in two different directions. First, we used Natural-Language-Processing (NLP) to find means of language error detection and correction (form), as well as means for semantic comparison (content). Content comparison amounts to performing deep analysis of the student's answer in order to guarantee that no material in the student's answer is irrelevant to the actual answer. We used a symbolic approach to the problem, because statistical methods can only provide approximations of content similarity, which is dangerous in a CALL context because the student can be error-prone. The fact that this work is the first of its kind, at least in what concerns Text Comprehension and French-as-a-Second-Language, has been another reason to opt for symbolic processing: in the absence of any comparable system, a symbolic approach might constitute a better baseline, to be challenged in the future by statistical methods. Finally, error detection is syntax-based in language technologies. As well, a symbolic approach permits to use the same structure for the assessment of form and content. Second, we worked in the direction of didactics to frame the work within relevant theoretical grounds in terms of the relation from questions to text, especially in order to limit the impact of the knowledge gap between machine and humans (students) -- a general consequence of the work-in-progress state of semantic analysis in NLP. The questions are controlled through a formal categorization that restricts the scope of the questions to answering material actually present in the text -- no world-knowledge is a priori needed. The system has been tested on a set of 273 student's answers gathered in class. The evaluation gave a result of 62% of answers correctly assessed as correct or incorrect. Due to the conservativeness of the system, precision on the assessment of correct answers is 100%, which satisfies the requirements of Second-Language Learning and CALL. The implemented program is a contribution in itself because no comparable system exists, while there is a real demand from the world of CALL for such a tool. The project also makes a contribution to NLP through a new approach to paraphrase recognition. This uses previous work in computational linguistics (the Meaning-Text Theory) to provide a rule-based model of syntactic paraphrase which, although limited to syntax, actually constitutes the only generic model of paraphrase to our knowledge, and should be easily adapted to other languages.
CollectionTh├Ęses, 1910 - 2010 // Theses, 1910 - 2010
NR61372.PDF12.57 MBAdobe PDFOpen