Repository logo

Supporting software maintenance by mining software update records

Loading...
Thumbnail ImageThumbnail Image

Date

Journal Title

Journal ISSN

Volume Title

Publisher

University of Ottawa (Canada)

Abstract

It is well known that maintenance is the most expensive stage of the software life cycle. Most large real world software systems consist of a very large number of source code files. Important knowledge about different aspects of a software system is embedded in a rich set of implicit relationships among these files. Those relationships are partly reflected in system documentation at its different levels, but more often than not are never made explicit and become part of the expertise of system maintainers. Finding existing relations between source code components is a difficult task, especially in the case of legacy systems. When a maintenance programmer is looking at a piece of code in a source file, one of the important questions that he or she needs to answer is: "which other files should I know about, i.e. what else might be relevant to this piece of code?". This is an example of a more general Relevance Relation that maps a set of entities in a software system into a relevance value. How can we discover and render explicit these relationships without looking over the shoulder of a programmer involved in a maintenance task? We turn to inductive methods that are capable of extracting structural patterns or models from data. They can learn concepts or models from experience observed in the past to predict outcomes of future unseen cases. This thesis lies at the intersection of two research fields, which has been widely ignored by researchers in the machine learning and software engineering communities. It investigates the application of inductive methods to daily software maintenance at the source code level. Therefore in this thesis we first lay out the general idea of relevance among different entities in a software system. Then using inductive learning methods and a variety of data sources used by maintenance programmers, we extract (i.e. learn) what we call a maintenance relevance relation among files in a large legacy system. In effect we learn from past maintenance experience in the form of problem reports and update records, to be able to make predictions that are useful in future maintenance activities. This relation, which is called the Co-update relation, predicts whether updating one source file may require a change in another file. To learn the Co-update relation we have performed a large number of experiments using syntactic features such as function calls or variable definitions. We have also performed experiments that use text based features such as source code comments and problem reports, and the combination of these features. The results obtained show that while using syntactic features is encouraging in terms of the predictive power of the results of learning, using text based features yields highly accurate models, with precision and recall measures that make these models viable to be used in a real world setting. As part of the contribution of this thesis we also report on challenges encountered in the process and the lessons learned.

Description

Keywords

Citation

Source: Dissertation Abstracts International, Volume: 64-05, Section: B, page: 2271.

Related Materials

Alternate Version