Identification of attribute interactions and generation of globally relevant continuous features in machine learning

Title: Identification of attribute interactions and generation of globally relevant continuous features in machine learning
Authors: Letourneau, Sylvain
Date: 2003
Abstract: Datasets found in real world applications of machine learning are often characterized by low-level attributes with important interactions among them. Such interactions may increase the complexity of the learning task by limiting the usefulness of the attributes to dispersed regions of the representation space. In such cases, we say that the attributes are locally relevant. To obtain adequate performance with locally relevant attributes, the learning algorithm must be able to analyse the interacting attributes simultaneously and fit an appropriate model for the type of interactions observed. This is a complex task that surpasses the ability of most existing machine learning systems. This research proposes a solution to this problem by extending the initial representation with new globally relevant features. The new features make explicit the important information that was previously hidden by the initial interactions, thus reducing the complexity of the learning task. This dissertation first proposes an idealized study of the potential benefits of globally relevant features assuming perfect knowledge of the interactions among the initial attributes. This study involves synthetic data and a variety of machine learning systems. Recognizing that not all interactions produce a negative effect on performance, the dissertation introduces a novel technique named Relevance graphs to identify the interactions that negatively affect the performance of existing learning systems. The tool of interactive relevance graphs addresses another important need by providing the user with an opportunity to participate in the construction of a new representation that cancels the effects of the negative attribute interactions. The dissertation extends the concept of relevance graphs by introducing a series of algorithms for the automatic discovery of appropriate transformations. We use the named GLOREF (GLObally RElevant Features) to designate the approach that integrates these algorithms. The dissertation fully describes the GLOREF approach along with an extensive empirical evaluation with both synthetic and UCI datasets. This evaluation shows that the features produced by the GLOREF approach significantly improve the accuracy with both synthetic and real-world data.
CollectionTh├Ęses, 1910 - 2010 // Theses, 1910 - 2010
NQ85375.PDF11.47 MBAdobe PDFOpen