Component-Based Crawling of Complex Rich Internet Applications

Moosavi Byooki, Seyed Ali

Component-Based Crawling of Complex Rich Internet Applications

Fichiers

Principal Moosavi_Byooki_Seyed_Ali_2014_thesis.pdf (2.39 MB)

Date

2014

Authors

Moosavi Byooki, Seyed Ali

Éditeur

Université d'Ottawa / University of Ottawa

Résumé

During the past decade, web applications have evolved substantially. Taking advantage of new technologies, Rich Internet Applications (RIAs) make heavy use of client side code to present content. Web crawlers, however, face new challenges in crawling RIAs, such as how to explore and identify different client states. The problem of crawling RIAs has been a focus for researchers during recent years, and solutions have been proposed based on constructing a state-transition model with DOMs as states and JavaScript events as transitions. When faced with real-life RIAs, however, a major problem prevalent in current solutions is state space explosion caused by the complexity of the RIAs. This problem prevents the automated crawlers from being usable on complex RIAs as they fail to produce useful results in a timely fashion. This research addresses the challenge of efficiently crawling complex RIAs with two main ideas: component-based crawling and similarity detection. Our experimental results show that these ideas lead to a drastic reduction of the time required to produce results, enabling the crawler to explore RIAs previously too complex for automated crawl.

Mots-clés

AJAX, crawl, ria

URI

http://hdl.handle.net/10393/30636
http://dx.doi.org/10.20381/ruor-3546

Collections

- Thèses, 2011 - // Theses, 2011 -

Notice complète

Component-Based Crawling of Complex Rich Internet Applications

Fichiers

Date

Authors

Nom de la revue

ISSN de la revue

Titre du volume

Éditeur

Résumé

Description

Mots-clés

Citation

URI

Collections

Approbation

Évaluation

Complété par

Référencé par