A Scalable P2P RIA Crawling System with Fault Tolerance

Description
Title: A Scalable P2P RIA Crawling System with Fault Tolerance
Authors: Ben Hafaiedh, Khaled
Date: 2016
Abstract: Rich Internet Applications (RIAs) have been widely used in the web over the last decade as they were found to be responsive and user-friendly compared to traditional web applications. RIAs use client-side scripting such as JavaScript which allows for asynchronous updates on the server-side using AJAX (Asynchronous JavaScript and XML). Due to the large size of RIAs and therefore the long time required for crawling, distributed RIA crawling has been introduced with the aim to decrease the crawling time. However, the current RIA crawling systems are not scalable, i.e. they are limited to a relatively low number of crawlers. Furthermore, they do not allow for fault tolerance in case that a failure occurs in one of their components. In this research, we address the scalability and resilience problems when crawling RIAs in a distributed environment and we explore the possibilities of designing an efficient RIA crawling system that is scalable and fault-tolerant. Our approach is to partition the search space among several storage devices (distributed databases) over a peer-to-peer (P2P) network where each database is responsible for storing only a portion of the RIA graph. This makes the distributed data structure invulnerable to a single point of failure. However, accessing the distributed data required by crawlers makes the crawling task challenging when the number of crawlers becomes high. We show by simulation results and analytical reasoning that our system is scalable and fault-tolerant. Furthermore, simulation results show that the crawling time using the P2P crawling system is significantly faster than the crawling time using both the non-distributed crawling system and the distributed crawling system using a single database.
URL: http://hdl.handle.net/10393/34646
http://dx.doi.org/10.20381/ruor-5854
CollectionThèses, 2011 - // Theses, 2011 -
Files