A Scalable P2P RIA Crawling System with Fault Tolerance

FieldValue
dc.contributor.authorBen Hafaiedh, Khaled
dc.date.accessioned2016-05-12T17:01:58Z
dc.date.available2016-05-12T17:01:58Z
dc.date.issued2016
dc.identifier.urihttp://hdl.handle.net/10393/34646
dc.identifier.urihttp://dx.doi.org/10.20381/ruor-5854
dc.description.abstractRich Internet Applications (RIAs) have been widely used in the web over the last decade as they were found to be responsive and user-friendly compared to traditional web applications. RIAs use client-side scripting such as JavaScript which allows for asynchronous updates on the server-side using AJAX (Asynchronous JavaScript and XML). Due to the large size of RIAs and therefore the long time required for crawling, distributed RIA crawling has been introduced with the aim to decrease the crawling time. However, the current RIA crawling systems are not scalable, i.e. they are limited to a relatively low number of crawlers. Furthermore, they do not allow for fault tolerance in case that a failure occurs in one of their components. In this research, we address the scalability and resilience problems when crawling RIAs in a distributed environment and we explore the possibilities of designing an efficient RIA crawling system that is scalable and fault-tolerant. Our approach is to partition the search space among several storage devices (distributed databases) over a peer-to-peer (P2P) network where each database is responsible for storing only a portion of the RIA graph. This makes the distributed data structure invulnerable to a single point of failure. However, accessing the distributed data required by crawlers makes the crawling task challenging when the number of crawlers becomes high. We show by simulation results and analytical reasoning that our system is scalable and fault-tolerant. Furthermore, simulation results show that the crawling time using the P2P crawling system is significantly faster than the crawling time using both the non-distributed crawling system and the distributed crawling system using a single database.
dc.language.isoen
dc.publisherUniversité d'Ottawa / University of Ottawa
dc.subjectFault Tolerance
dc.subjectData Recovery
dc.subjectRich Internet Applications
dc.subjectWeb Crawling
dc.subjectRIA Crawling
dc.subjectDistributed RIA Crawling
dc.subjectP2P Networks
dc.subjectGraph Exploration
dc.titleA Scalable P2P RIA Crawling System with Fault Tolerance
dc.typeThesis
dc.contributor.supervisorBochmann, Gregor
dc.contributor.supervisorJourdan, Guy-Vincent
thesis.degree.namePhD
thesis.degree.levelDoctoral
thesis.degree.disciplineGénie / Engineering
uottawa.departmentScience informatique et génie électrique / Electrical Engineering and Computer Science
CollectionThèses, 2011 - // Theses, 2011 -

Files