Repository logo

A Scalable P2P RIA Crawling System with Fault Tolerance

dc.contributor.authorBen Hafaiedh, Khaled
dc.contributor.supervisorBochmann, Gregor
dc.contributor.supervisorJourdan, Guy-Vincent
dc.date.accessioned2016-05-12T17:01:58Z
dc.date.available2016-05-12T17:01:58Z
dc.date.issued2016
dc.description.abstractRich Internet Applications (RIAs) have been widely used in the web over the last decade as they were found to be responsive and user-friendly compared to traditional web applications. RIAs use client-side scripting such as JavaScript which allows for asynchronous updates on the server-side using AJAX (Asynchronous JavaScript and XML). Due to the large size of RIAs and therefore the long time required for crawling, distributed RIA crawling has been introduced with the aim to decrease the crawling time. However, the current RIA crawling systems are not scalable, i.e. they are limited to a relatively low number of crawlers. Furthermore, they do not allow for fault tolerance in case that a failure occurs in one of their components. In this research, we address the scalability and resilience problems when crawling RIAs in a distributed environment and we explore the possibilities of designing an efficient RIA crawling system that is scalable and fault-tolerant. Our approach is to partition the search space among several storage devices (distributed databases) over a peer-to-peer (P2P) network where each database is responsible for storing only a portion of the RIA graph. This makes the distributed data structure invulnerable to a single point of failure. However, accessing the distributed data required by crawlers makes the crawling task challenging when the number of crawlers becomes high. We show by simulation results and analytical reasoning that our system is scalable and fault-tolerant. Furthermore, simulation results show that the crawling time using the P2P crawling system is significantly faster than the crawling time using both the non-distributed crawling system and the distributed crawling system using a single database.en
dc.identifier.urihttp://hdl.handle.net/10393/34646
dc.identifier.urihttp://dx.doi.org/10.20381/ruor-5854
dc.language.isoenen
dc.publisherUniversité d'Ottawa / University of Ottawaen
dc.subjectFault Toleranceen
dc.subjectData Recoveryen
dc.subjectRich Internet Applicationsen
dc.subjectWeb Crawlingen
dc.subjectRIA Crawlingen
dc.subjectDistributed RIA Crawlingen
dc.subjectP2P Networksen
dc.subjectGraph Explorationen
dc.titleA Scalable P2P RIA Crawling System with Fault Toleranceen
dc.typeThesisen
thesis.degree.disciplineGénie / Engineeringen
thesis.degree.levelDoctoralen
thesis.degree.namePhDen
uottawa.departmentScience informatique et génie électrique / Electrical Engineering and Computer Scienceen

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail ImageThumbnail Image
Name:
Ben_Hafaiedh_Khaled_2016_thesis.pdf
Size:
1.11 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail ImageThumbnail Image
Name:
license.txt
Size:
6.65 KB
Format:
Item-specific license agreed upon to submission
Description: