Repository logo

Understanding the Phishing Ecosystem

dc.contributor.authorLe Page, Sophie
dc.contributor.supervisorJourdan, Guy-Vincent
dc.date.accessioned2019-07-08T15:12:34Z
dc.date.available2019-07-08T15:12:34Z
dc.date.issued2019-07-08en_US
dc.description.abstractIn “phishing attacks”, phishing websites mimic trustworthy websites in order to steal sensitive information from end-users. Despite research by both academia and the industry focusing on development of anti-phishing detection techniques, phishing has increasingly become an online threat. Our inability to slow down phishing attacks shows that we need to go beyond detection and focus more on understanding the phishing ecosystem. In this thesis, we contribute in three ways to understand the phishing ecosystem and to offer insight for future anti-phishing efforts. First, we provide a new and comparative study on the life cycle of phishing and malware attacks. Specifically, we use public click-through statistics of the Bitly URL shortening service to analyze the click-through rate and timespan of phishing and malware attacks before (and after) they were reported. We find that the efforts against phishing attacks are stronger than those against malware attacks.We also find phishing activity indicating that mitigation strategies are not taking down phishing websites fast enough. Second, we develop a method that finds similarities between the DOMs of phishing attacks, since it is known that phishing attacks are variations of previous attacks. We find that existing methods do not capture the structure of the DOM, and question whether they are failing to catch some of the similar attacks. We accordingly evaluate the feasibility of applying Pawlik and Augsten’s recent implementation of Tree Edit Distance (AP-TED)calculations as a way to compare DOMs and identify similar phishing attack instances.Our method agrees with existing ones that 94% of our phishing database are replicas. It also better discriminates the similarities, but at a higher computational cost. The high agreement between methods strengthens the understanding that most phishing attacks are variations, which affects future anti-phishing strategies.Third, we develop a domain classifier exploiting the history and internet presence of a domain with machine learning techniques. It uses only publicly available information to determine whether a known phishing website is hosted on a legitimate but compromised domain, in which case the domain owner is also a victim, or whether the domain itself is maliciously registered. This is especially relevant due to the recent adoption of the General Data Protection Regulation (GDPR), which prevents certain registration information to be made publicly available. Our classifier achieves 94% accuracy on future malicious domains,while maintaining 88% and 92% accuracy on malicious and compromised datasets respectively from two other sources. Accurate domain classification offers insight with regard to different take-down strategies, and with regard to registrars’ prevention of fraudulent registrations.en_US
dc.identifier.urihttp://hdl.handle.net/10393/39385
dc.identifier.urihttp://dx.doi.org/10.20381/ruor-23629
dc.language.isoenen_US
dc.publisherUniversité d'Ottawa / University of Ottawaen_US
dc.subjectPhishing attacksen_US
dc.subjectMachine learningen_US
dc.titleUnderstanding the Phishing Ecosystemen_US
dc.typeThesisen_US
thesis.degree.disciplineGénie / Engineeringen_US
thesis.degree.levelMastersen_US
thesis.degree.nameMScen_US
uottawa.departmentScience informatique et génie électrique / Electrical Engineering and Computer Scienceen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail ImageThumbnail Image
Name:
Le_Page_Sophie_2019_thesis.pdf
Size:
1.56 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail ImageThumbnail Image
Name:
license.txt
Size:
6.65 KB
Format:
Item-specific license agreed upon to submission
Description: