Repository logo

Deep Learning for Protein-Protein Interaction Prediction and Protein Design

dc.contributor.authorWu, Junzheng
dc.contributor.supervisorViktor, Herna
dc.contributor.supervisorPaquet, Eric
dc.contributor.supervisorMichalowski, Wojtek
dc.date.accessioned2023-03-23T15:55:24Z
dc.date.available2023-03-23T15:55:24Z
dc.date.issued2023-03-23en_US
dc.description.abstractProtein–protein interactions (PPI) play a fundamental role in many biochemical functions such as signal transduction, cellular organization, and cell cycle progression. Laboratory experiments are expensive and time-consuming, limiting their applicability to a handful of cases, which severely impacts the speed of research innovation. Meanwhile, computational experiments can explore and screen a vast number of possibilities leaving the researchers with the most promising cases. For this reason, there is an urgent need to develop computational solutions for accurately determining whether proteins interact. Specifically, developing deep learning solutions to predict PPI constitutes an emerging area of research with much practical application. This thesis focuses on advancing the research in this area. In our first contribution, the thesis introduces a novel deep learning algorithm for predicting PPI solely from their amino acid sequences. The novelty of our work is that we consider self-binding and folding properties rather than only focusing on the sequences per se, as is commonplace in the state of the art. Our Siamese pyramid network (SPNet) architecture comprises a multilevel Siamese neural network with an attention mechanism and a trainable probability prediction network. Our experimental evaluation indicates that SPNet outperforms the state of the art against strict data sets, that is, data sets with no data leakage, leading to accurate and timely predictions. Our second contribution centers on the observation that the sizes and compositions of PPI data sets vary considerably, which may impact the learnability for deep learning solutions. For instance, small data sets are commonplace for rare and new diseases, while current deep learning implementations usually rely on large PPI data sets to produce accurate predictions. To this end, we introduce an adaptive and learnable pyramid network architecture in which the depth and the complexity of the network are directly learned from the data. Our experimental evaluation shows that these characteristics make our solution suitable for both small and large PPI data sets, varying in sizes from a few thousand to seventeen million pairs. In addition, the number of negative (non-interacting) protein pairs is generally scarce, resulting in highly imbalanced data sets that may impede neural network training. Therefore, our third contribution centers on a novel methodology, based on graphs and geodesic distances, that extracts non-interacting proteins from the graph associated with their interactions and binding strength. A new balanced B-STRING data set is created that consists of 17,591,832 protein pairs. Our comparative experimental evaluation not only confirms the value of our learnable pyramid network architecture but also shows the value of providing a novel benchmark data set for future use by the community. In our final contribution, we introduce our architecture for designing protein binders based on primary sequences through a transformer deep learning model that adopts the mechanism of attention, differentially weighing the significance of each part of the input sequence. Predicting protein binders is a multivalued problem, which implies that there is more than one solution, since there may be more than one binder for a specific protein. To solve a such a learning task, we utilize two types of constraints in our deep learning solution. These are based, respectively, on the binding score which is related to the strength of interactions, and the Bayesian prior, where we assume that a small portion of the ligands’ amino acids are known. Our experimental evaluation confirms the strengths of this novel approach.en_US
dc.identifier.urihttp://hdl.handle.net/10393/44730
dc.language.isoenen_US
dc.publisherUniversité d'Ottawa / University of Ottawaen_US
dc.rightsAttribution-NonCommercial 4.0 International*
dc.rights.urihttp://creativecommons.org/licenses/by-nc/4.0/*
dc.subjectDeep Learningen_US
dc.subjectProtein-protein Interactionen_US
dc.subjectProtein Designen_US
dc.titleDeep Learning for Protein-Protein Interaction Prediction and Protein Designen_US
dc.typeThesisen_US
thesis.degree.disciplineSciences / Scienceen_US
thesis.degree.levelDoctoralen_US
thesis.degree.namePhDen_US
uottawa.departmentScience informatique et génie électrique / Electrical Engineering and Computer Scienceen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail ImageThumbnail Image
Name:
Wu_Junzheng_2023_thesis.pdf
Size:
8.06 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail ImageThumbnail Image
Name:
license.txt
Size:
6.65 KB
Format:
Item-specific license agreed upon to submission
Description: