Enhancing mRNA Translation Efficiency Through Deep Learning
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Université d'Ottawa / University of Ottawa
Abstract
The 5' untranslated region (5' UTR) of mRNA plays a key role in regulating translation efficiency. Consequently, optimizing this region is important in synthetic biology and therapeutic mRNA design for increasing protein yield and functional potency. Despite advances in computational methods for modeling 5' UTR translation efficiency and sequence design, existing approaches do not account for mRNA secondary structure, provide limited control over sequence modification, and remain inefficient for bulk optimization. This thesis proposes a secondary structure-informed framework that combines accurate translation efficiency modeling with controllable, large-scale optimization of 5' UTR sequences.
A graph attention network (GAT) encoder integrating nucleotide identity, positional information, and predicted mRNA secondary structure was first trained to accurately predict 5' UTR translation efficiency. The encoder was then extended into a multitask autoencoder by adding an autoregressive long short-term memory (LSTM) decoder. The autoencoder achieved near-perfect sequence reconstruction accuracy while maintaining the encoder's performance on translation efficiency prediction. The decoder was then fine-tuned using reinforcement learning to generate 5' UTR variants with higher predicted translation efficiency. Fine-tuning the decoder using the REINFORCE algorithm substantially increased the number of generated sequences with improved predicted translation efficiency, while DAP-regularized fine-tuning delivered improvements through smaller, more controlled edits that maintained greater similarity to the original sequences. Incorporating curriculum learning in DAP-regularized fine-tuning substantially increased the proportion of improved sequences with limited disruption to composition and entropy. Interpretability analyses confirmed that the framework captures biologically meaningful determinants of translation initiation and applies optimization strategies consistent with known regulatory mechanisms.
Overall, this framework presents a novel approach to RNA sequence optimization and extends to regulatory elements beyond the 5' UTR.
Description
Keywords
5' UTR, mRNA translation efficiency, Mean Ribosome Load (MRL), Graph Neural Network (GNN), Graph Attention Network (GAT), Reinforcement Learning, RNA sequence optimization, RNA secondary structure, Deep learning, Sequence-to-sequence model, RNA design, Multitask autoencoder, REINFORCE algorithm
