Functional Alternative Splicing Analysis Using Long Read Technologies

Autores UPV
CONGRESO Functional Alternative Splicing Analysis Using Long Read Technologies


Most eukaryotic genomes encode multiple transcript isoforms per gene due to the process of Alternative Splicing (AS), which has been credited as a major mechanism to expand functional complexity in higher organisms. While there have been tremendous efforts to catalog new splice junctions and isoforms in many of species using RNA-seq, the actual functional implications of alternative isoform expression (AIE) have only been studied in a handful of cases. A major reason for this is the lack of functional profiling tools that operate at the transcript level to predict the effects of alternative isoform usage on protein function and transcript regulation. Therefore, our aim is to develop a new methodology for transcriptome analysis, the Functional Alternative Isoform Expression Analysis, that will allow the study of the functional implications of alternative isoform expression at the genome-wide level. To reach this goal we need to obtain sequencing data optimized for the analysis of full-length transcriptomes based on the utilization of the upcoming long-read sequencing technologies and then develop bioinformatics methods to annotate these transcriptome data with a large diversity of functional lables covering different aspects of the functionality of transcriptomes, including coding and non-coding (or regulatory) elements. As a proof of concept of this new methodology we used a mouse cell differentiation system from neural stem cells to oligodendrocytes. We have applied the smart-seq protocol to obtain full-length RNA-seq libraries. Then, we sequenced the full-length librariesusing PacBio long read technology.During the last year, this techonology has been adapted to transcriptome analysistoallowthetranscritptomesequencingbyIso-seq protocole. However, the sequencing depth provided by PacBio is insufficient to obtain quantitative measurements of gene expression, so we sequenced in parallel our samples withIllumina short-read technology to quantifyexpression of sequence transcripts by Pacbio. Moreover, Illumina reads can also be used to correct the still high error rate (around 15%) of the PacBio technology.We obtained around a half million PacBio processed reads(Reads Of Insert) per sample and 80 million of Illumina paired-end reads per sample. Using Smrt PacBio software it was verified that the majority of the PacBio reads contained both 3¿ and 5¿sequencing primers and mapped at over 90% of the length of Refseq transcripts, indicating full-length transcript sequencing.PacBio reads were corrected with Illumina reads by LSC correction software achieving an error rate decrease up to 5 %. The full-length transcriptome results showed the detection of about 40000 transcripts associated to 16000 genes. An intensive functional annotation pipeline was applied on the transcript sequences to obtain rich functional labels: GO terms, Interpro domains, miRNA target sites, functional motifs at UTR regions, repetitive sequences and post-translational modifications. I found that between 64 and 51% genes that express more than one isoform have some differences in functional annotation at coding or non-coding elements. So, we can say that it is equivalent the number of coding and non-coding properties that are regulated by alternative isoform expression. Therefore, preliminary results indicate notable functional annotation differences between the alternative isoforms of the same gene expressed in different cell types. These results reveal the complexity of functional differences in alternative isoforms expression and set the way for the analysis of the genome- wide functional implications of alternative splicing.