Machine learning methods for protein sorting predictionЛекция
Prediction of "protein sorting", i.e. the subcellular location of proteins, has become a major task in bioinformatics. The problem is easy to formulate and understand from a biological point of view, yet the computational solutions are often complex and involve several machine learning methods. Thus, protein sorting is a well suited case for introducing sequence-based machine learning methods for biologists.
Methods for predicting protein sorting from the amino acid sequence can roughly be divided into three types: Homology-based methods that rely on alignment to proteins with known location; signal-based methods that attempt to recognize the actual sorting signals; and global property methods that utilize the fact that proteins from different subcellular compartments differ in amino acid composition or other global properties of the sequence.
In my presentation, I will focus on two very important sorting signals, the signal peptide and the transmembrane helix, and show how two machine learning methods, artificial neural networks and hidden Markov models, have been successfully applied in their recognition. In addition, I will briefly mention issues of training set / test set division and overfitting, which apply to all types of machine learning and are important to understand even for the casual user of such methods.
Информация о лекции с сайта RECOMB Satellite Conference on Bioinformatics Education