Bioinformatics
MA461: Probabilistic models for molecular biology (5 ECTS)
This course covers applications of probabilistic models and related techniques in genomics and systems biology. Beginning with a review of stochastic processes, the course will consider the use of Hidden Markov models (HMMs) to predict genes and identify genomic regions with shared epigenetic characteristics; the use of continuous-time Markov processes to model molecular evolution; applications of Gibbs sampling to infer haplotypes from genotype data among other models and applications.
Taught in Semester(s) II. Examined in Semester(s) II.
Workload: 36 hours (24 Lecture hours, 12 Tutorial hours).
Module Learning Outcomes.
On successful completion of this module the learner should be able to:
- derive key results that are applied in the course;
- decode sequences of symbols generated from a HMM using the Viterbi algorithm;
- calculate hidden state probabilities using forward/backward algorithms;
- align a pair of DNA or amino acid sequences using a probabilistic model;
- apply probabilistic models to describe sequence evolution over a phylogenetic tree;
- infer haplotypes from a set of genotype data by hand;
- describe several problems in molecular biology/systems biology and explain the
application of probabilistic models to solve these problems;
- construct a pair-HMM for sequence alignment.
Indicative Content
This course will cover the application of probabilistic modelling to several important problems in molecular biology and/or systems biology. We will begin with a review of Markov chains, including continuous-time chains and hidden Markov models. Applications of models such as these to key problems in molecular biology include the alignment of molecular sequences, the identification of genes in genomic sequences (gene-finding), finding genomic regions with shared epigenetic features, molecular phylogenetics, and the analysis of genome-wide genotype data (including the inference of population structure and the haplotype phasing problem). We will consider several such applications, moving from textbook examples to more recent developments from the current bioinformatics literature.
Module Resources
Biological Sequence Analysis Durbin, Eddy, Krogh & Mitchison Cambridge University Press
Back