The text is based on the authors' courses given every semester at the University of Munich since Fall 1996. (Peter Clote previously held the Gerhard-Gentzen Chair of Theoretical Computer Science at the University of Munich.)
This text is used in Fall 2000-01 in the course 18.417 Computational Molecular Biology, given by Peter Clote, as Visiting Professor in the Dept of Mathematics, M.I.T.
Key features:
====================
Chapter 1 -- Molecular Biology
1.1 Some Organic Chemistry.................................3
1.2 Small Molecules........................................4
1.3 Sugars.................................................6
1.4 Nucleic Acids..........................................6
1.4.1 Nucleotides.......................................6
1.4.2 DNA...............................................8
1.4.3 RNA..............................................13
1.5 Proteins..............................................14
1.5.1 Amino Acids......................................14
1.5.2 Protein Structure................................15
1.6 From DNA to Proteins..................................17
1.6.1 Amino Acids and Proteins.........................17
1.6.2 Transcription and Translation....................19
1.7 Exercises.............................................21
Acknowledgments and References............................22
Chapter 2 -- Math Primer
2.1 Probability...........................................23
2.1.1 Random Variables.................................25
2.1.2 Some Important Probability Distributions.........27
Binomial Distribution
Geometric Distribution
Poisson Distribution
Normal Distribution
Hypergeometric Distribution
Exponential Distribution
Boltzmann Distribution
2.1.3 Markov Chains.................................. .38
2.1.4 Metropolis--Hastings Algorithm...................43
Annealing Schedule...............................46
2.1.5 Markov Random Fields and Gibbs Sampler...........47
2.1.6 Maximum Likelihood...............................52
2.2 Combinatorial Optimization............................53
2.2.1 Lagrange Multipliers.............................53
2.2.2 Gradient Descent.................................54
2.2.3 Heuristics Related to Simulated Annealing........54
2.2.4 Applications of Monte--Carlo.....................55
Optimality of the Genetic Code......................55
2.2.5 Genetic Algorithms...............................60
2.3 Entropy and Applications to Molecular Biology.........61
2.3.1 Information Theoretic Entropy....................62
2.3.2 Shannon Implies Boltzmann........................63
2.3.3 Simple Statistical Genomic Analysis..............66
2.3.4 Genomic Segmentation Algorithm...................69
2.4 Exercises.............................................72
2.5 Appendix: Modification of Bezout's Lemma..............77
Acknowledgments and References............................79
Chapter 3 -- Sequence Alignment
3.1 Motivating Example....................................83
3.2 Scoring Matrices......................................84
Score Matrices Based on a Statistical Model............85
PAM and Amino Acid Pair Probabilities..................86
3.3 Global Pairwise Sequence Alignment....................88
3.3.1 Distance Methods.................................88
3.3.2 Alignment with Tandem Duplication................99
3.3.3 Similarity Methods..............................110
3.4 Multiple Sequence Alignment..........................111
3.4.1 Dynamic Programming.............................112
3.4.2 Gibbs Sampler...................................112
3.4.3 Maximum-Weight Trace............................114
3.4.4 Hidden Markov Models............................117
3.4.5 Steiner Sequences...............................117
3.5 Genomic Rearrangements...............................118
3.6 Locating Cryptogenes and Guide RNA...................120
3.6.1 Anchor and Periodicity Rules....................122
3.6.2 Search for Cryptogenes..........................122
3.7 Expected Length of gRNA in Trypanosomes..............123
3.8 Exercises............................................128
3.9 Appendix: Maximum-Likelihood ........................132
Acknowledgements and References..........................133
Chapter 4 -- All About EVE
4.1 Introduction.........................................135
4.2 Rate of Evolutionary Change..........................137
4.2.1 Amino Acid Sequences............................137
4.2.2 Nucleotide Sequences............................139
4.3 Clustering Methods...................................144
4.3.1 Ultrametric Trees...............................147
4.3.2 Additive Metric.................................152
4.3.3 Estimating Branch Lengths.......................156
4.4 Maximum Likelihood...................................157
4.4.1 Likelihood of a Tree............................159
4.4.2 Recursive Definition for the Likelihood.........160
4.4.3 Optimal Branch Lengths for Fixed Topology.......162
4.4.4 Determining the Topology........................166
4.5 Quartet Puzzling.....................................166
4.5.1 Quartet Puzzling Step...........................169
4.5.2 Majority Consensus Tree.........................170
4.6 Exercises............................................171
Acknowledgments and References...........................173
Chapter 5 -- Hidden Markov Models
5.1 Likelihood and Scoring a Model.......................177
5.2 Re-estimation of Parameters..........................180
5.2.1 Baum--Welch Method..............................181
5.2.2 EM and Justification of the Baum--Welch Method..184
5.2.3 Baldi--Chauvin Gradient Descent.................187
5.2.4 Mamitsuka's MA Algorithm........................191
5.3 Applications.........................................193
5.3.1 Multiple Sequence Alignment.....................193
5.3.2 Protein Motifs..................................194
5.3.3 Eukaryotic DNA Promotor Regions.................195
5.4 Exercises............................................197
Acknowledgments and References...........................198
Chapter 6 -- Structure Prediction
6.1 RNA Secondary Structure..............................202
6.2 DNA Strand Separation................................213
6.3 Amino Acid Pair Potentials...........................223
6.4 Lattice Models of Proteins...........................228
6.4.1 Monte--Carlo and the Heteropolymer Protein Model231
Local Move Set.....................................231
Pivot Moves........................................232
6.4.2 Genetic Algorithm for Folding in the HP Model...233
6.5 Hart and Istrail's Approximation Algorithm...........234
6.5.1 Performance.....................................234
6.5.2 Lower Bound.....................................236
6.5.3 Block Structure, Folding Point, and Balanced Cut239
6.6 Constraint-Based Structure Prediction................243
6.7 Protein Threading....................................246
6.7.1 Definition......................................246
6.7.2 A Branch-and-Bound Algorithm....................249
6.7.3 NP-hardness.....................................258
6.8 Exercises............................................259
Acknowledgments and References...........................261
Appendix
A Mathematical Background................................263
A.1 Asymptotic complexity.............................263
A.2 Units of Measurement..............................263
A.3 Lagrange Multipliers..............................264
B Resources..............................................265
B.1 Web Sites.........................................265
B.2 The PDB Format....................................266
References...............................................269