Assignment2

ASSIGNMENT 2 cont.

Self study at S - star

i.e. Comparative Genomics

Page 7

Step 3: Closing the Gap – four classes
1.	Repeats
?	In their study, most repeats are tendem repeats. All their tendem repeats were adjacent to unique sequence.
?	Can be identified by MUMs overlapping each other
2.	SNPs
?	Simple cases: gap of one base between MUMs.
?	SNP adjacent to repeat sequences: use repeat processing
3.	Insertions
?	Simple insertion: large gap in alignment in one genome but not the other
?	Transposition: appear in MUM alignment out of sequence.
4.	Variable/ Polymorphic regions:
?	Appear as gap in Mum alignment
?	If short, use Smith – Waterman dynamic programming
?	If long, run MuM detection with reduced minimum length

Computation time vs. size and similarity
	                     LengthSequence similarity Step 1   Step 2  Step 3
                                                                         (# sec)  (# sec)  (# sec)
M. tuberculosis H37Rv        4 Mb
vs.                                               99% identical        5         45          5
M. tuberculosis CDC155   14 Mb

M. genitalium                  580 Kb   20 Kb in MUMs of
vs.                                      > 15 b; < 50 % id in      6.5    0.02       116
M. pneumonia	      816 Kb       gap regions

Subsequences of Human  223 Kb  14 Kb in MUMs of  
chromosome 12p13 vs.                 > 15 b;                 1.6      29        ?
Mouse chromosome 6	     228 Kb	  Large gaps