ASSIGNMENT 2
Self study at S - star
Page 1
i.e. Comparative Genomics
Comparative Genomics Liping Wei, Ph.D. Nexus Genomics, Inc. Now We have Completely Sequenced Genomes 4 Eucaryotes + (draft) Human 9 Archaea 32 Bacteria > 600 Viruses What is Comparative Genomics? Comparative genomics is the practice of analyzing and comparing the genetics material of different species for the purpose of studying evolution, the function of genes (what they do and why), and inherited diseases. Why Comparative Genomics? 1. It tells us what are common and what are unique between different species at the genome level. i.e., one application is to identify unique, crucial proteins in pathogen to use as targets for products that are both safe and effective. 2. Genome comparison may be the surest and most reliable way to identify genes and predict their functions and interactions. i.e., to distinguish orthologs from paralogs. 3. The functions of human genes and other DNA regions can be revealed by studying their counterparts in lower organisms. There are three major research directions. 1. Genome comparison for the purpose of understanding the similarity and different between the genomes. 2. Genome comparison for the purpose of predicting gene function, exons, etc., of a new genome, and ultimately, the study of evolution. 3. Development of efficient algorithms for comparing large, genome scale sequences.
Thinking of genome scale. Genome ? Protein ? Function ? Organism ? Population From single gene to whole genome, with increase in both size and complexity. From traditional homology based approaches to new nonhomology based approaches. Promising technologies yet very new. Always question the assumptions. Outline of lecture composed of three parts: 1. Comparison of complete genome sequences of two strains of H. pyroli to study strain specific genetic diversity. What are the features to be compared? 2. Prediction of protein interaction maps for complete genomes base on gene fusion events. What can we do with genome comparison? 3. Relatively fast alignment of whole genome sequences using suffix tree. How to align genome scale sequences? Part I: Helicobacter pyroli General information - colonizes the human gastric mucosa. - induces chronic gastric inflammation which can progress to ulcer, gastric cancer or mucosal associated lymphoma. - Affects 30 40 % of population in USA, 60 80 % in Asia. - H. pyroli can cause different diseases or even be beneficial to the infected host. What causes the difference? Strain specific genetic diversity, or host diversity? From the study of RA Alm, et. al. in 1999 (Nature, 397: 178 180), they compare the genomes of two H. pyroli strains, strain J99 and strain 26695, two independent isolates. What to compare? 1. Statistics of the genome. Size of the genome: total number of base pairs Overall (G + C) content: percentage of (G + C) Regions of different GC content: (G + C) content in sliding windows - Are they the corresponding regions in both genomes?
Genome features H. pyroli 26695 H. pyroli J99 Size (base pairs) 1,667,857 1,643,831 (G + C) content % 39 39 Region of different (G + C) content 8 9* * Four of the regions match those in 26695