Untitled Document

Hypercourses in Bioinformatics

Mr.Kulachat Sae - Jang

Ph.D student in Medical Technology

4436692 MTMT/D

Group 17

Assignment three

s-star.org

Protein Structure prediction

By Betty Cheng

Protein structure Prediction use to compare and obtain by extra matermatics.

Structure and Function

Why is structure important?

-Structure often determines functions.

-Structure is often better conserved than sequence.

-Rational drug design, protein engineering, detailed study of protein-biomolecule interactions all require structural knowledge.

How to Find Structure?

- Crystallography ; Must be able to crystallize protein.

- NMR spectroscopy ; Size of protein limited.

- Turn to predictive methods when there is insufficient materials or experimental methods fail.

- Protein sequence determination has grown rapidly (human genome project), protein structure determination has hardly been able to keep pace (>74,000 seqs: >7500 structures). Our ability to determine structures experimentally cannot keep pace with sequence determination.

Protein Structure Determination Methods

- Experimental structure determination-X-ray crystallography, NMR, cryo-EM, CD.

- Structure Prediction - Secondary, structure prediction, Protein threading, ab-initio structure prediction, homology modeling

- Docking - protein-protein, protein-DNA, protein-drug

Protein Structure Determination: Pros & Cons

- X-ray crystallography ; accurate, must have 20 mg material, must be able to crystallize protein.

- NMR ; limited to about 120 residues, Protein must be soluble ~ 30 mg/ml, locate flexible/rigid regions.

- Protein Structure prediction ; does not need material, complementary to Crystallography/NMR, more information = higher reliability.

Prediction

- From least structural information to most - only sequence information required ; secondary structure prediction - 1D only, protein Threading or Fold family recognition - 3-fold information, Ab-initio Structure prediction - 3-6 Å, homology modeling - up to X-ray accuracy.

Protein Fold Recognition

- Goal - from the 500+ folds in the library of known protein structures, identify the fold.

- The sequence have less than 30% pairwise sequence identity to any known structure.

- When successful, structures from fold recognition about 3-6 angstrom RMSD from actual structure.

Problems:Fold Recognition

- ~70% chance the top 10 predictions will contain the correct fold.

- To reduce the number requires more information (functional/structural information, motifs, position of exposed residues, putative contact points, etc…)

- 30% chance none of top predictions correct.

- Quality of results heavily dependent on human expertise and amount of information from other methods which can be used to eliminate decoy folds.

Homology Modeling

- Structure is conserved - for homologous sequences, most likely the structure is very similar.

- Find pair-wise sequence identity for sequence > 40%.

- Use backbone coordinates of homologous structure as template for the model. - 70% or greater - very high quality models-can place side-chains with reasonable accuracy.

- 40% - 65% medium quality with significant errors in the backbone - especially loop regions.

- Automated methods are available such as GeneMine (UCSD) and Modellor (Rockefeller).

Ab-initio Protein Structure Prediction

- No information about the protein except the amino acid sequence.

- Simulate the physical forces and processes that drive proteins into native conformation on the computer.

- Thermodyanamic stability: Protein must fulfill Anfinsen's hypothesis (the native conformation of the protein is the global free energy minimum), it must fold independently to a stable conformation. It is known that not all proteins fulfill this condition.

Docking and Drug Design

Accurate structure information required ;

---> Search for docking sites ; multiple minimum problem, Resource - intensive , Flexible protein?

---> Knowledge about binding site greatly reduces search time and increases accuracy.

---> Protein-protein docking is still primarily a research problem rather than a practical method to determine how two proteins pack against each other.

Protein Structure Design

- Has been shown to be possible - 1998 first designed protein with novel architecture produced with X-ray accuracy.

- Requires large computational resources

- Still major technical problems to be overcome before this becomes a versatile technique.

How to design a protein

- Design the backbone geometry - well understood for coiled-coil proteins.

- Select the side-chains - use side-chain packing to find a set of well-packed structures.

- Calculate stability using free energy perturbation.

- Technical problem - free energy pertubation is limited to small changes in the amino acids.

Tetrametric right-handed coiled-coil

- 0.2 A RMSD calculated vs. observed (Harbury et. al., Science, 282, 1462, (1998))

- Is it practical? 11 amino acid repeat, limited to subset of 6 amino acids; this method is not practical for biological problems.

This example showcases the accuracy of predictive methods when the limitations of the methods are taken into account.

De novo Design- Evaluation

- Undecatad (11-fold) repeat gives right-hand coiled-coil ; This problem was extremely small.

- Number of amino acids used was not the full set of 20 ; Theoretical limitations prevent this method from being applied to the full set of 20

- Within the technical limitations of the method, de novo protein design is possible but is it practical?

Roadmap for the Future

- Ab-initio ; multiple minimum problem (no practical solution yet for biologically reasonable problems), potential Energy functions (accurate functions are expensive). How can we increase accuracy with lower computational expense?

- Information based (Homology, fold recognition, secondary structure) ; more information required - only a fraction all protein folds have been determined for globular proteins in aqueous solution.

- Improvements and new algorithms for predictive methods needed!

Computational Molecular Biology

By Dr. Doug Brutlag

Professor of Biochemistry & Medicine (by courtesy)

Computational Molecular Biology (http://cmgm.stanford.edu/biochem218/)

Genomics, Bioinformatics & Medicine ; Focus on Bioinformatics

The definition of Bioinformatics is the intersaction between computation Biology and Molecular Biology.

Distinction between Genomics and Bioinformatics

Genomics defined as the science that determine sequence of genome the sequence to physical chromosome, genetic map, the exellence of genomics generating in new molecular can help diagnostic disease by DNA testing for develop and help who has disease.

Bioinformatics defined as study biology information, it study flow from genome for design genome and computational biology design to Identify Drug Targets, Rational Drug Design and Genetic Therapy.

The tools of Bioinformatics

- Machine Learning

- Robotics

- Statistics & Probability

- Artificial Intelligence

- Information Theory

- Graph Theory

- Algorithms

Example of Bioinformatic tools

National Center for Biotechnology Information (http://www.ncbi.nlm.nih.gov)

NCBI creates public databases, conducts research in computational biology, develops software tools for analyzing genome data, and disseminates biomedical information - all for the better understanding of molecular processes affecting human health and disease.

Human Genome Resources (http://www.ncbi.nlm..nih.gov/genome/guide/)

A challenge facing researchers today is the ability to piece together and analyze the multitudes of data currently being generated through the Human Genome Project. NCBI's Web site serves an an integrated, one-stop, genomic information infrastructure for biomedical researchers from around the world so that they may use this data in their research efforts.

Genes on Chromosome 7 (http://www.ncbi.nlm.nih.gov/genemap/map.cgi ?CHR=7)

Sequence Alignment (http://motif.stanford.edu/alion/)

Decypher Similarity Search (http:.//decypher.stanford.edu/)

Prosite Consensus Patterns (http://www.expasy.ch/prosite/)

The Optimal Way to Develop Patterns (http://www.expasy.ch ch/images/cartoon/prosite.gif)

EMOTIF Pattern Discovery (http://motif.stanford.edu/emotif/)

Identifying Protein Functions (http://motif.stanford.edu/emotif-search)

Mapping Sequence Motifs to Structural Motifs (http://motif.stanford.edu/3motif/)

Block Signatures for a Protein Family (http://www.blocks.fhcrc fhcrc.org/) .org/)

e MATRIX : Position - Specific Scoring Matrices (http://motif.stanford.edu/ematrix)

e MATRIX - Search (http://exon/ematrix/ematrix-search.html)

Central Paradigm of Molecular Biology

- Molecules ; Structure and Functions

-Processes ; Mechanism, Specificity and Regulation

Central Paradigm of Bioinformatics

Challenges Understandingetic Genetic information

- Genetic information is redundant.

- Structural information is redundant.

- Single genes have multiple functions.

- Genes are one dimensional but function depends on three - dimensional.

Redundancy in Genomic & Protein Sequences

- DNA is double - stranded

- Genetic code

- Acceptable amino - acid replacements.

- Intron - exon variation.

- Strain variation

- Sequencing errors