Protein Sequence Databases         


General protein Sequence Databases

Database name

Description                    

 Sequences of proteins with experimentally verified function

 All protein sequences: translated from GenBank and imported from other protein databases

 Protein sequences from model organisms, GO assignment and subcellular localization

 Protein information resource protein sequence database, has been merged into the UniProt     knowledgebase

 PIR's non-redundant reference protein database

 Protein research foundation database of peptides: sequences, literature and unnatural amino acids

 Now UniProt/Swiss-Prot: expertly curated protein sequence database, section of the UniProt  knowledgebase

 Now UniProt/TrEMBL:  computer-annotated  translations of EMBL nucleotide  sequence        entries: section of the UniProt knowledgebase

 UniProt archive: a repository of all protein sequences, consisting only of unique identifiers and sequence

 Universal protein knowledgebase: merged data from Swiss-Prot, TrEMBL and PIR protein sequence databases

MIPS

Protein databases at Munich Information Center for Protein Sequences

 UniProt non-redundant reference database: clustered sets of related sequences (including splice variants and isoforms)

Protein Properties

Database name

Description

AAindex

 Physicochemical properties of amino acids

Cybase Proteins with cyclic backbones
PPD Protein pKa database
PINT Protein-protein interactions thermodynamic database

ProNIT

 Thermodynamic data on protein–nucleic acid interactions

ProTherm

 Thermodynamic data for wild-type and mutant proteins

REFOLD

Experimental data on protein refolding and purification

TECRdb

 Thermodynamics of enzyme-catalyzed reactions

Protein localization and targeting

Database name
Description
 Database of protein subcellular localization

LOCATE

Membrane organization and subcellular localization of mouse proteins
 Nuclear export signals database
 Nuclear localization signals
 Nuclear matrix associated proteins database
 Nucleolar proteome database
 Protein subcellular localization in bacteria

NURSA

Nuclear receptor signaling atlas
 Secreted protein database
 Transmembrane helices in genome sequences

 Experimentally characterized transmembrane topologies

Protein sequence motifs and active sites

Database name
Description
 Active sequence collection: biologically active peptides
 Alignments of conserved regions in protein families

 Catalytic site atlas : active sites and catalytic residues in enzymes of known 3D structure

 Co-ordination of metals etc.: classification of bioinorganic proteins ( metalloproteins and some other  complex proteins)

 Comprehensive peptide signature database
 Highly conserved protein sequence blocks
 Protein sequence motif determination and searches
 Metal-binding sites in metalloproteins
 O- and C-linked glycosylation sites in proteins
 3D structure of protein functional sites
 S/T/Y protein phosphorylation sites (formerly PhosphoBase)
 Prosthetic centers and metal ions in protein active sites
 Biologically significant protein patterns and profiles

SitesBase

Known ligand binding sites in the PDB

 Signature sequences at the protein N- and C-termini

Protein domain databases; protein classification

Database name
Description
 A database of protein domain classification

BIOZON

A database of gene and protein familiy classification

 Conserved domain database, includes protein domains   fromPfam, SMART, COG and KOG databases

 Clusters of Swiss-Prot + TrEMBL proteins
 Functional divergence between the subfamilies of a protein domain family
 A database of protein domains and motifs
 Integrated resource of protein families, domains and functional  sites
 Integrated protein classification database

MulPSSM

Multiple PSSMs of structural and sequence families
 Family/superfamily classification of whole proteins
 Hierarchical gene family fingerprints

 Protein families: multiple sequence alignments and profile  hidden Markov models of protein domains

PIR-ALN

Curated database of protein sequence alignments
 Predicted and consensus interaction sites in enzymes

iProClass

Protein families defined by PIR superfamilies and PROSITE patterns
 Protein domain families
 Hierarchical classification of Swiss-Prot proteins
 Hierarchical clustering of Swiss-Prot proteins
 Structure-based sequence alignments of SCOP superfamilies
 Protein domain sequences and tools

 Simple modular architecture research tool: signalling,  extracellular and chromatin-associated protein domains

 Grouping of sequence families into superfamilies
 Systematic re-searching and clustering of proteins
 TIGR protein families adapted for functional annotation

Databases of individual protein families

Database name
Description
 Aminoacyl-tRNA synthetase database
 Artificial selected proteins/peptides database
 Transcriptional regulators of AraC and TetR families
 Cold shock domain-containing proteins
 Structural proteins of Arthropod cuticle
 Database of copper-chelating proteins
 DEAD-box, DEAH-box and DExH-box proteins
 G protein-coupled receptors; expression in cell lines
 Esterases and other alpha/beta hydrolase enzymes
 Families of proteins functioning in the eye
 G protein-coupled receptors database
 G-proteins and their interaction with GPCRs
 Histone fold sequences and structures
 Homeobox proteins, classification and evolution
 Homeobox genes database

 Homeodomain sequences, structures and related genetic and genomic information

 Human olfactory receptor data exploratorium

 Inteins (protein splicing elements) database: properties, sequences, bibliography

 S /T/Y-specific protein kinases encoded in complete genomes

 Database of knottins—small proteins with an unusual ‘disulfide through disulfide' knot

 Ligand-gated ion channel subunit sequences database
 structure and function of lipases and esterases
 Mammalian, invertebrate, plant and fungal lipoxygenases
 Database of proteolytic enzymes (peptidases)
 Nuclear protein database
 Nuclear receptor superfamily
 Nuclear receptor superfamily
 Nuclear hormone receptors database
 Sequences for olfactory receptor-like molecules
 Object-oriented transcription factors database

 Protein kinase resource: sequences, enzymology, genetics and molecular and structural properties

 Pyridoxal-5 0 -phosphate dependent enzymes mutations
 A database of bacterial protease systems
 Proteases and natural and synthetic protease inhibitors
 Restriction enzymes and associated methylases
 RNase P sequences, alignments and structures
 Ribosomal protein gene database
 Receptor tyrosine kinase sequences
 Nuclear scaffold/matrix attached regions
 Database of scorpion toxins
 Structural database of allergenic proteins and food allergens
 Sensory signal transduction proteins
 7-transmembrane helix receptors (G-protein-coupled)
 Proteins of the signal recognition particles
 Transcription factor database
 Voltage-gated potassium channel database
 Wnt proteins and phenotypes