Protein Databases







UniProt/Swiss-Prot

The UniProt/Swiss-Prot Protein Knowledgebase is an curated protein sequence database that provides a high level of annotation, a minimal level of redundancy and high level of integration with other databases. It is maintained collaboratively by the Swiss Institute for Bioinformatics (SIB) and the European Bioinformatics Institute (EBI).







Pfam Home Page

Pfam is a large collection of multiple sequence alignments and hidden Markov models covering many common protein domains and families. For each family in Pfam you can look at multiple alignments, view protein domain architectures, examine species distribution, and view known protein structures.







PIR-PSD

The Protein Information Resource (PIR), located at Georgetown University Medical Center, is an integrated public bioinformatics resource that supports genomic and proteomic research and scientific studies. PIR maintains the Protein Sequence Database (PSD), an annotated protein database containing over 283,000 sequences covering the entire taxonomic range.







TrEMBL

The TrEMBL database contains the translations of all coding sequences (CDS) present in the EMBL Nucleotide Sequence Database, which are not yet integrated into SwissProt. TrEMBL is split into two main sections: SP-TrEMBL (SwissProt TrEMBL) contains the entries which should eventually be incorporated into SwissProt and can be considered as a preliminary section of SwissProt (assigned SwissProt accession numbers), while REM-TrEMBL (REMaining TrEMBL) contains the entries without accession numbers.







ExPASy PROSITE

PROSITE is a database of protein families and domains. It is based on the observation that most of them can be grouped on the basis of sequence similarities into a limited number of families. Proteins or protein domains belonging to a particular family generally share functional attributes and are derived from a common ancestor. PROSITE currently contains patterns and profiles specific for more than a thousand protein families or domains. Each of these signatures comes with documentation providing background information on the structure and function of these proteins.







SCOP

The SCOP database provides a detailed and comprehensive description of the structural and evolutionary relationships between all proteins whose structure is known, including all entries in the Protein Data Bank (PDB). It is available as a set of tightly linked hypertext documents which make the large database comprehensible and accessible. In addition, the hypertext pages offer a panoply of representations of proteins, including links to PDB entries, sequences, references, images and interactive display systems.







CATH

The CATH database is a hierarchical domain classification of protein structures in the Brookhaven protein databank. Only crystal structures solved to resolution better than 3.0 angstroms are considered, together with NMR structures (non-protein, model, and "C-alpha only" structures are not classified in CATH). There are four major levels in this hierarchy; Class, Architecture, Topology (fold family) and Homologous superfamily.








Protein Explorer







Dali Server







Protein Data Bank