home uniprot
Protein Search Site Search
 
       Home      About PIR     Databases      Search/Retrieval      Download      Support
HOME / Representative Proteomes
Representative Proteomes (RP)

A stable, scalable and unbiased proteome set for sequence analysis
and functional annotation


Release 2015_07, Jun 24, 2015

Representative Proteomes (RPs), are proteomes that are selected from Representative Proteome Groups (RPGs) containing similar proteomes calculated based on co-membership in UniRef50 clusters. Representative Proteome is the proteome that can best represent all the proteomes in its group in terms of the majority of the sequence space and information. RPs at 75%, 55%, 35% and 15% co-membership threshold are provided to allow users to decrease or increase the granularity of the sequence space based on their requirements (Chen et al., 2011).

The RP set is updated every four weeks (synchronized with UniProtKB release) and the data is available for browsing, downloading and BLAST search.


Starting from 2014_09 release, we have used UniProt Proteome identifiers as Representative Proteome identifiers instead of NCBI Taxonomy identifiers. For those using the RPG files below, the first column now contains UniProt Proteome identifiers. Previously, it contains NCBI Taxonomy identifiers.


BLAST sequence search

Browse RPs database

Download RPs files

(download the complete proteome set, #Proteomes: 4193)
 RPG fileSeq file*#RPGs
75% cut-offrpg-75.txt rp-seqs-75.fasta.gz2851
55% cut-offrpg-55.txt rp-seqs-55.fasta.gz2124
35% cut-off rpg-35.txtrp-seqs-35.fasta.gz1447
15% cut-offrpg-15.txt rp-seqs-15.fasta.gz608

* Seq files for each cut-off include the sequences from model organisms with complete proteomes.
All sequence files have been filtered to contain one-protein-per-gene.

Make your own RP sequence file

Representative Genome (RG) files

Users can retrieve the genomic sequences of the RPs from UniProt or NCBI.

UniProt: UniProt provides genomics sequences of bacterial and archaeal RPs at 55% cut-off and eukaryotic RPs at 75% cutoff (ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/reference_proteomes/).

NCBI: NCBI provides refseq-genbank.csv. file which has mappings of Genome project id, RefSeq project id and taxonomy id (ftp://ftp.ncbi.nlm.nih.gov/bioproject/). The taxonomy id mappings available at http://pir.georgetown.edu/rps/data/current/75/rpg-75.txt can be used to get the Genome project id or RefSeq project id from refseq-genbank.csv, which then can be used to retrieve the genomic sequence or CDS using NCBI e-utils.

Previous Releases

Publication

Chen C, Natale DA, Finn RD, Huang H, Zhang J, Wu CH, Mazumder R. Representative proteomes: a stable, scalable and unbiased proteome set for sequence analysis and functional annotation. PLoS One. 2011 Apr 27;6(4):e18910. PubMed PMID: 21556138; PubMed Central PMCID: PMC3083393.



PIR
 HomeAbout PIRDatabasesSearch/AnalysisDownloadSupport  SITE MAPTERMS OF USE
©2014 Protein Information Resource