Computational Clustering of UniProtKB Virus Proteomes

Release 2017_03, March 15, 2017

Viral Reference Proteomes (Viral RPs) are computed from UniProtKB virus complete proteomes. For each pair of proteomes, we calculate their co-membership in UniRef50 clusters. We then hierarchically cluster the similar proteomes into a set of Representative Proteome Groups (RPGs) based on their co-memberships at the cutoff levels of 95%, 75%, 55%, 35% and 15%. The proteomes in each RPG are ranked using a Proteome Priority Score to facilitate the selection of a top ranked proteome as the representative from the group. We also use taxonomic group and host information to annotate the viral proteomes in each RPG. Viral RPs can be used to improve proteome annotation, protein classification, and taxonomic nomenclature bias detection in the viral proteome community.

BLAST sequence search

Browse Viral RPs

Download Viral RPs files

(download the complete proteome set, #Proteomes: 63067)
 RPG fileSeq file*#RPGs
95% cut-offrpg-95.txt rp-seqs-95.fasta.gz4485
75% cut-offrpg-75.txt rp-seqs-75.fasta.gz2421
55% cut-offrpg-55.txt rp-seqs-55.fasta.gz1914
35% cut-off rpg-35.txtrp-seqs-35.fasta.gz1560
15% cut-offrpg-15.txt rp-seqs-15.fasta.gz1176

* All sequence files have been filtered to contain one-protein-per-gene.

Make your own RP sequence file

There are two ways to make your own RP sequence file with respect to taxonomic group and cut-off level:

Using a script:
Please click here to get a Perl script and click here to get the configuration file. You can modify the configuration file according to your need and run the script from your machine.

Download the RPs files by Virus taxonomic group and co-membership cutoff below:
Get one file from each row and then put them together to form your RP sequence file.

Virus Taxonomic Group 95% 75% 55% 35% 15%
Deltavirus x x x x x
Retro-transcribing viruses x x x x x
Satellites x x x x x
dsDNA viruses, no RNA stage x x x x x
dsRNA viruses x x x x x
ssDNA viruses x x x x x
ssRNA viruses x x x x x
Other viruses x x x x x
environmental samples x x x x x

Chen C, Huang H, Mazumder R, Natale DA, McGarvey PB, Zhang J, Polson SW, Wang Y, Wu CH; UniProt Consortium.
Computational clustering for viral reference proteomes.
Bioinformatics. 2016 32(13):2041-3. doi: 10.1093/bioinformatics/btw110. Epub 2016 Feb 26.

