The PIRSF concept is being used as a guiding principle to provide comprehensive and non-overlapping clustering of UniProtKB sequences into a hierarchical order to reflect their evolutionary relationships. The PIRSF classification system is based on whole proteins rather than on the component domains; therefore, it allows annotation of generic biochemical and specific biological functions, as well as classification of proteins without well-defined domains.

The table below shows examples of the PIRSF classification levels. The primary level is the homeomorphic family, whose members are both homologous (evolved from a common ancestor) and homeomorphic (sharing full-length sequence similarity and a common domain architecture). At a lower level are the subfamilies which are clusters representing functional specialization and/or domain architecture variation within the family. Above the homeomorphic level there may be parent superfamilies that connect distantly related families and orphan proteins based on common domains. Because proteins can belong to more than one domain superfamily, the PIRSF structure is formally a network (Wu et al., 2004).

As part of the UniProt consortium, PIR has developed this classification strategy, with rules for functional site and protein name, to assist in the propagation and standardization of protein annotation and the systematic detection of annotation errors. In this way, PIRSF improves sensitivity of protein identification and functional inference, and also provides the basis for evolutionary and comparative genomics research.

PIRSF families are curated using a bioinformatics infrastructure implemented in a J2EE framework.

For more information, please see: A Proposal for the PIRSF Classification System

