Object-Relational Protein Sequence Database

Chunlin Xiao, Lai-Su Yeh, Zhenling Hou, Bruce Orcutt, and Cathy Wu
Protein Information Resource, National Biomedical Research Foundation,
Georgetown University Medical Center, 3900 Reservoir Road, NW, Washington, DC 20007-2195


The Protein Information Resource (PIR) for over thirty years has been maintaining and distributing the PIR-International Protein Sequence Database (PSD), which is the most comprehensive, well-annotated, and non-redundant public domain protein sequence database. In order to facilitate the annotation process and assure database quality, while keeping pace with the large influx of data being generated by genome sequencing projects, we are migrating the PIR-PSD and other auxiliary databases to Oracle 8i object-relational database management system from our home-grown legacy system on VAX/VMS. We use both relational and object models for database design based on ER and UML modeling, and adopt a three-tier networking architecture for database implementation. Flat files are generated for distribution, including the new XML format planned for our next quarterly release. A Java-based user-friendly web interface has been developed for querying the database and for supporting database update in both record and batch modes. With this new object-relational database system, we have greatly improved the data organization, data consistency and integrity, information retrieval, database scalability, maintainability, and interoperability of our databases. This work is supported in part by NIH Grant # P41 LM05798.

