About RCSB PDB: A Living Digital Data Resource That Enables Scientific Breakthroughs Across The Biological Sciences
Overview | Services | Funding | Users | Impact | Collaborations
Overview
RCSB PDB (RCSB.org) is the US data center for the global Protein Data Bank (PDB) archive of 3D structure data for large biological molecules (proteins, DNA, and RNA) essential for research and education in fundamental biology, health, energy, and biotechnology.
The Protein Data Bank (PDB) was established as the 1st open access digital data resource in all of biology and medicine (Historical Timeline). It is today a leading global resource for experimental data central to scientific discovery.
Through an internet information portal and downloadable data archive, PDB provides access to 3D structure data for the molecules of life, found in all organisms on the planet.
Knowing the 3D structure of a biological macromolecule is essential for understanding its role in human and animal health and disease, its function in plants and food and energy production, and its importance to other topics related to global prosperity and sustainability.
The enormous wealth of 3D structure data stored in the PDB has underpinned significant advances in our understanding of protein architecture, culminating in recent breakthroughs in protein structure prediction accelerated by artificial intelligence approaches and deep or machine learning methods.
RCSB PDB (Research Collaboratory for Structural Bioinformatics PDB) operates the US data center for the global PDB archive, and makes PDB data available at no charge to all data consumers without limitations on usage (Policies).
Recognized experts in fields, including but not limited to, structural biology, cell and molecular biology, computational biology, information technology, and education serve as advisors to the RCSB PDB.
PDB Archive contains >1 TB of Structure Data for Proteins, DNA, and RNA
The cost to replicate the contents of the PDB archive is estimated at
More than 20 billion (Analysis)
The PDB Archive
- Grows at the rate of nearly 10% per year
- Used to download ~5 million structure data files per day
- Managed by International collaboration US-Asia-Europe
- Manages “Big Data” as global Public Good
- Provides data critical to AI/ML development
PDB Data
- Enable research in subject areas from Agriculture to Zoology (Analysis)
- Contributed data to nearly >1 million published research papers
- Used by >490 biological data resources
PDB Data Impact
- Basic and applied research
- Patent applications
- Discovery of lifesaving drugs
- Innovations that can lead to new product development and company formation
- Training, Outreach, and Education: PDB-101 materials illustrate how PDB data help explain fundamental biology, biomedicine, energy sciences, and biotechnology
Millions of Data Consumers worldwide served every year
Researchers, scientists, educators, students, curious public, medical professionals, patients, and patient advocates
Public and Private sectors, including pharmaceutical and biotechnology companies
Generates annual Return on Investment of more than 1,500 times federal funding (Analysis)
Services Supporting Access to the Biological Molecules of the PDB Archive
- Service 1 Deposition/Biocuration: supports Data Depositors who submit the results of their structural studies of biological macromolecules to the PDB. All data deposited undergo expert review. Each structure is examined for self-consistency, standardized using controlled vocabularies, cross-referenced with other biological data resources, and validated for scientific/technical accuracy.
- Service 2 Archive Management and Access: supports PDB Data Consumers by maintaining the PDB archive; data dictionary development and standardization, enabling global data delivery and DOI registration, and integrating PDB data with other available information.
- Service 3 Data Exploration: supports PDB Data Consumers in the US and around the world through our open-access web portal RCSB.org that provides tools for structure visualization and analysis.
- Service 4 Training, Outreach, and Education: builds and supports the broad PDB user community with a wide range of resources for understanding 3D biostructures.
- Service 0 IT Infrastructure: Supports all RCSB PDB Services and systems by establishing policies and processes to ensure standardized systems configurations and management, redundancy, security, high availability, and disaster recovery.
Funding
RCSB PDB is supported by grants from the U.S. National Science Foundation (DBI-2321666), the US Department of Energy (DE-SC0019749), and the National Cancer Institute, National Institute of Allergy and Infectious Diseases, and National Institute of General Medical Sciences of the National Institutes of Health under grant R01GM133198.
In the past, RCSB PDB was also funded by the National Library of Medicine, the National Center for Research Resources, the National Institute of Biomedical Imaging and Bioengineering, and the National Institute of Neurological Disorders and Stroke.
Other funding awards to RCSB PDB by the NSF and to PDBe by the UK Biotechnology and Biological Research Council are jointly supporting development of a Next Generation PDB archive (DBI-2019297, PI: S.K. Burley; BB/V004247/1, PI: Sameer Velankar) and new Mol* features (DBI-2129634, PI: S.K. Burley; BB/W017970/1, PI: Sameer.
Users
RCSB PDB supports an international community of users, including biologists (in fields such as structural biology, biochemistry, genetics, pharmacology); other scientists (in fields such as bioinformatics, software developers for data analysis and visualization); students and educators (all levels); media writers, illustrators, textbook authors; and the general public.
Impact
RCSB PDB services have broad impact across research and education. The inaugural RCSB PDB citation (Berman et al., Nucleic Acids Research 2000) is one of the top-cited scientific publications of all time. A 2017 bibliometric analysis performed by Clarivate Analytics shows PDB motivated high-quality research throughout the world. Papers citing had a citation-based impact exceeding the world-average in 16 scientific fields including Biology & Biochemistry, Computer Science, Plant & Animal Sciences, Physics, Environment/Ecology, Mathematics and Geosciences.
A 2017 economic analysis performed by the Rutgers Office of Research Analytics noted that a reasonable estimate to replicate the PDB data archive at the time was $12 billion.
- Impact of PDB Structures on US FDA Drug Approvals 2010-2016 (PDF)
- Supporting the Opportunities and Grand Challenges of the NSF (PDF)
- Supporting the NIH Turn Discovery into Health (PDF)
- Supporting the Research Goals of DOE (PDF)
- Impact of PDB Structures on Anti-Cancer Drug Approvals (PDF)
- PDB Structures and the Pandemic (PDF)
- Protein Data Bank and 50 years of Molecular Structures (PDF)
- PDB Citation MeSH Network Explorer
Collaborations
Worldwide Protein Data Bank (wwPDB)
The Worldwide Protein Data Bank (wwPDB) was formed to maintain a single PDB archive of macromolecular structural data that is freely and publicly available to the global community. It consists of organizations that act as deposition, data processing and distribution centers for PDB data. As the US Data Center, RCSB PDB biocurates structures submitted from the Americas and Oceania.
PDB-Dev
PDB-Dev is a prototype archiving system for structural models obtained using integrative or hybrid modeling.
EMDataResource
EMDataResource provides access to 3DEM density maps and metadata, news, events, software tools, data standards, and validation methods.
KBase
KBase enables users to analyze, share, and collaborate using data and tools designed to help build increasingly realistic models for biological function. KBase utilizes RCSB PDB APIs to provide users with access to PDB data.