General Help
Glossary
13C - A Carbon isotope used in NMR experiments.
15N - A Nitrogen isotope used in NMR experiments.
1H - A Hydrogen isotope used in NMR experiments.
2D NMR - Two-dimensional NMR
2Fo-Fc map - A composite electron density map typically used for the modeling of atomic coordinates from crystallographic data. It comprises the sum of a map of observed density (Fo) and a difference map of observed density minus density calculated from a model (Fo-Fc). During model building and refinement this map can usually show locations of incorrectly positioned atoms.
3-10 helix - A type of protein secondary structural element that is more tightly coiled than the alpha helix (3 amino acids per turn with 10 atoms in the ring completed by each intra-helical hydrogen bond). These elements are often found as a terminal extension of an alpha helix and are less common than them.
3D NMR - See multidimensional NMR.
3D Reconstruction (EM) - In 3D electron microscopy, this is the process of calculating a three-dimensional volume map from multiple two-dimensional images.
4D NMR - See multidimensional NMR.
A DNA (A-form DNA) - A conformation of right-handed, double stranded DNA in which the bases are tilted with reference to the helix axis. This conformation has more base pairs per turn compared to the cannonical B-form DNA.
active site - A region in proteins and nucleic acids (that participate in chemical reactions), where reacting molecules (substrates) bind and make specific contacts necessary for chemical catalysis.
acylation - A chemical reaction that involves addition of organic (e.g., acetyl or myristoyl) groups. For example, post-translation modification on the N-terminal end of a protein, after the removal of the initiator methionine.
adenine (A) - A nitrogenous base that occurs in DNA and RNA nucleotides and pairs with thymine (in DNA) or uracil (in RNA) through two hydrogen bonds.
adenosine triphosphate - A molecule consisting of a ribose sugar molecule at the center with an attached adenine base on one side and a string of three phosphates on the other. The phosphates are linked via two high energy bonds, which function as the energy currency in cells.
alanine (Ala, A) - Alpha amino acid with a non-polar side chain. Alanine is weakly hydrophobic.
alignment - A comparison of two or more gene or protein sequences in order to determine their degree of similarity in amino acid or bases, respectively.
allosteric protein - A protein that changes among two or more structural conformations upon binding to a small molecule called an effector. This binding occurs at a site different from the proteins active site and enhances or reduces the protein's reactivity towards its normal ligand.
Allostery - This is a type of effect seen in proteins where the binding of a molecule, ion etc. to one location can have an impact of the structure and interactions at another location. See allosteric protein.
alpha carbon - The carbon in a molecule that is one carbon away from an aldehyde or ketone group. For example, the backbone carbon atom linked to the carbonyl group in an alpha amino acid, to which the side chain is attached.
alpha helix - A secondary structural motif of a protein. It is characterized by hydrogen bonds between the carbonyl group (-C=O) of one amino acid and the amino (N-H) group of the amino acid 4 residues below it along the helix. The helix makes one complete turn every 3.6 amino acids. The backbone atoms of the peptide in this region forms a right handed helical structure, hence the name.
amide - A molecule containing a nitrogen covalently linked to a carbonyl carbon.
amide bond - See peptide bond
amine - A molecule containing nitrogen with a single bond to a carbon chain and two other single bonds to hydrogen or carbon.
amino acid - A building block of proteins is an alpha amino acid which contains a basic amino group, an acidic carboxyl group, and a hydrogen or organic side chain attached to the central carbon atom. There are 20 different alpha amino acids commonly found in nature that can covalently link with each other to form short peptides or longer proteins.
amino acid motif - See motif
Anatomical Therapeutic Chemical (ATC) classification system - In the ATC classification system, active drugs are divided into different groups according to the organ or system on which they act and their therapeutic, pharmacological, and chemical properties. It is controlled by the World Health Organization Collaborating Centre for Drug Statistics Methodology.
angle outlier (bond angle outlier) - A bond angle that is significantly different from the expected bond angle value for standard amino acids and nucleotides. Bond angle deviations are considered serious outliers only when they are at least five standard deviations from their expected values.
angstrom (Å) - A unit of length equal to 10-10 meters. It is commonly used to measure atomic dimensions.
antibody - A protein produced by the immune system in response to an antigen in order to recognize and specifically bind to it, usually to protect it from infections or foreign substances.
area detector - An X-ray detector that can collect diffraction data from many different positions simultaneously as compared with a diffractometer. Image plates, Charged Couple Devices or CCDs, multiwire detectors, and film detectors are all considered types of area detectors.
arginine (Arg, R) - Alpha amino acid with a charged basic side chain containing an amino group.
asparagine (Asn, N) - Alpha amino acid with an uncharged, polar side chain containing an amide group.
aspartic acid (Asp, D) - Alpha amino acid with a charged acidic side chain containing a carboxyl group. The ionized form is known as aspartate.
assembly - See biological assembly
asymmetric - No symmetry is present in the assembly.
asymmetric unit - The smallest part of a crystal structure to which space group symmetry can be applied to produce the entire crystal. The asymmetric unit may contain a whole molecule, a portion of a molecule, or multiple molecules and does not necessarily represent the functionally relevant unit or assembly of the molecule(s) under investigation.
atom - The smallest unit of a chemical element that has the characteristics of that element. An atom is composed of protons and neutrons in a nucleus that is surrounded by electrons. The number of protons determines the identity of the atom.
ATP - See adenosine triphosphate
B DNA (B-form DNA) - A conformation of right-handed, double stranded DNA commonly seen in solutions. In this conformation the bases are perpendicular with reference to the helix axis.
B-factor - See temperature factor
backbone - It is the main chain of a polymer molecule. In the case of a protein is the N, Cα, C=O atoms to which the amino side chains are attached, while in a nucleic acid it is made of the linked sugar phosphate groups to with the bases are attached.
bacteria - Primitive, one-celled microorganisms without a nucleus.
ball and stick (Representation) - A molecular representation where atoms are shown as balls and covalent bonds connecting them are shown as sticks. Some visualization software represent the atoms as points and refer to this rendering as the stick representation.
base pair - Specific association between two complementary strands of nucleic acids that results from the formation of hydrogen bonds between the base components of the nucleotides of each strand. For example, Watson Crick base pairs A=T/U and G=C in the nucleic acid.
beamline - The location of experimental instrumentation and point of access to X-rays generated by a synchrotron, linear accelerator, or free electron laser.
beta sheet - An element of protein secondary structure comprised of two or more peptide strands that are either parallel or anti-parallel to each other and positioned such that carbonyl (C=O) and amino (N-H) groups of adjacent strands form hydrogen bonds between them to stabilize a sheet-like structure.
beta turn - Reverse turns are a class of protein secondary structure; where one beta strand reverses its direction in the protein.
bioinformatics - An interdisciplinary science that involves application of computational and statistical techniques to the management and exploration of biological data sets in order to identify patterns, make predictions based on trends, and organize biological data into meaningful groups.
biological assembly - The biologically relevant and/or functional grouping of a particular set of macromolecules. A biological assembly may comprise a single copy, multiple copies, or only a portion of a set of the modeled atomic coordinates. See also Asymmetric Unit.
Biological Magnetic Resonance Data Bank (BMRB) - BioMagResBank or BMRB is a publicly-accessible depository for NMR data of proteins, peptides, and nucleic acids.
BioZernike - A method for the rapid comparison of biomolecules based on their three-dimensional shapes. See also Structure Search.
BLAST - A program that performs fast alignments between two or more protein or nucleic acid sequences, often used to search a sequence database for a match to a query sequence, with the statistical significance of each match indicating the degree of similarity between related or homologous sequences.
BMRB - See Biological Magnetic Resonance Data Bank (BMRB)
carbohydrate - A biologically relevant molecule comprising only carbon, hydrogen, and oxygen atoms. Examples include sugars, starches, and cellulose.
carbonyl group - A chemical functional group comprising a carbon atom double-bonded to an oxygen atom. This group is present in the main chain / backbone portion of a peptide bond.
carboxyl group - A chemical functional group located at the protein C-terminus comprising a carbonyl group and a hydroxyl group attached to the carbon atom. During peptide bond formation, the hydroxyl group is eliminated with the addition of an amino acid.
cartesian coordinates - A coordinate system in which any point is identified by three coordinates (x,y,z) defined by their distance along each of three axes oriented perpendicular to one another.
cartoon (Representation) - A representation of biological macromolecules (proteins and nucleic acids), where the backbone atoms are shown as ribbons and coils denoting its secondary structure. In protein structures, the arrows indicate N-terminal to C-terminal direction. This may also be referred to as the ribbon representation.
CATH - A hierarchical classification of protein structural relationships which groups proteins at four major levels: Class (C), Architecture (A), Topology (T) and Homologous superfamily (H).
cation-pi interaction - A noncovalent interaction between the electron-rich π system (e.g., in benzene) and an adjacent cation (e.g., Na+). In proteins the cation may be replaced by a positively charged amino acid (e.g., Arg).
chain - Each modeled instance of a biopolymer, i.e., polypeptide (protein), polynucleotide (DNA, RNA), or oligosaccharide, in a structure in the PDB. Each chain in a structure will have a unique chain identifier, e.g., A, B so that they can be easily identified.
Chemical Component Dictionary (CCD) - A dictionary containing the definition (in mmCIF format) of all small molecules found in the wwPDB, including macromolecular building blocks (amino acids, nucleotides), ions, ligands, cofactors, inhibitors and solvent molecules.
chemical shift - A property of an atomic nucleus, measured using nuclear magnetic resonance (NMR) spectroscopy, that is dependent upon the atom, its chemical environment, and the incident magnetic field.
chi (χ) - The torsion angle centered on the alpha and beta carbon atoms of an amino acid that defines the orientation of the amino acid side chain relative to the main chain / backbone of a protein.
chirality - Spatial arrangement of points or atoms in a molecule that is non-superposable on its mirror image.
chromosome - Compact structures in the cell nuclei of eukaryotic organisms that use specialized proteins to package and manage all cellular DNA.
chromosome ideogram - A schematic representation of chromosomes that shows their relative sizes and banding patterns as well as the positions of specific genes.
clashscore - For every 1000 atoms in a structure (including hydrogens either modeled or added during the calculation), the number of atoms determined by MolProbity to be experiencing interatomic overlaps (clashes) greater than 0.4 Å.
coenzyme - A non-protein organic compound required for the activity of an enzyme; a specific type of cofactor.
cofactor - A non-protein chemical compound or metal ion required for the activity of an enzyme.
components (of structure) - The contents of a structure, including (bio)polymers, ligands, water (i.e., solvent), and ions.
configuration - The specific arrangement of covalent bonds and chiral centers in a molecule. Alteration of molecular configuration requires the breaking and/or formation of covalent bonds.
conformation - The specific arrangement of atoms in a molecule in a spatio-temporal context. Alteration of conformations requires rotations about bonds and changes in bond and torsion angles. Unlike changes in configuration, the breaking and/or formation of covalent bonds is not required.
conformer - The spatial arrangement of atoms in a molecule at a specific time or molecular context. Conversion between conformers neither changes chiral centers nor breaks or forms covalent bonds.
constraint (X-ray) - A fixed value assigned to a parameter during all or a portion of the refinement of a X-ray crystal structure. For example, early on in the refinement process, all occupancies might be constrained to values of 1.0. This constraint might be removed later on in the refinement process so that the occupancies could reflect alternative conformations.
cooperativity - Interaction between substrate/ligand binding sites of an allosteric enzyme where binding at one site changes the affinity of the binding sites on the other subunits by conformational change at the other binding sites.
coordinates - For the PDB these are the x, y, z, positions for every atom present in a structure. See also cartesian coordinates.
correlation spectroscopy - An NMR technique that provides information about protons that are spin-spin coupled to help determine protein structure.
COSY - See correlation spectroscopy
covalent bond - A bond that exists between two atoms if they share electrons between them. One pair of electrons forms a single bond, whereas two pairs form a double bond.
CPK colors - A convention of colors used for atoms in a visualization representation where hydrogen is white, oxygen is red, nitrogen is blue, carbon is black (or grey) etc.
CPK model - A spacefilling three-dimensional representation of a molecule. Given the centers of the molecule atoms and the relative van der Waals radii, a spherical CPK-type representation of a molecule can be built. This representation was proposed by Corey-Pauling-Koltun hence the name CPK.
cryo-electron microscopy - Electron microscopy performed at a very low temperature in order to preserve the sample in vitreous ice. This method also provides the ability to examine large biomolecular structures and complexes in close to physiological states, without the use of heavy atoms in the sample. Learn more about Methods for Determining Atomic Structures.
cryo-EM - See cryo-electron microscopy
crystal - A homogeneous solid in which molecules are arranged in a regular, repeating network.
crystallization - The formation and growth of a crystalline solid where molecules are positioned in an ordered lattice arrangement, by slowly altering the solubility of the molecules e.g., by changing temperature (cooling), evaporating a solution, or precipitating the molecule from solution by adding alcohols, salts etc.
crystallization components - The components of the solution(s) used to produce crystals. It includes the molecule to be crystalized and precipitants.
crystallography - See X-ray crystallography
CSM - Computed Structure Models
cyclic symmetry - A type of symmetry where N identical subunits in an assembly are related to one another by successive rotations of (360/N)° around a single (N-fold rotational) axis. See Symmetry Resources in the PDB.
cysteine (Cys, C) - Alpha amino acid with a uncharged side chain containing an sulfhydryl group. Cysteine can form covalent disulfide links within protein structures.
cytosine (C) - A nitrogenous base (pyrimidine) that occurs in DNA and RNA nucleotides and pairs with guanine (in DNA or RNA) through three hydrogen bonds.
dalton (or Da) - A unit of mass equal to one twelfth of an unbound neutral carbon-12 atom. Daltons are used to specify the mass of molecules.
data (NMR) - In NMR spectroscopy, the initial data collected include high-resolution multidimensional NMR spectra which reveal correlations between atoms that lie either within a few bonds of each other (J coupling) or within a short distance of each other (NOE coupling). NMR resonances are then assigned to specific atoms in the molecule and couplings are assigned to pairs or groups of atoms. These couplings are then used to produce a list of restraints that specify that particular atoms in the final model be near each other. Finally, a computer program is used to produce a model of the molecule which fits all of the restraints.
data (X-ray) - In X-ray crystallography, the initial data include the measured positions and intensities of the reflections in the diffraction pattern produced by the macromolecular crystal. The relative phases of the waves that produced the diffraction pattern are also determined. An electron-density map of the molecule is then computed from the positions, intensities and phases of all of the reflections. A model of the molecule contained within the crystal is then built to fit the electron-density map.
density modification - Density modification is a tool used for improving phase estimates, and therefore improving electron density maps, when a set of experimental structure factor magnitudes and some initial phase estimates are available. Often this includes calculation of phases for previously unphased reflections.
deoxyribonucleic acid - See DNA
deoxyribose - Type of sugar molecule found in DNA
diffraction (X-ray) - When a macromolecular crystal is irradiated with an X-ray beam, the electrons surrounding each atom in the molecule bend, or diffract, the X-ray beam. As the beam exits the crystal, the scattered X-rays produce spots, or a diffraction pattern, on a photographic film or detector.
diffractometer - An instrument for measuring and analyzing diffraction data produced by the scattering when a beam of radiation or particles (such as X-rays or neutrons) interacts with a material.
dihedral angle - The spatial relationship of four atoms (A, B, C, D) that are sequentially connected and is used as a measure of conformation about the bond between atoms B and C. It is defined by the angle between the plane containing atoms A, B, and C and the plane containing atoms B, C, and D. Another way to measure it is to look down the bond between B and C and measure the angle between the bonds A-B and C-D.
dihedral symmetry - A type of symmetry where 2N identical subunits in an assembly are related to one another by successive rotations of (360/N)° around an axis plus additional 180° rotations around N two-fold axes located perpendicular to the N-fold rotational axis. See Symmetry Resources in the PDB.
dimer - Composed of 2 units. In the context of a protein - this may refer to an assembly of 2 polymer chains or subunits.
disordered regions (NMR) - In NMR structures, disordered regions of molecules have high variability in their location so there may be large differences in these regions between the models of the ensemble. The implication is that the atom positions in these regions are very uncertain.
disordered regions (X-ray) - In crystal structures disordered regions of molecules appear as weak regions of electron density and as regions with high temperature factors. Atom positions in these regions are very uncertain.
disulfide bond - Covalent bond formed between the sulfur atoms of two cysteine residues in a protein. Disulfide bonds often stabilize protein structure or link multiple proteins in a complex. See a video.
disulfide bridge - See Disulfide bond
DNA - DNA or deoxyribonucleic acid is the molecule that encodes genetic information necessary for all cellular functions. DNA is composed of the sugar deoxyribose, phosphate groups, and the bases adenine , thymine (T), guanine (G) and cytosine (C). The DNA molecule is normally a double helix in which the two strands are bound through complementarity of the bases. Cs from one strand hydrogen bind to Gs of the other strand and vice versa. Similarly, As and Ts hydrogen bind to each other.
DOI - A digital object identifier (DOI) is a persistent identifier or handle used to identify objects uniquely, standardized by the International Organization for Standardization (ISO).
domain - A distinct area in a protein of specific tertiary structure that folds independently and may possess its own function
E value - E value is a parameter that describes the number of hits one can "expect" to see by chance when searching a database of a particular size. It decreases exponentially as the Score (S) of the match increases. Essentially, the E value describes the random background noise.
EC number or Enzyme Commission number - EC number is a numerical classification scheme for enzymes, based on the chemical reactions they catalyze. As a system of enzyme nomenclature, every EC number is associated with a recommended name for the respective enzyme.
effector - In biochemistry, an effector is that molecule that binds to a specific protein, and regulates the latter’s biological activity. An effector molecule acts as a ligand that is capable of increasing or decreasing the activity of that protein. It can also regulate the activity of certain mRNA molecules (e.g. riboswitches), gene expression, and cell signaling. The main types of effectors are the activators and the inhibitors.
electron microscopy - Imaging technique that uses a beam of high energy electrons to produce a magnified image of an object.
electron-density map - A depiction of the electron cloud surrounding a molecule. In crystallography, an atomic model is built to fit an electron density map. See also Fo-Fc and 2Fo-Fc maps.
electrostatic interaction - A non-bonded interaction between protein molecules. Electrostatic attraction between oppositely charged ions holds them close together.
EM - Electron Microscopy
ensemble - A collection of alternative models that each fits a set of NMR experimental data. A single model may be chosen by averaging the atomic positions of the set of models and then minimizing the energy of the result (see minimized average structure) or choosing a representative model from the ensemble (see representative conformer).
entity - A unique molecule under investigation, and can be one of three types: polymer, non-polymer and water. Entities are described only once, even in those structures that contain multiple observations of an entity. For example, in a DNA molecule containing two identical chains the entity would be one of the individual chains rather than the double helix.
entry (or PDB entry) - An entry in the PDB archive comprises all 3D coordinates, experimental data, and metadata associated with a specific macromolecular structure.
enzyme - A macromolecule that is capable of catalyzing, or speeding up, specific biochemical reactions without being permanently altered or consumed. The term enzyme usually refers to proteins, but some RNA molecules can also function as enzymes.
enzyme classification - A hierarchical classification of enzymes grouped according to function.
eukaryote - A eukaryote is an organism, such as a plant or an animal, whose cells contain a nucleus distinct and separate from the cytoplasm.
experimental model - Experimental models are derived from structure determination methods involving an actual sample of macromolecules, and determined either X-ray diffraction, NMR spectroscopy, or electron microscopy. In contrast, theoretical models include homology models and models obtained from simulations of folding and molecular dynamics.
experimental structure - see Experimental model
expression system - Expression systems facilitate the production of proteins or nucleic acids from genetic constructs (a gene inserted into a vector for introduction into a host cell to produce a protein, or an RNA (ribonucleic acid), either inside or outside a cell.
FASTA format - FASTA format is a text-based format for representing either nucleotide sequences or amino acid (protein) sequences, in which nucleotides or amino acids are represented using single-letter codes.
fatty acid - Aliphatic monocarboxylic acids found in esterified form in an animal/vegetable fat, oil, and wax.
Fo-Fc - This electron density map is also called the omit map and is used to show what has been overfit or not accounted for by the model.
Fo-Fc map - An Fo – Fc or mFo – DFc map is calculated by subtracting the observed structure factor amplitudes from those calculated from the current model (Fc). Thus, the Fo – Fc map shows where the model and experimental data differ i.e., regions that were present in the observed map but not accounted for by the model.
fold - Families of proteins are classified based on their secondary structural units of helices and sheets . Each different topology is considered as a fold.
fourier transformation - A mathematical process that performs a conversion between real space and reciprocal space. Short distances in real space become long distances in reciprocal space, and vice versa. Using a Fourier transformation, a computer program can convert an X-ray diffraction pattern into an electron density map.
fractional coordinates - Coordinates in which the positions of atoms are described in terms of the sides of a unit cell.
functional genomics - The study of the expression of genes and their functions in different biological processes, including how they contribute to or cause disease.
Gaussian surface (representation) - A representation of biological macromolecules (proteins and nucleic acids), where the atoms are shown as Gaussian functions to create a 3D volume representing the overall shape of the molecule.
GenBank - The NIH sequence database that contains an annotated collection of all publicly available DNA sequences. Each entry includes a description of the sequence, information on the source organism, protein translation for coding regions, and bibliographic references.
gene - A segment of DNA that encodes a specific protein or a protein subunit.
gene expression - The process by which a gene's coded information is translated into proteins.
Gene Ontology (GO) - The Gene Ontology (GO) is a bioinformatics initiative to unify the representation of gene and gene product attributes across all species.
genetic code - The genetic code is the set of rules used by living cells to translate information encoded within genetic material (DNA or mRNA triplet sequences, or codons) into proteins.
genetics - The scientific study of heredity.
genome - All of the genetic material in the chromosomes of a particular organism.
genomics - Interdisciplinary field of biology linked to structure, function, evolution and mapping of genomes.
genotype - All of the genes possessed by a particular individual.
genotyping - A procedure by which DNA sequence differences between individuals are used to study the differences in the genetic make-up between them.
global symmetry - Identical subunits composing a complete assembly are arranged such that their positions and orientations can be directly related to one another via a defined set of rotations and/or translations. See Symmetry Resources in the PDB.
glutamic acid (Glu, E) - Alpha amino acid with a charged (acidic) side chain containing a carboxyl group. The ionized form of the amino acid is known as glutamate.
glutamine (Gln, Q) - Alpha amino acid with a uncharged, polar and hydrophilic side chain containing both a carboxylic and an amide group.
glycine (Gly, G) - Alpha amino acid with a hydrogen atom as its side chain. Glycine is weakly hydrophobic and is often located at bends or folds in proteins.
glycoprotein - Protein molecules that have one or more sugars (glycans) covalently attached to it. These proteins are involved in many physiological functions including immunity.
glycosidic bond - Covalent bond linking sugars in polysaccharides.
glycosylation - The addition of carbohydrate groups to particular residues on a protein.
guanine (G) - A nitrogenous base (purine) that occurs in DNA and RNA nucleotides and pairs with cytosine (in DNA or RNA) through three hydrogen bonds.
hairpin turn - Bends in the peptide backbone that often occur between two antiparallel beta-strands or helices. Hairpin turns usually contain proline residues.
halogen bonds - A non-covalent interaction that forms when there is evidence of a net attractive interaction between an electrophilic region associated with a halogen atom in a molecular entity and a nucleophilic region in another, or the same, molecular entity. See a video.
helical symmetry - A type of symmetry where an indeterminate number of identical subunits are related to one another by identical rotations around and translations parallel to a central ("helical") axis. See Symmetry Resources in the PDB.
helix-turn-helix motif - Helix-turn-helix (HTH) motifs are domains found in most DNA binding proteins. They consists of 2 alpha-helices where one is a recognition helix that binds to the DNA and another stabilizing helix separated by a short loop.
hertz - A unit of frequency defined as one cycle per second. It is often used to measure the frequency of electromagnetic radiation.
heterodimer - An assembly composed of 2 different molecular components. For example, a protein heterodimer is composed of 2 different protein chains.
Heteronuclear Multiple Quantum Coherence (HMQC) - An NMR technique that provides information about the correlation between directly-bonded protons and neighboring atoms (commonly, 13C and 15N) to help determine protein structure.
Heteronuclear Single Quantum Correlation (HSQC) - An NMR technique that provides information about the correlation between directly-bonded protons and neighboring atoms (commonly, 13C and 15N) to help determine protein structure.
heterotetramer - An assembly composed of 4 different molecular components. For example 4 different protein chains or subunits.
heterotrimer - An assembly composed of 3 different molecular components. For example 3 different protein chains or subunits.
histidine (His, H) - Alpha amino acid with a polar side chain that contains a imidazole group that can be ionized to acquire a positive charge.
homodimer - An assembly composed of 2 identical molecular components. A protein homodimer is composed of 2 identical protein chains or subunits.
homology - In biology, homology means having similar sequence or structure of protein molecules due to shared ancestry. It points to a potentially common 3D shape and function.
homotetramer - An assembly composed of 4 identical molecular components. A protein homotetramer is composed of 4 identical protein chains or subunits.
homotrimer - An assembly composed of 3 identical molecular components. A protein homotrimer is composed of 3 identical protein chains or subunits.
hydrogen bond - A bond between a hydrogen atom (that is covalently attached to one electronegative atom) and another electronegative atom. Hydrogen bonds are important in many biological interactions - e.g., stabilizing alpha helices and beta sheets; base pairing between complementary strands of nucleotides, and many other interactions. See a video.
hydrophilic - Water loving property of functional groups, amino acids, molecules, or domains that prefer to be in an aqueous environment. Hydrophilic groups tend to make favorable interactions with water, usually through hydrogen bonds.
hydrophobic - Water repelling property of functional groups, amino acids, molecules, or domains that prefer to be in a non-aqueous (lipid) environment because they cannot make favorable interactions with water.
hydrophobic interactions - Non-bonded interactions between hydrocarbons in aqueous solutions so that the hydrophobic groups in solutes aggregate together to form inter- or intramolecular interactions and exclude interactions with water molecules.
Icosahedral symmetry - A type of symmetry where sixty identical subunits are distributed on an icosahedron, related to one another via 6 five-fold rotational axes (through icosahedral vertices), 10 three-fold rotational axes (through icosahedral faces), and 15 two-fold rotational axes (through icosahedral edges). See Symmetry Resources in the PDB.
immunoglobulin (Ig) - These are protein assemblies produced by B lymphocytes to specifically bind to antigens, as part of an immune response. There are several types of immunoglobulins, of which IgG is the most common type, composed of two identical light chains and two identical heavy chains.
in situ - Maintenance or study of an organism within its native environment.
in vitro - A biological or chemical experiment that is performed outside an organism, for example in a test tube.
in vivo - A biological or chemical experiment that is performed within a cell or organism.
InChI - InChI is the International Chemical Identifier developed by IUPAC, the International Union of Pure and Applied Chemistry.
InChI Key - InChIKey is a compact chemical identifier derived from InChI. The InChIKey is always only 27-characters long and is not human understandable.
inhibitor - A molecule that blocks or suppresses the biological activity of another molecule.
Instance - A particular occurrence of an entity. See also Organization of 3D Structures in the PDB.
International Chemical Identifier (InChI) - It is a textual identifier for a chemical substance. It encodes the molecular description in a standard way. See also InChI and InChI Key.
ion - An atom or group of atoms that have either gained or lost electrons and as a result bears an electrical charge. Positively charged ions are called cations and negatively charged ions are called anions.
ionic bond - A bond between two oppositely charged atoms formed by electrostatic attraction. See a video.
ionic interactions - Interactions between two groups of opposite charges.
isoleucine (Ile, I) - Alpha amino acid with a branched aliphatic, non-polar, and hydrophobic side chain.
isomer - One of two or more molecules with identical chemical compositions, but differing in the arrangement of their atoms. Two isomers may have different physical, chemical, and biological properties.
isotope - Isotopes are variants of a particular chemical element. They contain equal numbers of protons but different numbers of neutrons in their nuclei, and hence differ in relative atomic mass but not in chemical properties. Example, a radioactive form of an element. Isotopes are often used to trace atoms or molecules in a metabolic pathway. In NMR, only one isotope of each element contains the correct magnetic properties to be useful.
J Coupling - Interaction observed in NMR, between nuclei containing spins that are mediated through bonds in the molecule.
JSON format - JSON is an open standard file format, and data interchange format, that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and array data types. It has a diverse range of applications, such as serving as a replacement for XML.
kilodalton - A unit of mass equal to 1,000 daltons (see Dalton)
kinase - Kinase is an enzyme that catalyzes the transfer of phosphate groups from high-energy, phosphate-donating molecules to specific substrates. This process is known as phosphorylation.
label (Mol*) - A way to identify an amino acid or nucleic acid residue in a structure. Residues can be labeled by both their type and their position within the macromolecule's polymer sequence.
Laue diffraction - One of the oldest and most common X-ray crystallography techniques. Diffraction of X-rays occurs when a beam of X-rays falls on a crystal. The scattered X-rays then produce spots, or a diffraction pattern, on a photographic film. In Laue diffraction, the X-rays are polychromatic and the crystal is stationary throughout the exposure. In monochromatic diffraction, the X-rays have a single X-ray wavelength and the crystal is rotated during the exposure.
leucine (Leu, L) - Alpha amino acid with an aliphatic, non-polar, and hydrophobic side chain.
ligand - A molecule that binds specifically to another molecule (usually a protein or nucleic acid) to form a complex. A ligand can be another protein, but is often a small molecule.
lipid - A substance that is soluble in non-polar solvents (and insoluble in water). For example fats, steroids.
local symmetry - Identical subunits composing a portion of an assembly are arranged such that their positions and orientations can be directly related to one another via a defined set of rotations and/or translations. See Symmetry Resources in the PDB.
luzzati plot (X-ray) - A way of estimating the overall or average precision of atom positions in a refined crystallographic model. The Luzzati plot provides an estimate of the upper limit of error in the atomic coordinates.
lysine (Lys, K) - Alpha amino acid with a charged basic side chain containing an amino group.
macromolecule - A large molecule that is a polymer of smaller units joined together, such as DNA, RNA, protein, or carbohydrate.
MAD - see multi-wavelength anomalous diffraction
Mass spectrometry (MS) - Mass spectrometry is an analytical tool useful for measuring the mass-to-charge ratio (m/z) of one or more molecules present in a sample. These measurements can often be used to calculate the exact molecular weight of the sample components as well.
Matthews coefficient - The crystal volume per unit of protein molecular weight. This helps estimate the solvent content in a crystal.
Medical Subject Headings (MeSH) - MeSH is a hierarchical set of keywords developed by the National Library of Medicine. It is used for indexing, cataloging, and searching of biomedical and health-related information.
megahertz - A unit of measurement equal to 1,000,000 hertz (see hertz).
messenger RNA - See mRNA
metal coordination - Coordinate covalent bonds are formed when the shared electrons in the bond are contributed by a donor (Lewis base) to an electron acceptor (Lewis acid), often a metal ion (or atom) in the center of the coordination complex. Transition elements (e.g., Cu, Zn, Ni, Fe, Co) and a few other elements can form coordination bonds with biological macromolecules.
metalloprotease - A class of protease characterized by a metal ion at its active site.
methionine (Met, M) - Alpha amino acid with a sulphur-containing, non polar, hydrophobic side chain.
microbiology - The science of microscopic organisms e.g., bacteria, viruses, fungi.
minimized average conformation/structure - A single model derived from an ensemble set of NMR structures by averaging the atomic positions of the set of models and then minimizing the energy of the average set of atoms.
minimized average structure - See minimized average conformation
mmCIF - The macromolecular Crystallographic Information File (mmCIF) is a file format that is used to describe the features of a macromolecular structure. The mmCIF file is suitable for data archiving and data exchange. The initial CIF (Crystallographic Information File) format and dictionary was developed for archiving small molecule crystallographic experiments. In 1997, the dictionary was expanded to include data items relevant to macromolecular crystallographic experiments (PDBx/mmCIF). The PDBx/mmCIF file format and data dictionary is the basis of wwPDB data deposition, annotation, and archiving of PDB data from all supported experimental methods.This format overcomes limitations of the legacy PDB file format and supports data representing large structures, complex chemistry, and new and hybrid experimental methods. This format is also more amenable to automated searching for metadata in a given entry or throughout the entire PDB archive, relative to the legacy PDB format.
molecular biology - The study of nucleic acids and proteins in relation to the mechanisms of gene expression (how products are made from genes), regulation (activation/deactivation) and manipulation, and also DNA replication.
molecular chaperone - A protein that assists in the folding of a second protein into its active conformation.
Molecular Replacement (MR) - A technique used in X-ray crystallography that uses the structure of a molecule whose structure is already known (initial model) to determine the structure of the molecule of interest. This technique allows electron density maps to be calculated using the model combined with the experimental data. These maps reveal differences between the initial model and the structure of interest. The model therefore is changed to fit the calculated maps, and thereby the structure of the molecule of interest is obtained.
molecular surface (Representation) - A representation of biological macromolecules (proteins and nucleic acids), where a probe sphere is used to calculate a solvent-excluded surface that shows overall molecule shape, surface features, and accessible binding pockets.
molecule - The smallest unit of a non-ionic compound that has the characteristics of that substance. A molecule consists of one or more atoms covalently bound together.
monomer - Composed of 1 units. In the context of a protein - this may refer to a 1 single chain in assembly.
monosaccharide - A single saccharide unit or carbohydrate building block that cannot be decomposed to a simpler sugar.
motif - A motif may refer to a sequence of amino acids or a small structural element, defined by a combination of secondary structures that has a specific topology and is organized into a characteristic three-dimensional structure with a specific structural and/or functional role.
mRNA - Messenger RNA is an RNA molecule synthesized by a cell, based on genetic information encoded by DNA, that directs the ribosomes of the cell to produce a specific protein. The DNA forms the permanent copy of the instructions for all the proteins a cell is to produce, whereas the mRNA molecules are short-lived copies, each containing the instructions for only one specific protein. Ribosomes read the information from the mRNA, and then synthesize a number of molecules of the correct protein.
Multi-wavelength Anomalous Dispersion/Diffraction (MAD) - A phasing technique used in X-ray crystallography where diffraction data is collected from a crystal containing an anomalous scatterer at different wavelengths, thereby relieving crystallographers from having to make several different metal-containing isomorphous crystals.
multidimensional (three- and four-dimensional) NMR - Together with advances in isotope labeling techniques of biological macromolecules, this technology has the advantage of resolving severe overlaps in 2D spectra and providing additional structural and dynamic information.
multimeric structure - A molecular structure composed of several identical or different polypeptide chains or subunits, held together by noncovalent (weak) bonds.
Multiple Isomorphous Replacement (MIR) - A phasing technique used in X-ray crystallography where diffraction data is collected from multiple heavy atom soaked crystals that are isomorphous to the native crystal.
native conformation - The physiological conformation (shape) of a protein or the conformation it has in its natural biological environment.
NDB - See Nucleic Acid Database
NMR - Nuclear Magnetic Resonance Spectroscopy is an experimental tool in which a magnetic field is applied to magnetic active nuclei (e.g., 1H, 13C, 15N) and perturbed by electromagnetic (e.g., radio frequency) pulses. Measurements of responses of the magnetic active nucleic provide rich structural, dynamic, and kinetic information about the molecule's environment. This information is used to determine the structures of the molecule and study its overall shape and movements. Learn more about Methods for Determining Atomic Structures.
NMR ensemble - A set of structures, all of which are consistent with the same experimentally derived restraints.
NOESY - See nuclear overhauser effect spectroscopy
non-polar molecule - An (electrically neutral) molecule that has its electrons uniformly distributed through the molecule. Non-polar molecules tend to have low solubility in water, but high solubility in oils.
Nuceic Acids Database (NDB) - A searchable database of structures of nucleic acids and their complexes, with links to the corresponding entries in the Protein Data Bank.
Nuclear Magnetic Resonance (NMR) - See NMR
Nuclear Overhauser Effect SpectroscopY (NOESY) - An NMR technique that provides information about the distances between non-bonded hydrogen nuclei to help determine protein structure.
nucleic acid - A large molecule composed of nucleotide subunits. There are two types of nucleic acid, ribonucleic acid (RNA) and deoxyribonucleic acid (DNA).
nucleoside - Nucleotide precursor containing a nitrogenous base and a sugar molecule, but not a phosphate group.
nucleotide - The subunit of DNA or RNA that contains a nitrogenous base, a sugar, and a phosphate group. In DNA the bases may be adenine, cytosine, guanine, or thymine while in RNA it may be adenine, cytosine, guanine, or uracil. The sugar in DNA is deoxyribose, while that in RNA is ribose. Free nucleotides that have not been incorporated into DNA or RNA polymers may have up to three phosphate groups attached to a specific location in the sugar.
nucleus (cell) - A membrane-bound organelle that contains the cell's DNA.
occupancy (X-ray) - A measure reported for each atomic position determined by X-ray crystallography. Occupancy of an atom is the fraction of molecules in the crystal, in which the atom is actually present in the position specified (model coordinates). If all molecules in a crystal have identical conformations, then occupancy for each atom is 1.00. If two or more conformations are observed for a small portion of the molecule in the crystal, the occupancy of the atoms making up that region may be less than 1.00. For example a part of an amino acid side chain may occupy two alternate positions so the occupancies may be reported as 0.5. The sum of reported occupancies of all distinct positions of an atom should add up to a maximum of 1.00.
octahedral symmetry - A type of symmetry where twenty-four identical subunits are distributed on an octahedron, related to one another via 3 four-fold rotational axes (through octahedral vertices), 4 three-fold rotational axes (through octahedral faces), and 6 two-fold rotational axes (through octahedral edges). See Symmetry Resources in the PDB.
oligomer - Composed of a few units. In the context of a protein - this may refer to an assembly of a few polymer chains or subunits.
oligosaccharide - Composed of a small number of monosaccharide units. Oligosaccharides are frequently found attached to secreted proteins and those found on the surface of cells.
omega - Torsion angle (ω) measures the rotation around the bond between the nitrogen and the carbonyl carbon in a peptide bond. It is generally close to 180°, or, less often, near 0°.
orientation (Mol*, Representation) - Each polymer chain is represented as an oval or oblong shape to emphasize its orientation as well as the overall organization of a structure.
PDB - See Protein Data Bank
PDB format - A particular file format for describing structures.
PDBML/XML format - The Protein Data Bank Markup Language (PDBML) provides a representation of PDB data in XML format.
PDBx/mmCIF format - See mmCIF
pentamer - Composed of 5 units. In the context of a protein - this may refer to an assembly of 5 polymer chains or subunits.
peptide - A molecule composed of two or more amino acids linked by peptide bonds. Shorter peptides (containing 2 to ~15-20 amino acids) may be called oligopeptides, while long peptides are generally referred to as polypeptides or proteins.
peptide bond - A covalent bond linking the carboxyl (COOH) group C of one to the amino (NH2) group N of another amino acid. This amide bond is formed in a dehydration reaction, involving release of a water molecule where an OH is removed from the carboxyl group and an H is removed from the NH2 group. Peptide bonds linking a series of amino acids in form peptides and proteins.
Pfam - A database of protein families and domains. Protein families are classified based on sequence similarity, structural similarity, functional similarity and more.
pH - This is an abbreviation for power/potential of hydrogen and is a scale used to specify the acidity or basicity of a solution. Acidic solutions have a lower pH value (e.g., 0-6) while basic solutions have higher pH values (8-14). A neutral pH is ~7, while the pH of human blood is slightly basic (~7.4). Depending on the pH of the environment, amino acids and nucleotides may be gain or loose protons (i.e., get protonated or deprotonated) resulting in the building blocks becoming charged (+ive or -ive). The charges may interact with each other to form ionic bonds or with ions and other molecules. Protonation and deprotonation of biomolecular building blocks may be influenced by the dielectric constant of their immediate neighborhoods - e.g., the ability of an amino acid on the surface of a protein may be different compared to that of the same amino acid within a hydrophobic core. The PDB does not record pH values for individual amino acids or nucleotides. However the pH of the sample solution experiment, crystallization buffers etc. may be recorded and can be searched in the archive.
phenylalanine (Phe, F) - An amino acid with a large non-polar side chain. Phenylalanine is hydrophobic and is often found burried within the hydrophobic cores of protein structures.
phi - Torsion angle phi (Φ) measures the rotation around the bond between of the alpha carbon and nitrogen bond of each amino acid.
phosphate - A chemical group composed of one phosphorous atom bound to four oxygen atoms. It can be transferred to proteins and other biological molecules for short-term energy storage, or else for regulatory or structural purposes.
phosphodiester bond - A bond between two sugar groups and a phosphate group. Such bonds link nucleotides together to form the sugar-phosphate-sugar backbone of DNA and RNA.
phosphorylation - The covalent attachment of a phosphoryl (PO3--) ion onto a hydroxyl group of an amino-acid within a protein, or on a sugar. Phosphorylation is a critical step in many biochemical processes such as enzyme activity control. Phosphorylation is performed by enzymes that are referred to as kinases. (The reverse reaction, dephosphorylation, is performed by enzymes referred to as phosphatases.)
photosynthesis - The biochemical process by which green plants, algae, and some bacteria use the sun's energy to synthesize organic compounds (glucose) from carbon dioxide and water.
pI - The isoelectric pH is the pH of a solution at which the net charge of a protein becomes zero. It can be computed for a protein or peptide based on the pKa values of all component amino acids. In practice, molecules are at their lowest solubility in water when they are at pI. Knowing the pI of a protein can be helpful in purifying it using ion exchange chromatography. See also pH.
pi-pi interaction - Interactions that occur between aromatic π systems binding either face to face or face to edge with one another.
plasmid - An autonomously replicating, extrachromosomal DNA molecule that is distinct from the normal bacterial genome. It is usually in the form of circular double-stranded DNA. Molecular biologists use plasmids as cloning vectors, in which they insert a gene of interest in order to introduce that gene into a recipient cell.
pLDDT - predicted local distance difference test, a confidence measure for computed structure models.
point (Mol*, Representation) - Each subunit within a component (e.g, amino acids in a protein or atoms in a molecule) is represented as a single point.
Point Group - This is a group of symmetry operations that intersect at a point and when applied to a molecule, leaves it unmoved in space. For biomolecules, point groups may be a unit matrix (i.e., no symmetry), a 2, 3, or n fold rotation (cyclic symmetry) or a combination of multiple intersecting rotational axes (e.g., 4-fold, 3-fold, and 2-fold axes in a cubic symmetry)
polar molecule - A molecule which has partial positive electric charge in one part of the molecule and a complementary partial negative charge in another part. Polar molecules tend to be soluble in water, but insoluble in oils.
polymer - A large molecule formed by joining small molecules (monomers) together.
polypeptide - A long string of amino acids linked by peptide bonds.
polysaccharide - A biological polymer composed of multiple sugar subunits.
post-translational modification - Alterations made to a protein after its synthesis in the cell (translation). For example, addition of phosphate groups (phosphorylation), carbohydrates (glycosylation) or fatty acid chains (acylation). These modifications impact protein-protein interactions and play critical roles in protein function and regulation of biological processes.
prenylation - A post-translational modification of a protein that involves the covalent attachment of prenyl groups to proteins. The prenyl groups include the 15-carbon farnesyl groups and the 20-carbon geranylgeranyl groups. Prenylation typically anchors the protein to the cell membrane from the intracellular side.
primary structure - The linear sequence of amino acids in a protein.
prokaryote - An organism whose cells do not contain a nucleus such as in bacteria or blue-green algae.
proline (Pro, P) - Alpha amino acid with a non-polar cyclic side chain and a secondary amine group. Proline is weakly hydrophobic and often seen in bends, turns, and at the ends of helical regions in a protein.
PROSITE - Database of protein domains, families and functional sites. Patterns and profiles to identify the protein domains, families etc. are also available.
prosthetic group - A tightly bound non-peptide organic or inorganic component of a protein (e.g., lipid, carbohydrate, phosphate group, metal ion, etc.). Prosthetic groups are involved in catalytic mechanisms and required for activity. They may also be called cofactor, or coenzyme, For example, the heme group of hemoglobin.
protein - A large biological molecule composed of a long string of amino acids joined by peptide bonds. The order and identity of these amino acids is determined by the DNA sequence of the gene coding for the protein. Proteins perform a wide variety of functions and can serve as enzymes, antibodies, hormones, or structural components, among other functions. See a video about it.
Protein Data Bank - The single repository of experimentally determined structures of proteins, nucleic acids and complex biomolecular assemblies as managed by the Worldwide Protein Data Bank. Structures are freely available to download.
protein families - Sets of proteins that share a common evolutionary origin and typically have similar sequences, three-dimensional structures, and functions.
proteolysis - The process by which proteins are degraded into smaller polypeptides or individual amino acid residues via hydrolysis of peptide bonds. This reaction is often carried out by enzymes called proteases. For example, post-translational modifications of proteins such as removal of the N-terminal methionine in some proteins and cleavage of the signal peptides for maturation of secretory proteins, are both proteolysis events.
proteomics - Study of the entire protein complement of a cell, tissue, organ, biological fluid, or organism at a specific point in time.
pseudosymmetry - A form of approximate symmetry where similar but non-identical subunits are assembled in positions defined by the symmetry elements such as rotations and/or translations. See Symmetry Resources in the PDB.
psi - Torsion angle Psi (ψ) measures the rotation around the bond between the alpha carbon and the carbonyl carbon.
PubMed - A free resource supporting the search and retrieval of biomedical and life sciences literature. The database does not include full text journal articles, however links to the full text are often present when available from other sources, such as the publisher's website or PubMed Central (PMC).
purine - A nitrogen and carbon containing two joined ring structure (bicyclic nitrogenous base) found in DNA and RNA. Examples of purines include Adenine and Guanine (both commonly found in DNA and RNA.
putty (Mol*, Representation) - A representation of biological macromolecules (proteins and nucleic acids), where the protein backbone is shown as a putty- or clay-like strand whose diameter or cross-section is dependent upon a defined property of the structure.
pyrimidine - A nitrogen and carbon containing single ring structure (monocyclic nitrogenous base), found in DNA and RNA. Examples of pyrimidines include Thymine (commonly found in DNA), Cytosine (commonly found in DNA and RNA), and Uracil (commonly found in RNA).
Quasisymmetry - A form of approximate symmetry where multiple subunits are assembled within each position of a point group, while forming similar interactions with neighboring subunits. Most often found in icosahedral viruses, where multiples of 60 subunits are used to build larger spherical capsids. See also Quasisymmetry in Icosahedral Viruses.
quaternary structure - The arrangement of two or more protein subunits (or other biological polymers) interacting together to form higher order spacial arrangements.
R-factor - The residual, R-value or just R-factor is an indicator of model quality that measures the discrepancy between the amplitudes of the structure factors ("reflections") calculated from a crystallographic model with those from the original X-ray diffraction data.
R-free - This is a special type of R-factor, calculated for a small subset of reflections that are not used in the refinement and is used for cross-validation. A round of refinement typically decreases R-work; if the refinement was successful, then R-free should also decrease, even though the corresponding subset of reflections was not considered during the refinement. Generally the R-free value will always be greater than R-work, but the two statistics should be similar. If they differ significantly then that indicates the model may have been over-parameterized, so unreliable. Thus R-free is a useful validation metric to assess model quality and guide model refinement strategies. See also R-factor.
R-Observed - R-Observed or R-value Observed is the R-factor for all reflections that satisfy the resolution limits established for the refinement and meet the criterion for being "observed" {i.e., have a minimum value of amplitude to standard deviation ratio (F/sig(F))}. In contrast the R-work or R-value work excludes a subset of reflections that are used to calculate R-free or R-value free.
R-value Free - See R-free
R-value observed - See R-observed
R-value work - See R-work
R-work - This R-factor is calculated for the majority of reflections in the data set (working data set) and is used in the refinement. A round of refinement typically decreases R-work. When examined along with the R-free it indicates the computation of a correct/true model for the data set. See also R-factor and R-free.
Ramachandran map - see Ramachandran plot
Ramachandran plot - A two-dimensional plot that shows the torsional angles phi (φ) and psi (ψ) of amino acid residues in a polypeptide. Pairs of φ-ψ angles are represented by points in the plot whose axes correspond to -180° and 180° for φ and ψ. The plot is used to assess protein structure quality and areas in a structure with geometric problems and steric clashes.
RCSB - Research Collaboratory for Structural Bioinformatics operates the US data center for the global PDB archive, and makes PDB data available at no charge to all data consumers without limitations on usage. It is a founding member of the wwPDB and collaborates with multiple data resources including the Structural Biology Knowledgebase, Nucleic Acid Database, and EMDataBank. It makes macromolecular structural data, tools and resources to explore these structures at no charge to all data consumers without limitations on usage.
RCSB PDB 101 - Online portal for teachers, students, and the general public to promote education and exploration of protein and nucleic acid structure and function.
receptor - Molecules on a cells surface that bind macromolecular or small molecule ligands (including hormones, neurotransmitters, drugs, toxins, virions, and intracellular messengers) to initiate a physiological response.
reciprocal space - A concept that is critical for the interpretation of X-ray diffraction patterns. The diffraction space is usually called the reciprocal space because of the inverse relationship between the real spacing in the object and the angle of diffraction. The further out you go in reciprocal space, the more the diffraction pattern becomes sensitive to objects that are in close proximity in "real space". This reflects the principal of Fourier transforms that linear scaling in one space corresponds to the inverse scaling in the other space. Short distances in real space become long distances in reciprocal space, and vice versa.
refinement (NMR) - The process of improving the agreement between the molecular model and the experimental NMR data. Simulated annealing or some other molecular dynamics program is used to fold the initial model using simulated forces that impose correct bond lengths and angles, provide weak van der Waals repulsion, and satisfy the restraints derived from NMR. The resulting model is examined for problems such as van der Waals collisions, and for large deviations from conformational restraints. Models that have such problems are eliminated. The whole simulated folding process is repeated until a number of models (an ensemble) are identified that make chemical sense are are consistent with the NMR-derived restraints.
refinement (X-ray) - The process of improving the agreement between the molecular model and the experimental diffraction data. Fitting models to the electron density map is an iterative process. The initial model contains phase and other errors, and certain measures are taken during the refinement process to correct these errors. For example, certain constraints and restraints may be imposed on the model during refinement. The success of refinement is indicated in part by a decreasing R-factor and the elimination of residues from unfavorable regions in the Ramachandran plot.
Regex - REGular EXpression is a sequence of characters for advanced searching of large texts that looks for specific patterns to extract or manipulate. For example they are useful in bioinformatics for sequence motif searches.
relaxation - Occurs when a chemical or biological system is perturbed by a rapid change in external conditions (e.g., temperature or pressure); the system response as it relaxes to the new equilibrium reveals the nature of the reaction kinetics.
representations - Different modes for depiction of atomic coordinates of a structure in a visualization software. (See also Ball and stick, Cartoons or Ribbons, Spacefill, Gaussian surface, putty).
representative conformer - A single model chosen from an ensemble (e.g., a set of NMR structures) as being representative of the set.
residue - Single molecular unit within a polymer. Can refer to a specific amino acid in a polypeptide, a sugar within a carbohydrate chain, or a nucleotide that forms the basic structural unit of nucleic acids.
resolution - Defined in X-ray crystallography as the highest resolvable peak in the diffraction pattern, whereas in cryo-EM resolution is based on a frequency space comparison of two halves of the data.
resonance assignments - Key step in NMR structure determination that involves linking resonance peaks to individual residues of a target protein sequence, which then allows for establishing intra- and inter-residue spatial relationships between atoms.
restraint (NMR) - The atomic distances and conformational angles that are determined from NMR couplings or correlations. These restraints specify which pairs of atoms are near each other through bonds or through space. The building of an NMR model complies with these restraints.
restraint (X-ray) - During refinement of a crystallographic model, a restraint or condition may be applied to specific parameters. For example, the condition that all bond lengths and bond angles fall within a certain range of values.
Revision History - The revision history shows the details of revisions made to an entry's mmCIF/PDBx file. These details have been publicly recorded in the category PDBX_VERSION since July 2011. These descriptions are more detailed than REVDAT records, which indicate what records have been changed in the PDB format file for the entry.
ribonucleic acid - See RNA
Ribonucleic acid (RNA) - A molecule consisting of four types of ribonucleotides: adenine (A), cytosine (C), guanine (G), and uracil (U); linked together by phosphodiester bonds. There are three major types of RNA: messenger RNA (mRNA), transfer RNA (tRNA), and ribosomal RNA (rRNA). The various types of RNA are responsible for a broad range of structural, genetic, and enzymatic functions critical to the proper functioning of cells and organisms.
ribose - Type of pentose sugar found in RNA
ribosomal RNA - Type of RNA that forms a component of ribosomes that is essential for messenger RNA decoding, peptide-bond formation, and ribosome dynamics during translation. See rRNA.
ribosome - Particle composed of ribosomal RNAs and proteins that is the site of protein synthesis in cells.
RMS deviations (NMR) - A measure for the precision of the NMR-derived models within an ensemble. A smaller value indicates similarity, while a higher value indicates divergence or difference in the structures.
ROESY - See Rotating frame nuclear Overhauser Effect Spectroscopy (ROESY)
Root-Mean-Square Deviation (RMSD) of atomic position - Most commonly used quantitative measure of the similarity between two superimposed atomic coordinates. A smaller value indicates similarity, while a higher value indicates divergence or difference in the structures.
rotamer (amino acid) - Conformation of a protein side chain arising from rotations around single bonds. Each amino acid has a set of preferred rotations that are defined by chi angles.
rotamer outlier - Amino acid with a rotamer conformation that lies outside the contours of the reference dataset, i.e., has an unusual rotamer conformation.
Rotating frame nuclear Overhauser Effect SpectroscopY (ROESY) - An NMR technique that provides information about the distances between hydrogen nuclei to help determine protein structure. The ROESY method is similar to the NOESY, and yields through space correlations but is better for mid sized molecules with molecular masses of 1000 to 5000 Da.
rRNA - One of the structural and functional components of ribosomes.
RSCC - Real Space Correlation Coefficient, a measure of structure quality for X-ray structures
RSRZ outliers - The real-space R-value Z-score (RSRZ) is a resolution-dependent, residue-specific measure of the quality of fit between a part of an atomic model and the data in real space. RSRZ is calculated only for standard amino acids, as well as nucleotides in DNA and RNA chains. A residue is considered an RSRZ outlier if its RSRZ value is greater than 2.
saccharide - An organic compound containing sugar. Comes from the Greek word σάκχαρον (sákkharon), meaning "sugar".
SAD - See "Single-wavelength anomalous dispersion"
SAS - Small Angle Scattering
SCOP - The Structural Classification of Proteins (SCOP) database aims to provide detailed, comprehensive descriptions of the structural and evolutionary relationships between all proteins for which a structure is known.
secondary structure - Arrangement of polypeptides or polynucleotides into locally organized folded units, stabilized by the formation of intra-molecular hydrogen bonds. Main types of secondary structure include alpha helices, beta sheets, DNA double helix structures, and RNA stem-loops.
selenoproteins - Proteins that contain selenium in the form of selenocysteine that is incorporated co-translationally and perform a variety of cellular functions, including a first line of defense against oxidants.
sequence alignment - A method for aligning sequences of protein or nucleic acid to identify regions of similarity that may indicate structural, functional, and/or evolutionary relationships.
serine (Ser, S) - Alpha amino acid with a uncharged, polar side chain containing a hydroxyl group.
side chain - Atom or group of atoms attached to the central carbon (alpha carbon) of an amino acid. Also referred to as R groups, side chains vary in size, shape, charge, hydrophobicity, and reactivity and thus are the defining feature of an amino acid.
side chain outliers - Amino acids whose side chain conformations are unusual for the given residue type for which the assessment is available.
SIFTS - Structure Integration with Function, Taxonomy and Sequence (SIFTS) is a residue-level mapping between the UniProt data resource and PDB.
signal transduction - The process involving the transmission of chemical, electrical, or biological signals into and within a cell.
similarity (homology) search - A method for predicting the structure and function of a newly sequenced gene.This method is based on the detection of significant sequence similarity to a protein of known structure and function, or of a sequence pattern that is characteristic of a protein family.
Simplified Molecular-Input Line-Entry system (SMILES) - Encodes the connectivity and stereochemistry of a molecule as a line of text.
Single-wavelength Anomalous Dispersion (SAD) - A method to determine the phases in protein crystallography. Crystals for this experiment either naturally include one or more atoms that display anomalous scattering (e.g., Se, Cl). A tuneable radiation source is required for recording high anomalous signals from intrinsic anomalous scatterers present in macromolecules. The dataset is measured at a single wavelength for which the X-ray fluorescence and the anomalous part f′ of the scattering factor of a natural or artificially incorporated anomalous scatterer is collected, at or near a maximum. By looking at small differences in symmetrical reflections in the diffraction pattern, the phases may be estimated directly.
Small angle-scattering (SAS) - An analytical technique that yields low-resolution information on the size and shape of complex macromolecules in solution by detecting their elastically scattered X-ray signals at small angles (typically 0.1°-10°).
solid state NMR - A special type of NMR to investigate the chemical composition, local structure, and dynamic properties of solids and semi-solids.
solution NMR - NMR experiments performed in a liquid or liquid crystal phase. See NMR.
solvent accessible surface area - The surface area of a biomolecule that is exposed to solvent in its native, folded state. This value helps estimate how much of the folded protein molecule is available to interact with other molecules or ligands.
solvent content (of crystal) - The fraction of the crystal volume occupied by solvent.
space group - This is a description in three-dimensional space that combines a crystal lattice (e.g., monoclinic, orthorhombic, tetragonal) with symmetry elements such as rotations and translations. Determination of the space-group of a crystal is an essential step in X-ray crystallography because it helps define the smallest portion of the crystal (asymmetric unit), whose structure must be determined in order to generate the structure of the entire crystal. There are 230 unique 3-dimensional space groups.
spacefill (Representation) - A representation of molecules where each atom is shown as a sphere having a radius proportional to size of that atom. This representation is useful for observing how much space a molecule fills.
stereochemistry - The study of how the shape of a molecule affects its chemistry. The branch of chemistry that involves the study of the relative spatial arrangement of atoms in a molecule. The study of stereochemistry focuses on stereoisomers, which are molecules with the same molecular formula and sequence of bonded atoms (constitution), but differ in the three-dimensional orientations of their atoms in space.
structural biology - A field of study that focuses on the determination of the three-dimensional structures of biomolecules in order to better understand their function.
structural genomics - A field of study dedicated to determining a large number of protein structures based on gene sequences. The goal is to be able to generate approximate structural models of any protein based on its nucleic acid sequence, and thereby infer its biological function.
structural parameters - Refers to the bond lengths, bond angles, and tortional angles in a model.
structure prediction - The use of algorithms to predict the secondary, tertiary and sometimes even quaternary structure of proteins from their sequences. Since the folded tertiary structure of a protein governs how it functions, being able to predict a protein's structure from its sequence is useful for inferring its biological function.
structure-based drug design - A methodology for designing new drugs that uses the three-dimensional structure of a target molecule to guide the design of the drug molecule.
substrate - A molecule that binds to an enzyme and is subsequently chemically modified.
subunit - One of the identical or non-identical protein molecules that make up a multimeric protein; for example, one of the ribonucleoprotein complexes that make up the ribosome.
sulfation (protein) - An irreversible post-translational modification of a protein that involves the addition of sulfate group to tyrosine side chains.
synchrotron - A large instrument used to generate very intense X-rays. Synchrotrons have been key in providing intense X-rays with tunable wavelengths, leading to shorter data collection time in X-ray diffraction experiments.
taxonomy - The scientific study of naming, defining and classifying groups of biological organisms based on shared characteristics.
temperature factor - A measure of how much the position of an atom deviates from that given in the atomic coordinates. This deviation is due to thermal motion and crystal imperfections. Temperature factors are also termed B-factors.
tertiary structure - The arrangement of the secondary structure elements of a protein arising from interactions of the side chains including the formation of disulfide bonds between cysteine residues as well as non-covalent forces.
tetrahedral symmetry - A type of symmetry where twelve identical subunits are distributed on a tetrahedron, related to one another via 4 three-fold rotational axes (through tetrahedral vertices and the faces opposite them) and 3 two-fold rotational axes (through tetrahedral edges). See Symmetry Resources in the PDB.
tetramer - Composed of 4 units. In the context of a protein - this may refer to an assembly of 4 polymer chains or subunits.
theoretical model - Theoretical models are produced by computational methods without the existence of an actual sample of the molecules whose structure is being solved. They may include homology models, models derived from simulations of folding and molecular dynamics, and "docking" experiments, in which researchers explore possible modes of interaction between experimental models (for example, protein-protein, protein-nucleic acid, or protein-ligand binding).
threonine (Thr, T) - Alpha amino acid with a uncharged, polar side chain containing a hydroxyl group.
thymine (T) - A nitrogenous base (pyrimidine) that occurs in DNA nucleotides and pairs with adenine through hydrogen bonds.
TOCSY - See TOtal Correlation SpectroscopY
torsion angle - See dihedral angle
TOtal Correlation SpectroscopY - An NMR technique for correlating all the spins that are mutually coupled. It is useful in identifying spin systems such as each individual amino acid residue. (Also called TOCSY).
transcription - The process of copying genetic information from the DNA by the synthesis of a complementary single strand of mRNA. It is performed by the RNA polymerase.
transfer RNA - See tRNA
transgenic animals - Animals (usually mice) that have had a foreign gene inserted into their genome by recombinant DNA technology.
translation - The process in which the information encoded in mRNA is read and the corresponding sequence of amino acids is synthesized. This process is perfomed by tRNA and the ribosome (containing rRNA).
transmembrane protein - A protein that spans the cell membrane. These proteins may have hydrophobic regions that are compatible with the lipid nature of the membrane as well as hydrophilic regioins that are compatible with the aqueous environment both inside and outside the cell.
trimer - Composed of 3 units. In the context of a protein - this may refer to an assembly of 3 polymer chains or subunits.
tRNA - RNA molecule that carries amino acids to the ribosome during protein synthesis.
tryptophan (Trp, W) - An alpha amino acid with a uncharged, non-polar side chain that contains a heterocyclic aromatic group. This amino acid is considered an essential amino acid (i.e., it can not be synthesized in the human body so must be obtained from the diet).
tyrosine (Tyr, Y) - An alpha amino acid with a uncharged, side chain containing an aromatic phenolic group.
unexplained density (X-ray) - Experimentally determined electron density regions that are unaccounted for, after all known contents of the crystal have been located. These empty density features may be represent ions present in the sample solution/buffer, reagents/detergents used in the purification or crystallization of the protein/complex, or small molecules/cofactors that co-purified with the protein(s) of interest.
UniProt - The Universal Protein Resource (UniProt) is a comprehensive resource for protein sequence and annotation data. Each UniProt entry has a specific identifier (UniProt ID), and contains the sequence of the protein along with various annotations of its features and descriptions of its known function(s).
unit cell - The minimum repeat unit within a crystal which can be used to generate the entire crystal structure using only translation. Fractional coordinates are used to describe positions in the unit cells. These are fractions of a, b, and c (the axes) and these coordinates are similar to the x, y, and z of cartesian coordinates. Several unit cells can often be defined for the same arrangement. However, the convention is to choose the unit cell with the highest symmetry.
uracil (U) - A nitrogenous base (pyrimidine) that occurs in RNA nucleotides and pairs with adenine through two hydrogen bonds.
Validation clashes (Mol*, Representation) - Atoms are shown with representations of interatomic steric clashes (overlaps between their van der Waals shells) identified during structure validation.
valine (Val, V) - Alpha amino acid with a non-polar, hydrophobic side chain with an isopropyl group.
van der Waals interactions - Weak attractive and repulsive interactions, occurring between uncharged atoms due to attraction and repulsion between permanent or transient electric dipole moments. It was named after the Dutch physicist Johannes van der Waals. The term is sometimes used loosely for the totality of nonspecific attractive or repulsive inter-molecular forces.
van der Waals radius - The radius of the electron cloud surrounding an atom that defines how close another atom or molecule can approach.
virus - Submicroscopic infectious agent that replicates only inside the living cells of an organism. Viruses infect all types of life forms, from animals and plants to microorganisms, including bacteria and archaea. It consists of proteins and either DNA or RNA.
wireframe (Representation) - A molecular representation where atoms are shown as dots connected by lines representing covalent bonds.
Worldwide Protein Data Bank - This organization manages the PDB archive and ensures that the PDB is freely and publicly available to the global community.
wwPDB - Worldwide PDB, international consortium of data centers that manages the PDB archive
X-ray - This is an electromagnetic radiation with a wavelength range of 0.01–10 nm and corresponding frequencies in the range of 3×1016–3×1019 Hz. While X-rays are used to image body parts such as bones and some soft tissues, it is also used to study the atomic and molecular structures of various crystalline materials such as minerals, metals, and crystals of biological molecules. See also X-ray crystallography.
X-ray crystallography - An experimental method used to determine detailed atomic and molecular three-dimensional structures. The experiment is done by shining an X-ray beam on a crystal of the molecule(s) being studied and carefully analyzing the resulting scattered rays. Learn more about Methods for Determining Atomic Structures.
XFEL - X-ray Free Electron Laser
Z DNA - A left-handed form of DNA present under physiological conditions It contains short GC segments that are methylated and may play a role in regulating gene expression in eukaryotes.
Z value - Refers to the number of molecules in the crystallographic unit cell. It is dependent on the symmetry of the crystal. Increasing the unit cell axis or changing from a primative cell to a centered cell can change the Z value.