chembl_23 Schema Documentation

ACTION_TYPE:

Table storing the distinct list of action types used in the drug_mechanism table, together with a higher-level parent action type.

KEYSCOLUMN_NAMEDATA_TYPENULLABLECOMMENT
PKACTION_TYPEVARCHAR2(50)NOT NULLPrimary key. Type of action of the drug e.g., agonist, antagonist
DESCRIPTIONVARCHAR2(200)NOT NULLDescription of how the action type is used
PARENT_TYPEVARCHAR2(50)Higher-level grouping of action types e.g., positive vs negative action

ACTIVITIES:

Activity 'values' or 'end points' that are the results of an assay recorded in a scientific document. Each activity is described by a row.

KEYSCOLUMN_NAMEDATA_TYPENULLABLECOMMENT
PKACTIVITY_IDNUMBER(11,0)NOT NULLUnique ID for the activity row
FKASSAY_IDNUMBER(9,0)NOT NULLForeign key to the assays table (containing the assay description)
FKDOC_IDNUMBER(9,0)Foreign key to documents table (for quick lookup of publication details - can also link to documents through compound_records or assays table)
FKRECORD_IDNUMBER(9,0)NOT NULLForeign key to the compound_records table (containing information on the compound tested)
FKMOLREGNONUMBER(9,0)Foreign key to compounds table (for quick lookup of compound structure - can also link to compounds through compound_records table)
STANDARD_RELATIONVARCHAR2(50)Symbol constraining the activity value (e.g. >, <, =)
PUBLISHED_VALUENUMBERDatapoint value as it appears in the original publication.
PUBLISHED_UNITSVARCHAR2(100)Units of measurement as they appear in the original publication
STANDARD_VALUENUMBERSame as PUBLISHED_VALUE but transformed to common units: e.g. mM concentrations converted to nM.
STANDARD_UNITSVARCHAR2(100)Selected 'Standard' units for data type: e.g. concentrations are in nM.
STANDARD_FLAGNUMBER(1,0)Shows whether the standardised columns have been curated/set (1) or just default to the published data (0).
STANDARD_TYPEVARCHAR2(250)Standardised version of the published_activity_type (e.g. IC50 rather than Ic-50/Ic50/ic50/ic-50)
ACTIVITY_COMMENTVARCHAR2(4000)Describes non-numeric activities i.e. 'Slighty active', 'Not determined'
PUBLISHED_TYPEVARCHAR2(250)Type of end-point measurement: e.g. IC50, LD50, %inhibition etc, as it appears in the original publication
FKDATA_VALIDITY_COMMENTVARCHAR2(30)Comment reflecting whether the values for this activity measurement are likely to be correct - one of 'Manually validated' (checked original paper and value is correct), 'Potential author error' (value looks incorrect but is as reported in the original paper), 'Outside typical range' (value seems too high/low to be correct e.g., negative IC50 value), 'Non standard unit type' (units look incorrect for this activity type).
POTENTIAL_DUPLICATENUMBER(1,0)When set to 1, indicates that the value is likely to be a repeat citation of a value reported in a previous ChEMBL paper, rather than a new, independent measurement. Note: value of zero does not guarantee that the measurement is novel/independent though
PUBLISHED_RELATIONVARCHAR2(50)Symbol constraining the activity value (e.g. >, <, =), as it appears in the original publication
PCHEMBL_VALUENUMBER(4,2)Negative log of selected concentration-response activity values (IC50/EC50/XC50/AC50/Ki/Kd/Potency)
FKBAO_ENDPOINTVARCHAR2(11)ID for the corresponding result type in BioAssay Ontology (based on standard_type)
UO_UNITSVARCHAR2(10)ID for the corresponding unit in Unit Ontology (based on standard_units)
QUDT_UNITSVARCHAR2(70)ID for the corresponding unit in QUDT Ontology (based on standard_units)

ACTIVITY_STDS_LOOKUP:

Table storing details of standardised activity types and units, with permitted value ranges. Used to determine the correct standard_type and standard_units for a range of published types/units.

KEYSCOLUMN_NAMEDATA_TYPENULLABLECOMMENT
PKSTD_ACT_IDNUMBER(9,0)NOT NULLPrimary key.
UKSTANDARD_TYPEVARCHAR2(250)NOT NULLThe standard_type that other published_types in the activities table have been converted to.
DEFINITIONVARCHAR2(500)A description/definition of the standard_type.
UKSTANDARD_UNITSVARCHAR2(100)NOT NULLThe units that are applied to this standard_type and to which other published_units are converted. Note a standard_type may have more than one allowable standard_unit and therefore multiple rows in this table.
NORMAL_RANGE_MINNUMBER(24,12)The lowest value for this activity type that is likely to be genuine. This is only an approximation, so lower genuine values may exist, but it may be desirable to validate these before using them. For a given standard_type/units, values in the activities table below this threshold are flagged with a data_validity_comment of 'Outside typical range'.
NORMAL_RANGE_MAXNUMBER(24,12)The highest value for this activity type that is likely to be genuine. This is only an approximation, so higher genuine values may exist, but it may be desirable to validate these before using them. For a given standard_type/units, values in the activities table above this threshold are flagged with a data_validity_comment of 'Outside typical range'.

ASSAY_PARAMETERS:

Table storing additional parameters for an assay e.g., dose, administration route, time point

KEYSCOLUMN_NAMEDATA_TYPENULLABLECOMMENT
PKASSAY_PARAM_IDNUMBER(9,0)NOT NULLNumeric primary key
FK,UKASSAY_IDNUMBER(9,0)NOT NULLForeign key to assays table. The assay to which this parameter belongs
FK,UKPARAMETER_TYPEVARCHAR2(200)NOT NULLForeign key to parameter_type table, defining the meaning of the parameter
PARAMETER_VALUEVARCHAR2(2000)NOT NULLThe value of the particular parameter

ASSAY_TYPE:

Description of assay types (e.g., Binding, Functional, ADMET)

KEYSCOLUMN_NAMEDATA_TYPENULLABLECOMMENT
PKASSAY_TYPEVARCHAR2(1)NOT NULLSingle character representing assay type
ASSAY_DESCVARCHAR2(250)Description of assay type

ASSAYS:

Table storing a list of the assays that are reported in each document. Similar assays from different publications will appear as distinct assays in this table.

KEYSCOLUMN_NAMEDATA_TYPENULLABLECOMMENT
PKASSAY_IDNUMBER(9,0)NOT NULLUnique ID for the assay
FKDOC_IDNUMBER(9,0)NOT NULLForeign key to documents table
DESCRIPTIONVARCHAR2(4000)Description of the reported assay
FKASSAY_TYPEVARCHAR2(1)Assay classification, e.g. B=Binding assay, A=ADME assay, F=Functional assay
ASSAY_TEST_TYPEVARCHAR2(20)Type of assay system (i.e., in vivo or in vitro)
ASSAY_CATEGORYVARCHAR2(20)screening, confirmatory (ie: dose-response), summary, panel or other.
ASSAY_ORGANISMVARCHAR2(250)Name of the organism for the assay system (e.g., the organism, tissue or cell line in which an assay was performed). May differ from the target organism (e.g., for a human protein expressed in non-human cells, or pathogen-infected human cells).
ASSAY_TAX_IDNUMBER(11,0)NCBI tax ID for the assay organism.
ASSAY_STRAINVARCHAR2(200)Name of specific strain of the assay organism used (where known)
ASSAY_TISSUEVARCHAR2(100)Name of tissue used in the assay system (e.g., for tissue-based assays) or from which the assay system was derived (e.g., for cell/subcellular fraction-based assays).
ASSAY_CELL_TYPEVARCHAR2(100)Name of cell type or cell line used in the assay system (e.g., for cell-based assays).
ASSAY_SUBCELLULAR_FRACTIONVARCHAR2(100)Name of subcellular fraction used in the assay system (e.g., microsomes, mitochondria).
FKTIDNUMBER(9,0)Target identifier to which this assay has been mapped. Foreign key to target_dictionary. From ChEMBL_15 onwards, an assay will have only a single target assigned.
FKRELATIONSHIP_TYPEVARCHAR2(1)Flag indicating of the relationship between the reported target in the source document and the assigned target from TARGET_DICTIONARY. Foreign key to RELATIONSHIP_TYPE table.
FKCONFIDENCE_SCORENUMBER(1,0)Confidence score, indicating how accurately the assigned target(s) represents the actually assay target. Foreign key to CONFIDENCE_SCORE table. 0 means uncurated/unassigned, 1 = low confidence to 9 = high confidence.
FKCURATED_BYVARCHAR2(32)Indicates the level of curation of the target assignment. Foreign key to curation_lookup table.
FKSRC_IDNUMBER(3,0)NOT NULLForeign key to source table
SRC_ASSAY_IDVARCHAR2(50)Identifier for the assay in the source database/deposition (e.g., pubchem AID)
FK,UKCHEMBL_IDVARCHAR2(20)NOT NULLChEMBL identifier for this assay (for use on web interface etc)
FKCELL_IDNUMBER(9,0)Foreign key to cell dictionary. The cell type or cell line used in the assay
FKBAO_FORMATVARCHAR2(11)ID for the corresponding format type in BioAssay Ontology (e.g., cell-based, biochemical, organism-based etc)
FKTISSUE_IDNUMBER(9,0)ID for the corresponding tissue/anatomy in Uberon. Foreign key to tissue_dictionary
FKVARIANT_IDNUMBER(9,0)Foreign key to variant_sequences table. Indicates the mutant/variant version of the target used in the assay (where known/applicable)

ATC_CLASSIFICATION:

WHO ATC Classification for drugs

KEYSCOLUMN_NAMEDATA_TYPENULLABLECOMMENT
WHO_NAMEVARCHAR2(2000)WHO/INN name for the compound
LEVEL1VARCHAR2(10)First level of classification
LEVEL2VARCHAR2(10)Second level of classification
LEVEL3VARCHAR2(10)Third level of classification
LEVEL4VARCHAR2(10)Fourth level of classification
PKLEVEL5VARCHAR2(10)NOT NULLComplete ATC code for compound
LEVEL1_DESCRIPTIONVARCHAR2(2000)Description of first level of classification
LEVEL2_DESCRIPTIONVARCHAR2(2000)Description of second level of classification
LEVEL3_DESCRIPTIONVARCHAR2(2000)Description of third level of classification
LEVEL4_DESCRIPTIONVARCHAR2(2000)Description of fourth level of classification

BINDING_SITES:

Table storing details of binding sites for a target. A target may have multiple sites defined in this table.

KEYSCOLUMN_NAMEDATA_TYPENULLABLECOMMENT
PKSITE_IDNUMBER(9,0)NOT NULLPrimary key. Unique identifier for a binding site in a given target.
SITE_NAMEVARCHAR2(200)Name/label for the binding site.
FKTIDNUMBER(9,0)Foreign key to target_dictionary. Target on which the binding site is found.

BIO_COMPONENT_SEQUENCES:

Table storing the sequences for biotherapeutic drugs (e.g., monoclonal antibodies, recombinant proteins, peptides etc. For multi-chain biotherapeutics (e.g., mAbs) each chain is stored here as a separate component.

KEYSCOLUMN_NAMEDATA_TYPENULLABLECOMMENT
PKCOMPONENT_IDNUMBER(9,0)NOT NULLPrimary key. Unique identifier for each of the molecular components of biotherapeutics in ChEMBL (e.g., antibody chains, recombinant proteins, synthetic peptides).
COMPONENT_TYPEVARCHAR2(50)NOT NULLType of molecular component (e.g., 'PROTEIN','DNA','RNA').
DESCRIPTIONVARCHAR2(200)Description/name of molecular component.
SEQUENCECLOBSequence of the biotherapeutic component.
SEQUENCE_MD5SUMVARCHAR2(32)MD5 checksum of the sequence.
TAX_IDNUMBER(11,0)NCBI tax ID for the species from which the sequence is derived. May be null for humanized monoclonal antibodies, synthetic peptides etc.
ORGANISMVARCHAR2(150)Name of the species from which the sequence is derived.

BIOASSAY_ONTOLOGY:

Lookup table providing labels for Bioassay Ontology terms used in assays and activities tables

KEYSCOLUMN_NAMEDATA_TYPENULLABLECOMMENT
PKBAO_IDVARCHAR2(11)NOT NULLBioassay Ontology identifier (BAO version 2.0)
LABELVARCHAR2(100)NOT NULLBioassay Ontology label for the term (BAO version 2.0)

BIOTHERAPEUTIC_COMPONENTS:

Links each biotherapeutic drug (in the biotherapeutics table) to its component sequences (in the bio_component_sequences table). A biotherapeutic drug can have multiple components and hence multiple rows in this table. Similarly, a particular component sequence can be part of more than one drug.

KEYSCOLUMN_NAMEDATA_TYPENULLABLECOMMENT
PKBIOCOMP_IDNUMBER(9,0)NOT NULLPrimary key.
FK,UKMOLREGNONUMBER(9,0)NOT NULLForeign key to the biotherapeutics table, indicating which biotherapeutic the component is part of.
FK,UKCOMPONENT_IDNUMBER(9,0)NOT NULLForeign key to the bio_component_sequences table, indicating which component is part of the biotherapeutic.

BIOTHERAPEUTICS:

Stores sequence information/descriptions for protein therapeutics, including recombinant proteins, peptides and antibodies

KEYSCOLUMN_NAMEDATA_TYPENULLABLECOMMENT
PK,FKMOLREGNONUMBER(9,0)NOT NULLForeign key to molecule_dictionary
DESCRIPTIONVARCHAR2(2000)Description of the biotherapeutic.
HELM_NOTATIONVARCHAR2(4000)Sequence notation generated according to the HELM standard (http://www.openhelm.org/home). Currently for peptides only

CELL_DICTIONARY:

Table storing information for cell lines in the target_dictionary (e.g., tissue/species origin). Cell_name should match pref_name in target_dictionary.

KEYSCOLUMN_NAMEDATA_TYPENULLABLECOMMENT
PKCELL_IDNUMBER(9,0)NOT NULLPrimary key. Unique identifier for each cell line in the target_dictionary.
UKCELL_NAMEVARCHAR2(50)NOT NULLName of each cell line (as used in the target_dicitonary pref_name).
CELL_DESCRIPTIONVARCHAR2(200)Longer description (where available) of the cell line.
CELL_SOURCE_TISSUEVARCHAR2(50)Tissue from which the cell line is derived, where known.
CELL_SOURCE_ORGANISMVARCHAR2(150)Name of organism from which the cell line is derived.
UKCELL_SOURCE_TAX_IDNUMBER(11,0)NCBI tax ID of the organism from which the cell line is derived.
CLO_IDVARCHAR2(11)ID for the corresponding cell line in Cell Line Ontology
EFO_IDVARCHAR2(12)ID for the corresponding cell line in Experimental Factory Ontology
CELLOSAURUS_IDVARCHAR2(15)ID for the corresponding cell line in Cellosaurus Ontology
CL_LINCS_IDVARCHAR2(8)Cell ID used in LINCS (Library of Integrated Network-based Cellular Signatures)
FK,UKCHEMBL_IDVARCHAR2(20)ChEMBL identifier for the cell (used in web interface etc)

CHEMBL_ID_LOOKUP:

Lookup table storing chembl identifiers for different entities in the database (assays, compounds, documents and targets)

KEYSCOLUMN_NAMEDATA_TYPENULLABLECOMMENT
PKCHEMBL_IDVARCHAR2(20)NOT NULLChEMBL identifier
UKENTITY_TYPEVARCHAR2(50)NOT NULLType of entity (e.g., COMPOUND, ASSAY, TARGET)
UKENTITY_IDNUMBER(9,0)NOT NULLPrimary key for that entity in corresponding table (e.g., molregno for compounds, tid for targets)
STATUSVARCHAR2(10)NOT NULLIndicates whether the status of the entity within the database - ACTIVE, INACTIVE (downgraded), OBS (obsolete/removed).

COMPONENT_CLASS:

Links protein components of targets to the protein_family_classification table. A protein can have more than one classification (e.g., Membrane receptor and Enzyme).

KEYSCOLUMN_NAMEDATA_TYPENULLABLECOMMENT
FK,UKCOMPONENT_IDNUMBER(9,0)NOT NULLForeign key to component_sequences table.
FK,UKPROTEIN_CLASS_IDNUMBER(9,0)NOT NULLForeign key to the protein_classification table.
PKCOMP_CLASS_IDNUMBER(9,0)NOT NULLPrimary key.

COMPONENT_DOMAINS:

Links protein components of targets to the structural domains they contain (from the domains table). Contains information showing the start and end position of the domain in the component sequence.

KEYSCOLUMN_NAMEDATA_TYPENULLABLECOMMENT
PKCOMPD_IDNUMBER(9,0)NOT NULLPrimary key.
FK,UKDOMAIN_IDNUMBER(9,0)Foreign key to the domains table, indicating the domain that is contained in the associated molecular component.
FK,UKCOMPONENT_IDNUMBER(9,0)NOT NULLForeign key to the component_sequences table, indicating the molecular_component that has the given domain.
UKSTART_POSITIONNUMBER(5,0)Start position of the domain within the sequence given in the component_sequences table.
END_POSITIONNUMBER(5,0)End position of the domain within the sequence given in the component_sequences table.

COMPONENT_GO:

Table mapping protein components in the COMPONENT_SEQUENCES table to the GO slim terms stored in the GO_CLASSIFICATION table

KEYSCOLUMN_NAMEDATA_TYPENULLABLECOMMENT
PKCOMP_GO_IDNUMBER(9,0)NOT NULLPrimary key
FK,UKCOMPONENT_IDNUMBER(9,0)NOT NULLForeign key to COMPONENT_SEQUENCES table. The protein component this GO term applies to
FK,UKGO_IDVARCHAR2(10)NOT NULLForeign key to the GO_CLASSIFICATION table. The GO term that this protein is mapped to

COMPONENT_SEQUENCES:

Table storing the sequences for components of molecular targets (e.g., protein sequences), along with other details taken from sequence databases (e.g., names, accessions). Single protein targets will have a single protein component in this table, whereas protein complexes/protein families will have multiple protein components.

KEYSCOLUMN_NAMEDATA_TYPENULLABLECOMMENT
PKCOMPONENT_IDNUMBER(9,0)NOT NULLPrimary key. Unique identifier for the component.
COMPONENT_TYPEVARCHAR2(50)Type of molecular component represented (e.g., 'PROTEIN','DNA','RNA').
UKACCESSIONVARCHAR2(25)Accession for the sequence in the source database from which it was taken (e.g., UniProt accession for proteins).
SEQUENCECLOBA representative sequence for the molecular component, as given in the source sequence database (not necessarily the exact sequence used in the assay).
SEQUENCE_MD5SUMVARCHAR2(32)MD5 checksum of the sequence.
DESCRIPTIONVARCHAR2(200)Description/name for the molecular component, usually taken from the source sequence database.
TAX_IDNUMBER(11,0)NCBI tax ID for the sequence in the source database (i.e., species that the protein/nucleic acid sequence comes from).
ORGANISMVARCHAR2(150)Name of the organism the sequence comes from.
DB_SOURCEVARCHAR2(25)The name of the source sequence database from which sequences/accessions are taken. For UniProt proteins, this field indicates whether the sequence is from SWISS-PROT or TREMBL.
DB_VERSIONVARCHAR2(10)The version of the source sequence database from which sequences/accession were last updated.

COMPONENT_SYNONYMS:

Table storing synonyms for the components of molecular targets (e.g., names, acronyms, gene symbols etc.) Please note: EC numbers are also currently included in this table although they are not strictly synonyms and can apply to multiple proteins.

KEYSCOLUMN_NAMEDATA_TYPENULLABLECOMMENT
PKCOMPSYN_IDNUMBER(9,0)NOT NULLPrimary key.
FK,UKCOMPONENT_IDNUMBER(9,0)NOT NULLForeign key to the component_sequences table. The component to which this synonym applies.
UKCOMPONENT_SYNONYMVARCHAR2(500)The synonym for the component.
UKSYN_TYPEVARCHAR2(20)The type or origin of the synonym (e.g., GENE_SYMBOL).

COMPOUND_PROPERTIES:

Table storing calculated physicochemical properties for compounds (note all but FULL_MWT are calculated on the parent structure)

KEYSCOLUMN_NAMEDATA_TYPENULLABLECOMMENT
PK,FKMOLREGNONUMBER(9,0)NOT NULLForeign key to compounds table (compound structure)
MW_FREEBASENUMBER(9,2)Molecular weight of parent compound
ALOGPNUMBER(9,2)Calculated ALogP
HBANUMBER(3,0)Number hydrogen bond acceptors
HBDNUMBER(3,0)Number hydrogen bond donors
PSANUMBER(9,2)Polar surface area
RTBNUMBER(3,0)Number rotatable bonds
RO3_PASSVARCHAR2(3)Indicates whether the compound passes the rule-of-three (mw < 300, logP < 3 etc)
NUM_RO5_VIOLATIONSNUMBER(1,0)Number of violations of Lipinski's rule-of-five, using HBA and HBD definitions
ACD_MOST_APKANUMBER(9,2)The most acidic pKa calculated using ACDlabs v12.01
ACD_MOST_BPKANUMBER(9,2)The most basic pKa calculated using ACDlabs v12.01
ACD_LOGPNUMBER(9,2)The calculated octanol/water partition coefficient using ACDlabs v12.01
ACD_LOGDNUMBER(9,2)The calculated octanol/water distribution coefficient at pH7.4 using ACDlabs v12.01
MOLECULAR_SPECIESVARCHAR2(50)Indicates whether the compound is an acid/base/neutral
FULL_MWTNUMBER(9,2)Molecular weight of the full compound including any salts
AROMATIC_RINGSNUMBER(3,0)Number of aromatic rings
HEAVY_ATOMSNUMBER(3,0)Number of heavy (non-hydrogen) atoms
NUM_ALERTSNUMBER(3,0)Number of structural alerts for QED calculation (as defined by Brenk et al., ChemMedChem 2008)
QED_WEIGHTEDNUMBER(3,2)Weighted quantitative estimate of drug likeness (as defined by Bickerton et al., Nature Chem 2012)
MW_MONOISOTOPICNUMBER(11,4)Monoisotopic parent molecular weight
FULL_MOLFORMULAVARCHAR2(100)Molecular formula for the full compound (including any salt)
HBA_LIPINSKINUMBER(3,0)Number of hydrogen bond acceptors calculated according to Lipinski's original rules (i.e., N + O count))
HBD_LIPINSKINUMBER(3,0)Number of hydrogen bond donors calculated according to Lipinski's original rules (i.e., NH + OH count)
NUM_LIPINSKI_RO5_VIOLATIONSNUMBER(1,0)Number of violations of Lipinski's rule of five using HBA_LIPINSKI and HBD_LIPINSKI counts

COMPOUND_RECORDS:

Represents each compound extracted from scientific documents.

KEYSCOLUMN_NAMEDATA_TYPENULLABLECOMMENT
PKRECORD_IDNUMBER(9,0)NOT NULLUnique ID for a compound/record
FKMOLREGNONUMBER(9,0)Foreign key to compounds table (compound structure)
FKDOC_IDNUMBER(9,0)NOT NULLForeign key to documents table
COMPOUND_KEYVARCHAR2(250)Key text identifying this compound in the scientific document
COMPOUND_NAMEVARCHAR2(4000)Name of this compound recorded in the scientific document
FKSRC_IDNUMBER(3,0)NOT NULLForeign key to source table
SRC_COMPOUND_IDVARCHAR2(150)Identifier for the compound in the source database (e.g., pubchem SID)

COMPOUND_STRUCTURAL_ALERTS:

Table showing which structural alerts (as defined in the STRUCTURAL_ALERTS table) are found in a particular ChEMBL compound. It should be noted some alerts/alert sets are more permissive than others and may flag a large number of compounds. Results should be interpreted with care, depending on the use-case, and not treated as a blanket filter (e.g., around 50% of approved drugs have 1 or more alerts from these sets).

KEYSCOLUMN_NAMEDATA_TYPENULLABLECOMMENT
PKCPD_STR_ALERT_IDNUMBER(9,0)NOT NULLPrimary key.
FK,UKMOLREGNONUMBER(9,0)NOT NULLForeign key to the molecule_dictionary. The compound for which the structural alert has been found.
FK,UKALERT_IDNUMBER(9,0)NOT NULLForeign key to the structural_alerts table. The particular alert that has been identified in this compound.

COMPOUND_STRUCTURES:

Table storing various structure representations (e.g., Molfile, InChI) for each compound

KEYSCOLUMN_NAMEDATA_TYPENULLABLECOMMENT
PK,FKMOLREGNONUMBER(9,0)NOT NULLInternal Primary Key for the compound structure and foreign key to molecule_dictionary table
MOLFILECLOBMDL Connection table representation of compound
UKSTANDARD_INCHIVARCHAR2(4000)IUPAC standard InChI for the compound
UKSTANDARD_INCHI_KEYVARCHAR2(27)NOT NULLIUPAC standard InChI key for the compound
CANONICAL_SMILESVARCHAR2(4000)Canonical smiles, generated using pipeline pilot

CONFIDENCE_SCORE_LOOKUP:

Lookup table describing how assay2target confidence scores are assigned depending on the type of target(s) assigned to the assay and the level of confidence in their molecular identity

KEYSCOLUMN_NAMEDATA_TYPENULLABLECOMMENT
PKCONFIDENCE_SCORENUMBER(1,0)NOT NULL0-9 score showing level of confidence in assignment of the precise molecular target of the assay
DESCRIPTIONVARCHAR2(100)NOT NULLDescription of the target types assigned with each score
TARGET_MAPPINGVARCHAR2(30)NOT NULLShort description of the target types assigned with each score

CURATION_LOOKUP:

Lookup table for assays.curated_by column. Shows level of curation that has been applied to the assay to target mapping.

KEYSCOLUMN_NAMEDATA_TYPENULLABLECOMMENT
PKCURATED_BYVARCHAR2(32)NOT NULLShort description of the level of curation
DESCRIPTIONVARCHAR2(100)NOT NULLDefinition of terms in the curated_by field.

DATA_VALIDITY_LOOKUP:

Table storing information about the data_validity_comment values used in the activities table.

KEYSCOLUMN_NAMEDATA_TYPENULLABLECOMMENT
PKDATA_VALIDITY_COMMENTVARCHAR2(30)NOT NULLPrimary key. Short description of various types of errors/warnings applied to values in the activities table.
DESCRIPTIONVARCHAR2(200)Definition of the terms in the data_validity_comment field.

DEFINED_DAILY_DOSE:

WHO DDD (defined daily dose) information

KEYSCOLUMN_NAMEDATA_TYPENULLABLECOMMENT
FKATC_CODEVARCHAR2(10)NOT NULLATC code for the compound (foreign key to ATC_CLASSIFICATION table)
DDD_UNITSVARCHAR2(200)Units of defined daily dose
DDD_ADMRVARCHAR2(1000)Administration route for dose
DDD_COMMENTVARCHAR2(2000)Comment
PKDDD_IDNUMBER(9,0)NOT NULLInternal primary key
DDD_VALUENUMBER

DOCS:

Holds all scientific documents (journal articles or patents) from which assays have been extracted.

KEYSCOLUMN_NAMEDATA_TYPENULLABLECOMMENT
PKDOC_IDNUMBER(9,0)NOT NULLUnique ID for the document
JOURNALVARCHAR2(50)Abbreviated journal name for an article
YEARNUMBER(4,0)Year of journal article publication
VOLUMEVARCHAR2(50)Volume of journal article
ISSUEVARCHAR2(50)Issue of journal article
FIRST_PAGEVARCHAR2(50)First page number of journal article
LAST_PAGEVARCHAR2(50)Last page number of journal article
PUBMED_IDNUMBER(11,0)NIH pubmed record ID, where available
DOIVARCHAR2(100)Digital object identifier for this reference
FK,UKCHEMBL_IDVARCHAR2(20)NOT NULLChEMBL identifier for this document (for use on web interface etc)
TITLEVARCHAR2(500)Document title (e.g., Publication title or description of dataset)
DOC_TYPEVARCHAR2(50)NOT NULLType of the document (e.g., Publication, Deposited dataset)
AUTHORSVARCHAR2(4000)For a deposited dataset, the authors carrying out the screening and/or submitting the dataset.
ABSTRACTCLOBFor a deposited dataset, a brief description of the dataset.
PATENT_IDVARCHAR2(20)Patent ID for this document

DOMAINS:

Table storing a non-redundant list of domains found in protein targets (e.g., Pfam domains).

KEYSCOLUMN_NAMEDATA_TYPENULLABLECOMMENT
PKDOMAIN_IDNUMBER(9,0)NOT NULLPrimary key. Unique identifier for each domain.
DOMAIN_TYPEVARCHAR2(20)NOT NULLIndicates the source of the domain (e.g., Pfam).
SOURCE_DOMAIN_IDVARCHAR2(20)NOT NULLIdentifier for the domain in the source database (e.g., Pfam ID such as PF00001).
DOMAIN_NAMEVARCHAR2(20)Name given to the domain in the source database (e.g., 7tm_1).
DOMAIN_DESCRIPTIONVARCHAR2(500)Longer name or description for the domain.

DRUG_INDICATION:

Table storing indications for drugs from a variety of sources (e.g., ATC, DailyMed, ClinicalTrials.gov)

KEYSCOLUMN_NAMEDATA_TYPENULLABLECOMMENT
PKDRUGIND_IDNUMBER(9,0)NOT NULLPrimary key
FK,UKRECORD_IDNUMBER(9,0)NOT NULLForeign key to compound_records table. Links to the drug record to which this indication applies
FKMOLREGNONUMBER(9,0)Molregno corresponding to the record_id in the compound_records table
MAX_PHASE_FOR_INDNUMBER(1,0)The maximum phase of development that the drug is known to have reached for this particular indication
UKMESH_IDVARCHAR2(7)NOT NULLMedical Subject Headings (MeSH) disease identifier corresponding to the indication
MESH_HEADINGVARCHAR2(200)NOT NULLMedical Subject Heading term for the MeSH disease ID
UKEFO_IDVARCHAR2(20)Experimental Factor Ontology (EFO) disease identifier corresponding to the indication
EFO_TERMVARCHAR2(200)Experimental Factor Ontology term for the EFO ID

DRUG_MECHANISM:

Table storing mechanism of action information for FDA-approved drugs and WHO anti-malarials

KEYSCOLUMN_NAMEDATA_TYPENULLABLECOMMENT
PKMEC_IDNUMBER(9,0)NOT NULLPrimary key for each drug mechanism of action
FKRECORD_IDNUMBER(9,0)NOT NULLRecord_id for the drug (foreign key to compound_records table)
FKMOLREGNONUMBER(9,0)Molregno for the drug (foreign key to molecule_dictionary table)
MECHANISM_OF_ACTIONVARCHAR2(250)Description of the mechanism of action e.g., 'Phosphodiesterase 5 inhibitor'
FKTIDNUMBER(9,0)Target associated with this mechanism of action (foreign key to target_dictionary table)
FKSITE_IDNUMBER(9,0)Binding site for the drug within the target (where known) - foreign key to binding_sites table
FKACTION_TYPEVARCHAR2(50)Type of action of the drug on the target e.g., agonist/antagonist etc (foreign key to action_type table)
DIRECT_INTERACTIONNUMBER(1,0)Flag to show whether the molecule is believed to interact directly with the target (1 = yes, 0 = no)
MOLECULAR_MECHANISMNUMBER(1,0)Flag to show whether the mechanism of action describes the molecular target of the drug, rather than a higher-level physiological mechanism e.g., vasodilator (1 = yes, 0 = no)
DISEASE_EFFICACYNUMBER(1,0)Flag to show whether the target assigned is believed to play a role in the efficacy of the drug in the indication(s) for which it is approved (1 = yes, 0 = no)
MECHANISM_COMMENTVARCHAR2(500)Additional comments regarding the mechanism of action
SELECTIVITY_COMMENTVARCHAR2(100)Additional comments regarding the selectivity of the drug
BINDING_SITE_COMMENTVARCHAR2(100)Additional comments regarding the binding site of the drug

FORMULATIONS:

Table linking individual ingredients in approved products (and their strengths) to entries in the molecule dictionary. Where products are mixtures of active ingredients they will have multiple entries in this table

KEYSCOLUMN_NAMEDATA_TYPENULLABLECOMMENT
FK,UKPRODUCT_IDVARCHAR2(30)NOT NULLUnique identifier of the product. FK to PRODUCTS
INGREDIENTVARCHAR2(200)Name of the approved ingredient within the product
STRENGTHVARCHAR2(200)Dose strength
FK,UKRECORD_IDNUMBER(9,0)NOT NULLForeign key to the compound_records table.
FKMOLREGNONUMBER(9,0)Unique identifier of the ingredient FK to MOLECULE_DICTIONARY
PKFORMULATION_IDNUMBER(9,0)NOT NULLPrimary key.

FRAC_CLASSIFICATION:

Table showing classification of fungicide mechanism of action according to the Fungicide Resistance Action Committee (FRAC): http://www.frac.info/publication/anhang/FRAC%20Code%20List%202013-final.pdf

KEYSCOLUMN_NAMEDATA_TYPENULLABLECOMMENT
PKFRAC_CLASS_IDNUMBER(9,0)NOT NULLUnique numeric primary key for each level5 code
ACTIVE_INGREDIENTVARCHAR2(500)NOT NULLName of active ingredient (fungicide) classified by FRAC
LEVEL1VARCHAR2(2)NOT NULLMechanism of action code assigned by FRAC
LEVEL1_DESCRIPTIONVARCHAR2(2000)NOT NULLDescription of mechanism of action
LEVEL2VARCHAR2(2)NOT NULLTarget site code assigned by FRAC
LEVEL2_DESCRIPTIONVARCHAR2(2000)Description of target provided by FRAC
LEVEL3VARCHAR2(6)NOT NULLGroup number assigned by FRAC
LEVEL3_DESCRIPTIONVARCHAR2(2000)Description of group provided by FRAC
LEVEL4VARCHAR2(7)NOT NULLNumber denoting the chemical group (number not assigned by FRAC)
LEVEL4_DESCRIPTIONVARCHAR2(2000)Chemical group name provided by FRAC
UKLEVEL5VARCHAR2(8)NOT NULLA unique code assigned to each ingredient (based on the level 1-4 FRAC classification, but not assigned by IRAC)
FRAC_CODEVARCHAR2(4)NOT NULLThe official FRAC classification code for the ingredient

GO_CLASSIFICATION:

Table storing the ChEMBL Drug Target GO slim (http://www.geneontology.org/ontology/subsets/goslim_chembl.obo)

KEYSCOLUMN_NAMEDATA_TYPENULLABLECOMMENT
PKGO_IDVARCHAR2(10)NOT NULLPrimary key. Gene Ontology identifier for the GO slim term
PARENT_GO_IDVARCHAR2(10)Gene Ontology identifier for the parent of this GO term in the ChEMBL Drug Target GO slim
PREF_NAMEVARCHAR2(200)Gene Ontology name
CLASS_LEVELNUMBER(1,0)Indicates the level of the term in the slim (L1 = highest)
ASPECTVARCHAR2(1)Indicates which aspect of the Gene Ontology the term belongs to (F = molecular function, P = biological process, C = cellular component)
PATHVARCHAR2(1000)Indicates the full path to this term in the GO slim

HRAC_CLASSIFICATION:

Table showing classification of herbicide mechanism of action according to the Herbicide Resistance Action Committee (HRAC): http://www.hracglobal.com/Education/ClassificationofHerbicideSiteofAction.aspx

KEYSCOLUMN_NAMEDATA_TYPENULLABLECOMMENT
PKHRAC_CLASS_IDNUMBER(9,0)NOT NULLUnique numeric primary key for each level3 code
ACTIVE_INGREDIENTVARCHAR2(500)NOT NULLName of active ingredient (herbicide) classified by HRAC
LEVEL1VARCHAR2(2)NOT NULLHRAC group code - denoting mechanism of action of herbicide
LEVEL1_DESCRIPTIONVARCHAR2(2000)NOT NULLDescription of mechanism of action provided by HRAC
LEVEL2VARCHAR2(3)NOT NULLIndicates a chemical family within a particular HRAC group (number not assigned by HRAC)
LEVEL2_DESCRIPTIONVARCHAR2(2000)Description of chemical family provided by HRAC
UKLEVEL3VARCHAR2(5)NOT NULLA unique code assigned to each ingredient (based on the level 1 and 2 HRAC classification, but not assigned by HRAC)
HRAC_CODEVARCHAR2(2)NOT NULLThe official HRAC classification code for the ingredient

INDICATION_REFS:

Table storing references indicating the source of drug indication information

KEYSCOLUMN_NAMEDATA_TYPENULLABLECOMMENT
PKINDREF_IDNUMBER(9,0)NOT NULLPrimary key
FK,UKDRUGIND_IDNUMBER(9,0)NOT NULLForeign key to the DRUG_INDICATION table, indicating the drug-indication link that this reference applies to
UKREF_TYPEVARCHAR2(50)NOT NULLType/source of reference
UKREF_IDVARCHAR2(2000)NOT NULLIdentifier for the reference in the source
REF_URLVARCHAR2(4000)NOT NULLFull URL linking to the reference

IRAC_CLASSIFICATION:

Table showing classification of insecticide mechanism of action according to the Insecticide Resistance Action Committee (IRAC): http://www.irac-online.org/documents/moa-classification/?ext=pdf

KEYSCOLUMN_NAMEDATA_TYPENULLABLECOMMENT
PKIRAC_CLASS_IDNUMBER(9,0)NOT NULLUnique numeric primary key for each level4 code
ACTIVE_INGREDIENTVARCHAR2(500)NOT NULLName of active ingredient (insecticide) classified by IRAC
LEVEL1VARCHAR2(1)NOT NULLClass of action e.g., nerve action, energy metabolism (code not assigned by IRAC)
LEVEL1_DESCRIPTIONVARCHAR2(2000)NOT NULLDescription of class of action, as provided by IRAC
LEVEL2VARCHAR2(3)NOT NULLIRAC main group code denoting primary site/mechanism of action
LEVEL2_DESCRIPTIONVARCHAR2(2000)NOT NULLDescription of site/mechanism of action provided by IRAC
LEVEL3VARCHAR2(6)NOT NULLIRAC sub-group code denoting chemical class of insecticide
LEVEL3_DESCRIPTIONVARCHAR2(2000)NOT NULLDescription of chemical class or exemplifying ingredient provided by IRAC
UKLEVEL4VARCHAR2(8)NOT NULLA unique code assigned to each ingredient (based on the level 1, 2 and 3 IRAC classification, but not assigned by IRAC)
IRAC_CODEVARCHAR2(3)NOT NULLThe official IRAC classification code for the ingredient

LIGAND_EFF:

Contains BEI (Binding Efficiency Index) and SEI (Surface Binding Efficiency Index) for each activity_id where such data can be calculated.

KEYSCOLUMN_NAMEDATA_TYPENULLABLECOMMENT
PK,FKACTIVITY_IDNUMBER(11,0)NOT NULLLink key to activities table
BEINUMBER(9,2)Binding Efficiency Index = p(XC50) *1000/MW_freebase
SEINUMBER(9,2)Surface Efficiency Index = p(XC50)*100/PSA
LENUMBER(9,2)Ligand Efficiency = deltaG/heavy_atoms [from the Hopkins DDT paper 2004]
LLENUMBER(9,2)Lipophilic Ligand Efficiency = -logKi-ALogP. [from Leeson NRDD 2007]

MECHANISM_REFS:

Table storing references for information in the drug_mechanism table

KEYSCOLUMN_NAMEDATA_TYPENULLABLECOMMENT
PKMECREF_IDNUMBER(9,0)NOT NULLPrimary key
FK,UKMEC_IDNUMBER(9,0)NOT NULLForeign key to drug_mechanism table - indicating the mechanism to which the references refer
UKREF_TYPEVARCHAR2(50)NOT NULLType/source of reference (e.g., 'PubMed','DailyMed')
UKREF_IDVARCHAR2(200)Identifier for the reference in the source (e.g., PubMed ID or DailyMed setid)
REF_URLVARCHAR2(400)Full URL linking to the reference

METABOLISM:

Table storing drug metabolic pathways, manually curated from a variety of sources

KEYSCOLUMN_NAMEDATA_TYPENULLABLECOMMENT
PKMET_IDNUMBER(9,0)NOT NULLPrimary key
FK,UKDRUG_RECORD_IDNUMBER(9,0)Foreign key to compound_records. Record representing the drug or other compound for which metabolism is being studied (may not be the same as the substrate being measured)
FK,UKSUBSTRATE_RECORD_IDNUMBER(9,0)Foreign key to compound_records. Record representing the compound that is the subject of metabolism
FK,UKMETABOLITE_RECORD_IDNUMBER(9,0)Foreign key to compound_records. Record representing the compound that is the result of metabolism
UKPATHWAY_IDNUMBER(9,0)Identifier for the metabolic scheme/pathway (may be multiple pathways from one source document)
PATHWAY_KEYVARCHAR2(50)Link to original source indicating where the pathway information was found (e.g., Figure 1, page 23)
UKENZYME_NAMEVARCHAR2(200)Name of the enzyme responsible for the metabolic conversion
FK,UKENZYME_TIDNUMBER(9,0)Foreign key to target_dictionary. TID for the enzyme responsible for the metabolic conversion
MET_CONVERSIONVARCHAR2(200)Description of the metabolic conversion
ORGANISMVARCHAR2(100)Organism in which this metabolic reaction occurs
UKTAX_IDNUMBER(11,0)NCBI Tax ID for the organism in which this metabolic reaction occurs
MET_COMMENTVARCHAR2(1000)Additional information regarding the metabolism (e.g., organ system, conditions under which observed, activity of metabolites)

METABOLISM_REFS:

Table storing references for metabolic pathways, indicating the source of the data

KEYSCOLUMN_NAMEDATA_TYPENULLABLECOMMENT
PKMETREF_IDNUMBER(9,0)NOT NULLPrimary key
FK,UKMET_IDNUMBER(9,0)NOT NULLForeign key to record_metabolism table - indicating the metabolism information to which the references refer
UKREF_TYPEVARCHAR2(50)NOT NULLType/source of reference (e.g., 'PubMed','DailyMed')
UKREF_IDVARCHAR2(200)Identifier for the reference in the source (e.g., PubMed ID or DailyMed setid)
REF_URLVARCHAR2(400)Full URL linking to the reference

MOLECULE_ATC_CLASSIFICATION:

Table mapping drugs in the molecule_dictionary to ATC codes in the atc_classification table

KEYSCOLUMN_NAMEDATA_TYPENULLABLECOMMENT
PKMOL_ATC_IDNUMBER(9,0)NOT NULLPrimary key
FKLEVEL5VARCHAR2(10)NOT NULLATC code (foreign key to atc_classification table)
FKMOLREGNONUMBER(9,0)NOT NULLDrug to which the ATC code applies (foreign key to molecule_dictionary table)

MOLECULE_DICTIONARY:

Non redundant list of compounds/biotherapeutics with associated identifiers

KEYSCOLUMN_NAMEDATA_TYPENULLABLECOMMENT
PKMOLREGNONUMBER(9,0)NOT NULLInternal Primary Key for the molecule
PREF_NAMEVARCHAR2(255)Preferred name for the molecule
FK,UKCHEMBL_IDVARCHAR2(20)NOT NULLChEMBL identifier for this compound (for use on web interface etc)
MAX_PHASENUMBER(1,0)NOT NULLMaximum phase of development reached for the compound (4 = approved). Null where max phase has not yet been assigned.
THERAPEUTIC_FLAGNUMBER(1,0)NOT NULLIndicates that a drug has a therapeutic application (as opposed to e.g., an imaging agent, additive etc).
DOSED_INGREDIENTNUMBER(1,0)NOT NULLIndicates that the drug is dosed in this form (e.g., a particular salt)
STRUCTURE_TYPEVARCHAR2(10)NOT NULLIndications whether the molecule has a small molecule structure or a protein sequence (MOL indicates an entry in the compound_structures table, SEQ indications an entry in the protein_therapeutics table, NONE indicates an entry in neither table, e.g., structure unknown)
CHEBI_PAR_IDNUMBER(9,0)Preferred ChEBI ID for the compound (where different from assigned)
MOLECULE_TYPEVARCHAR2(30)Type of molecule (Small molecule, Protein, Antibody, Oligosaccharide, Oligonucleotide, Cell, Unknown)
FIRST_APPROVALNUMBER(4,0)Earliest known approval year for the molecule
ORALNUMBER(1,0)NOT NULLIndicates whether the drug is known to be administered orally.
PARENTERALNUMBER(1,0)NOT NULLIndicates whether the drug is known to be administered parenterally
TOPICALNUMBER(1,0)NOT NULLIndicates whether the drug is known to be administered topically.
BLACK_BOX_WARNINGNUMBER(1,0)NOT NULLIndicates that the drug has a black box warning
NATURAL_PRODUCTNUMBER(1,0)NOT NULLIndicates whether the compound is natural product-derived (currently curated only for drugs)
FIRST_IN_CLASSNUMBER(1,0)NOT NULLIndicates whether this is known to be the first compound of its class (e.g., acting on a particular target).
CHIRALITYNUMBER(1,0)NOT NULLShows whether a drug is dosed as a racemic mixture (0), single stereoisomer (1) or is an achiral molecule (2)
PRODRUGNUMBER(1,0)NOT NULLIndicates that the molecule is a pro-drug (see molecule hierarchy for active component, where known)
INORGANIC_FLAGNUMBER(1,0)NOT NULLIndicates whether the molecule is inorganic (i.e., containing only metal atoms and <2 carbon atoms)
USAN_YEARNUMBER(4,0)The year in which the application for a USAN/INN name was made
AVAILABILITY_TYPENUMBER(1,0)The availability type for the drug (0 = discontinued, 1 = prescription only, 2 = over the counter)
USAN_STEMVARCHAR2(50)Where the compound has been assigned a USAN name, this indicates the stem, as described in the USAN_STEM table.
POLYMER_FLAGNUMBER(1,0)Indicates whether a molecule is a small molecule polymer (e.g., polistyrex)
USAN_SUBSTEMVARCHAR2(50)Where the compound has been assigned a USAN name, this indicates the substem
USAN_STEM_DEFINITIONVARCHAR2(1000)Definition of the USAN stem
INDICATION_CLASSVARCHAR2(1000)Indication class(es) assigned to a drug in the USP dictionary
WITHDRAWN_FLAGNUMBER(1,0)NOT NULLFlag indicating whether the drug has been withdrawn in at least one country (not necessarily in the US)
WITHDRAWN_YEARNUMBER(4,0)Year the drug was first withdrawn in any country
WITHDRAWN_COUNTRYVARCHAR2(1000)List of countries/regions where the drug has been withdrawn
WITHDRAWN_REASONVARCHAR2(1000)Reasons for withdrawl (e.g., safety)

MOLECULE_FRAC_CLASSIFICATION:

Table showing Fungicide Resistance Action Committee (FRAC) mechanism of action classification for known crop protection fungicides.

KEYSCOLUMN_NAMEDATA_TYPENULLABLECOMMENT
PKMOL_FRAC_IDNUMBER(9,0)NOT NULLPrimary key.
FK,UKFRAC_CLASS_IDNUMBER(9,0)NOT NULLForeign key to frac_classification table showing the mechanism of action classification of the compound.
FK,UKMOLREGNONUMBER(9,0)NOT NULLForeign key to molecule_dictionary, showing the compound to which the classification applies.

MOLECULE_HIERARCHY:

Table storing relationships between parents, salts and active metabolites (for pro-drugs).

KEYSCOLUMN_NAMEDATA_TYPENULLABLECOMMENT
PK,FKMOLREGNONUMBER(9,0)NOT NULLForeign key to compounds table. This field holds a list of all of the ChEMBL compounds with associated data (e.g., activity information, approved drugs). Parent compounds that are generated only by removing salts, and which do not themselves have any associated data will not appear here.
FKPARENT_MOLREGNONUMBER(9,0)Represents parent compound of molregno in first field (i.e., generated by removing salts). Where molregno and parent_molregno are same, the initial ChEMBL compound did not contain a salt component, or else could not be further processed for various reasons (e.g., inorganic mixture). Compounds which are only generated by removing salts will appear in this field only. Those which, themselves, have any associated data (e.g., activity data) or are launched drugs will also appear in the molregno field.
FKACTIVE_MOLREGNONUMBER(9,0)Where a compound is a pro-drug, this represents the active metabolite of the 'dosed' compound given by parent_molregno. Where parent_molregno and active_molregno are the same, the compound is not currently known to be a pro-drug.

MOLECULE_HRAC_CLASSIFICATION:

Table showing Herbicide Resistance Action Committee (HRAC) mechanism of action classification for known herbicidal compounds.

KEYSCOLUMN_NAMEDATA_TYPENULLABLECOMMENT
PKMOL_HRAC_IDNUMBER(9,0)NOT NULLPrimary key
FK,UKHRAC_CLASS_IDNUMBER(9,0)NOT NULLForeign key to hrac_classification table showing the classification for the compound.
FK,UKMOLREGNONUMBER(9,0)NOT NULLForeign key to molecule_dictionary, showing the compound to which this classification applies.

MOLECULE_IRAC_CLASSIFICATION:

Table showing Insecticide Resistance Action Committee (IRAC) mechanism of action classification for known crop protection insecticides.

KEYSCOLUMN_NAMEDATA_TYPENULLABLECOMMENT
PKMOL_IRAC_IDNUMBER(9,0)NOT NULLPrimary key.
FK,UKIRAC_CLASS_IDNUMBER(9,0)NOT NULLForeign key to the irac_classification table showing the mechanism of action classification for the compound.
FK,UKMOLREGNONUMBER(9,0)NOT NULLForeign key to the molecule_dictionary table, showing the compound to which the classification applies.

MOLECULE_SYNONYMS:

Stores synonyms for a compound (e.g., common names, trade names, research codes etc)

KEYSCOLUMN_NAMEDATA_TYPENULLABLECOMMENT
FK,UKMOLREGNONUMBER(9,0)NOT NULLForeign key to molecule_dictionary
UKSYN_TYPEVARCHAR2(50)NOT NULLType of name/synonym (e.g., TRADE_NAME, RESEARCH_CODE, USAN)
PKMOLSYN_IDNUMBER(9,0)NOT NULLPrimary key.
FKRES_STEM_IDNUMBER(9,0)Foreign key to the research_stem table. Where a synonym is a research code, this links to further information about the company associated with that code.
UKSYNONYMSVARCHAR2(200)Synonym for the compound

ORGANISM_CLASS:

Simple organism classification (essentially a cut-down version of the NCBI taxonomy for organisms in ChEMBL assay2target and target_dictionary tables), allowing browsing of ChEMBL data by taxonomic groups

KEYSCOLUMN_NAMEDATA_TYPENULLABLECOMMENT
PKOC_IDNUMBER(9,0)NOT NULLInternal primary key
UKTAX_IDNUMBER(11,0)NCBI taxonomy ID for the organism (corresponding to tax_ids in assay2target and target_dictionary tables)
L1VARCHAR2(200)Highest level classification (e.g., Eukaryotes, Bacteria, Fungi etc)
L2VARCHAR2(200)Second level classification
L3VARCHAR2(200)Third level classification

PARAMETER_TYPE:

Table storing definitions for the different types of parameters used in the assay_parameters table.

KEYSCOLUMN_NAMEDATA_TYPENULLABLECOMMENT
PKPARAMETER_TYPEVARCHAR2(40)NOT NULLShort name for the type of parameter associated with an assay
DESCRIPTIONVARCHAR2(2000)Description of the parameter type

PATENT_USE_CODES:

Table from FDA Orange Book, showing definitions of different patent use codes (as used in the product_patents table).

KEYSCOLUMN_NAMEDATA_TYPENULLABLECOMMENT
PKPATENT_USE_CODEVARCHAR2(8)NOT NULLPrimary key. Patent use code from FDA Orange Book
DEFINITIONVARCHAR2(500)NOT NULLDefinition for the patent use code, from FDA Orange Book.

PREDICTED_BINDING_DOMAINS:

Table storing information on the likely binding domain of compounds in the activities table (based on analysis of the domain structure of the target. Note these are predictions, not experimentally determined. See Kruger F, Rostom R and Overington JP (2012), BMC Bioinformatics, 13(S17), S11 for more details.

KEYSCOLUMN_NAMEDATA_TYPENULLABLECOMMENT
PKPREDBIND_IDNUMBER(9,0)NOT NULLPrimary key.
FKACTIVITY_IDNUMBER(11,0)Foreign key to the activities table, indicating the compound/assay(+target) combination for which this prediction is made.
FKSITE_IDNUMBER(9,0)Foreign key to the binding_sites table, indicating the binding site (domain) that the compound is predicted to bind to.
PREDICTION_METHODVARCHAR2(50)The method used to assign the binding domain (e.g., 'Single domain' where the protein has only 1 domain, 'Multi domain' where the protein has multiple domains, but only 1 is known to bind small molecules in other proteins).
CONFIDENCEVARCHAR2(10)The level of confidence assigned to the prediction (high where the protein has only 1 domain, medium where the compound has multiple domains, but only 1 known small molecule-binding domain).

PRODUCT_PATENTS:

Table from FDA Orange Book, showing patents associated with drug products.

KEYSCOLUMN_NAMEDATA_TYPENULLABLECOMMENT
PKPROD_PAT_IDNUMBER(9,0)NOT NULLPrimary key
FK,UKPRODUCT_IDVARCHAR2(30)NOT NULLForeign key to products table - FDA application number for the product
UKPATENT_NOVARCHAR2(11)NOT NULLPatent numbers as submitted by the applicant holder for patents covered by the statutory provisions
UKPATENT_EXPIRE_DATEDATENOT NULLDate the patent expires as submitted by the applicant holder including applicable extensions
DRUG_SUBSTANCE_FLAGNUMBER(1,0)NOT NULLPatents submitted on FDA Form 3542 and listed after August 18, 2003 may have a drug substance flag set to 1, indicating the sponsor submitted the patent as claiming the drug substance
DRUG_PRODUCT_FLAGNUMBER(1,0)NOT NULLPatents submitted on FDA Form 3542 and listed after August 18, 2003 may have a drug product flag set to 1, indicating the sponsor submitted the patent as claiming the drug product
FK,UKPATENT_USE_CODEVARCHAR2(10)Code to designate a use patent that covers the approved indication or use of a drug product
DELIST_FLAGNUMBER(1,0)NOT NULLSponsor has requested patent be delisted if set to 1. This patent has remained listed because, under Section 505(j)(5)(D)(i) of the Act, a first applicant may retain eligibility for 180-day exclusivity based on a paragraph IV certification to this patent for a certain period. Applicants under Section 505(b)(2) are not required to certify to patents where this flag is set to 1

PRODUCTS:

Table containing information about approved drug products (mainly from the FDA Orange Book), such as trade name, administration route, approval date. Ingredients in each product are linked to the molecule dictionary via the formulations table.

KEYSCOLUMN_NAMEDATA_TYPENULLABLECOMMENT
DOSAGE_FORMVARCHAR2(200)The dosage form of the product (e.g., tablet, capsule etc)
ROUTEVARCHAR2(200)The administration route of the product (e.g., oral, injection etc)
TRADE_NAMEVARCHAR2(200)The trade name for the product
APPROVAL_DATEDATEThe FDA approval date for the product (not necessarily first approval of the active ingredient)
AD_TYPEVARCHAR2(5)RX = prescription, OTC = over the counter, DISCN = discontinued
ORALNUMBER(1,0)Flag to show whether product is orally delivered
TOPICALNUMBER(1,0)Flag to show whether product is topically delivered
PARENTERALNUMBER(1,0)Flag to show whether product is parenterally delivered
BLACK_BOX_WARNINGNUMBER(1,0)Flag to show whether the product label has a black box warning
APPLICANT_FULL_NAMEVARCHAR2(200)Name of the company applying for FDA approval
INNOVATOR_COMPANYNUMBER(1,0)Flag to show whether the applicant is the innovator of the product
PKPRODUCT_IDVARCHAR2(30)NOT NULLFDA application number for the product
NDA_TYPEVARCHAR2(10)New Drug Application Type. The type of new drug application approval. New Drug Applications (NDA or innovator) are N. Abbreviated New Drug Applications (ANDA or generic) are A.

PROTEIN_CLASS_SYNONYMS:

Table storing synonyms for the protein family classifications (from various sources including MeSH, ConceptWiki and UMLS).

KEYSCOLUMN_NAMEDATA_TYPENULLABLECOMMENT
PKPROTCLASSSYN_IDNUMBER(9,0)NOT NULLPrimary key.
FK,UKPROTEIN_CLASS_IDNUMBER(9,0)NOT NULLForeign key to the PROTEIN_CLASSIFICATION table. The protein_class to which this synonym applies.
UKPROTEIN_CLASS_SYNONYMVARCHAR2(1000)The synonym for the protein class.
UKSYN_TYPEVARCHAR2(20)The type or origin of the synonym (e.g., ChEMBL, Concept Wiki, UMLS).

PROTEIN_CLASSIFICATION:

Table storing the protein family classifications for protein targets in ChEMBL (formerly in the target_class table)

KEYSCOLUMN_NAMEDATA_TYPENULLABLECOMMENT
PKPROTEIN_CLASS_IDNUMBER(9,0)NOT NULLPrimary key. Unique identifier for each protein family classification.
PARENT_IDNUMBER(9,0)Protein_class_id for the parent of this protein family.
PREF_NAMEVARCHAR2(500)Preferred/full name for this protein family.
SHORT_NAMEVARCHAR2(50)Short/abbreviated name for this protein family (not necessarily unique).
PROTEIN_CLASS_DESCVARCHAR2(410)NOT NULLConcatenated description of each classification for searching purposes etc.
DEFINITIONVARCHAR2(4000)Definition of the protein family.
CLASS_LEVELNUMBER(9,0)NOT NULLLevel of the class within the hierarchy (level 1 = top level classification)

PROTEIN_FAMILY_CLASSIFICATION:

WARNING! THIS TABLE IS NOW DEPRECATED, PLEASE USE PROTEIN_CLASSIFICATION INSTEAD.

KEYSCOLUMN_NAMEDATA_TYPENULLABLECOMMENT
PKPROTEIN_CLASS_IDNUMBER(9,0)NOT NULLPrimary key. Unique identifier for each classification.
UKPROTEIN_CLASS_DESCVARCHAR2(810)NOT NULLConcatenated description of each classification for searching purposes etc.
UKL1VARCHAR2(100)NOT NULLFirst level classification (e.g., Enzyme, Transporter, Ion Channel).
UKL2VARCHAR2(100)Second level classification.
UKL3VARCHAR2(100)Third level classification.
UKL4VARCHAR2(100)Fourth level classification.
UKL5VARCHAR2(100)Fifth level classification.
UKL6VARCHAR2(100)Sixth level classification.
UKL7VARCHAR2(100)Seventh level classification.
UKL8VARCHAR2(100)Eighth level classification.

RELATIONSHIP_TYPE:

Lookup table for assay2target.relationship_type column, showing whether assays are mapped to targets of the correct identity and species ('Direct') or close homologues ('Homologue')

KEYSCOLUMN_NAMEDATA_TYPENULLABLECOMMENT
PKRELATIONSHIP_TYPEVARCHAR2(1)NOT NULLRelationship_type flag used in the assay2target table
RELATIONSHIP_DESCVARCHAR2(250)Description of relationship_type flags

RESEARCH_COMPANIES:

Table storing a list of pharmaceutical companies (including current and former names) corresponding to each research code stem in the research_stem table. A stem can sometimes be used by more than one company.

KEYSCOLUMN_NAMEDATA_TYPENULLABLECOMMENT
PKCO_STEM_IDNUMBER(9,0)NOT NULLPrimary key.
FK,UKRES_STEM_IDNUMBER(9,0)Foreign key to research_stem table.
UKCOMPANYVARCHAR2(100)Name of current company associated with this research code stem.
COUNTRYVARCHAR2(50)Country in which the company uses this research code stem.
PREVIOUS_COMPANYVARCHAR2(100)Previous name of the company associated with this research code stem (e.g., if the company has undergone acquisitions/mergers).

RESEARCH_STEM:

Table storing a list of stems/prefixes used in research codes.

KEYSCOLUMN_NAMEDATA_TYPENULLABLECOMMENT
PKRES_STEM_IDNUMBER(9,0)NOT NULLPrimary key. Unique ID for each research code stem.
UKRESEARCH_STEMVARCHAR2(20)The actual stem/prefix used in the research code.

SITE_COMPONENTS:

Table defining the location of the binding sites in the binding_sites table. A binding site could be defined in terms of which protein subunits (components) are involved, the domains within those subunits to which the compound binds, and possibly even the precise residues involved. For a target where the binding site is at the interface of two protein subunits or two domains, there will be two site_components describing each of these subunits/domains.

KEYSCOLUMN_NAMEDATA_TYPENULLABLECOMMENT
PKSITECOMP_IDNUMBER(9,0)NOT NULLPrimary key.
FK,UKSITE_IDNUMBER(9,0)NOT NULLForeign key to binding_sites table.
FK,UKCOMPONENT_IDNUMBER(9,0)Foreign key to the component_sequences table, indicating which molecular component of the target is involved in the binding site.
FK,UKDOMAIN_IDNUMBER(9,0)Foreign key to the domains table, indicating which domain of the given molecular component is involved in the binding site (where not known, the domain_id may be null).
SITE_RESIDUESVARCHAR2(2000)List of residues from the given molecular component that make up the binding site (where not know, will be null).

SOURCE:

Table showing source from which ChEMBL activity data is derived (e.g., literature, deposited datasets etc)

KEYSCOLUMN_NAMEDATA_TYPENULLABLECOMMENT
PKSRC_IDNUMBER(3,0)NOT NULLIdentifier for each source (used in compound_records and assays tables)
SRC_DESCRIPTIONVARCHAR2(500)Description of the data source
SRC_SHORT_NAMEVARCHAR2(20)A short name for each data source, for display purposes

STRUCTURAL_ALERT_SETS:

Table showing list of sets of structural alerts that have been included in COMPOUND_STRUCTURAL_ALERT table. It should be noted some alerts/alert sets are more permissive than others and may flag a large number of compounds. Results should be interpreted with care, depending on the use-case, and not treated as a blanket filter (e.g., around 50% of approved drugs have 1 or more alerts from these sets).

KEYSCOLUMN_NAMEDATA_TYPENULLABLECOMMENT
PKALERT_SET_IDNUMBER(9,0)NOT NULLUnique ID for the structural alert set
UKSET_NAMEVARCHAR2(100)NOT NULLName (or origin) of the structural alert set
PRIORITYNUMBER(2,0)NOT NULLPriority assigned to the structural alert set for display on the ChEMBL interface (priorities >=4 are shown by default).

STRUCTURAL_ALERTS:

Table storing a list of structural features (encoded as SMARTS) that are potentially undesirable in drug discovery context. It should be noted some alerts/alert sets are more permissive than others and may flag a large number of compounds. Results should be interpreted with care, depending on the use-case, and not treated as a blanket filter (e.g., around 50% of approved drugs have 1 or more alerts from these sets).

KEYSCOLUMN_NAMEDATA_TYPENULLABLECOMMENT
PKALERT_IDNUMBER(9,0)NOT NULLPrimary key. Unique identifier for the structural alert
FK,UKALERT_SET_IDNUMBER(9,0)NOT NULLForeign key to structural_alert_sets table indicating which set this particular alert comes from
UKALERT_NAMEVARCHAR2(100)NOT NULLA name for the structural alert
UKSMARTSVARCHAR2(4000)NOT NULLSMARTS defining the structural feature that is considered to be an alert

TARGET_COMPONENTS:

Links molecular target from the target_dictionary to the components they consist of (in the component_sequences table). For a protein complex or protein family target, for example, there will be multiple protein components in the component_sequences table.

KEYSCOLUMN_NAMEDATA_TYPENULLABLECOMMENT
FKTIDNUMBER(9,0)NOT NULLForeign key to the target_dictionary, indicating the target to which the components belong.
FKCOMPONENT_IDNUMBER(9,0)NOT NULLForeign key to the component_sequences table, indicating which components belong to the target.
PKTARGCOMP_IDNUMBER(9,0)NOT NULLPrimary key.
HOMOLOGUENUMBER(1,0)NOT NULLIndicates that the given component is a homologue of the correct component (e.g., from a different species) when set to 1. This may be the case if the sequence for the correct protein/nucleic acid cannot be found in sequence databases. A value of 2 indicates that the sequence given is a representative of a species group, e.g., an E. coli protein to represent the target of a broad-spectrum antibiotic.

TARGET_DICTIONARY:

Target Dictionary containing all curated targets for ChEMBL. Includes both protein targets and non-protein targets (e.g., organisms, tissues, cell lines)

KEYSCOLUMN_NAMEDATA_TYPENULLABLECOMMENT
PKTIDNUMBER(9,0)NOT NULLUnique ID for the target
FKTARGET_TYPEVARCHAR2(30)Describes whether target is a protein, an organism, a tissue etc. Foreign key to TARGET_TYPE table.
PREF_NAMEVARCHAR2(200)NOT NULLPreferred target name: manually curated
TAX_IDNUMBER(11,0)NCBI taxonomy id of target
ORGANISMVARCHAR2(150)Source organism of molecuar target or tissue, or the target organism if compound activity is reported in an organism rather than a protein or tissue
FK,UKCHEMBL_IDVARCHAR2(20)NOT NULLChEMBL identifier for this target (for use on web interface etc)
SPECIES_GROUP_FLAGNUMBER(1,0)NOT NULLFlag to indicate whether the target represents a group of species, rather than an individual species (e.g., 'Bacterial DHFR'). Where set to 1, indicates that any associated target components will be a representative, rather than a comprehensive set.

TARGET_RELATIONS:

Table showing relationships between different protein targets based on overlapping protein components (e.g., relationship between a protein complex and the individual subunits).

KEYSCOLUMN_NAMEDATA_TYPENULLABLECOMMENT
FKTIDNUMBER(9,0)NOT NULLIdentifier for target of interest (foreign key to target_dictionary table)
RELATIONSHIPVARCHAR2(20)NOT NULLRelationship between two targets (e.g., SUBSET OF, SUPERSET OF, OVERLAPS WITH)
FKRELATED_TIDNUMBER(9,0)NOT NULLIdentifier for the target that is related to the target of interest (foreign key to target_dicitionary table)
PKTARGREL_IDNUMBER(9,0)NOT NULLPrimary key

TARGET_TYPE:

Lookup table for target types used in the target dictionary

KEYSCOLUMN_NAMEDATA_TYPENULLABLECOMMENT
PKTARGET_TYPEVARCHAR2(30)NOT NULLTarget type (as used in target dictionary)
TARGET_DESCVARCHAR2(250)Description of target type
PARENT_TYPEVARCHAR2(25)Higher level classification of target_type, allowing grouping of e.g., all 'PROTEIN' targets, all 'NON-MOLECULAR' targets etc.

TISSUE_DICTIONARY:

Table storing information about tissues used in assays.

KEYSCOLUMN_NAMEDATA_TYPENULLABLECOMMENT
PKTISSUE_IDNUMBER(9,0)NOT NULLPrimary key, numeric ID for each tissue.
UKUBERON_IDVARCHAR2(15)Uberon ontology identifier for this tissue.
PREF_NAMEVARCHAR2(200)NOT NULLName for the tissue (in most cases Uberon name).
UKEFO_IDVARCHAR2(20)Experimental Factor Ontology identifier for the tissue.
FK,UKCHEMBL_IDVARCHAR2(20)NOT NULLChEMBL identifier for this tissue (for use on web interface etc)
BTO_IDVARCHAR2(20)BRENDA Tissue Ontology identifier for the tissue.
CALOHA_IDVARCHAR2(7)Swiss Institute for Bioinformatics CALOHA Ontology identifier for the tissue.

USAN_STEMS:

Table storing definitions for stems used in USANs (United States Adopted Names).

KEYSCOLUMN_NAMEDATA_TYPENULLABLECOMMENT
PKUSAN_STEM_IDNUMBER(9,0)NOT NULLNumeric primary key.
UKSTEMVARCHAR2(100)NOT NULLStem defined for use in United States Adopted Names.
UKSUBGROUPVARCHAR2(100)NOT NULLMore specific subgroup of the stem defined for use in United States Adopted Names.
ANNOTATIONVARCHAR2(2000)Meaning of the stem (e.g., the class of compound it applies to).
STEM_CLASSVARCHAR2(100)Indicates whether stem is used as a Prefix/Infix/Suffix.
MAJOR_CLASSVARCHAR2(100)Protein family targeted by compounds of this class (e.g., GPCR/Ion channel/Protease) where known/applicable.
WHO_EXTRANUMBER(1,0)Stem not represented in USAN list, but added from WHO INN stem list (where set to 1).

VARIANT_SEQUENCES:

Table storing information about mutant sequences and other variants used in assays. The sequence provided is a representative sequence incorporating the reported mutation/variant used in the assay - it is not necessarily the exact sequence used in the experiment.

KEYSCOLUMN_NAMEDATA_TYPENULLABLECOMMENT
PKVARIANT_IDNUMBER(9,0)NOT NULLPrimary key, numeric ID for each sequence variant.
UKMUTATIONVARCHAR2(2000)Details of variant(s) used, with residue positions adjusted to match provided sequence.
UKACCESSIONVARCHAR2(25)UniProt accesion for the representative sequence used as the base sequence (without variation).
VERSIONNUMBER(9,0)Version of the UniProt sequence used as the base sequence.
ISOFORMNUMBER(9,0)Details of the UniProt isoform used as the base sequence where relevant.
SEQUENCECLOBVariant sequence formed by adjusting the UniProt base sequence with the specified mutations/variations.
ORGANISMVARCHAR2(200)Organism from which the sequence was obtained.

VERSION:

Table showing release version and creation date for the database

KEYSCOLUMN_NAMEDATA_TYPENULLABLECOMMENT
PKNAMEVARCHAR2(20)NOT NULLName of release version
CREATION_DATEDATEDate database created
COMMENTSVARCHAR2(2000)Description of release version