The European Bioinformatics Institute (EMBL-EBI)



The European Molecular Biology Laboratory is Europe’s flagship laboratory for the life sciences, operated as an  intergovernmental organisation with 21 member states, of which EMBL-EBI is an outstation. EMBL-EBI provides freely available, open data from life science experiments covering the full spectrum of molecular biology, including the life science literature. As well as this service mission, about 20% of EMBL-EBI is devoted to basic research, and extensive training programmes help researchers use the resources we provide. 

We maintain a range of structured deposition databases such as the European Nucleotide Archive, part of the International Nucleotide Sequence Database Consortium (together with GenBank, USA and the DNA Database of Japan) - the world’s largest nucleotide sequence database; PDBe, for protein structures, a part of the World-Wide Protein DataBank, and molecular “omics" databases such as ArrayExpress (gene expression),  PRIDE (proteomics), and Metabolights (Metabolomics). More recently, the BioStudies database collects metadata, links and files to aggregate all the data about a study together in one place, supporting data discovery, provenance of scientific assertions, and potentially simplifing data citation. Building on these deposition databases are knowledgebases such as Uniprot, for protein information, Ensembl - home to whole genomes ranging from human to plant to microbe reference genomes, and the Expression Atlas, which indicates which in which tissues and cells different genes are expressed. All resources are built through international collaboration and community engagement, have governance structures and make significant crosslinks to the scienticfic literature, provided by Europe PMC.

Europe PMC is EMBL-EBI's database for the life science research literature ( It contains over 30 million abstracts and 4.4 million full text articles, about 1.8 million of which are "gold" open access i.e. free to read and reuse. One of the key directives of Europe PubMed Central is to integrate the literature with related data. is an established resolving system and enables referencing of data for the Life Sciences community. It handles persistent identifiers (PIDs) in the form of URIs and CURIEs, which allows referencing data in both a location-independent and resource-dependent manner. registers namespaces and providers for over 600 biomedical resources. 

This expertise around literature, open data and PID services in the life sciences, make EMBL-EBI an excellent match to the tasks outlined in this proposal.