DataCite Commons - Exploiting the Power of PIDs and the PID Graph

Author: Martin Fenner (repost from DataCite Blog)

Today DataCite is proud to announce the launch of DataCite Commons, available at DataCite Commons is a discovery service that enables simple searches while giving users a comprehensive overview of connections between entities in the research landscape. This means that DataCite members registering DOIs with us will have easier access to information about the use of their DOIs and can discover and track connections between their DOIs and other entities. DataCite Commons was developed as part of the EC-funded project Freya and will form the basis of new DataCite services.

Datacite DOI commons


DataCite Commons has a lot of content to search for. One of the most important features is the ability to search for all DOIs, no matter whether registered with DataCite, Crossref, or one of the other scholarly DOI registration agencies. Users want to search for content or look up metadata for a particular DOI, and not worry about where to look. DataCite initially focused on registering DOIs for datasets (approaching 8 million DOIs so far), but our members to date have also registered almost 6 million DOIs for text publications. At the same time, Crossref members have given almost 2 million DOIs to datasets in addition to the DOIs for journal articles, book chapters, and other text publications. Other content types can be equally found at both DataCite and Crossref, e.g. dissertations or preprints. And there are 6 more DOI registration agencies that register DOIs for scholarly content. Including the more than 110 million Crossref DOIs in DataCite Commons is a huge undertaking. We currently have 10 million Crossref DOIs in DataCite Commons with the import of many more DOIs ongoing, together with 20 million DOIs from DataCite.



DataCite Commons not only has a lot more content to search for but also exposes the connections between DOIs in the form of citations, versions, and collections. DataCite Commons also shows the connections between content with DOIs and people, research organizations, and funders – what we together call the PID Graph of scholarly resources identified via persistent identifiers (PIDs) and connected in standard ways. We integrate with both the ORCID and ROR (Research Organization Registry) APIs to enable a search for (10 million) people and (100,000) organizations and to show the associated content. For funding, we take advantage of the inclusion of Crossref Funder IDs in ROR metadata. We combine these connections, showing a funder, research organization, or researcher not only their content but also the citations and views and downloads if available, together with aggregate statistics such as numbers by year or content type.


DataCite commons


For a single work, e.g. the dataset registered with DOI, we show views, downloads and citations if available:

DataCite commons


By mapping all Crossref metadata to corresponding metadata in DataCite, we can support much more granular search queries compared to just mapping basic metadata. With this release, we are also launching a new set of filters for content search. We added license type, fields of science, primary language, and DOI registration agency to the existing filters publication year and work type. As described in a July blog post (Fenner, 2020a), we are using existing controlled vocabularies for these filters (license type: SPDX, fields of science: OECD, and language: ISO639-1), and are re-indexing all our metadata (almost completed) to align with these standard vocabularies where possible. We encourage our members to use these standard vocabularies when registering content. This should help to find content that has a license that allows unrestricted re-use, and that is in the research field and language we are interested in. Using these widely used vocabularies should help with interoperability with other services.

DataCite commons

Original Blog Post: