Data Integration and Analysis System (DIAS)
National Institute of Informatics (NII) Group

Project Top

Assignment of DOI to Research Data

What is DOI?

DOI (Digital Object Identifier) is a world-wide identifier system operated by the International DOI foundation. The initial purpose was to construct a more stable ID system than easily-broken URL (Uniform Resource Locator) to specify the location of scholarly papers published on the website of publishers. Adding DOI information on scholarly papers has become common culture, and it is now the basis of other systems such as citation index or research evaluation. Following this success story, people became interested in applying the same system to other types of scholarly information. Among others, research data is the most active area of research and development for using DOI.

From the global point of view, DataCite, established in 2009, has been playing a central role for the assignment of DOI on research data. In Japan, Japan Link Center (JaLC), which is operated jointly by four academic organizations, has been playing a central role. In October 2014, "Experimental project for DOI registration on research data" has started, and in October 2015, it summarized "Guideline for DOI registration on research data." DIAS participated on this project and actively contributed to submit our opinions. After the publication of the guideline, a few academic organizations have started the assignment of DOI, and DIAS is also working on the establishment of framework, and improvement of the system for starting the assignment of DOI. As a result, the first DOI was assigned and resolved on March 28, 2017.

DOI and Open Science

Assignment of DOI to research data is an infrastructure of open science that aims at providing data and information with easy access by anyone in the world.

First, the specification of data location by DOI leads to the usage of data. URL is widely used as the location information on the Internet, but it cannot be said as permanent location information because URL could be broken by the change of the system or the relocation of servers. On the other hand, DOI is robust against change because the management of location information is centralized at the DOI system, and users feel safe to use the data with expectation that data can be accessed toward the future.

Second, the specification of DOI on academic papers leads to the evaluation of data. The string of DOI is enough to identify the data used in academic papers and others, so it is possible to keep track of data usage by counting the number of appearances. It is possible to raise incentive involved in the release of data, such as appreciating people who released data with large demand, and this is expected to contribute to capacity building such as training and education to increase experts in earth environmental information.

It is globally recognized that, for the promotion of data-driven research using machine learning or artificial intelligence, the key is in high-quality open data or data sharing among stakeholders. Nevertheless, in Japan, it is not yet well understood that human and financial resources are crucial for the sustainable evolution of open data infrastructure. The assignment of DOI to research data is the beginning toward constructing advanced data infrastructure. First, we will assign DOI retrospectively to data that was already released at DIAS and satisfies conditions to assign DOI. Second, we will search for earth environmental data that is valuable to be released from DIAS, promote the release of data by assigning DOI. Data papers also require DOI when submitting papers, so some researchers choose to release DOI-assigned data from DIAS after the review.

The final goal is to increase the usage of released data, and raise the evaluation of people who released the data, and give birth to environment in which data ecosystem has a sustainable cycle. But we have a long way to go. DIAS, as the core platform for earth environmental data, has a plan to continue activities for development and promotion of earth environmental data both in terms of academic depth and industrial innovation.

One of our efforts to achieve this goal is the Mahalo Button. This is a button that is placed on the dataset landing page to aggregate dataset usage, and the DOIs of research results using the dataset are linked from the the button. If all datasets and research results could be assigned a DOI, it would be possible to evaluate the contribution of the dataset using a network of DOIs.

References on the Assignment of DOI at DIAS

List of DIAS datasets with DOI: DataCite Search