Data Integration and Analysis System (DIAS)
National Institute of Informatics (NII) Group

Project Top

Data Repository and LLM Study Group

We are launching a study group to consider how to utilize large language model LLM (Generative AI) for data repositories. Based on the LLM study group, this will be a place where you can easily exchange information, and you can also share experimental attempts and code.

The use case will be utilization in data repositories, and the following topics will be discussed, among others.

1. Streamlining Tasks

  1. Streamlining metadata creation
  2. Generating metadata from dataset papers
  3. Translating metadata
  4. Summarizing related papers
  5. Organizing dataset usage

2. Advancing Search

  1. Searching semantically by direct input of natural language sentences
  2. Converting natural language into a query language (e.g. SPARQL)
  3. Organizing and summarizing search results, displaying search results by integrating multiple DBs
  4. Generating search results based on user level

3. Facilitating Usage

  1. Preparing metadata for LLM augmentation
  2. Generating code reflecting the schema of the dataset
  3. Generating code for dataset analysis and visualization