QA Subtask

Sat Dec 07, 2019 by NTCIR-16 Data Search 2 Organizers

In this subtask, given a question and a dataset, a system is expected to generate an answer to the question by extracting a part of the dataset.

More specifically, participants are given a question and a dataset (ID). For example,

Question: What is the population in Tokyo in 2020?
Dataset:  000002505731

A system is expected to generate 13,510,000 as the answer to the given question, which is included in the dataset.

Questions

A question list will be released around the registration due. We may not be able to provide questions for training.

Participants are expected to find answers from:

English
- data_search_e_collection.jsonl.bz2 (metadata)
- data_search_e_data.tar.bz2 (data files)
Japanese
- data_search_j_collection.jsonl.bz2 (metadata)
- data_search_j_data.tar.bz2 (data files)

These files are available at Data Search Test Collection.

Generated answers are evaluated in two ways, as was done in reading comprehension tasks:

Exact match
- The fraction of answers that exactly match the ground truth answers.
Macro-averaged F1 score
- Let X be a set of words in the generated answer and Y be a set of words in the ground truth answer. Precision is defined as P = |X ∩ Y| / |X|, while recall is defined as R = |X ∩ Y| / |Y|. F1 score of an answer is computed as 2PR / (P + R). Macro-averaged F1 score is the average of the F1 scores over all of the answers.

Exact match will be used as the primary evaluation metric.

Each team is allowed to submit a run per day. Runs should be generated automatically.

The first line of the run file should describe your algorithm, which will be used when the organizers report participants' results:

<SYSDESC>[REPLACE ME]</SYSDESC>

e.g. <SYSDESC>BERT-based approach<SYSDESC>

The other lines in the file should be of the form:

[QUESTION_ID][TAB][ANSWER]

e.g.

DS2-QA-J-1001	2010
DS2-QA-J-1002	Tokyo
...

where each field should be separated by a TAB character (\t or U+0009), and

Note that the run files should contain the results for all the test questions.