kgxqr.github.io

Experiments

RQ1. We evaluate the intrinsic quality of the concept knowledge graph by assessing the correctness of the concepts and relations in the knowledge graph.
RQ2. We evaluate the effectiveness of KGXQR for relevant question retrieval by comparing it with multiple baselines on different test datasets. We create a benchmark consisting of three test datasets with different characteristics.
- The AnswerBot dataset is human annotated small datasets and contain only Java-related queries, while duplicate question datasets of java/c#/javascript/python and title edit datasets of java/c#/javascript/python are automatically created large datasets and cover the four most popular programming languages. The detailed evaluation result is shown in RQ2.xlsx. And the corresponding document stores for java, c#, javascript and python are also listed in the dataset.
- The Stack Overflow dump used in our implementation includes 1,096,708 duplicate question records and 2,554,062 edit records for 16,663,358 questions. As a result, we sample 21,172 and 80,000 positive samples from the duplicate question records and the question edit records respectively. The detailed information of positive training data and corresponding negative training data is shown in two files, duplicate data file and history data file.
RQ3. To evaluate the usefulness of the explanations provided by KGXQR, we compare the performance of users in selecting relevant questions for specific programming tasks with the explanations generated by KGXQR and a baseline method respectively.