home

This is an archival project for Recognyze evaluations in order to track progress.

PHILOSOPHY

This evaluation uses TAC-KBP and therefore follows its philosophy. In addition to this, we aim to fix various errors found in such evaluations, therefore we sometimes reference various error types.

All these concepts are described in the Error Analysis page.

INSTALLATION

You need to check the following projects in order for these scripts to work:

error_analysis - merged into this project already!
neleval (official TAC-KBP client) - https://github.com/wikilinks/neleval.git
eWRT - https://github.com/weblyzard/ewrt
weblyzard_api - https://github.com/weblyzard/weblyzard_api

You will need to replace the neleval/neleval/analyze.py file with the version provided in this project.

Current clients from the error_analysis package have been modified lately, therefore if you notice any mistake please report them.

Please be aware that due to the eWRT and weblyzard_api dependencies you will have to set your user, pass, url in a bash_profile or bashrc file.

Also generally when using the Recognyze client (recognyze evals) you need to be run as sudo and make sure you run them with the appropiate bash_profile (e.g., when running as sudoer you will want to do source /home/user/.bash_profile or a similar command depending on your OS).

Example usage for verifying that the Recognyze client is working

sudo su
source /home/user/.bash_profile
cd git/error_analysis/src
python recognyzetest.py

UPDATES

Corpora and local paths were removed for Recognyze scripts in latest version.

QUICK EXAMPLES

If you want to jump directly to use it please use the following two examples:

#new version
Run classic evals
./run_test.sh reuters128 date_in_format_yyyymmdd tool

Run evals with overlap
./run_test_fixer.sh reuters128 date_in_format_yyyymmdd tool

HEADERS and LEGENDS

Explained in the guideline.

WORKFLOWS

(currently recognize only)

This workflow should be applied for all tools. Current tool dependency in the eval scripts should be removed (WIP).

Please stick to the following rules:

Whenever you discover errors please recompute the latest runs. This is experimental, but latest results should be as reliable as possible!
Announce any change to the workflows and keep documentation up-to-date!

RUNNING AUTOMATED TAC-KBP CLASSIC EVALS

In order to run this type of evaluations you need to put a folder with Recognyze evaluation results in one of the date folders of this project (e.g., 20170204/test).

Each dataset should be contained in its own folder (e.g., reuters128, kore50).

All scripts are available in the script folder.

The format for running classic TAC-KBP evals is the following:

FORMAT
./run_eval.sh [git_path] [dataset] [profile] [nel_archive_folder]

- [git_path] - is the path to the git folders
- [dataset] - is the dataset name (e.g., reuters128, kore50, rbb, etc.)
- [profile] - represents short profile name (e.g., wikipedia, dbpediasolo, advanced, etc.)
- [nel_archive_folder] - represents the path to the nel_archive folder where Recognyze results were copied

Examples:
REUTERS128
./run_eval.sh gitpath reuters128 advanced nel_archive/20170204/advanced 

KORE50
./run_eval.sh gitpath kore50 advanced nel_archive/20170204/advanced

This creates 2 files as output:

[dataset]-[profilename]-results (classic results)
[dataset]-[profilename]-resultsbytype (results by type)

The basic error analysis returned by TAC is printed on the console.

All the gold standards are in the gold folder. Changes to neleval are in the neleval folder.

AUTOMATED EVALS WITH OVERLAPS OR PARTIAL MATCHING

This fixes the fact that you're penalized twice for partial matches with correct links in TAC-KBP evals.

 FORMAT
./run_fixer.sh [git_path] [dataset] [profile] [nel_archive_folder]

- [git_path] - is the path to the git folders
- [dataset] - is the dataset name (e.g., reuters128, kore50, rbb, etc.)
- [profile] - represents short profile name (e.g., wikipedia, dbpediasolo, advanced, etc.)
- [nel_archive_folder] - represents the path to the nel_archive folder where Recognyze results were copied

DIFF WORKFLOW

This creates diffs between 2 consecutive runs. The format is the following:

FORMAT
./run_eval_diff.sh gitpath reuters128 nel_archive/20170315/big nel_archive/20170317/big nel_archive/20170317/big-diff

- [git_path dataset]  - path to git folder
- [dataset] - the name of the dataset
- [path_to_folder1] - path to run 1
- [path_to_folder2] - path to run2
- [path_to_output_folder] - the output folder

ERROR ANALYSIS FOR HUMAN ANNOTATORS

In order to run Error Analysis for Human Annotators we have two options:

TAC-KBP workflow
TAC-KBP workflow + Partial matches workflow

[WIP] AUTOMATED ERROR ANALYSIS

This is still [WIP]

COMBINING WORKFLOWS

It is possible (and recommended) that we combine different workflows or execute them in a certain order.

In any pipeline, the first script should always be: TAC-KBP eval.

This can be followed either by Partial-Matches/Duplicates or by Error Analysis Workflows (Error Analysis for Human Annotators, Automated Error Analysis).

FOLDERS

This project contains the following folders:

[date] folders - in format YYYYMMDD - which contain profile subfolders for the respective day
gold - gold standards
neleval - scripts that were changed from neleval in order to achieve certain goals
observations - whenever needed observations or analysis files can be added separately (e.g., if annotated by humans, if fixes were done)
scripts - automated scripts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly