-
Notifications
You must be signed in to change notification settings - Fork 1
home
This is an archival project for Recognyze evaluations in order to track progress.
This evaluation uses TAC-KBP and therefore follows its philosophy. In addition to this, we aim to fix various errors found in such evaluations, therefore we sometimes reference various error types.
All these concepts are described in the Error Analysis page.
You need to check the following projects in order for these scripts to work:
-
error_analysis - merged into this project already!
-
neleval (official TAC-KBP client) - https://github.com/wikilinks/neleval.git
-
weblyzard_api - https://github.com/weblyzard/weblyzard_api
You will need to replace the neleval/neleval/analyze.py file with the version provided in this project.
Current clients from the error_analysis package have been modified lately, therefore if you notice any mistake please report them.
Please be aware that due to the eWRT and weblyzard_api dependencies you will have to set your user, pass, url in a bash_profile or bashrc file.
Also generally when using the Recognyze client (recognyze evals) you need to be run as sudo and make sure you run them with the appropiate bash_profile (e.g., when running as sudoer you will want to do source /home/user/.bash_profile or a similar command depending on your OS).
Example usage for verifying that the Recognyze client is working
sudo su
source /home/user/.bash_profile
cd git/error_analysis/src
python recognyzetest.py
Corpora and local paths were removed for Recognyze scripts in latest version.
If you want to jump directly to use it please use the following two examples:
#new version
Run classic evals
./run_test.sh reuters128 date_in_format_yyyymmdd tool
Run evals with overlap
./run_test_fixer.sh reuters128 date_in_format_yyyymmdd tool
Explained in the guideline.
(currently recognize only)
This workflow should be applied for all tools. Current tool dependency in the eval scripts should be removed (WIP).
Please stick to the following rules:
Whenever you discover errors please recompute the latest runs. This is experimental, but latest results should be as reliable as possible!
Announce any change to the workflows and keep documentation up-to-date!
In order to run this type of evaluations you need to put a folder with Recognyze evaluation results in one of the date folders of this project (e.g., 20170204/test).
Each dataset should be contained in its own folder (e.g., reuters128, kore50).
All scripts are available in the script folder.
The format for running classic TAC-KBP evals is the following:
FORMAT
./run_eval.sh [git_path] [dataset] [profile] [nel_archive_folder]
- [git_path] - is the path to the git folders
- [dataset] - is the dataset name (e.g., reuters128, kore50, rbb, etc.)
- [profile] - represents short profile name (e.g., wikipedia, dbpediasolo, advanced, etc.)
- [nel_archive_folder] - represents the path to the nel_archive folder where Recognyze results were copied
Examples:
REUTERS128
./run_eval.sh gitpath reuters128 advanced nel_archive/20170204/advanced
KORE50
./run_eval.sh gitpath kore50 advanced nel_archive/20170204/advanced
This creates 2 files as output:
- [dataset]-[profilename]-results (classic results)
- [dataset]-[profilename]-resultsbytype (results by type)
The basic error analysis returned by TAC is printed on the console.
All the gold standards are in the gold folder. Changes to neleval are in the neleval folder.
This fixes the fact that you're penalized twice for partial matches with correct links in TAC-KBP evals.
FORMAT
./run_fixer.sh [git_path] [dataset] [profile] [nel_archive_folder]
- [git_path] - is the path to the git folders
- [dataset] - is the dataset name (e.g., reuters128, kore50, rbb, etc.)
- [profile] - represents short profile name (e.g., wikipedia, dbpediasolo, advanced, etc.)
- [nel_archive_folder] - represents the path to the nel_archive folder where Recognyze results were copied
This creates diffs between 2 consecutive runs. The format is the following:
FORMAT
./run_eval_diff.sh gitpath reuters128 nel_archive/20170315/big nel_archive/20170317/big nel_archive/20170317/big-diff
- [git_path dataset] - path to git folder
- [dataset] - the name of the dataset
- [path_to_folder1] - path to run 1
- [path_to_folder2] - path to run2
- [path_to_output_folder] - the output folder
In order to run Error Analysis for Human Annotators we have two options:
- TAC-KBP workflow
- TAC-KBP workflow + Partial matches workflow
This is still [WIP]
It is possible (and recommended) that we combine different workflows or execute them in a certain order.
In any pipeline, the first script should always be: TAC-KBP eval.
This can be followed either by Partial-Matches/Duplicates or by Error Analysis Workflows (Error Analysis for Human Annotators, Automated Error Analysis).
This project contains the following folders:
- [date] folders - in format YYYYMMDD - which contain profile subfolders for the respective day
- gold - gold standards
- neleval - scripts that were changed from neleval in order to achieve certain goals
- observations - whenever needed observations or analysis files can be added separately (e.g., if annotated by humans, if fixes were done)
- scripts - automated scripts