Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Import the exported annotation data and keep the all original info #482

Closed
mingwei-liu opened this issue Dec 3, 2019 · 2 comments · Fixed by #1310
Closed
Labels
feature request feature request for doccano

Comments

@mingwei-liu
Copy link

Feature description

At present, we have a project labeled by multiple people at the same time, and then we export this labeled data from the website. Then we created a new project, and wanted to import the exported data into this new project, and then each sentence in it had the original annotation of each person, but we found that we couldn't do it.

Currently, it seems that when exporting, the labeling results of all people can be exported at the same time, but when importing, either a new empty project or each sentence has a unique label through preprocessing (by giving the “labels” field a unique value). I think import and export should work together seamlessly?
  This may provide users with support for merging and batch deletion of labeled data? Users can export the data to do some processing with the code, and then import it back to the annotation system again.

@Hironsan Hironsan added the feature request feature request for doccano label Dec 3, 2019
@AmirAktify
Copy link

I've been thinking about this feature myself. I might add a pull request at some point with the changes needed...

@cgill95
Copy link
Contributor

cgill95 commented Aug 3, 2020

I started working on this feature and implemented a solution that works for CSV import and export and has been tested with document classification only so far. It is also backwards compatible and allows for importing only "text,label"-files just like previously.

One problem I stumbled upon though is "meta stacking" as I like to call it. It happens probably in the way the meta data is exported. On reimporting the same document the meta data becomes longer and longer as you can see on the example
meta_stacking
The first two lines were exported and reimported and the meta data did not handle this very well.

The entire feature can be found here https://github.com/cgill95/doccano/tree/feature/unify_import_export and is relying on the implementation of #889 since this was needed to extract the correct label by name instead of searching the entire database for the labels ID.

Looking for some feedback or some way this could possibly be improved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request feature request for doccano
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants