Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Annotations Lost After Network failure for around 2 mins. #6675

Open
abhijith96 opened this issue Nov 18, 2024 · 3 comments
Open

Annotations Lost After Network failure for around 2 mins. #6675

abhijith96 opened this issue Nov 18, 2024 · 3 comments

Comments

@abhijith96
Copy link

Hi I have around 10000 images in an azure blob storage container. There are intermittent network connection issues and some times the the call to azure blob store fails. But during that time some of the annotations are lost.

To Reproduce
Steps to reproduce the behaviour:

Setup data import form azure blob store. Use a container prefix path.
Annotate a few files.
Simulate a network error (Turn off the internet connection so that the get request from azure blob fails)
Upto 50% of the annotations will be missing after refreshing.
My assumption was that all the annotations will be persisted in the sql lite database in my local storage. Losing half of the data that I annotated is causing great pain. During the first network connection error the task number I was annotating was 2003. Then after refresh the annotated task numbers where from 1 to 500 only. Rest of the tasks did not have any annotations.

OS: Macos
Label Studio Version 1.14.0.post0

@heidi-humansignal
Copy link
Collaborator

Hello,

This is when you have target storage setup? or it just source storage that causes these issue?

I would also suggest:

Switch to a PostgreSQL Database:
Label Studio uses SQLite by default, which might not be ideal for large projects with many tasks. SQLite can have limitations with concurrent access and large datasets, potentially leading to data integrity issues during network interruptions. Switching to a PostgreSQL database can provide better performance and reliability.
You can find instructions on how to set up Label Studio with PostgreSQL in our documentation:

https://docs.humansignal.com/guide/storedata.html#PostgreSQL-database

Thank you,
Abu

Comment by Abubakar Saad
Workflow Run

@abhijith96
Copy link
Author

Hi ,
Thanks for getting back to me.
I have an azure blob storage as source.
Here is a snippet from source storage settings:
"Status Completed
Tasks 10734 (0 new)
Last Sync November 15, 2024 ∙ 22:37:38"

I also have azure blob container in the same storage account as target storage.
"Status Completed
Annotations 0 (0 total)
Last Sync November 15, 2024 ∙ 22:33:17"
"
I have not synced it yet after starting annotations.
I also checked the sqlite tables for annotation information. It seems that some of the annotations that are missing have been cleared from the sql lite database. And there are some duplicate entries for some annotated files. All this happened after the network disconnection.
I will try Postgres SQL if it helps. But I am not sure whether switchting to Postgres SQL will solve the issue.

@heidi-humansignal
Copy link
Collaborator

Hello,

The problem may be related to using SQLite as the backend database. SQLite is suitable for small projects and testing purposes but can encounter issues with larger datasets and network instability. Since you're working with over 10,000 tasks, switching to a more robust database like PostgreSQL is highly recommended.

Let us know how that goes.

Thank you,
Abu

Comment by Abubakar Saad
Workflow Run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants