Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REQ: auto-merge if data is identical or if only specific fields are mismatched #112

Open
huachuman opened this issue Nov 11, 2024 · 8 comments

Comments

@huachuman
Copy link

For example, if items have identical metadata, auto-merge. Or merge all items whose only mismatched fields that are different are the date, URL, accessed date, language, rights, etc. (to be set in settings).

@huachuman huachuman changed the title REQ: auto-merge only if specific fields are mismatched REQ: auto-merge if data is identical or if only specific fields are mismatched Nov 11, 2024
@ChenglongMa
Copy link
Owner

Hi @huachuman,

Thanks for your suggestion!

To better understand your idea, please advise me on how to process the duplicates in the following scenario:

For example,

  1. In the setting window, I set mismatched fields: URL, language;
  2. Items A and B have different field values: authors, URL, issues;
  3. I set keep Item A in the dialogue;

Then, what are the expected results?

Thank you so much!

Best,
Chenglong

@huachuman
Copy link
Author

huachuman commented Nov 12, 2024

edit: this is a bit confusing. to be more specific, the use case is as follows:

In a library, there are several duplicate items that are almost identical. I want to decide which fields I don't care about. For those items that are identical other than the fields I don't care about, the choice is arbitrary. Whether you choose item a or b makes no difference. make sense?

I'm not sure about the exact implementation in the dialog box as my original idea is about a bulk/auto-merge with no dialog. The idea is to avoid having to do this manually so I'm not sure the dialog would come into play.

@ChenglongMa
Copy link
Owner

ChenglongMa commented Nov 12, 2024

Ah, I got you. If "important" differences (like author) exist in certain key fields, you prefer manually handling duplicates. However, if the differences are minor and not significant (like URL), these duplicates can be automatically merged without much concern.

Did I understand correctly?

Thanks!

@huachuman
Copy link
Author

I think so, yes. My apologies I have edited my post many times to try to be clearer/work this out for myself.

@ChenglongMa
Copy link
Owner

No worries! Thanks for your detailed explanation. It's my first time to see a "live" comment 😂.

I will try to implement this idea in the next version.

Thanks!

@huachuman
Copy link
Author

Yes I saw your edits live too! Brilliant, github is.

That is absolutely amazing to hear. I am not an advanced developer but if you need help along the way I will do what I can.

@ChenglongMa
Copy link
Owner

Sure, @huachuman! Thank you so much for your help. I'll let you know if I need your further advice :) Really appreciate it.

@fflamerie
Copy link

Hi,

First of all, thank you for developing Zoplicate!

I'd like to submit the following scenario, which I think meets the same need.

As part of a systematic review, I need to deduplicate very large sets of records - several thousand. I'd like to divide them into 3 subsets. These 3 subsets could be processed in a different order. For example I might want to have a quick look at subset 3 first, then analyse and merge records in subset 2, then return to subset 3 to merge records, etc.

The 3 subsets would be as follows:

  1. Articles with the same DOI and the same title
  2. Articles with the same DOI but different titles
  3. Articles with different DOIs but the same title

I'll probably want to merge all the articles in subset 1 after a quick visual check. On the other hand, I will pay close attention to the articles in subsets 2 and 3, and a number of them should probably be marked as non-duplicates.

Do you think a workflow like this would be possible with Zoplicate?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants