Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a script to extract transgenes (known and unknown) from papers #1

Open
valearna opened this issue Oct 24, 2024 · 0 comments
Open
Assignees

Comments

@valearna
Copy link
Collaborator

valearna commented Oct 24, 2024

We need a new pipeline to extract transgenes from all WB papers. The pipeline needs to run periodically to extract transgenes from newly added papers.

Detailed info
== For known transgenes ==

  • Read list of known transgenes at WB from trp_publicname
  • match names in papers
  • store matching WBPaperIDs into trp_paper

== For unknown transgenes ==

  • match new transgenes using this regular expression: \b([a-z]{1,3}(Is|In|Si|Ex)[0-9]+([a-z]{1}?))\b/
  • filter out known transgenes from trp_publicname
  • create entries for matches in trp_name (contains WBTransgene IDs, generate +1 objects) and trp_publicname
  • create a releated entry for each added transgene in trp_curator with WBPerson4793 (Arun)

Note for all edits to the DB: update the history tables

@valearna valearna self-assigned this Oct 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant