-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CSL metadata incomplete when there's a consortium in the author list #158
Comments
curl --silent --location \
--header "Accept: application/vnd.citationstyles.csl+json" \
https://doi.org/10.1038/ng.3834 \
| python -m json.tool You'll notice the following: "author": [
{
"given": "Colby",
"family": "Chiang",
"sequence": "first",
"affiliation": []
},
{
"name": "GTEx Consortium",
"sequence": "first",
"affiliation": []
}, The mistake is that
Nice workaround... it ends up using a Zotero translator that is likely designed specifically for the Nature website. |
I'm not sure I understand. Just to clarify, in practice all the references containing a Consortium didn't format well in the final output of the manuscript (when using I didn't know if this issie fit better here or in the rootstock repo. This was for this manuscript. |
https://jmonlong.github.io/manu-vgsv/ is looking great!
This is correct! The code that actually downloads citation metadata lives here.
Yes, the issue (as I understand it) is that all of these DOIs are registered with Crossref. We therefore retrieve the metadata from Crossref using "DOI Content Negotation". We request the metadata in CSL JSON format, but Crossref returns it in a pseudo CSL JSON format with lot's of fields that don't meet the CSL JSON specification (past examples at CrossRef/rest-api-doc#222 (comment)). In this case, Crossref should put the consortium name in a field called "literal" rather than "name". As far as solutions go:
Here is some prototype code regarding 3 showing that it would work for this case (not perfectly, but enough): >>> import manubot.cite.zotero
>>> zotero_data = manubot.cite.zotero.search_query('doi:10.1038/ng.3834')
>>> csl_json_data = manubot.cite.zotero.export_as_csl(zotero_data)
>>> csl_json_data[0]['author'][:2]
[{'family': 'GTEx Consortium', 'given': ''}, {'family': 'Chiang', 'given': 'Colby'}] |
@jmonlong I opened a pull request at #206 that should fix this issue. It switches the service we use to convert DOI metadata to CSL JSON. The new service does better at properly handling the consortium author name. See the new raw data in https://data.crosscite.org/application/vnd.citationstyles.csl+json/10.1038/ng.3834. The "GTEx Consortium" will now show up properly in the reference list, although it won't be in the right place. However, this is due to improperly submitted metadata.
I found those DOIs and commented on them at crosscite/content-negotiation#92 (comment). The consortium name in https://doi.org/10.1101/664623 is properly parsed now, but https://doi.org/10.1186/s13742-015-0103-4 still has trouble using DOI content negotiation. |
This unfortunately still seems to be an issue for me with certain DOIs with consortium authors. The resulting references don't have any authors since the consortium is the only author. @dhimmel could we reopen this or is there a configuration option I'm missing?
Also shows up for this DOI:
|
@twrightsman what author lists are you expecting to see for these two examples? The single consortium author matches what is shown at the publisher's site: For the second article, PubMed has a different author list that you can obtain with |
@agitter When manubot processes the JSON metadata the "name" key under authors gets dropped because it is invalid in the CSL schema, if I am understanding the initial conversation in the issue correctly. See this section of the
Even though the Brachypodium genome paper raw JSON has the author:
This ends up showing as a blank author list in my rendered manubot manuscript. |
Thanks. I understand the issue now. |
I believe this is related to the changes in #319. I'll note that for
@dhimmel should we reconsider this possible solution you listed earlier:
|
I'm following up on this issue with a workaround that has been helpful for some of my citations involving consortium authors that have broken metadata. Using Zotero to fetch the metadata from the PubMed URL works in many of my examples. For the paper above:
This works even though the PubMed citation does not (the literal author is still missing):
|
We noticed that some references didn't format well and that it always happened when there was a consortium in the author list. Also, the consortium name was missing.
There is always the solution of fixing them manually. After doing that I realized that using URLs (
@url:https://doi.org/DOI
) works too, like for the bioRxiv problem (#16).Example:
No consortium and a
{}
as second author that affects the final output.Looks good.
The text was updated successfully, but these errors were encountered: