Skip to content

Commit b746270

Browse files
authored
Update manubot for more flexible citation processing
merges manubot/rootstock#342 Includes major updates to how citations are processed by the pandoc-manubot-cite filter. See the following commit messages for more information: - manubot/manubot@7055bcc - manubot/manubot@47b03e0 Removes requirement to prefix certain citekeys with raw: or tag: Removes support for deprecated `content/citation-tags.tsv`. Switches from tag to alias terminology for citation aliases. closes manubot/manubot#120 Removes pandas and jsonref dependencies, which are no longer needed.
1 parent 8b9b5ce commit b746270

File tree

3 files changed

+50
-52
lines changed

3 files changed

+50
-52
lines changed

USAGE.md

Lines changed: 42 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -76,51 +76,54 @@ We recommend always specifying the width of SVG images (even if just `width="100
7676

7777
### Citations
7878

79-
Manubot supports Pandoc [citations](https://pandoc.org/MANUAL.html#citations).
79+
Manubot supports [Pandoc citations](https://pandoc.org/MANUAL.html#citations), but with added support for citing persistent identifiers directly.
8080
Citations are processed in 3 stages:
8181

8282
1. Pandoc parses the input Markdown to locate citation keys.
83-
2. The [`pandoc-manubot-cite`](https://github.com/manubot/manubot#pandoc-filter) filter automatically retreives the bibliographic metadata for citation keys.
84-
3. The [`pandoc-citeproc`](https://github.com/jgm/pandoc-citeproc/blob/master/man/pandoc-citeproc.1.md) filter renders in-text citations and generates styled references.
83+
2. The [`pandoc-manubot-cite` filter](https://github.com/manubot/manubot#pandoc-filter) automatically retrieves the bibliographic metadata for citation keys.
84+
3. The [`pandoc-citeproc` filter](https://github.com/jgm/pandoc-citeproc/blob/master/man/pandoc-citeproc.1.md) renders in-text citations and generates styled references.
8585

86-
When using Manubot, citation keys should be formatted like `@source:identifier`,
87-
where `source` is one of the options described below.
86+
When citing persistent identifiers, citation keys should be formatted like `@prefix:accession`,
87+
where `prefix` is one of the options described below.
8888
When choosing which source to use for a citation, we recommend the following order:
8989

9090
1. DOI (Digital Object Identifier), cite like `@doi:10.15363/thinklab.4`.
9191
Shortened versions of DOIs can be created at [shortdoi.org](http://shortdoi.org/).
9292
shortDOIs begin with `10/` rather than `10.` and can also be cited.
9393
For example, Manubot will expand `@doi:10/993` to the DOI above.
9494
We suggest using shortDOIs to cite DOIs containing forbidden characters, such as `(` or `)`.
95-
2. PubMed Central ID, cite like `@pmcid:PMC4497619`.
96-
3. PubMed ID, cite like `@pmid:26158728`.
95+
2. PubMed Central ID, cite like `@pmc:PMC4497619`.
96+
3. PubMed ID, cite like `@pubmed:26158728`.
9797
4. _arXiv_ ID, cite like `@arxiv:1508.06576v2`.
9898
5. ISBN (International Standard Book Number), cite like `@isbn:9781339919881`.
99-
6. URL / webpage, cite like `@url:https://nyti.ms/1QUgAt1`.
99+
6. URL / webpage, cite like `@https://nyti.ms/1QUgAt1`.
100100
URL citations can be helpful if the above methods return incorrect metadata.
101-
For example, `@doi:10.1038/ng.3834` [incorrectly handles](https://github.com/manubot/manubot/issues/158) the consortium name resulting in a blank author, while `@url:https://doi.org/10.1038/ng.3834` succeeds.
102-
Similarly, `@url:https://doi.org/10.1101/142760` is a [workaround](https://github.com/manubot/manubot/issues/16) to set the journal name of bioRxiv preprints to _bioRxiv_.
101+
For example, `@doi:10.1038/ng.3834` [incorrectly handles](https://github.com/manubot/manubot/issues/158) the consortium name resulting in a blank author, while `@https://doi.org/10.1038/ng.3834` succeeds.
102+
Similarly, `@https://doi.org/10.1101/142760` is a [workaround](https://github.com/manubot/manubot/issues/16) to set the journal name of bioRxiv preprints to _bioRxiv_.
103103
7. Wikidata Items, cite like `@wikidata:Q50051684`.
104-
Note that anyone can edit or add records on [Wikidata](https://www.wikidata.org), so users are encouraged to contribute metadata for hard-to-cite works to Wikidata as an alternative to using a `raw` citation.
105-
8. For references that do not have any of the persistent identifiers above, use a raw citation like `@raw:old-manuscript`.
106-
Metadata for raw citations must be provided manually.
104+
Note that anyone can edit or add records on [Wikidata](https://www.wikidata.org), so users are encouraged to contribute metadata for hard-to-cite works to Wikidata.
105+
8. Any other compact identifier supported by <https://identifiers.org>.
106+
Manubot uses the Identifiers.org Resolution Service to support [hundreds](https://github.com/manubot/manubot/blob/7055bcc6524fdf1ef97d838cf0158973e2061595/manubot/cite/handlers.py#L122-L831 "Actual prefix support is determined by this manubot source code.") of [prefixes](https://registry.identifiers.org/registry "Identifiers.org prefix search").
107+
For example, citing `@clinicaltrials:NCT04280705` will produce the same bibliographic metadata as `@https://identifiers.org/clinicaltrials:NCT04280705` or `@https://clinicaltrials.gov/ct2/show/NCT04280705`.
108+
9. For references that do not have any of the above persistent identifiers, the citation key does not need to include a prefix.
109+
Citing `@old-manuscript` will work, but only if reference metadata is [provided manually](#reference-metadata).
107110

108111
Cite multiple items at once like:
109112

110113
```md
111-
Here is a sentence with several citations [@doi:10.15363/thinklab.4; @pmid:26158728; @arxiv:1508.06576; @isbn:9780394603988].
114+
Here is a sentence with several citations [@doi:10.15363/thinklab.4; @pubmed:26158728; @arxiv:1508.06576; @isbn:9780394603988].
112115
```
113116

114117
Note that multiple citations must be semicolon separated.
115118
Be careful not to cite the same study using identifiers from multiple sources.
116-
For example, the following citations all refer to the same study, but will be treated as separate references: `[@doi:10.7717/peerj.705; @pmcid:PMC4304851; @pmid:25648772]`.
119+
For example, the following citations all refer to the same study, but will be treated as separate references: `[@doi:10.7717/peerj.705; @pmc:PMC4304851; @pubmed:25648772]`.
117120

118121
Citation keys must adhere to the syntax described in the [Pandoc manual](https://pandoc.org/MANUAL.html#citations):
119122

120123
> The citation key must begin with a letter, digit, or `_`, and may contain alphanumerics, `_`, and internal punctuation characters (`:.#$%&-+?<>~/`).
121124
122125
To evaluate whether a citation key fully matches this syntax, try [this online regex](https://regex101.com/r/mXZyY2/latest).
123-
If the citation key is not valid, use the [citation tag](#citation-tag) workaround below.
126+
If the citation key is not valid, use the [citation aliases](#citation-aliases) workaround below.
124127
This is required for citation keys that contain forbidden characters such as `;` or `=` or end with a non-alphanumeric character such as `/`.
125128
<!-- See [jgm/pandoc#6026](https://github.com/jgm/pandoc/issues/6026) for progress on a more flexible Markdown citation key syntax. -->
126129

@@ -134,66 +137,63 @@ pandoc:
134137
manubot-fail-on-errors: True
135138
```
136139
137-
#### Citation tags
140+
#### Citation aliases
138141
139-
The system also supports citation tags, which map from one citation key (an alias) to another.
140-
Tags are recommended for the following applications:
142+
The system also supports citation aliases, which map from one citation key (the "alias" or "tag") to another.
143+
Aliases are recommended for the following applications:
141144
142-
1. A citation's identifier contains forbidden characters, you must use a tag.
145+
1. A citation key contains forbidden characters.
143146
2. A single reference is cited many times.
144-
Therefore, it might make sense to define a tag, so if the citation updates (e.g. a newer version becomes available), only a single change is required.
147+
Therefore, it might make sense to define an alias, so if the citation updates (e.g. a newer version becomes available), only a single change is required.
145148
146-
Tags can be defined using Markdown's [link reference syntax](https://spec.commonmark.org/0.29/#link-reference-definitions) as follows:
149+
Aliases can be defined using Markdown's [link reference syntax](https://spec.commonmark.org/0.29/#link-reference-definitions) as follows:
147150
148151
```markdown
149-
Citing a URL containing a `?` character [@tag:my-url].
150-
Citing a DOI containing parentheses [@doi:my-doi].
152+
Citing a URL containing a `?` character [@my-url].
153+
Citing a DOI containing parentheses [@my-doi].
151154

152-
[@tag:my-url]: url:https://openreview.net/forum?id=HkwoSDPgg
153-
[@tag:my-doi]: doi:10.1016/S0022-2836(05)80360-2
155+
[@my-url]: https://openreview.net/forum?id=HkwoSDPgg
156+
[@my-doi]: doi:10.1016/S0022-2836(05)80360-2
154157
```
155158
156159
This syntax is also used by [`pandoc-url2cite`](https://github.com/phiresky/pandoc-url2cite).
157160
Make sure to place these link reference definitions in their own paragraphs.
158161
These paragraphs can be in any of the content Markdown files.
159162

160-
Another method for defining tags is to define `pandoc.citekey-aliases` in `metadata.yaml`:
163+
Another method for defining aliases is to define `pandoc.citekey-aliases` in `metadata.yaml`:
161164

162165
```yaml
163166
pandoc:
164167
citekey-aliases:
165-
tag:my-url: url:https://openreview.net/forum?id=HkwoSDPgg
166-
tag:my-doi: doi:10.1016/S0022-2836(05)80360-2
168+
my-url: https://openreview.net/forum?id=HkwoSDPgg
169+
my-doi: doi:10.1016/S0022-2836(05)80360-2
167170
```
168171

169-
For backwards compatibility, tags can also be defined in `content/citation-tags.tsv`.
170-
If `citation-tags.tsv` defines the tag `study-x`, then this study can be cited like `@tag:study-x`.
171-
This method is deprecated.
172-
173172
## Reference metadata
174173

175174
Manubot stores the bibliographic details for references (the set of all cited works) as CSL JSON ([Citation Style Language Items](http://citeproc-js.readthedocs.io/en/latest/csl-json/markup.html#csl-json-items)).
176-
For all citation sources besides `raw`, Manubot automatically generates CSL JSON.
175+
Manubot automatically generates CSL JSON for most persistent identifiers (as described in [Citations](#citations) above).
177176
In some cases, automatic metadata retrieval fails or provides incorrect or incomplete information.
178-
Errors are most common for `url` references.
177+
Errors are most common for references generated from scraping HTML metadata from websites.
178+
This occurs most frequently for `https`/`http`/`url` citations as well as identifiers.org prefixes without explicit support listed above.
179179
Therefore, Manubot supports user-provided metadata, which we refer to as "manual references".
180180
When a manual reference is provided, Manubot uses the supplied metadata and does not attempt to generate it.
181181

182182
Manubot searches the `content` directory for files that match the glob pattern `manual-references*.*` and expects that these files contain manual references.
183183
[`content/manual-references.json`](content/manual-references.json) is the default file to specify custom CSL JSON metadata.
184184
Manual references are matched to citations using their "id" field.
185-
For example, to manually specify the metadata for the citation `@url:https://github.com/manubot/rootstock`, add a CSL JSON Item to `manual-references.json` that contains the following excerpt:
185+
For example, to manually specify the metadata for the citation `@https://github.com/manubot/rootstock`, add a CSL JSON Item to `manual-references.json` that contains the following excerpt:
186186

187187
```json
188-
"id": "url:https://github.com/manubot/rootstock",
188+
"id": "https://github.com/manubot/rootstock",
189189
```
190190

191-
The metadata for `raw` citations must be provided in a manual reference file (e.g. `manual-references.json`) or an error will occur.
192-
For example, to cite `@raw:private-message` in a manuscript, a corresponding CSL JSON Item is required, such as:
191+
The metadata for unhandled citations — any citation key that is a not a supported persistent ID — must be provided in a manual reference file (e.g. `manual-references.json`) or an error will occur.
192+
For example, to cite `@private-message` in a manuscript, a corresponding CSL JSON Item is required, such as:
193193

194194
```json
195195
{
196-
"id": "raw:private-message",
196+
"id": "private-message",
197197
"type": "personal_communication",
198198
"title": "Personal communication with Doctor X"
199199
}
@@ -204,10 +204,10 @@ For guidance on what CSL JSON should be like for different document types, refer
204204

205205
Manubot offers some support for other bibliographic metadata formats besides CSL JSON, by delegating conversion to the `pandoc-citeproc --bib2json` [utility](https://github.com/jgm/pandoc-citeproc/blob/master/man/pandoc-citeproc.1.md#convert-mode).
206206
Formats are inferred from filename extensions.
207-
So, for example, to provide metadata for `@url:https://github.com/manubot/rootstock` in BibTeX format, create the file `content/manual-references.bib` and create an item whose definition starts with the excerpt:
207+
So, for example, to provide metadata for `@https://github.com/manubot/rootstock` in BibTeX format, create the file `content/manual-references.bib` and create an item whose definition starts with the excerpt:
208208

209209
```latex
210-
@misc{url:https://github.com/manubot/rootstock,
210+
@misc{https://github.com/manubot/rootstock,
211211
```
212212

213213
Processed reference metadata in CSL JSON format, either generated by Manubot or specified via manual references, is exported to `references.json`.

build/environment.yml

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,6 @@ dependencies:
88
- ghp-import=0.5.5
99
- jinja2=2.11.2
1010
- jsonschema=3.2.0
11-
- pandas=1.0.3
1211
- pandoc=2.9.2
1312
- pango=1.40.14
1413
- pip=20.0
@@ -20,8 +19,7 @@ dependencies:
2019
- yamllint=1.21.0
2120
- pip:
2221
- errorhandler==2.0.1
23-
- git+https://github.com/manubot/manubot@890b76891f139a26d36cd9a4aa652f7e019501f8
24-
- jsonref==0.2
22+
- git+https://github.com/manubot/manubot@31968197d1ccd96a46bf092cdba4b575764bb954
2523
- opentimestamps-client==0.7.0
2624
- opentimestamps==0.4.1
2725
- pandoc-eqnos==2.1.1

content/02.delete-me.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -95,24 +95,24 @@ Bare URL link: <https://manubot.org>
9595

9696
Citation by DOI [@doi:10.7554/eLife.32822].
9797

98-
Citation by PubMed Central ID [@pmcid:PMC6103790].
98+
Citation by PubMed Central ID [@pmc:PMC6103790].
9999

100-
Citation by PubMed ID [@pmid:30718888].
100+
Citation by PubMed ID [@pubmed:30718888].
101101

102102
Citation by Wikidata ID [@wikidata:Q56458321].
103103

104104
Citation by ISBN [@isbn:9780262517638].
105105

106-
Citation by URL [@url:https://greenelab.github.io/meta-review/].
106+
Citation by URL [@https://greenelab.github.io/meta-review/].
107107

108-
Citation by tag [@tag:deep-review].
108+
Citation by alias [@deep-review].
109109

110-
Multiple citations can be put inside the same set of brackets [@doi:10.7554/eLife.32822; @tag:deep-review; @isbn:9780262517638].
111-
Manubot plugins provide easier, more convenient visualization of and navigation between citations [@doi:10.1371/journal.pcbi.1007128; @pmid:30718888; @pmcid:PMC6103790; @tag:deep-review].
110+
Multiple citations can be put inside the same set of brackets [@doi:10.7554/eLife.32822; @deep-review; @isbn:9780262517638].
111+
Manubot plugins provide easier, more convenient visualization of and navigation between citations [@doi:10.1371/journal.pcbi.1007128; @pubmed:30718888; @pmc:PMC6103790; @deep-review].
112112

113113
Citation tags (i.e. aliases) can be defined in their own paragraphs using Markdown's reference link syntax:
114114

115-
[@tag:deep-review]: doi:10.1098/rsif.2017.0387
115+
[@deep-review]: doi:10.1098/rsif.2017.0387
116116

117117
## Referencing figures, tables, equations
118118

0 commit comments

Comments
 (0)