-
Notifications
You must be signed in to change notification settings - Fork 181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Creating a diff between two manuscript versions #54
Comments
For the Project Rephetio manuscript, now published in eLife, I had to create diffs to show changes in response to reviewers. I ended up enabling DOCX export (dhimmel/rephetio-manuscript@b7b8bd3), and then using Microsoft Word to compare the documents. While manual and thus sub-optimal, this worked. We may want to consider setting |
That's good to know you were able to satisfy the journal. Did you not encounter the image embedding problems I did in #40? I'm okay defaulting to |
Well we used PNG not SVG images, so they exported to DOCX fine. But in this case the export failure would have been a feature, since the journal required images be uploaded separately! |
Should we resurrect #40 to merge? |
@vsmalladi I'm still leaning against any heavyweight SVG export solution as these are things that really make the most sense to fix upstream. We don't want to place ourselves in a position where we have to maintain this heavy machinery. |
@dhimmel that makes sense. |
Here is another approach used with a GitHub-based project, the COP21 project: https://github.com/okfn/cop21/blob/gh-pages/scripts/diff.sh Example output Still a bit manual for specific document versions but could likely be automated more. |
Thanks, the output looks great. |
Thanks @rgieseke. To summarize, this method pipes the output of |
DraftableI came across the Draftable webapp to create diffs for PDF and DOCX files. Their example showed that it worked well for diffing two arXiv PDFs. They have an API and python package for using the API. To use the API, the free tier is limited to 200 requests per month. API calls return a URL for viewing the diff. We could potentially use this tool for creating diffs. The URLs could even be embedded into the CI logs, so you could see the changes a PR would create to the PDF output. Obviously, the whole registration / API key / quota / third-party dependency thing kind of sucks. There may be an open source PDF diff solution that works as well like https://vslavik.github.io/diff-pdf/. Or even create a probot to comment on GitHub PRs with the PDF diff uploaded as an attachment. |
I wrote a little notebook that will highlight the differences between two manuscript versions in the HTML and PDF. It is not pretty, but in my limited testing, it seems to do any okay job and I personally like it better than using the external tools listed above. The notebook is here, with the limitations listed at the bottom. For example, I compared manuscript versions Here is the PDF as of Here is the PDF as of |
@slochower nice approach. I agree that using HTML tags to color portions of the text in the source markdown document may be the right solution. I don't think it's inelegant to put HTML in the markdown (we already do that for manuscripts in places). However, as you note, tables and figures and some other more complex constructs might be problematic. Also I find the whole line highlighting problematic. It would be much better to get behavior along the lines of I think your approach of using HTML to demarcate markdown source based on git diff output is a promising direction. Were we to refine it a bit more, I think it could be appropriate for Manubot. |
I agree. The issue is getting either vanilla It would probably be pretty fragile, but I suppose we could simply parse the ANSI codes that do the coloring in the output of |
|
I just learned about |
Adding to the thread that Google Docs now has a feature to compare two documents (in Tools -> Compare documents). So we can build the DOCX output for two versions of the manuscript, upload them to Google Drive, convert them to Google Docs and use this feature. Just another option like the LibreOffice compare documents. Still manual but some people might prefer Google Docs. The end result is a bit different too so maybe worth trying out if LibreOffice doesn't work properly. In our experience, going through Google Doc helped with the tables. It was worth it to even upload the "diff" DOCX produced by LibreOffice, just to get the tables right. (Maybe it has to do with my version of LibreOffice on Ubuntu.) Also, Google Docs doesn't seem to be able to print/export the track-changes in PDF except when printing from Chrome. |
To add to the record here, here is a project doing diffs for JATS XML: |
Oftentimes, it's important (and required in scholarly publishing) to show the changes between two versions of a manuscript. It would be ideal if Manubot users could "track changes" between two manuscript versions.
Pandoc doesn't have builtin support for diffs: jgm/pandoc#2374. Other options would be:
manuscript.md
as a text file (perhaps usingdiff
,prettydiff
, orrich-text-diff
)The text was updated successfully, but these errors were encountered: