-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
External links support #5
Comments
Thinking more about this, it does not make sense to check for broken external links when a new commit has been created on master. If an external reference becomes invalid, that is not logically associated with a code change so doing it on code change (and associating results with a code change) does not make for sensible UX. GitHub actions could still be used to run hyperlink periodically, but instead of having a binary result (master broken/not broken) I think it would make more sense to create GitHub issues and assign them to the people who introduced the broken link. Similar to how Sentry tracks prod errors (Sentry + suspect commits could probably be abused to avoid building a new frontend) External sites' URLs are too much of a moving target so nobody wants to check them in CI, so they end up not checking them at all. Or they use some spider service that does it for them, but those are either expensive or don't really have a issue tracking workflow of some sort. |
How would you feel about having an option to simply count and print a number of external links in the final summary, something like "Found 371 external links". I'm happy to put together a PR if this would be useful. |
@mwcz that's probably useful, yeah. feel free to give it a try |
Another idea: Allow to specify URL remaps on the command line, like Specific use-case here: we publish our docs as a static site to https://docs.tigerbeetle.com. From those docs, we occasionally want to refer to source files in the github repo. It would be great to check those links in HTML, but there's no need to go and actually curl github for these, we can check links against local files in the neighbouring dir. P.S. Thank for building hyperlink, such a no-nosense piece of software, love it! |
I think if the API looks like that, hyperlink will have to make assumptions about which paths are valid, that are incompatible with how static sites are typically served. For example, linking to a directory One could fix this particular example by adding a "assume directory listings" option, but that's just one example... particularly around anchors, github's way of serving up a directory tree just differs too much from a static site host ( Right now I believe the sitemap thing is the best idea, except make it not XML, but a simple textfile:
then one can add/remove the downside of course is that all URLs have to be enumerated upfront, which hurts if there are very few external links to check. another option is this:
but shelling out for every link is unacceptable in all other cases except the one where there's really very few external links to check. thoughts? |
🤔 maybe flip this around? Have hyperlink produce the list of external urls as a .txt on stdout to allow the user to pipe that into a custom script with arbitrary logic? And also maybe a dual subcomand to take a list of urls as an input, and check them against the directory? That way, the original issue with two cross-linking static sites could be solved by:
|
yeah I think that's better
On Fri, Nov 3, 2023, at 09:51, Alex Kladov wrote:
🤔 maybe flip this around? Have hyperlink produce the list of external urls as a .txt on stdout to allow the user to pipe that into a custom script with arbitrary logic?
And also maybe a dual subcomand to take a list of urls as an input, and check them against the directory?
That way, the original issue with two cross-linking static sites could be solved by:
`hyperlink ./site-a -print-urls | hyperlink ./site-b -read-urls
`
…
—
Reply to this email directly, view it on GitHub <#5 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAGMPRJC5E7WVGHLNQAX4F3YCSWAPAVCNFSM4SUMLWGKU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCNZZGIYDMNRRGA2A>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Now I see no cheap way to validate changes to one site without caching the SSG output somewhere. If I want to run CI to validate a newly added external link to site A, I need to:
Compare to sitemap-style:
Perhaps both approaches need to be implemented. |
version 0.1.32 is out which contains a new experimental |
External link support was not built because fetching remote content is slow and flaky. Ideas:
Why do it this way? Because our actual usecase is only for checking links from docs.sentry.io to sentry.io. Both are static sites we control, so we could make sure everything has sitemaps and still get away with very fast builds. sentry.io already has a sitemap
However, for a general-purpose external links checker we probably really need to support real HTTP + build a local cache file, maybe. Also for anchor-checking sitemap.xml doesn't work.
The text was updated successfully, but these errors were encountered: