-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Request to omit all statements on csarven.ca #2
Comments
oh wow, hey @csarven. apologies, i totally missed this. absolutely, will do. |
done! your site is now gone from the dataset. details. fwiw, here's what i see in http://csarven.ca/robots.txt right now. i do get that you may want to allow some crawlers but not others, like indie map.
|
Thanks! Pardon me if I'm looking at the wrong code, but perhaps it is also worthwhile to update the |
interesting idea. you mean, set it to Indie Map? i could! technically the user agent still is wget though, right? Indie Map is just the use case? maybe I'm splitting hairs. |
I would classify indie-map (the software of this repository) as the user-agent, as opposed to a particular library that's doing the fetching. Just as Firefox uses its own library to negotiate resources. But yes, generally it can be arbitrary. So, you can do something like |
also fwiw, i've only actually done this whole crawl once, and i have no plans to do it again right now, regularly or otherwise. still, good idea. thanks for the nudge. |
Hi. I'm the current owner of csarven.ca. I would appreciate it if the dataset, map, and anything else, can omit all statements pertaining to csarven.ca. If there is any information that's currently on csarven.ca that the crawler is discovering, please let me know, I can remove those. Same goes for any other place that I might have access to. Thanks.
The text was updated successfully, but these errors were encountered: