wa

Web file archiver written in Python

wa (web archiver) uses wget to download resources and store them in a directory heirarchy according to the hostname and the path of the URL.

Installation

To install, download the source to a directory (e.g. ~/src/) or your choice and run the install script, which will make soft links to the executables in your path.

cd ~/src
./install.sh

or,

./install.sh /path/to/bin  # override default

Uninstall

cd ~/src
./uninstall.sh

or,

./uninstall.sh /path/to/bin  # override default

WAPATH

The web archive directory is given by the environment variable $WAPATH. If not defined, it defaults to ~/var/wa.

`.warc`

If it exists, Python will read the file ~/.warc where the following options can be defined

# sets a User Agent string
user_agent = User Agent String

The -m command line switch will mirror a request recursivily from the authority/path, so use carefully.

The directory heirarchy in the form:

<wapath>/<tld>/<private>/<subdomain>/<path>/<file>

Usage and Options

usage:
    wa [-p] [-m] [-d] [-t CSV,tag,list] <URL1> [URL2...]

options:
    -p    preserve (download dependencies)
    -m    mirror authority/path
    -d    override wapath
    -t    comma seperated list of tags without whitespace

Web archive example

Fetches and stores files in wapath (default ~/var/wa):

wa -t archive,test http://www.example.com

tree ~/var/wa
/home/musetronstar/var/wa
└── com
    └── example
        └── www
            └── index.html

A history file is written in the form:

<DATE>	<TYPE>	<URL>	<PATH>	<TAGS>

$ tail -n 1 ~/var/wa/.history 
2021-06-18-11:54:00	S	http://example.com	com/example/index.html	archive,test

History Archive Types

'S' single file download
'M' mirror

This software is in the public domain.

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
bin		bin
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
archiver.py		archiver.py
config.py		config.py
install.sh		install.sh
uninstall.sh		uninstall.sh
wa.py		wa.py
wget-save		wget-save

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

wa

Installation

Uninstall

WAPATH

`.warc`

Usage and Options

Web archive example

History Archive Types

About

Releases

Packages

Languages

License

musetronstar/wa

Folders and files

Latest commit

History

Repository files navigation

wa

Installation

Uninstall

WAPATH

.warc

Usage and Options

Web archive example

History Archive Types

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

`.warc`

Packages