doc/backup_and_restore.txt

Backup and Restore
===================

Backup of juju's state is a critical feature, not only for juju users
but for use inside juju itself.  This is likewise the case for the
ability to restore previous backups.  This doc is intended as an
overview of both since changes in juju are prone to break both.

Backup
-------------------

Backing up juju state involves dumping the state database (currently
from mongo) and copying all files that are critical to juju's operation.
All the files are bundled up into an archive file.  Effectively the
archive represents a snapshot of juju state.

That snapshot is stored by the controller in such a way that only a
backup ID is needed for restore (no need to upload an archive).  Note
that if the controller is not available, restoring with just the ID is
not an option.  While that situation will need to be addressed in the
short-term, it should not require much additional effort.

We make reasonable efforts to ensure that the archive is consistent with
the snapshot.  This includes stopping the database for the length of
time it takes to dump it.  There is, however, room for improvement with
regard to ensuring the consistency of the archive.

First of all, running juju commands will fail while the DB is
unavailable (already running services should not be affected).  While
this period of time is rather short, we expect that it will grow over
time.  Furthermore, the larger a model's state, the larger the
impact of this downtime.  In the long term this makes it less than ideal
to run backup as often as it should be.

Secondly, state currently does not block for the backup process as a
whole.  This means that if we dump the DB first, it may be out of date
by the time we finish gathering the state-related files.  In practice
this isn't a big concern since we do not expect the files to change
during the interval that backup is running.  However, we do backup some
log files, so there is a small chance they will differ from when backups
started.

Restore
-------------------

Restore involves reviving the juju state in a new model by
reversing the steps taken by backup.  However, the process is a bit more
complicated than just gathering files and dumping the DB.

If no controller is present restore will do the following:

1. bootstrap a new node in safe mode (ProvisionerSafeMode reports
   whether the provisioner should not destroy machines it does not know
   about), 
2. stop juju-db,
3. stop jujud-machine,
4. load the backed-up db in place of the recently created one,
5. un-tar the fs files onto the root dir of the current machine,
6. run a set of bash scripts that replace the dns/instance names of the
   old machine with those of the new machine in the relevant config
   files and also in the db (if this step is not performed peergrouper
   will kick our machine out of the vote list and fill it with the old
   dead ones),
7. restart all services.

As noted above, restoring via an uploaded archive (rather than by using
an ID) will need to be addressed in the short term, since the existing
restore functionality works this way.  However, it shouldn't involve
more than bootstrapping a new model, uploading the archive to it,
and then requesting restore of that backup.  The design in this document
already accommodates doing this.

HA
-------------------

HA is a work in progress, for the moment we have a basic support which is an
extension of the regular backup functionality.
Read carefully before attempting backup/restore on an HA model.

In the case of HA, the backup process will backup files/db for machine 0,
support for "any working controller" is plans for the near future.
We assume, for now, that if you are running restore is because you have
lost all controller machines. Out of this restore you will get one
functioning controller that you can use to start you other state machines.
BEWARE, only run restore in the case where you no longer have working
controllers since otherwise this will take them offline and possibly
cripple your model.

Previous Implementation
-------------------

Backup and restore were both implemented as plugins (in cmd/plugins/) to
the juju CLI.  The plugins were essentially scripts we sent over SSH to
the state machine and ran there.  However, they were definitely distinct
pieces of software.


Implementation
===================

Key Points
-------------------

* Backups are created and then stored on the state machine.
* Each backup archive has an associated metadata document (stored in
  mongo).
* Each backup archive is stored relative to the state machine (currently
  env storage).
* Restore will have access to the state machine where the archive and
  metadata are stored.
* In the common case there is no need to upload or download a backup.
* The backups machinery has its own facade in state/apiserver/backups.
* The choice of mechanism for uploading and downloading backups has not
  been decided yet.
* The backups machinery is divided into 4 layers:
  - state-dependent functionality,
  - state-independent functionality,
  - the state API facade for backups,
  - the juju CLI sub-command for backups.
* The state-independent functionality can be broken down further:
  - a high-level backups interface/implementation,
  - low-level backup/restore functionality,
  - components of the backups machinery.
* Backups depend on the github.com/juju/utils/filestorage package.
* Backups have a special relationship with state (see note at
  beginning of state/backups.go).

Backup Archives
-------------------

Each backup archive is a gzipped tar file (.tar.gz).  It has the
following structure.

juju-backup/
    metadata.json - the backup metadata for the archive.
    root.tar      - the bundle of state-related files (exluding mongo).
    dump/         - all the files dumped from the DB (using mongodump).

At present we do not include any sort of manifest/index file in the
archive.

For more information, see:
  - state/backups/db/dump.go     - how the DB is dumped;
  - state/backups/files/files.go - which files are included in root.tar.

File Layout
--------------------

The layering of the backups machinery and divisions of the state-
independent functionality map almost directly to the following
structure in the filesystem.  The state API facade for backups is spread
between state/apiserver and state/api.

state/
    backups.go - state-dependent functionality (basically the
                 interaction with mongo and with env storage)
    backups/   - state-independent functionality
        backups.go - high-level/public backups interface/implementation
        create.go  - low-level implementation of backing up juju state
        restore.go - low-level implementation of restoring juju state
        archive/   - an abstraction of a backups archive
        db/        - all stuff related to external interaction with the
                     DB (internal interactions live in state/backups.go)
        files/     - all stuff related to files we back up and restore
        metadata/  - the backups metadata implementation
    apiserver/
        backups/ - the state API facade for backups
            backups.go - facade implementation (not including methods
                         for end-points)
            create.go  - implementation of the Create() end-point
            info.go    - (wraps state/backups/backups.go:Backups.Get)
            list.go
            remove.go
            restore.go - implementation of the Restore() end-point
    api/
        backups.go - the juju state API client implementation for the
                     backups facade
        params/
            backups.go - the backups-related API arg/result types
    cmd/
        juju/
            backups.go - the juju CLI sub-command implementation

Note that upload/download aren't accommodated in apiserver/backups/ yet.

Layers of Abstraction
--------------------

As noted above, the backups machinery is divided in 4 layers and the
state-independent portion into 3 parts.  Here is an example (using
"create") of how those layers interact.

* The juju CLI for backups wraps:
  - the backups facade's Create() method.
* The state API facade wraps:
  - the high-level backups implementation (state/backups/backups.go),
  - the state-backups interactions (state/backups.go).
* the backups implementation wraps:
  - a "filestorage" implementation (../utils/filestorage:FileStorage),
  - the low-level "create" implementation,
  - DB connection info (state/backups/db/info.go),
  - the backup's metadata.
* the "create" implementation makes use of:
  - the code in state/backups/{archive,db,files}.

Backups Interface
--------------------

Backups
  Add(meta Metadata, archive io.ReadCloser) error
  Create() (id string, err error)
  Get(id string) (Metadata, io.ReadCloser, error)
  List() ([]Metadata, error)
  Remove(id string) error
  Restore(id string) error

Note: Restore() makes use of Get().

State API Facade
--------------------

BackupsAPI
  Create(BackupsCreateArgs) BackupsCreateResult
  Info(BackupsInfoArgs) BackupsMetadataResult
  List() BackupsMetadataListResult
  Remove(BackupsRemoveArgs)
  Restore(BackupsRestoreArgs)

Again note that upload and download are not yet included here.

CLI sub-command
--------------------

The juju CLI sub-command for backups is called "backups".  Its own
sub-commands have basically a 1-to-1 equivalence with the API client
methods of the same respective names.  The essential sub-commands are
exposed via the following options:

- juju backups [--create] [--quiet] [<notes>]
- juju backups --info <ID>
- juju backups --list [--brief]
- juju backups --remove <ID>
- juju backups --restore <ID>

Note: further options may be appropriate for later addition
      (e.g. [<filename>] on --restore).

Other anticipated subcommands:

- juju backups --download <ID> [<filename>]
- juju backups --upload <filename>

Note that download and upload are only hypothetical, pending support in
the API facade.  When we add the ability to restore from an archive
(rather than an ID), --download and --upload (or a --filename option on
restore) will become essential.