forked from juju/juju
-
Notifications
You must be signed in to change notification settings - Fork 0
/
backup_and_restore.txt
256 lines (210 loc) · 9.92 KB
/
backup_and_restore.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
Backup and Restore
===================
Backup of juju's state is a critical feature, not only for juju users
but for use inside juju itself. This is likewise the case for the
ability to restore previous backups. This doc is intended as an
overview of both since changes in juju are prone to break both.
Backup
-------------------
Backing up juju state involves dumping the state database (currently
from mongo) and copying all files that are critical to juju's operation.
All the files are bundled up into an archive file. Effectively the
archive represents a snapshot of juju state.
That snapshot is stored by the controller in such a way that only a
backup ID is needed for restore (no need to upload an archive). Note
that if the controller is not available, restoring with just the ID is
not an option. While that situation will need to be addressed in the
short-term, it should not require much additional effort.
We make reasonable efforts to ensure that the archive is consistent with
the snapshot. This includes stopping the database for the length of
time it takes to dump it. There is, however, room for improvement with
regard to ensuring the consistency of the archive.
First of all, running juju commands will fail while the DB is
unavailable (already running services should not be affected). While
this period of time is rather short, we expect that it will grow over
time. Furthermore, the larger a model's state, the larger the
impact of this downtime. In the long term this makes it less than ideal
to run backup as often as it should be.
Secondly, state currently does not block for the backup process as a
whole. This means that if we dump the DB first, it may be out of date
by the time we finish gathering the state-related files. In practice
this isn't a big concern since we do not expect the files to change
during the interval that backup is running. However, we do backup some
log files, so there is a small chance they will differ from when backups
started.
Restore
-------------------
Restore involves reviving the juju state in a new model by
reversing the steps taken by backup. However, the process is a bit more
complicated than just gathering files and dumping the DB.
If no controller is present restore will do the following:
1. bootstrap a new node in safe mode (ProvisionerSafeMode reports
whether the provisioner should not destroy machines it does not know
about),
2. stop juju-db,
3. stop jujud-machine,
4. load the backed-up db in place of the recently created one,
5. un-tar the fs files onto the root dir of the current machine,
6. run a set of bash scripts that replace the dns/instance names of the
old machine with those of the new machine in the relevant config
files and also in the db (if this step is not performed peergrouper
will kick our machine out of the vote list and fill it with the old
dead ones),
7. restart all services.
As noted above, restoring via an uploaded archive (rather than by using
an ID) will need to be addressed in the short term, since the existing
restore functionality works this way. However, it shouldn't involve
more than bootstrapping a new model, uploading the archive to it,
and then requesting restore of that backup. The design in this document
already accommodates doing this.
HA
-------------------
HA is a work in progress, for the moment we have a basic support which is an
extension of the regular backup functionality.
Read carefully before attempting backup/restore on an HA model.
In the case of HA, the backup process will backup files/db for machine 0,
support for "any working controller" is plans for the near future.
We assume, for now, that if you are running restore is because you have
lost all controller machines. Out of this restore you will get one
functioning controller that you can use to start you other state machines.
BEWARE, only run restore in the case where you no longer have working
controllers since otherwise this will take them offline and possibly
cripple your model.
Previous Implementation
-------------------
Backup and restore were both implemented as plugins (in cmd/plugins/) to
the juju CLI. The plugins were essentially scripts we sent over SSH to
the state machine and ran there. However, they were definitely distinct
pieces of software.
Implementation
===================
Key Points
-------------------
* Backups are created and then stored on the state machine.
* Each backup archive has an associated metadata document (stored in
mongo).
* Each backup archive is stored relative to the state machine (currently
env storage).
* Restore will have access to the state machine where the archive and
metadata are stored.
* In the common case there is no need to upload or download a backup.
* The backups machinery has its own facade in state/apiserver/backups.
* The choice of mechanism for uploading and downloading backups has not
been decided yet.
* The backups machinery is divided into 4 layers:
- state-dependent functionality,
- state-independent functionality,
- the state API facade for backups,
- the juju CLI sub-command for backups.
* The state-independent functionality can be broken down further:
- a high-level backups interface/implementation,
- low-level backup/restore functionality,
- components of the backups machinery.
* Backups depend on the github.com/juju/utils/filestorage package.
* Backups have a special relationship with state (see note at
beginning of state/backups.go).
Backup Archives
-------------------
Each backup archive is a gzipped tar file (.tar.gz). It has the
following structure.
juju-backup/
metadata.json - the backup metadata for the archive.
root.tar - the bundle of state-related files (exluding mongo).
dump/ - all the files dumped from the DB (using mongodump).
At present we do not include any sort of manifest/index file in the
archive.
For more information, see:
- state/backups/db/dump.go - how the DB is dumped;
- state/backups/files/files.go - which files are included in root.tar.
File Layout
--------------------
The layering of the backups machinery and divisions of the state-
independent functionality map almost directly to the following
structure in the filesystem. The state API facade for backups is spread
between state/apiserver and state/api.
state/
backups.go - state-dependent functionality (basically the
interaction with mongo and with env storage)
backups/ - state-independent functionality
backups.go - high-level/public backups interface/implementation
create.go - low-level implementation of backing up juju state
restore.go - low-level implementation of restoring juju state
archive/ - an abstraction of a backups archive
db/ - all stuff related to external interaction with the
DB (internal interactions live in state/backups.go)
files/ - all stuff related to files we back up and restore
metadata/ - the backups metadata implementation
apiserver/
backups/ - the state API facade for backups
backups.go - facade implementation (not including methods
for end-points)
create.go - implementation of the Create() end-point
info.go - (wraps state/backups/backups.go:Backups.Get)
list.go
remove.go
restore.go - implementation of the Restore() end-point
api/
backups.go - the juju state API client implementation for the
backups facade
params/
backups.go - the backups-related API arg/result types
cmd/
juju/
backups.go - the juju CLI sub-command implementation
Note that upload/download aren't accommodated in apiserver/backups/ yet.
Layers of Abstraction
--------------------
As noted above, the backups machinery is divided in 4 layers and the
state-independent portion into 3 parts. Here is an example (using
"create") of how those layers interact.
* The juju CLI for backups wraps:
- the backups facade's Create() method.
* The state API facade wraps:
- the high-level backups implementation (state/backups/backups.go),
- the state-backups interactions (state/backups.go).
* the backups implementation wraps:
- a "filestorage" implementation (../utils/filestorage:FileStorage),
- the low-level "create" implementation,
- DB connection info (state/backups/db/info.go),
- the backup's metadata.
* the "create" implementation makes use of:
- the code in state/backups/{archive,db,files}.
Backups Interface
--------------------
Backups
Add(meta Metadata, archive io.ReadCloser) error
Create() (id string, err error)
Get(id string) (Metadata, io.ReadCloser, error)
List() ([]Metadata, error)
Remove(id string) error
Restore(id string) error
Note: Restore() makes use of Get().
State API Facade
--------------------
BackupsAPI
Create(BackupsCreateArgs) BackupsCreateResult
Info(BackupsInfoArgs) BackupsMetadataResult
List() BackupsMetadataListResult
Remove(BackupsRemoveArgs)
Restore(BackupsRestoreArgs)
Again note that upload and download are not yet included here.
CLI sub-command
--------------------
The juju CLI sub-command for backups is called "backups". Its own
sub-commands have basically a 1-to-1 equivalence with the API client
methods of the same respective names. The essential sub-commands are
exposed via the following options:
- juju backups [--create] [--quiet] [<notes>]
- juju backups --info <ID>
- juju backups --list [--brief]
- juju backups --remove <ID>
- juju backups --restore <ID>
Note: further options may be appropriate for later addition
(e.g. [<filename>] on --restore).
Other anticipated subcommands:
- juju backups --download <ID> [<filename>]
- juju backups --upload <filename>
Note that download and upload are only hypothetical, pending support in
the API facade. When we add the ability to restore from an archive
(rather than an ID), --download and --upload (or a --filename option on
restore) will become essential.