Skip to content

Latest commit

 

History

History
153 lines (124 loc) · 6.96 KB

hacking-state.md

File metadata and controls

153 lines (124 loc) · 6.96 KB

Hacking the juju-core/state package

This document remains a work in progress; it's an attempt to capture the various conventions and things to bear in mind that aren't necessarily written down anywhere else.

return values: ok vs err

By convention, anything that could reasonably fail must use a separate channel to communicate the failure. Broadly speaking, methods that can fail in more than one way must return an error; those that can only fail in one way (eg Machine.InstanceId: the value is either valid or missing) are expected to return a bool for consistency's sake, even if the type of the return value is such that failure can be signalled in- band.

changes to entities

Entity objects reflect remote state that may change at any time, but we don't want to make that too obvious. By convention, the only methods that should change an entity's in-memory document are as follows:

  • Refresh(), which should update every field.
  • Methods that set fields remotely should update only those fields locally.

The upshot of this is that it's not appropriate to call Refresh on any argument to a state method, including receivers; if you're in a context in which that would be helpful, you should clone the entity first. This is simple enough that there's no Clone() method, but it would be kinda nice to implement them in our Copious Free Time; I think there are places outside state that would also find it useful.

care and feeding of mgo/txn

Just about all our writes to mongodb are mediated by the mgo/txn package, and using this correctly demands some care. Not all the code has been written in a fully aware state, and cases in which existing practice is divergent from the advice given below should be regarded with some suspicion.

The txn package lets you make watchable changes to mongodb via lists of operations with the following critical properties:

  • transactions can apply to more than one document (this is rather the point of having them).
  • transactions will complete if every assert in the transaction passes; they will not run at all if any assert fails.
  • multi-document transactions are not atomic; the operations are applied in the order specified by the list.
  • operations, and hence assertions, can only be applied to documents with ids that are known at the time the operation list is built; this means that it takes extra work to specify a condition like "no unit document newer than X exists".

The second point above deserves further discussion. Whenever you are implementing a state change, you should consider the impact of the following possible mongodb states on your code:

  • if mongodb is already in a state consistent with the transaction having already been run -- which is always possible -- you should return immediately without error.
  • if mongodb is in a state that indicates the transaction is not valid -- eg trying to add a unit to a dying service -- you should return immediately with a descriptive error.

Each of the above situations should generally be checked as part of preparing a new []txn.Op, but in some cases it's convenient to trust (to begin with) that the entity's in-memory state reflects reality. Regardless, your job is to build a list of operations which assert that:

  • the transaction still needs to be applied.
  • the transaction is actively valid.
  • facts on which the transaction list's form depends remain true.

If you're really lucky you'll get to write a transaction in which the third requirement collapses to nothing, but that's not really the norm. In practice, you need to be prepared for the run to return txn.ErrAborted; if this happens, you need to check for previous success (and return nil) or for known cause of invalidity (and return an error describing it).

If neither of these cases apply, you should assume it's an assertion failure of the third kind; in that case, you should build a new transaction based on more recent state and try again. If ErrAborteds just keep coming, give up; there's an ErrExcessiveContention that helps to describe the situation.

watching entities, and select groups thereof

The mgo/txn log enables very convenient notifications of changes to particular documents and groups thereof. The state/watcher package converts the txn event log into events and send them down client- supplied channels for further processing; the state code itself implements watchers in terms of these events.

All the internally-relevant watching code is implemented in the file state/watcher.go. These constructs can be broadly divided into groups as follows:

  • single-document watchers: dead simple, they notify of every change to a given doc. SettingsWatcher bucks the convention of NotifyWatcher in that it reads whole *Settings~s to send down the channel rather than just sending notifications (we probably shouldn't do that, but I don't think there's time to fix it right now).
  • entity group watchers: pretty simple, very common; they notify of every change to the Life field among a group of entities in the same collection, with group membership determined in a wide variety of ways.
  • relation watchers: of a similar nature to entity group watchers, but generating carefully-ordered events from observed changes to several different collections, none of which have Life fields.

Implementation of new watchers is not necessarily a simple task, unless it's a genuinely trivial refactoring (or, given lack of generics, copy- and-paste job). Unless you already know exactly what you're doing, please start a discussion about what you're trying to do before you embark on further work in this area.

transactions and reference counts

As described above, it's difficult to assert things like "no units of service X exist". It can be done, but not without additional work, and we're generally using reference counts to enable this sort of thing.

In general, reference counts should not be stored within entities: such changes are part of the operation of juju and are not important enough to be reflected as a constant stream of events. We didn't figure this out until recently, though, so relationDoc has a UnitCount, and serviceDoc has both a UnitCount and a RelationCount. This doesn't matter for the relation -- nothing watches relation docs directly -- but I fear it matters quite hard for application, because every added or removed unit or relation will be an unnecessary event sent to every unit agent.

We'll deal with that when it's a problem rather than just a concern; but new reference counts should not be thoughtlessly added to entity documents, and opportunities to separate them from existing docs should be considered seriously.

[TODO: write about globalKey and what it's good for... constraints, settings, statuses, etc; occasionaly profitable use of fast prefix lookup on indexed fields.]