Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reimplement Nuisance to sample in an unconstrained space #1007

Open
sethaxen opened this issue Aug 9, 2018 · 3 comments
Open

Reimplement Nuisance to sample in an unconstrained space #1007

sethaxen opened this issue Aug 9, 2018 · 3 comments

Comments

@sethaxen
Copy link
Contributor

sethaxen commented Aug 9, 2018

Drawing from some of the tricks used in probabilistic programming packages, I'm proposing a reimplementation to IMP.isd.Nuisance in IMP that should enable better sampling. The beginnings of an implementation can be found here. How this changes things is detailed in the bullet points in the below exchanges with @benmwebb:

@sdaxen:

I've been reading about modern practices with Hamiltonian Monte Carlo (HMC), and one thing I've learned is that HMC typically has a hard time exploring the parameters of constrained variables near the bound of the constraint (Gibbs is even worse at this), which can result in oscillations between two highly biased regimes in the resulting samples, preventing an accurate sample from being drawn. The trick that's usually employed here is to transform the variables so that their support is the real line. The usual transformations are log for a variable with only an upper or lower bound and logit for a variable with both (with necessary shifting factors, etc). These transformations usually make exploration of the space much more efficient. Modifying all restraints to operate on log-scale parameters, etc is really cumbersome.

My idea is to modify the implementation of Nuisance so that instead of storing the nuisance value and enforcing a constraint, it stores only the transformed nuisance (log, logit, etc) and computes the untransformed nuisance upon request. Adding to the nuisance derivative would simply add to the derivative of the transformed nuisance multiplied by the Jacobian. Somehow the necessary Jacobian correction to the posterior will always be added to the score and derivative (once per transformed nuisance per evaluation). The bounds would be used to determine which transformation should be applied and to warn if the user manually sets the nuisance to a value outside of the constraints. An additional benefit is that all nuisances regardless of scale/bounds are on the same rough scale when transformed, and a nuisance mover can operate with some default parameters on the transformed nuisance value. The only change to how Nuisances behave is that the bounds will be exclusive instead of inclusive, but for parameters like Scale, we generally don't want them to ever be 0 anyways, and the fact they can be 0 often requires us to catch a resulting inf/nan in our restraints. What are your thoughts on this?

@benmwebb:

Seems reasonable, assuming it doesn't break existing code. Your
suggestion to make a branch to play with this sounds ideal.

@sdaxen:

Cool. I've implemented most of it, and I suspect it will break some tests that check result of scoring function evaluation or derivative of nuisance but in a well-defined way that we can check. Any sampled drawn from the posterior should be improved. There are a few other behaviors it changes (question below):

  • Bounds are now exclusive. If setting the nuisance outside of a bound, it sets the transformed nuisance such that the nuisance is as close to the bound as possible without being equal. This means some code that set nuisances to have tighter bounds than intended to avoid overflow is probably unnecessary.
    Related: Hard bounds should only be used if there's a physical/mathematical constraint there (e.g. Scale, Switching). To avoid unrealistically high values, a properly tuned prior should be used. Setting hard bounds that are reachable when sampling by the scoring function can really break inferences, and if they're unreachable, there's no need for a hard constraint (This is now documented). The crosslinking restraints in PMI are an example of where this is done and really should be updated accordingly.
  • Upper and lower bounds now can't be equal. This should go without saying, as the proper way to constrain a nuisance to a singular value is just to not optimize it. There are a couple tests where upper and lower bounds are carelessly set to be equal but the particle is set to be optimized.
  • NormalMover moves the transformed nuisance, not the nuisance. This is necessary to solve the biased sampling issue that's the whole point. Might be necessary to tweak a couple of the hardcoded nuisance moves in PMI so they behave similarly to a mover on the Nuisance.
  • Moving a Nuisance that is used for an upper or lower bound changes the value of the bounded nuisance, though its transformation is unchanged. I can't see any way to not do this without adding a decorator for NuisanceBound. I've not found anywhere in the IMP code where we bound a Nuisance with a Nuisance, so not certain how big of a deal this is.
  • The NuisanceScoreState is now used to update the derivative of the transformed nuisance with the derivative of the Jacobian-based modification of the prior due to change of variables. This happens after evaluation. I chose to do it here instead of adding it in the prior restraint because we want to maintain the default uniform prior on the untransformed nuisance if the user doesn't specify a prior.
  • Upon evaluation, the score should have the Jacobian-based modifications of all Nuisances added to it.
    The last point is the one thing that's missing (besides a couple Jacobian tests) before I go on hunts for failed tests. My idea is that it should be handled by the ScoringFunction base class upon evaluation. So an arbitrary ScoringFunction should have access to a list of all particles that have been decorated as nuisances in the model. Is that information accessible from the attribute table or would we need a new entry? Or is there a better way to do this? I'm trying to make it user-proof so that the user never needs to worry about setting up the transformation and its corrections properly can can just trust that the samples drawn using the scoring function are equivalent to draws from the untransformed scoring function that they defined.

@benmwebb:

I don't think it's a good idea to have the scoring function examine all
particles in the model looking for Nuisances (if I understand you
correctly). If nothing else, this will end up considering Nuisances that
aren't even used in the scoring function. The simplest way would be to
add a suitable restraint to the scoring function that gets given a list
of all the Nuisances and then calculates the necessary correction. A
more involved but "automatic" way might be to write a custom
ScoringFunction subclass that goes through all of its restraints, calls
get_particles() on each one, tries to cast to Nuisance and then
calculates the correction.

@sethaxen
Copy link
Contributor Author

sethaxen commented Aug 9, 2018

@benmwebb:

I don't think it's a good idea to have the scoring function examine all particles in the model looking for Nuisances (if I understand you correctly). ... The simplest way would be to add a suitable restraint to the scoring function that gets given a list of all the Nuisances and then calculates the necessary correction. A more involved but "automatic" way might be to write a custom ScoringFunction subclass that goes through all of its restraints, calls get_particles() on each one, tries to cast to Nuisance and then calculates the correction.

I was more thinking that the attribute table would contain a special list nuisances_ with ParticleIndexes that have been decorated with Nuisance. It could be populated by IMP.isd.Nuisance.setup_particle. Then ScoringFunction would just iterate through this list, and for each Nuisance that is optimized, add the necessary correction to the score.

I don't think it's sufficient to have a subclass of ScoringFunction do this, because then it's not user-proof. If the user didn't use the subclass, then their samples from the posterior are guaranteed to be wrong because the posterior is using the wrong measure. Likewise, if the user has to add a special restraint to ScoringFunction to get this behavior, there will likely be cases where that doesn't happen. Or perhaps I'm misunderstanding how ScoringFunction works; my understanding is that it produces the score -log posterior by computing a weighted sum of the scores of the individual restraints. I suppose a compromise here is for a special user-hidden Restraint that ScoringFunction automatically adds to the list of provided restraints. The caveat is that (I think) this should be unaffected by temperature in MC/MD. Infinite temperature reduces all priors in the untransformed space to uniform, which means that the scoring function would just consist of the -log Jacobian term, unmodified by the temperature. So maybe that means this should be something entirely separate from the scoring function that these optimizers make special use of.

@benmwebb:

If nothing else, this will end up considering Nuisances that aren't even used in the scoring function.

That's actually the necessary behavior. If a Nuisance is being optimized using a scoring function in which it never appears, this is equivalent to including a uniform restraint on the untransformed parameter. But if the transformed parameter is being optimized, the corresponding restraint would not be uniform due to the Jacobian correction. i.e. to maintain the default behavior that an unrestrained untransformed parameter is uniform, the Jacobian correction to the score is needed even if the parameter is not restrained with a prior.

@sethaxen
Copy link
Contributor Author

sethaxen commented Aug 12, 2018

Here's a better alternative: Implement a child of Nuisance called TransformedNuisance. It should be usable everywhere where Nuisance is used, but the value of the nuisance is constrained to be the inverse transform of TransformedNuisance and is therefore only guaranteed to be correct after Model.evaluate(). A single score state will handle this and the Jacobian adjustment of derivatives for all TransformedNuisances. Switching to the transformed nuisance should be tear-free.

The only complicated thing here is what I mentioned in the last post. The log Jacobian score adjustment should be invariant to re-weighting. Right now optimizers like IMP.core.MonteCarlo don't allow for any changes to the score after scoring function evaluation but before acceptance/rejection. A new class IMP.core.JacobianAdjuster will be responsible for returning the adjustment to the score needed for transformed variables (nuisances and any others in the future), and optimizers will need to explicitly handle this.

sethaxen added a commit that referenced this issue Aug 23, 2018
This is primarily used when an unintuitive transformation of a
constrained parameter exists in an unconstrained space. To sample in the
unconstrained space but represent the parameter in the constrained space
requires an adjustment to the score and gradient, handled by
JacobianScoreAdjuster.

Relates #1007
sethaxen added a commit that referenced this issue Aug 23, 2018
@sethaxen
Copy link
Contributor Author

All the pieces are there for uni- and multi-variate adjustment of score and gradients. Without modifying IMP.ScoringFunction, the last piece is for IMP.core.JacobianAdjuster.get_score_adjustment() to be called everywhere in an IMP.Optimizer where an IMP.ScoringFunction is evaluated.

sethaxen added a commit that referenced this issue Sep 16, 2018
Because tempering divides score and its gradient by the temperature, the
Jacobian ajustment must be multiplied by the temperature so that it is
unaffected by tempering.

Relates #1007
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant