-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding geometric distribution for GLM.jl #480
Conversation
…t, etc. The probability that a batsman can make a successful hit on the ball before he gets out by the bowler, is calculated efficiently using the geometric probability distribution function. Now each player will have their own distribution depending on their batting ability. So, the probability of successfully playing a ball can be modelled as a function of several covariates of the player. Note that, Geometric distribution is a special case Negative Binomial distribution when the number of failures is 1.
…ace `approx` function by `≈` in test cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Have you checked the results against another implementation?
Co-authored-by: Milan Bouchet-Valat <[email protected]>
Co-authored-by: Milan Bouchet-Valat <[email protected]>
Co-authored-by: Milan Bouchet-Valat <[email protected]>
Codecov Report
@@ Coverage Diff @@
## master #480 +/- ##
==========================================
+ Coverage 85.12% 85.28% +0.16%
==========================================
Files 7 7
Lines 827 836 +9
==========================================
+ Hits 704 713 +9
Misses 123 123
Continue to review full report at Codecov.
|
src/glmtools.jl
Outdated
@@ -419,6 +431,10 @@ function devresid(::Binomial, y, μ::Real) | |||
end | |||
end | |||
devresid(::Gamma, y, μ::Real) = -2 * (log(y / μ) - (y - μ) / μ) | |||
function devresid(::Geometric, y, μ::Real) | |||
v = 2 * (xlogy(y, y / μ) + xlogy(y + 1, (μ + 1) / (y + 1))) | |||
return μ == 0 ? oftype(v, NaN) : v |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would probably move this check above to shortcircuit faster when mu==0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right - I will update the code similar to the following:
μ == 0 ? NaN : 2 * (xlogy(y, y / μ) + xlogy(y + 1, (μ + 1) / (y + 1)))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
μ == 0 && return oftype(μ, NaN)
return 2 * (xlogy(y, y / μ) + xlogy(y + 1, (μ + 1) / (y + 1)))
is probably how I would do it, but it's really a matter of personal style and what @nalimilan thinks 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it sounds better to adapt the NaN
type to that of the inputs. But I'm not sure whether it should be the type of μ
, that of y
, or promote_type(typeof(μ), typeof(y))
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
promote_type(typeof(μ), typeof(y))
seems the way to go. Given that we have an expression with y/mu
and xlogy
never returns a narrower type than the inputs, we can expect the result to be at least as wide as y/mu
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually promote_type(typeof(μ), typeof(y))
will give a failure if both types are Integer
as conversion of NaN
won't work. So we need to do float(promote_type(typeof(μ), typeof(y)))
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to document where the numbers in the "Geometric LogLink" test set come from, but otherwise looks good to me.
I don't think we note this in the code usually, I think the implicit assumption is that values have been checked against R, either directly or indirectly (here, by checking against |
Bumping this - what remains to merge? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, we kind of forgot about this. We just need to fix the discussion about the devresid
return type.
function devresid(::Geometric, y, μ::Real) | ||
μ == 0 && return oftype(μ, NaN) | ||
return 2 * (xlogy(y, y / μ) - xlogy(y + 1, (y + 1) / (μ + 1))) | ||
end | ||
devresid(::InverseGaussian, y, μ::Real) = abs2(y - μ) / (y * abs2(μ)) | ||
function devresid(d::NegativeBinomial, y, μ::Real) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you adapt this method to use the same pattern for consistency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed oftype(η, NaN)
to convert(float(typeof(η)), NaN)
wherever possible in inverselink
functions since GLM.inverselink(GLM.IdentityLink(), 0)
is throwing an error.
Also, the devresid
function for Negative Binomial is defined as how it is defined for Geometric.
Thanks for your suggestion. The suggestion has been tested and committed. Co-authored-by: Milan Bouchet-Valat <[email protected]>
… possible since `GLM.inverselink(GLM.IdentityLink(), 0)` is throwing an error.
|
||
linkfun(::InverseLink, μ::Real) = inv(μ) | ||
linkinv(::InverseLink, η::Real) = inv(η) | ||
mueta(::InverseLink, η::Real) = -inv(abs2(η)) | ||
function inverselink(::InverseLink, η::Real) | ||
μ = inv(η) | ||
return μ, -abs2(μ), oftype(μ, NaN) | ||
return μ, -abs2(μ), convert(float(typeof(μ)), NaN) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are likely not needed as μ = inv(η)
will (almost?) always be a float. Though I guess that doesn't hurt much.
Thanks! |
The following is the summary of the changes in the following source files:
LogLink
as canonical link function forGeometric
distributionglmvar
function forGeometric
distributionmustart
function forGeometric
distributiondevresid
function forGeometric
distributionloglik_obs
function forGeometric
distributionglmvar
in docstringmustart
in docstringGeometric
distribution withLogLink
Geometric
distribution withInverseLink