Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding geometric distribution for GLM.jl #480

Merged
merged 11 commits into from
May 20, 2022
Merged

Conversation

ayushpatnaikgit
Copy link
Contributor

The following is the summary of the changes in the following source files:

  1. In docs/src/index.md:
  • Added Geometric to the list of distribution in family, under the “Fitting GLM models” section
  1. In src/GLM.jl:
  • Exported Geometric
  1. In src/glmtools.jl:
  • Defined LogLink as canonical link function for Geometric distribution
  • Defined glmvar function for Geometric distribution
  • Defined mustart function for Geometric distribution
  • Defined devresid function for Geometric distribution
  • Defined loglik_obs function for Geometric distribution
  • Added an example glmvar in docstring
  • Added an example mustart in docstring
  1. In test/runtests.jl:
  • Added a test case of Geometric distribution with LogLink
  • Added a test case of Geometric distribution with InverseLink

Mousum added 4 commits April 15, 2022 16:37
…t, etc. The probability that a batsman can make a successful hit on the ball before he gets out by the bowler, is calculated efficiently using the geometric probability distribution function. Now each player will have their own distribution depending on their batting ability. So, the probability of successfully playing a ball can be modelled as a function of several covariates of the player. Note that, Geometric distribution is a special case Negative Binomial distribution when the number of failures is 1.
…ace `approx` function by `≈` in test cases.
Copy link
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Have you checked the results against another implementation?

mousum-github and others added 4 commits April 25, 2022 10:06
Co-authored-by: Milan Bouchet-Valat <[email protected]>
Co-authored-by: Milan Bouchet-Valat <[email protected]>
Co-authored-by: Milan Bouchet-Valat <[email protected]>
@codecov-commenter
Copy link

codecov-commenter commented Apr 25, 2022

Codecov Report

Merging #480 (633ed64) into master (42a0d04) will increase coverage by 0.16%.
The diff coverage is 94.11%.

@@            Coverage Diff             @@
##           master     #480      +/-   ##
==========================================
+ Coverage   85.12%   85.28%   +0.16%     
==========================================
  Files           7        7              
  Lines         827      836       +9     
==========================================
+ Hits          704      713       +9     
  Misses        123      123              
Impacted Files Coverage Δ
src/GLM.jl 50.00% <ø> (ø)
src/glmtools.jl 83.80% <94.11%> (+1.09%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 42a0d04...633ed64. Read the comment docs.

@nalimilan nalimilan requested review from palday and dmbates April 25, 2022 07:15
src/glmtools.jl Outdated
@@ -419,6 +431,10 @@ function devresid(::Binomial, y, μ::Real)
end
end
devresid(::Gamma, y, μ::Real) = -2 * (log(y / μ) - (y - μ) / μ)
function devresid(::Geometric, y, μ::Real)
v = 2 * (xlogy(y, y / μ) + xlogy(y + 1, (μ + 1) / (y + 1)))
return μ == 0 ? oftype(v, NaN) : v
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably move this check above to shortcircuit faster when mu==0.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right - I will update the code similar to the following:
μ == 0 ? NaN : 2 * (xlogy(y, y / μ) + xlogy(y + 1, (μ + 1) / (y + 1)))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

   μ == 0 && return oftype(μ, NaN)
   return 2 * (xlogy(y, y / μ) + xlogy(y + 1, (μ + 1) / (y + 1)))

is probably how I would do it, but it's really a matter of personal style and what @nalimilan thinks 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it sounds better to adapt the NaN type to that of the inputs. But I'm not sure whether it should be the type of μ, that of y, or promote_type(typeof(μ), typeof(y))...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

promote_type(typeof(μ), typeof(y)) seems the way to go. Given that we have an expression with y/mu and xlogy never returns a narrower type than the inputs, we can expect the result to be at least as wide as y/mu.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually promote_type(typeof(μ), typeof(y)) will give a failure if both types are Integer as conversion of NaN won't work. So we need to do float(promote_type(typeof(μ), typeof(y))).

Copy link
Member

@palday palday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to document where the numbers in the "Geometric LogLink" test set come from, but otherwise looks good to me.

@nalimilan
Copy link
Member

It would be good to document where the numbers in the "Geometric LogLink" test set come from

I don't think we note this in the code usually, I think the implicit assumption is that values have been checked against R, either directly or indirectly (here, by checking against NegativeBinomial which was previously checked against R).

@ViralBShah
Copy link
Contributor

Bumping this - what remains to merge?

Copy link
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, we kind of forgot about this. We just need to fix the discussion about the devresid return type.

function devresid(::Geometric, y, μ::Real)
μ == 0 && return oftype(μ, NaN)
return 2 * (xlogy(y, y / μ) - xlogy(y + 1, (y + 1) / (μ + 1)))
end
devresid(::InverseGaussian, y, μ::Real) = abs2(y - μ) / (y * abs2(μ))
function devresid(d::NegativeBinomial, y, μ::Real)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you adapt this method to use the same pattern for consistency?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed oftype(η, NaN) to convert(float(typeof(η)), NaN) wherever possible in inverselink functions since GLM.inverselink(GLM.IdentityLink(), 0) is throwing an error.

Also, the devresid function for Negative Binomial is defined as how it is defined for Geometric.

mousum-github and others added 2 commits May 20, 2022 13:00
Thanks for your suggestion. The suggestion has been tested and committed.

Co-authored-by: Milan Bouchet-Valat <[email protected]>
… possible since `GLM.inverselink(GLM.IdentityLink(), 0)` is throwing an error.

linkfun(::InverseLink, μ::Real) = inv(μ)
linkinv(::InverseLink, η::Real) = inv(η)
mueta(::InverseLink, η::Real) = -inv(abs2(η))
function inverselink(::InverseLink, η::Real)
μ = inv(η)
return μ, -abs2(μ), oftype(μ, NaN)
return μ, -abs2(μ), convert(float(typeof(μ)), NaN)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are likely not needed as μ = inv(η) will (almost?) always be a float. Though I guess that doesn't hurt much.

@nalimilan nalimilan requested a review from palday May 20, 2022 08:02
@nalimilan nalimilan merged commit 3a3a27c into JuliaStats:master May 20, 2022
@nalimilan
Copy link
Member

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants