Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

safer int for binomial loglik #504

Merged
merged 9 commits into from
Oct 22, 2022
Merged

safer int for binomial loglik #504

merged 9 commits into from
Oct 22, 2022

Conversation

palday
Copy link
Member

@palday palday commented Oct 21, 2022

closes #503

Copy link
Member

@kleinschmidt kleinschmidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the quick fix!

test/runtests.jl Outdated Show resolved Hide resolved
test/runtests.jl Outdated Show resolved Hide resolved
Co-authored-by: Dave Kleinschmidt <[email protected]>
src/glmtools.jl Outdated Show resolved Hide resolved
src/glmtools.jl Show resolved Hide resolved
test/runtests.jl Outdated Show resolved Hide resolved
Co-authored-by: Alex Arslan <[email protected]>
@codecov-commenter
Copy link

codecov-commenter commented Oct 22, 2022

Codecov Report

Base: 87.32% // Head: 87.39% // Increases project coverage by +0.06% 🎉

Coverage data is based on head (4fc8539) compared to base (1459737).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #504      +/-   ##
==========================================
+ Coverage   87.32%   87.39%   +0.06%     
==========================================
  Files           7        7              
  Lines         947      952       +5     
==========================================
+ Hits          827      832       +5     
  Misses        120      120              
Impacted Files Coverage Δ
src/glmtools.jl 93.82% <100.00%> (+0.19%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

palday and others added 2 commits October 22, 2022 03:35
Co-authored-by: Alex Arslan <[email protected]>
Co-authored-by: Alex Arslan <[email protected]>
Copy link
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Can you check performance not only with a microbenchmark like at #503 but also by calling loglikelihood on a model with large data?

src/glmtools.jl Outdated Show resolved Hide resolved
src/glmtools.jl Outdated Show resolved Hide resolved
test/runtests.jl Outdated Show resolved Hide resolved
@palday
Copy link
Member Author

palday commented Oct 22, 2022

@nalimilan

Here's the benchmark:

using BenchmarkTools
using GLM
using Random

n = 1_000_000
y = rand(MersenneTwister(42), [1, 2, 3], n) ./ 3
wts = fill(3, n)

m = glm(@formula(y ~ 1), (; y), Binomial(); wts)

@benchmark loglikelihood(m) samples=100 seconds=10

for current master:

BenchmarkTools.Trial: 100 samples with 1 evaluation.
 Range (min  max):  78.540 ms  96.700 ms  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     81.484 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   81.709 ms ±  2.325 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

         ▂▃  ▂  ▁██                                            
  ▃▁▃▁▃▄▄███▆█▄▄███▇▇▄▄▁▃▆▄▃▃▁▁▁▃▁▄▁▁▁▃▁▁▄▁▁▁▁▁▁▁▁▁▁▁▁▃▁▁▁▁▁▃ ▃
  78.5 ms         Histogram: frequency by time        89.7 ms <

 Memory estimate: 16 bytes, allocs estimate: 1.

for this branch:

julia> @benchmark loglikelihood(m) samples=100 seconds=10
BenchmarkTools.Trial: 100 samples with 1 evaluation.
 Range (min  max):  83.523 ms  91.340 ms  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     85.213 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   85.559 ms ±  1.121 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

         ▁▁▄▇█▅ ▁       ▇▂                                     
  ▃▁▁▃▅▁▁████████▅██▆▃▆▆███▁▃▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃ ▃
  83.5 ms         Histogram: frequency by time        91.1 ms <

 Memory estimate: 16 bytes, allocs estimate: 1.

So 4-5ms extra on a million observations and generally within the variability of current master.

This was tested on a different machine than #503, here's the new version info:

Julia Version 1.8.2
Commit 36034abf260 (2022-09-29 15:21 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 16 × Intel(R) Xeon(R) E-2288G CPU @ 3.70GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, skylake)
  Threads: 1 on 16 virtual cores

@palday palday merged commit 0c05716 into master Oct 22, 2022
@palday palday deleted the pa/binloglik branch October 22, 2022 16:34
@@ -512,7 +525,7 @@ The loglikelihood of a fitted model is the sum of these values over all the obse
function loglik_obs end

loglik_obs(::Bernoulli, y, μ, wt, ϕ) = wt*logpdf(Bernoulli(μ), y)
loglik_obs(::Binomial, y, μ, wt, ϕ) = logpdf(Binomial(Int(wt), μ), Int(y*wt))
loglik_obs(::Binomial, y, μ, wt, ϕ) = logpdf(Binomial(Int(wt), μ), _safe_int(y*wt))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative would have been

loglik_obs(::Binomial, y, μ, wt, ϕ) = logpdf(Beta(y*wt + 1, (1 - y)*wt + 1), μ) - log(wt + 1)

It would throw when y*wt isn't roughly an integer, though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would throw when y*wt isn't roughly an integer, though.

That's what _safe_int does though?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also just leave out the conversion -- logpdf(::Binomial) is defined for nearly integers as well, but returns -Inf if you're not near enough, but this would essentially be an untrapped error. I figured it would be better to catch that higher in the call stack so that the error is more interpretable. We could even return more informative ones like "response is not a multiple of weight".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Int conversion in loglik_obs(::Binomial)
6 participants