-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
safer int for binomial loglik #504
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the quick fix!
Co-authored-by: Dave Kleinschmidt <[email protected]>
Co-authored-by: Alex Arslan <[email protected]>
Codecov ReportBase: 87.32% // Head: 87.39% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## master #504 +/- ##
==========================================
+ Coverage 87.32% 87.39% +0.06%
==========================================
Files 7 7
Lines 947 952 +5
==========================================
+ Hits 827 832 +5
Misses 120 120
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
Co-authored-by: Alex Arslan <[email protected]>
Co-authored-by: Alex Arslan <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Can you check performance not only with a microbenchmark like at #503 but also by calling loglikelihood
on a model with large data?
Co-authored-by: Milan Bouchet-Valat <[email protected]>
Here's the benchmark: using BenchmarkTools
using GLM
using Random
n = 1_000_000
y = rand(MersenneTwister(42), [1, 2, 3], n) ./ 3
wts = fill(3, n)
m = glm(@formula(y ~ 1), (; y), Binomial(); wts)
@benchmark loglikelihood(m) samples=100 seconds=10 for current master: BenchmarkTools.Trial: 100 samples with 1 evaluation.
Range (min … max): 78.540 ms … 96.700 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 81.484 ms ┊ GC (median): 0.00%
Time (mean ± σ): 81.709 ms ± 2.325 ms ┊ GC (mean ± σ): 0.00% ± 0.00%
▂▃ ▂ ▁██
▃▁▃▁▃▄▄███▆█▄▄███▇▇▄▄▁▃▆▄▃▃▁▁▁▃▁▄▁▁▁▃▁▁▄▁▁▁▁▁▁▁▁▁▁▁▁▃▁▁▁▁▁▃ ▃
78.5 ms Histogram: frequency by time 89.7 ms <
Memory estimate: 16 bytes, allocs estimate: 1. for this branch: julia> @benchmark loglikelihood(m) samples=100 seconds=10
BenchmarkTools.Trial: 100 samples with 1 evaluation.
Range (min … max): 83.523 ms … 91.340 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 85.213 ms ┊ GC (median): 0.00%
Time (mean ± σ): 85.559 ms ± 1.121 ms ┊ GC (mean ± σ): 0.00% ± 0.00%
▁▁▄▇█▅ ▁ ▇▂
▃▁▁▃▅▁▁████████▅██▆▃▆▆███▁▃▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃ ▃
83.5 ms Histogram: frequency by time 91.1 ms <
Memory estimate: 16 bytes, allocs estimate: 1. So 4-5ms extra on a million observations and generally within the variability of current master. This was tested on a different machine than #503, here's the new version info: Julia Version 1.8.2
Commit 36034abf260 (2022-09-29 15:21 UTC)
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 16 × Intel(R) Xeon(R) E-2288G CPU @ 3.70GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-13.0.1 (ORCJIT, skylake)
Threads: 1 on 16 virtual cores |
@@ -512,7 +525,7 @@ The loglikelihood of a fitted model is the sum of these values over all the obse | |||
function loglik_obs end | |||
|
|||
loglik_obs(::Bernoulli, y, μ, wt, ϕ) = wt*logpdf(Bernoulli(μ), y) | |||
loglik_obs(::Binomial, y, μ, wt, ϕ) = logpdf(Binomial(Int(wt), μ), Int(y*wt)) | |||
loglik_obs(::Binomial, y, μ, wt, ϕ) = logpdf(Binomial(Int(wt), μ), _safe_int(y*wt)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative would have been
loglik_obs(::Binomial, y, μ, wt, ϕ) = logpdf(Beta(y*wt + 1, (1 - y)*wt + 1), μ) - log(wt + 1)
It would throw when y*wt
isn't roughly an integer, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would throw when
y*wt
isn't roughly an integer, though.
That's what _safe_int
does though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also just leave out the conversion -- logpdf(::Binomial)
is defined for nearly integers as well, but returns -Inf
if you're not near enough, but this would essentially be an untrapped error. I figured it would be better to catch that higher in the call stack so that the error is more interpretable. We could even return more informative ones like "response is not a multiple of weight".
closes #503