safer int for binomial loglik #504

palday · 2022-10-21T20:21:24Z

closes #503

kleinschmidt

LGTM, thanks for the quick fix!

test/runtests.jl

Co-authored-by: Dave Kleinschmidt <[email protected]>

src/glmtools.jl

test/runtests.jl

Co-authored-by: Alex Arslan <[email protected]>

codecov-commenter · 2022-10-22T03:35:48Z

Codecov Report

Base: 87.32% // Head: 87.39% // Increases project coverage by +0.06% 🎉

Coverage data is based on head (4fc8539) compared to base (1459737).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #504      +/-   ##
==========================================
+ Coverage   87.32%   87.39%   +0.06%     
==========================================
  Files           7        7              
  Lines         947      952       +5     
==========================================
+ Hits          827      832       +5     
  Misses        120      120

Impacted Files	Coverage Δ
src/glmtools.jl	`93.82% <100.00%> (+0.19%)`	⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

Co-authored-by: Alex Arslan <[email protected]>

nalimilan

Makes sense. Can you check performance not only with a microbenchmark like at #503 but also by calling loglikelihood on a model with large data?

src/glmtools.jl

test/runtests.jl

Co-authored-by: Milan Bouchet-Valat <[email protected]>

palday · 2022-10-22T15:15:43Z

@nalimilan

Here's the benchmark:

using BenchmarkTools
using GLM
using Random

n = 1_000_000
y = rand(MersenneTwister(42), [1, 2, 3], n) ./ 3
wts = fill(3, n)

m = glm(@formula(y ~ 1), (; y), Binomial(); wts)

@benchmark loglikelihood(m) samples=100 seconds=10

for current master:

BenchmarkTools.Trial: 100 samples with 1 evaluation.
 Range (min … max):  78.540 ms … 96.700 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     81.484 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   81.709 ms ±  2.325 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

         ▂▃  ▂  ▁██                                            
  ▃▁▃▁▃▄▄███▆█▄▄███▇▇▄▄▁▃▆▄▃▃▁▁▁▃▁▄▁▁▁▃▁▁▄▁▁▁▁▁▁▁▁▁▁▁▁▃▁▁▁▁▁▃ ▃
  78.5 ms         Histogram: frequency by time        89.7 ms <

 Memory estimate: 16 bytes, allocs estimate: 1.

for this branch:

julia> @benchmark loglikelihood(m) samples=100 seconds=10
BenchmarkTools.Trial: 100 samples with 1 evaluation.
 Range (min … max):  83.523 ms … 91.340 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     85.213 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   85.559 ms ±  1.121 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

         ▁▁▄▇█▅ ▁       ▇▂                                     
  ▃▁▁▃▅▁▁████████▅██▆▃▆▆███▁▃▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃ ▃
  83.5 ms         Histogram: frequency by time        91.1 ms <

 Memory estimate: 16 bytes, allocs estimate: 1.

So 4-5ms extra on a million observations and generally within the variability of current master.

This was tested on a different machine than #503, here's the new version info:

Julia Version 1.8.2
Commit 36034abf260 (2022-09-29 15:21 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 16 × Intel(R) Xeon(R) E-2288G CPU @ 3.70GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, skylake)
  Threads: 1 on 16 virtual cores

andreasnoack · 2022-10-25T07:06:50Z

src/glmtools.jl

@@ -512,7 +525,7 @@ The loglikelihood of a fitted model is the sum of these values over all the obse
 function loglik_obs end

 loglik_obs(::Bernoulli, y, μ, wt, ϕ) = wt*logpdf(Bernoulli(μ), y)
-loglik_obs(::Binomial, y, μ, wt, ϕ) = logpdf(Binomial(Int(wt), μ), Int(y*wt))
+loglik_obs(::Binomial, y, μ, wt, ϕ) = logpdf(Binomial(Int(wt), μ), _safe_int(y*wt))


An alternative would have been

loglik_obs(::Binomial, y, μ, wt, ϕ) = logpdf(Beta(y*wt + 1, (1 - y)*wt + 1), μ) - log(wt + 1)

It would throw when y*wt isn't roughly an integer, though.

It would throw when y*wt isn't roughly an integer, though.

That's what _safe_int does though?

We could also just leave out the conversion -- logpdf(::Binomial) is defined for nearly integers as well, but returns -Inf if you're not near enough, but this would essentially be an untrapped error. I figured it would be better to catch that higher in the call stack so that the error is more interpretable. We could even return more informative ones like "response is not a multiple of weight".

palday added 2 commits October 21, 2022 15:19

safer Int in Binomial log likelihood

4df4bd0

patch bump

89086be

palday requested review from nalimilan and kleinschmidt October 21, 2022 20:21

kleinschmidt approved these changes Oct 21, 2022

View reviewed changes

test/runtests.jl Outdated Show resolved Hide resolved

test/runtests.jl Outdated Show resolved Hide resolved

Uncle Dave's Formatting

50bda45

Co-authored-by: Dave Kleinschmidt <[email protected]>

ararslan reviewed Oct 22, 2022

View reviewed changes

src/glmtools.jl Outdated Show resolved Hide resolved

src/glmtools.jl Show resolved Hide resolved

test/runtests.jl Outdated Show resolved Hide resolved

Update src/glmtools.jl

bbecf23

Co-authored-by: Alex Arslan <[email protected]>

palday and others added 2 commits October 22, 2022 03:35

Update test/runtests.jl

c4c612c

Co-authored-by: Alex Arslan <[email protected]>

fallback method

a71c529

Co-authored-by: Alex Arslan <[email protected]>

ararslan approved these changes Oct 22, 2022

View reviewed changes

nalimilan reviewed Oct 22, 2022

View reviewed changes

src/glmtools.jl Outdated Show resolved Hide resolved

src/glmtools.jl Outdated Show resolved Hide resolved

test/runtests.jl Outdated Show resolved Hide resolved

palday and others added 3 commits October 22, 2022 14:55

Update src/glmtools.jl

3d5e95e

drop extra tests

9eea8af

Co-authored-by: Milan Bouchet-Valat <[email protected]>

s/_safer/_safe/

4fc8539

nalimilan approved these changes Oct 22, 2022

View reviewed changes

palday merged commit 0c05716 into master Oct 22, 2022

palday deleted the pa/binloglik branch October 22, 2022 16:34

andreasnoack reviewed Oct 25, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

safer int for binomial loglik #504

safer int for binomial loglik #504

palday commented Oct 21, 2022

kleinschmidt left a comment

codecov-commenter commented Oct 22, 2022 •

edited

Loading

nalimilan left a comment

palday commented Oct 22, 2022

andreasnoack Oct 25, 2022

ararslan Oct 25, 2022

palday Oct 25, 2022

safer int for binomial loglik #504

safer int for binomial loglik #504

Conversation

palday commented Oct 21, 2022

kleinschmidt left a comment

Choose a reason for hiding this comment

codecov-commenter commented Oct 22, 2022 • edited Loading

Codecov Report

nalimilan left a comment

Choose a reason for hiding this comment

palday commented Oct 22, 2022

andreasnoack Oct 25, 2022

Choose a reason for hiding this comment

ararslan Oct 25, 2022

Choose a reason for hiding this comment

palday Oct 25, 2022

Choose a reason for hiding this comment

codecov-commenter commented Oct 22, 2022 •

edited

Loading