Feature request: Support non-integer dependent variable for Poisson regression #494
Closed
Description
Hi,
I know that the Poisson distribution is technically defined for integers. But in many cases, there are times that it could be a good approximation by treating non-integer variable as Poisson. Other statistical programs (e.g. R, Stata) seem to take it. I hope it is possible to do a similar approach in Julia. Below is the MWE (including a comparison with R code) and stacktrace:
using DataFrames, GLM, Random, Distributions
using RCall
# Generate random dataset with Float64 type dependent variable
df = DataFrame(y = rand(Uniform(0,10), 10),
x1 = randn(10),
x2 = randn(10))
# Poisson Regression in R
println("This is Poisson regression in R")
R"print(summary(glm(y ~ x1 + x2, family=poisson, data=$df)))"
# Poisson Regression in Julia
println("\n This is Poisson regression in Julia")
fit(GeneralizedLinearModel, @formula(y ~ x1 + x2), df, Poisson())
This is Poisson regression in R
Call:
glm(formula = y ~ x1 + x2, family = poisson, data = `#JL`$df)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.5851 -1.3313 0.2815 1.0060 1.5319
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.8127 0.1315 13.782 <2e-16 ***
x1 0.1248 0.1370 0.911 0.363
x2 -0.1083 0.1269 -0.853 0.394
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 22.002 on 9 degrees of freedom
Residual deviance: 20.690 on 7 degrees of freedom
AIC: Inf
Number of Fisher Scoring iterations: 5
┌ Warning: RCall.jl: Warning in dpois(y, mu, log = TRUE) : non-integer x = 9.620932
│ Warning in dpois(y, mu, log = TRUE) : non-integer x = 7.250401
│ Warning in dpois(y, mu, log = TRUE) : non-integer x = 1.245370
│ Warning in dpois(y, mu, log = TRUE) : non-integer x = 8.879959
│ Warning in dpois(y, mu, log = TRUE) : non-integer x = 4.531989
│ Warning in dpois(y, mu, log = TRUE) : non-integer x = 9.078459
│ Warning in dpois(y, mu, log = TRUE) : non-integer x = 9.979947
│ Warning in dpois(y, mu, log = TRUE) : non-integer x = 1.019928
│ Warning in dpois(y, mu, log = TRUE) : non-integer x = 7.847729
│ Warning in dpois(y, mu, log = TRUE) : non-integer x = 2.645772
└ @ RCall C:\Users\yangk\.julia\packages\RCall\6kphM\src\io.jl:172
This is Poisson regression in Julia
ERROR: ArgumentError: y must be in the support of D
Stacktrace:
[1] checky
@ C:\Users\yangk\.julia\packages\GLM\P0Ris\src\glmfit.jl:642 [inlined]
[2] GLM.GlmResp(y::Vector{Float64}, d::Poisson{Float64}, l::LogLink, η::Vector{Float64}, μ::Vector{Float64}, off::Vector{Float64}, wts::Vector{Float64})
@ GLM C:\Users\yangk\.julia\packages\GLM\P0Ris\src\glmfit.jl:36
[3] GLM.GlmResp(y::Vector{Float64}, d::Poisson{Float64}, l::LogLink, off::Vector{Float64}, wts::Vector{Float64})
@ GLM C:\Users\yangk\.julia\packages\GLM\P0Ris\src\glmfit.jl:61
[4] fit(::Type{GeneralizedLinearModel}, X::Matrix{Float64}, y::Vector{Float64}, d::Poisson{Float64}, l::LogLink; dofit::Bool, wts::Vector{Float64}, offset::Vector{Float64}, fitargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ GLM C:\Users\yangk\.julia\packages\GLM\P0Ris\src\glmfit.jl:488
[5] fit (repeats 2 times)
@ C:\Users\yangk\.julia\packages\GLM\P0Ris\src\glmfit.jl:484 [inlined]
[6] fit(::Type{GeneralizedLinearModel}, f::FormulaTerm{Term, Tuple{Term, Term}}, data::DataFrame, args::Poisson{Float64}; contrasts::Dict{Symbol, Any}, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ StatsModels C:\Users\yangk\.julia\packages\StatsModels\qY2YS\src\statsmodel.jl:88
[7] fit(::Type{GeneralizedLinearModel}, f::FormulaTerm{Term, Tuple{Term, Term}}, data::DataFrame, args::Poisson{Float64})
@ StatsModels C:\Users\yangk\.julia\packages\StatsModels\qY2YS\src\statsmodel.jl:82
[8] top-level scope
@ c:\julialang\Question_poisson_glm.txt:15
Metadata
Assignees
Labels
No labels