-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fixest: sparse matrix support #18
Comments
Probably worth waiting for lrberge/fixest#418 to land before pushing, but if you install locally from this branch, you can use |
Looking good so far 👀 library(fixest)
library(Matrix)
library(summclust)
# library(devtools)
# load_all()
nlswork <- haven::read_dta("http://www.stata-press.com/data/r9/nlswork.dta")
# drop NAs at the moment
nlswork <- nlswork[, c("ln_wage", "grade", "age", "birth_yr",
"union", "race", "msp", "ind_code")]
nlswork <- na.omit(nlswork)
# 100 groups
nlswork$group <- (seq_len(nrow(nlswork)) - 1) %% 100 + 1
lm_fit <- lm(
ln_wage ~ 0 + factor(union) + factor(msp) + factor(race),
data = nlswork
)
feols_fit <- feols(
ln_wage ~ 0 + i(union) + i(msp) + i(race),
data = nlswork
)
obj = feols_fit
vcovCR3_feols = vcov_CR3J(feols_fit, ~ group, type = "CRV3")
sqrt(diag(vcovCR3_feols))
#> union::0 union::1 msp::1 race::2 race::3
#> 0.006237060 0.009494587 0.007800561 0.007650714 0.032370445
vcovCR3_lm = vcov_CR3J(lm_fit, ~ group, type = "CRV3")
sqrt(diag(vcovCR3_lm))[c("factor(union)0", "factor(union)1", "factor(msp)1", "factor(race)2", "factor(race)3")]
#> factor(union)0 factor(union)1 factor(msp)1 factor(race)2 factor(race)3
#> 0.006237060 0.009494587 0.007800561 0.007650714 0.032370445
X = fixest::sparse_model_matrix(feols_fit, type = c("rhs", "fixef"))
y = fixest::sparse_model_matrix(feols_fit, type = "lhs")
cluster_vec = nlswork$group
tictoc::tic("Sparse")
res = summclust:::cluster_jackknife_sparse(y, X, cluster_vec, type = "CRV3")
tictoc::toc()
#> Sparse: 0.331 sec elapsed
tictoc::tic("Dense")
res = summclust:::cluster_jackknife(as.matrix(y), as.matrix(X), data.frame(group = cluster_vec), type = "CRV3")
tictoc::toc()
#> Dense: 2.521 sec elapsed Created on 2023-06-09 with reprex v2.0.2 |
Oh, wow, awesome, that's indeed quite a nice speed up! |
One thing I learned: using RcppArmadillo's pinv is twice as fast as library(microbenchmark)
N <- 1000
X <- as.factor(sample(1:1000, 100000, TRUE))
df <- data.frame(X = X)
mf <- model.frame(~X, df)
mm <- sparse.model.matrix(mf)
X <- crossprod(mm)
dim(X)
X <- as.matrix(X)
microbenchmark(
MASS::ginv(X),
pinv(X),
times = 1
)
# expr min lq mean median uq max neval
# MASS::ginv(X) 3.111503 3.111503 3.111503 3.111503 3.111503 3.111503 1
# pinv(X) 1.759175 1.759175 1.759175 1.759175 1.759175 1.759175 1
all.equal(pinv(X), MASS::ginv(X))
# TRUE |
For very high-dimensional fixed effects problems, using sparse matrices should make the MNW jackknife estimator computationally feasible (based on a suggestion by @kylebutts).
The text was updated successfully, but these errors were encountered: