Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixest: sparse matrix support #18

Open
s3alfisc opened this issue Jun 8, 2023 · 4 comments · May be fixed by #16
Open

fixest: sparse matrix support #18

s3alfisc opened this issue Jun 8, 2023 · 4 comments · May be fixed by #16

Comments

@s3alfisc
Copy link
Owner

s3alfisc commented Jun 8, 2023

For very high-dimensional fixed effects problems, using sparse matrices should make the MNW jackknife estimator computationally feasible (based on a suggestion by @kylebutts).

@s3alfisc s3alfisc linked a pull request Jun 8, 2023 that will close this issue
@kylebutts
Copy link
Contributor

Probably worth waiting for lrberge/fixest#418 to land before pushing, but if you install locally from this branch, you can use fixest::sparse_model_matrix in dev. I'm working on an implementation and will send you

@kylebutts
Copy link
Contributor

Looking good so far 👀

library(fixest)
library(Matrix)
library(summclust)
# library(devtools)
# load_all()

nlswork <- haven::read_dta("http://www.stata-press.com/data/r9/nlswork.dta")

# drop NAs at the moment
nlswork <- nlswork[, c("ln_wage", "grade", "age", "birth_yr",
                        "union", "race", "msp", "ind_code")]
nlswork <- na.omit(nlswork)

# 100 groups
nlswork$group <- (seq_len(nrow(nlswork)) - 1) %% 100 + 1

lm_fit <- lm(
  ln_wage ~ 0 + factor(union) + factor(msp) + factor(race),
  data = nlswork
)
feols_fit <- feols(
  ln_wage ~ 0 + i(union) + i(msp) + i(race),
  data = nlswork
)
obj = feols_fit

vcovCR3_feols = vcov_CR3J(feols_fit, ~ group, type = "CRV3")
sqrt(diag(vcovCR3_feols))
#>    union::0    union::1      msp::1     race::2     race::3 
#> 0.006237060 0.009494587 0.007800561 0.007650714 0.032370445

vcovCR3_lm    = vcov_CR3J(lm_fit, ~ group, type = "CRV3")
sqrt(diag(vcovCR3_lm))[c("factor(union)0", "factor(union)1", "factor(msp)1", "factor(race)2", "factor(race)3")]
#> factor(union)0 factor(union)1   factor(msp)1  factor(race)2  factor(race)3 
#>    0.006237060    0.009494587    0.007800561    0.007650714    0.032370445

X = fixest::sparse_model_matrix(feols_fit, type = c("rhs", "fixef"))
y = fixest::sparse_model_matrix(feols_fit, type = "lhs")
cluster_vec = nlswork$group

tictoc::tic("Sparse")
res = summclust:::cluster_jackknife_sparse(y, X, cluster_vec, type = "CRV3")
tictoc::toc()
#> Sparse: 0.331 sec elapsed
tictoc::tic("Dense")
res = summclust:::cluster_jackknife(as.matrix(y), as.matrix(X), data.frame(group = cluster_vec), type = "CRV3")
tictoc::toc()
#> Dense: 2.521 sec elapsed

Created on 2023-06-09 with reprex v2.0.2

@s3alfisc
Copy link
Owner Author

s3alfisc commented Jun 9, 2023

Oh, wow, awesome, that's indeed quite a nice speed up!

@s3alfisc
Copy link
Owner Author

One thing I learned: using RcppArmadillo's pinv is twice as fast as MASS::ginv:

library(microbenchmark)

N <- 1000
X <- as.factor(sample(1:1000, 100000, TRUE))
df <- data.frame(X = X)
mf <- model.frame(~X, df)
mm <- sparse.model.matrix(mf)
X <- crossprod(mm)
dim(X)
X <- as.matrix(X)

microbenchmark(
  MASS::ginv(X), 
  pinv(X), 
  times = 1
)
# expr      min       lq     mean   median       uq      max neval
# MASS::ginv(X) 3.111503 3.111503 3.111503 3.111503 3.111503 3.111503     1
# pinv(X) 1.759175 1.759175 1.759175 1.759175 1.759175 1.759175     1

all.equal(pinv(X), MASS::ginv(X))
# TRUE

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants