Tags: Quantco/glum
Tags
glum v3.0 (#677) * Make tests green with densematrix-refactor branch * Remove most Matrixbase subclass checks * Simplify _group_sum * Pre-commit autoupdate (#672) * Use boa in CI. (#673) * Fix covariance matrix mutating feature names (#671) * Do not use _set_up_... in covariance_matrix * Add changelog entry * Add the option to store the covariance matrix to avoid recomputing it (#661) * Add option to store covariance matrix during fit * Fix fitting with variance matrix estimation `.covariance_matrix()` expects X and weights in a different format than what we have at the end of `.fit(). * Store covariance matrix after estimation * Handle the alpha_search and glm_cv cases * Propagate covariance parameters * Add changelog * Slightly more lenient tests * Pre-commit autoupdate (#676) Co-authored-by: quant-ranger[bot] <132915763+quant-ranger[bot]@users.noreply.github.com> * Fix covariance_matrix dtypes * Make CI use pre-release tabmat * Column names à la Tabmat #278 (#678) * Delegate column naming to tabmat * Add tests * More tests * Test for dropping complete categories * Add docstrings for new argument * Add changelog entry * Convert to pandas at the correct place * Reorganize converting from pandas * Remove xfail from test * Formula interface (#670) * Add formulaic to dependencies * Add function for transforming the formula * Add tests * First draft of glum formula interface * Fixes and tests * Handle intercept correctly * Add formula functionality to glm_cv * Variables from local context * Test predict with formulas * Add formula tutorial * Fix tutorial * Reformat tutorial * Improve function signatures adn docstrings * Handle two-sided formulas in covariance_matrix * Make mypy happy about module names * Matthias' suggestions * Improve tutorial * Improve tutorial * Formula- and term-based Wald-tests (#689) * Add formulaic to dependencies * Add function for transforming the formula * Add tests * First draft of glum formula interface * Fixes and tests * Handle intercept correctly * Add formula functionality to glm_cv * Variables from local context * Test predict with formulas * Add formula tutorial * Fix tutorial * Reformat tutorial * Improve function signatures adn docstrings * Handle two-sided formulas in covariance_matrix * Make mypy happy about module names * Matthias' suggestions * Add back term-based Wald-tests * Tests for term names * Add formula-based Wald-test * Tests for formula-based Wald-test * Add changelog * Fix exception message * Additional test case * make docstrings clearer in the case of terms * Support for missing values in categorical columns (#684) * Delegate column naming to tabmat * Add tests * More tests * Test for dropping complete categories * Add docstrings for new argument * Add changelog entry * Convert to pandas at the correct place * Reorganize converting from pandas * Remove xfail from test * Implement missing categorical support * Add test * Solve adding missing category when predicting * Apply Matthias' suggestions * Add changelog entry * Fix formula context (#691) * Make tests fail * Propagate context through methods * pyupgrade * ensure_full_rank != drop_first * fix * move feature name assignment to right spot * fix * remove blank line * bump minimum formulaic version (stateful transforms) * improve backward compatibility * Remove code that is not needed in tabmat v4 / glum v3 (#741) * Remove check_array from predict() We don't need it here as predict calls linear_redictor, and the latter does this check. We can avoid doing it twice. * Remove _name_categorical_variable parts There is no need for those as Tabmat v4 handles variable names internally. --------- Co-authored-by: Martin Stancsics <[email protected]> * Fix formula test: consider presence of intercept in full rankness check when constructing the model matrix externally (#746) * deal with intercept in formula test correctly * naming [skip ci] * test varying significance level in coef table test (#749) * pin formulaic to 0.6 (#752) * Add illustration of formula interface to example in README (#751) * add illustration of formula to readme * rephrase * spacing * add linear term for illustration * Determine presence of intercept only by `fit_intercept` argument (#747) * always use self.fit_intercept; raise if formula conflicts with it * wording [skip ci] * adjust other tests, cosmetics * don't compare specs with singular matrix to smf * fix smf test formula * fix intercept in context test * remove outdated sentence; clean up * fix * adjust tutorial * adjust tutorial * consistent linebreaks in docstring * remove obsolete arg in docstring * Informative error when encountering categories that were not seen in training (#748) * drop missings not seen in training * zero not drop * better (?) name [skip ci] * catch case of unseen missings and fail method * fix * respect categorical missing method with formula; test different categorical missing methods also with formula * shorten the tests * dont allow fitting in case of conversion of categoricals and presence of formula * clearer error msg * also change the error msg in the regex (facepalm) * remove matches * fix * better name * describe more restrictive behavior in tutorial * Raise error on unseen levels when predicting * Allow cat_missing_method='convert' again * Update test * Check for unseen categories * Adapt align_df_categories tests to changes * Make pre-commit happy * Avoid unnecessary work * Correctly expand penalties with categoricals and `cat_missing_method="convert"` (#753) * Correctyl expand penalties when cat_missing_method=convert * Add test * Improve variable names Co-authored-by: Matthias Schmidtblaicher <[email protected]> --------- Co-authored-by: Matthias Schmidtblaicher <[email protected]> * bump tabmat pre-release version --------- Co-authored-by: Martin Stancsics <[email protected]> * docstring cosmetics * even more docstring cosmetics * Do not fail when an estimator misses class members that are new in v3 (#757) * do not fail on missing class members that are new in v3 * simplify * convert * shorten the comment * simplify * don't use getattr unnecessarily * cosmetics * fix unrelated typo * tiny cosmetics [skip ci] * No regularization as default (#758) * set alpha=0 as default * fix docstring * add alpha where needed to avoid LinAlgError * add changelog entry * also set alpha in golden master * change name in persisted file too * set alpha in model_parameters again * don't modify case of no alpha attribute, which is RegressorCV * remove invalid alpha argument * wording * Improve code readability * Make arguments to public methods except `X`, `y`, `sample_weight` and `offset` keyword-only and make initialization keyword-only (#764) * make all args except X, y, sample_weight, offset keyword only; make initialization keyword only * add changelog [skip ci] * mention that also RegressorBase was changed [skip ci] * fix import * clean up changelog * Restructure distributions (#768) * Explain `scale_predictors` more (#778) * Expand on effect of scale_predictors and remove note * Update src/glum/_glm.py Co-authored-by: Jan Tilly <[email protected]> * remove sentence --------- Co-authored-by: Jan Tilly <[email protected]> * Move helpers into `_utils` (#782) * Patch docstring * Update CHANGELOG.rst Co-authored-by: Luca Bittarello <[email protected]> * Apply suggestions from code review Co-authored-by: Luca Bittarello <[email protected]> * shorten docstrings of private functions; typos in defaults; other suggestions * context docstring * kwargs * no context as default; small cleanups * add explanation to get calling scope * adjust to tabmat release * keep whitespace * temporarily add tabmat_dev channel again to investigate env solving failure on CI * remove tabmat_dev channel again * for now, disable conda build test on osx and Python 3.12 * Add a different environment for macos (#786) * try solving on ci with different env for macos * add missing if * typo * try and remove --no-test flag * replace deprecated scipy.sparse.*_matrix.A * replace other instance of .A * two more * simply replace all instances of .A by .toarray() (tabmat knows both) * update CHANGELOG for release --------- Co-authored-by: quant-ranger[bot] <132915763+quant-ranger[bot]@users.noreply.github.com> Co-authored-by: Jan Tilly <[email protected]> Co-authored-by: Marc-Antoine Schmidt <[email protected]> Co-authored-by: Matthias Schmidtblaicher <[email protected]> Co-authored-by: Matthias Schmidtblaicher <[email protected]> Co-authored-by: Martin Stancsics <[email protected]> Co-authored-by: Luca Bittarello <[email protected]> Co-authored-by: lbittarello <[email protected]>
Remove code that is not needed in tabmat v4 / glum v3 (#741) * Remove check_array from predict() We don't need it here as predict calls linear_redictor, and the latter does this check. We can avoid doing it twice. * Remove _name_categorical_variable parts There is no need for those as Tabmat v4 handles variable names internally. --------- Co-authored-by: Martin Stancsics <[email protected]>
Support for missing values in categorical columns (#684) * Delegate column naming to tabmat * Add tests * More tests * Test for dropping complete categories * Add docstrings for new argument * Add changelog entry * Convert to pandas at the correct place * Reorganize converting from pandas * Remove xfail from test * Implement missing categorical support * Add test * Solve adding missing category when predicting * Apply Matthias' suggestions * Add changelog entry
PreviousNext