-
-
Notifications
You must be signed in to change notification settings - Fork 25.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Sparse Matrix Support For HistGradientBoostingClassifier #15336
Comments
Thank you for posting this feature request. We can discuss what kind of semantics we want for sparse matrix support. I.E. we can treat zero as missing or a literal zero. LightGBM uses a parameter to decide which semantic to use. |
No problem. Without knowing the full extent of what is required, I'd be happy to try and tackle it with your guidance on where to look, etc. |
Zero semantics would be consistent with every other estimator (except for
pairwise data).
|
For ref I had noted some implem suggestions in #16885 I believe @StealthyKamereon wants to give it a shot. Regarding semantics of zeros: we can have a boolean parameter |
Following what you said regarding semantics of zeros, I think in addition to the |
Can anyone give some temporary approach to solve this problem ? |
Description
Hi!
I'm receiving the error below when attempting to pass a sparse matrix to
HistGradientBoostingClassifier
. The matrix is the result of usingCountVectorizer
andTfidfTransformer
on input text.In my case, the size of the text prohibits converting the sparse matrix to a dense one (I run out of memory).
Steps/Code to Reproduce
Expected Results
No error is thrown.
Actual Results
TypeError: A sparse matrix was passed, but dense data is required. Use X.toarray() to convert to a dense numpy array.
Versions
System:
python: 3.7.3 (default, Oct 1 2019, 18:28:53) [GCC 5.4.0 20160609]
executable: /local_disk0/pythonVirtualEnvDirs/virtualEnv-3631eab5-084b-4139-952e-5aff594ac1bb/bin/python
machine: Linux-4.15.0-1050-azure-x86_64-with-debian-stretch-sid
Python deps:
pip: 19.0.3
setuptools: 40.8.0
sklearn: 0.21.3
numpy: 1.16.2
scipy: 1.2.1
Cython: 0.29.6
pandas: 0.24.2
The text was updated successfully, but these errors were encountered: