-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] pandas migration #1347
Closed
Closed
[WIP] pandas migration #1347
Changes from 1 commit
Commits
Show all changes
140 commits
Select commit
Hold shift + click to select a range
bb960ae
Extend str in Variable.
sstanovnik 5a8846c
pandas migration: first huge, breaking, table update.
sstanovnik 97fab90
Enable strict read-only access on Table X/Y/meta views.
sstanovnik 3e7a853
Further changes to Table, as per recent comments.
sstanovnik 81ddf10
Add Table.attributes to the pandas persistence scheme.
sstanovnik 89907af
Insert pandas into requirements-core.
sstanovnik b732ed7
Table constructors and other fixes.
sstanovnik 7af5a67
OWSelectRows: transform usage of Filter into pandas syntax.
sstanovnik 85f6920
Completely remove Filter.
sstanovnik b8ee3b5
Table Domain changes, Variable inference and miscellaneous fixes.
sstanovnik 7d3e29b
Make indexing and weights work from empty Tables.
sstanovnik ebb5a1f
Remove RowInstance completely.
sstanovnik 5853ec5
Remove and transform infrequently-used old syntax.
sstanovnik 77bf036
Completely remove Instance.
sstanovnik 43f1a7f
Remove Storage.
sstanovnik d7a159d
Some minor fixes and cleanup in Table.
sstanovnik 8623abc
A multitude of small fixes of bugs shown by tests.
sstanovnik 3f7369b
Use pandas' reader for csv and tab.
sstanovnik 827d4e1
Tab reader fixes for less common behaviour.
sstanovnik 36d5b94
Port ExcelReader to use pandas' Excel reader.
sstanovnik 8535478
Improve DiscreteVariable discreteness determination and parsing.
sstanovnik ba08eaf
Transform values we interpret as null with actual null values when
sstanovnik ce5cf4f
Handle NA weights when setting them.
sstanovnik f3a064b
Use pandas' categorical coltype for DiscreteVariable.
sstanovnik c170ade
Convert TimeVariable functionality to pandas.
sstanovnik 745ca03
Small fixes: sniffer size, NA weights handling.
sstanovnik 4a656b1
Variable equality fix and TimeVariable test modification.
sstanovnik ac914f0
A lot of small fixes for issues found by tests.
sstanovnik eca5618
Fix handling null values in TimeVariable columns.
sstanovnik fae01ac
Improve reading tab and csv files.
sstanovnik 601d82d
Remove an unneeded test.
sstanovnik 99ecf47
Compatibility shims for SQL table.
sstanovnik 135eb47
Make Data Table work, transfer basic stats to pandas.
sstanovnik 78f0db1
Multiple fixes: TableSeries retain attributes, constructor works
sstanovnik 699b240
Remove Value.
sstanovnik 360d0df
Fix recent TimeVariable changes.
sstanovnik 38b648c
Some basic fixes for subscripting pandas.
sstanovnik 9a4ef13
Migrate distributions to pandas.
sstanovnik 2711440
Ported contingency to pandas.
sstanovnik e6f0cd9
Remove statistics/util.py.
sstanovnik bef5741
Adapt distances and tests to work with the new Table.
sstanovnik 81bad83
Adapt preprocessors (discretize, impute) to work with the new Table.
sstanovnik 5020b89
Loads of test and compatibility fixes.
sstanovnik 90c43ec
k-Means compatibility fixes.
sstanovnik 2dab765
Transform Continuize for usage with pandas.
sstanovnik 54677f0
Fix parsing files with discrete variables which specify values.
sstanovnik 9146bfa
Discretization pandas compatibility, also test fixes.
sstanovnik 494caf0
Evaluation - scoring test compatibility fix.
sstanovnik f43eb2c
Intepret missing value markers when reading from file.
sstanovnik d2297f8
Impute pandas adaptation, with tests.
sstanovnik fc5b426
Fix transforming discrete ordinal values into descriptor values.
sstanovnik 7dde951
Distributions should use weights instead of counts.
sstanovnik 238a838
Miscellaneous test adaptations.
sstanovnik 9d5093d
Convert normalization, use groupby instead of value_count in distribu…
sstanovnik 72e2eb0
Add copying to table constructors, very important!
sstanovnik dc8027d
Migrate randomization to pandas.
sstanovnik 7317045
A small fix for caching table transformations.
sstanovnik 8c10f2a
Miscellaneous test compatibility fixes.
sstanovnik 9d3cf40
Migrate remover and its tests.
sstanovnik d019e8d
Simple tree and softmax adaptation.
sstanovnik db2a2a9
Only allow one of specified delimiters when reading file.
sstanovnik 2b53079
Remove Value tests.
sstanovnik 55d233a
Miscellaneous table test compatibility fixes.
sstanovnik 5cd878f
Use 0 instead of NA when values don't exist in distributions.
sstanovnik 1fde4f2
Feature scoring test compatibility.
sstanovnik aa1d4c6
A bucketload of fixes for widgets.
sstanovnik 3efd943
Fix owcontinuize to use proper continuization behaviour.
sstanovnik f5ee17d
Don't intepret None as a missing value when reading a table.
sstanovnik a44e1e4
Use proper top-level imports. D'oh!
sstanovnik 14efa24
Use a more robust way of computing basic stats.
sstanovnik 74bc375
A small fix for the new single-class test.
sstanovnik bc64dbc
Fixes for some elusive tests.
sstanovnik d21a3e8
Port SQLTable to a pandas backend. Some breaking changes.
sstanovnik 23d42c4
Completely overhaul the Table class inheritance structure.
sstanovnik 5c23db1
Fix some broken Table imports.
sstanovnik 6004bed
Basic SparseTable functionality.
sstanovnik bc84af8
Distributions for sparse tables.
sstanovnik c190a5a
A snail-paced implementationof contingency computation for sparse
sstanovnik 5864dd1
Improved the reading capabilities.
sstanovnik 30b7f13
Fix elusive tests.
sstanovnik 7e5b420
Use add numexpr to requirements-core.
sstanovnik 9b94cb1
Use actual values instead of indices when constructing discretes.
sstanovnik c672319
Merge domain when not rowstacking concatenated tables.
sstanovnik 645e50b
Widget test adaptation and widget fixes.
sstanovnik ed500db
REVIEWME: 'fixed' displaying SQL tables.
sstanovnik cd93a3f
Test fixes, remove sql.compat.Value
sstanovnik 0d7ee88
Hopefully fix some strange failing tests.
sstanovnik 5d2f8a7
Remove val_from_str_add.
sstanovnik 430be00
Add to_var_col, a slightly optimized version of to_val.
sstanovnik a9f63c2
Remove TableBase.DENSE and related indicators.
sstanovnik e1f708d
Remove PanelBase and SparseTablePanel.
sstanovnik d17088e
Docstring bonanza!
sstanovnik 265b590
Remove some old, unused, deprecated things from TableBase.
sstanovnik 21bcc56
Remove variable.to_val_col.
sstanovnik 6f0f16e
Documentation slightly updated.
sstanovnik 0bd7160
Increase test coverage, some bugfixes.
sstanovnik 9c2c6ba
Improve distributions, also coverage.
sstanovnik 4f6993b
Increase contingency coverage.
sstanovnik caa733d
Fix OWHeatmap and its recent tests.
sstanovnik 4843efc
Bump minimum version of pandas above 0.18.0.
sstanovnik b3031c9
Sparse fixes and improvements.
sstanovnik 411d027
Excel sheet naming.
sstanovnik 7ad4909
A multitude of fixes.
sstanovnik 25c5c2b
A truckload of changes.
sstanovnik 496f808
Remove Table.append.
sstanovnik 4f50571
From list with missing class fixes, indent, lesser __setitem__ breakage.
sstanovnik 96755ab
Remove the many missing-value replaces.
sstanovnik 811e3e9
Prevent multiple calls to __init__.
sstanovnik 8517bf2
Proper finalization and domain filtering.
sstanovnik 41ae304
Custom __str__ and __repr__, needs some work.
sstanovnik 33afcf0
REVIEWME: custom __iter__, iterates over rows, breaks pandas contract.
sstanovnik dfc1aba
Merge Data fix, new fun TableBase.merge method!
sstanovnik 2e156fa
Fix venn diagram.
sstanovnik 2add840
Fix Data Table.
sstanovnik ac6539d
Switch inputs from Table to TableBase.
sstanovnik 98c6353
Much better __str__, uses pandas magic.
sstanovnik 2fd14db
Except Orange instead of pandas behaviour in constructors.
sstanovnik 78e9450
Use pure nnumpy ops for transforming discretes into categoricals.
sstanovnik 287b049
Change usages of checksum to hash.
sstanovnik f2ef9ba
Add a notificatoin comment and test for the iterrows wrapper.
sstanovnik fc9d867
Remove shuffle in favour of .sample(frac=1).
sstanovnik b576a7e
Consolidate usages of the _transferer hack.
sstanovnik d0a03dd
Comments and tests to setUpClass, other test fixes.
sstanovnik 118a8d1
Add time component awareness to TimeVariable.
sstanovnik a70e2c5
Fix a failing doctest.
sstanovnik 8dbc1c7
Improve time column display with month and day.
sstanovnik b22e0d8
Add a pandas git build to travis.
sstanovnik 98b745f
Some general fixes, report test fixes.
sstanovnik 50e0989
Requirements.txt requires a different requirement format.
sstanovnik b561d2f
Further improvements to the documentation.
sstanovnik 27c596f
Revert 68b18c5: overriding __iter__.
sstanovnik 6ea711a
Simplify weight assignment.
sstanovnik 06bcc77
Cherry-pick: sstanovnik/orange3:benches.
sstanovnik 2bfca35
Weight setting robustness.
sstanovnik 6460ab0
Properer sparse handling.
sstanovnik 259bfc1
Always convert weights to floats on assignment.
sstanovnik ac75022
Fix visualizing continuous variables in Data Table.
sstanovnik 3317ff0
Significantly improve feature constructor performance.
sstanovnik fc48858
Domain editor fix and file reader hardening.
sstanovnik 3e6030f
Fix a failing owkmeans test.
sstanovnik File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Add time component awareness to TimeVariable.
- Loading branch information
commit 118a8d1c0b983eed9bf31d3cc84ca717ad4fedc8
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we instead use
pd.tseries.tools._guess_datetime_format()
and save a format string, like"%Y"
. Sure, it's a private interface, but we can catch that in a try-except at the top of the module, and just adapt it when it changes.It would be so much nicer to format datetimes constructed from yearly series as
"%Y"
only ...There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like that, what if strings have different formats? Parsing the column first does away with all parsing responsibility. There exists a
_guess_datetime_format_array
, but only considers the first non-null entry.Nonetheless, I added the ability to display only the year/yearmonth.