Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Way to convert Pandas dataframe to Table object #68

Closed
fccoelho opened this issue Jul 16, 2014 · 14 comments
Closed

Feature request: Way to convert Pandas dataframe to Table object #68

fccoelho opened this issue Jul 16, 2014 · 14 comments
Labels
Milestone

Comments

@fccoelho
Copy link

Hi, Pandas is quickly becoming a standard in data analysis in the scientific Python community. So I think you should consider adding support for converting data in a pandas Dataframe to an Orange table object.

The other day I had a very complex csv which I wanted to read into Orange. The available CSV Importer couldn't read it but pandas, or a straight python script loaded without problems. So I added a Python Script block to my project. But then I got stuck because I couldn't find a way to transform the imported table into an Orange Table Object.

Perhaps better documentation on how to buiid a Table from raw data would already help a lot.

@hlin117
Copy link

hlin117 commented Oct 16, 2014

I posted this question on Stackoverflow here:
http://stackoverflow.com/questions/26320638/converting-pandas-dataframe-to-orange-table?noredirect=1#comment41471770_26320638
I'm hoping someone would be able to answer this.

@mfitzp
Copy link

mfitzp commented Mar 28, 2015

I'm wondering if there would be potential for Pandas DataFrames (or a subclass) to be used as the internal data object for Orange? If there is a move towards numpy anyway, they work very nicely together + you get a lot of functionality for free.

@janezd
Copy link
Contributor

janezd commented Mar 30, 2015

We have seriously considered using Pandas instead of pure numpy, but we would have to add to many things on our own, so we decided it would be easier to start from numpy.

However, Orange is no longer limited to data in a single format. It can already use data which is stored in SQL and only moves it to local memory when needed (e.g. it would compute naive bayesian classifier on the database, without moving the data to the client). Adding Pandas DataFrames should be even simpler. It is not on any short-term list - our priority now is to port most of Orange 2 first - but we can do it in the future.

@mfitzp
Copy link

mfitzp commented Mar 30, 2015

@janezd That's great to hear. I can understand the thinking - Pandas does bring it's own restrictions as a base format (not least of all only allowing up to 2D data - which is limiting for image data, etc. depending on your plans for Orange). The restriction of data formats in Orange has previously prevented me from implementing some tools... time to take another look!

@janezd
Copy link
Contributor

janezd commented Mar 30, 2015

Check this: http://docs.orange.biolab.si/3/modules/data.table.html. The data is still 2d, but it can be, for instance, a view into a 3d array. It is much more flexible now.

@jamartinh
Copy link

def construct_domain(df):
    columns = OrderedDict(df.dtypes)

    def create_variable(col):
        if col[1].__str__().startswith('float'):
            return Orange.data.ContinuousVariable(col[0])
        if col[1].__str__().startswith('int') and len(df[col[0]].unique()) > 50:
            return Orange.data.ContinuousVariable(col[0])
        if col[1].__str__().startswith('date'):
            df[col[0]] = df[col[0]].values.astype(np.str)
        if col[1].__str__() == 'object':
            df[col[0]] = df[col[0]].astype(type(""))

        return Orange.data.DiscreteVariable(col[0], values = df[col[0]].unique().tolist())

    return Orange.data.Domain(list(map(create_variable, columns.items())))
def pandas_to_orange(df):
    domain = construct_domain(df)
    orange_table = Orange.data.Table.from_list(domain = domain, rows = df.values.tolist())
    return orange_table

@janezd
Copy link
Contributor

janezd commented Oct 16, 2015

Can you create a pull request with this? This makes it easier to follow the code and its potential changes.

@jamartinh
Copy link

There's a new add on available, Orange3-spark.

This add on include widgets to convert between Spark<--->Pandas<-->Orange

Perhaps the Pandas<--->Orange widgets can be included in the Default orange distro?

@kernc
Copy link
Contributor

kernc commented Dec 11, 2015

@jamartinh Could be, but can you please explain how they would be used?

@ajdapretnar
Copy link
Contributor

@kernc Can we close the issue since implementing Pandas dataframe is our GSoC project? Or we do it after the project is done?

@kernc
Copy link
Contributor

kernc commented May 13, 2016

Is the issue fixed? 😸

@ajdapretnar
Copy link
Contributor

It's a feature request, not an issue per se. Thus yes, feature request is accepted and is in the process of implementation.
Also see this: #796

@sstanovnik sstanovnik added this to the pandas milestone Aug 26, 2016
@sstanovnik
Copy link
Contributor

With the new pandas implementation (#1347) , there's a Table.from_dataframe method that resolves this issue.

@kernc
Copy link
Contributor

kernc commented Jul 29, 2018

Duplicate of #2932.

thocevar pushed a commit to thocevar/orange3 that referenced this issue Jun 26, 2020
[FIX] Double-click NodeItem when above a LinkItem
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants