Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Figure object output type #1333

Open
Blubbaa opened this issue Aug 21, 2024 · 3 comments
Open

Support for Figure object output type #1333

Blubbaa opened this issue Aug 21, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@Blubbaa
Copy link

Blubbaa commented Aug 21, 2024

🚀 The feature

I would love to have the ability of setting the ouput type to a plotly or matplotlib figure, instead of saving plots to PNG and returning filepaths, where the usefulness is quite limited.

Motivation, pitch

I recently started using pandasai for building custom data analysis apps and I like it quite a lot so far. I was wondering why it is limited to the four output datatypes and even more why the (cumbersome) way of saving images to disk and returning a filepath has been chosen. Maybe it has a security related reason or it is due to the client-server architecture of pandasai? Instead returning objects (like already implemented for the dataframe) would open much more potential, especially regarding plots and figures.

I have already tinkered with modifying the output_type_template.tmpl and output_validator.py in order to make pandasai return figure objects. However I do not really know about potential problems/implications of this and thus am proposing this here as a feature request, since my "hacky" implementation is probably not how it should be implemented.

Here you can see the resulting prompt used and the generated code, which works fine for now in a standalone app.
Prompt used:

<dataframe>
dfs[0]:150x5
Sepal_Length,Sepal_Width,Petal_Length,Petal_Width,Class
7.2,3.4,6.4,0.3,Iris-setosa
4.5,4.1,3.5,1.7,Iris-virginica
6.0,2.6,5.9,2.5,Iris-versicolor
</dataframe>


Update this initial code:
"""python
# TODO: import the required dependencies
import pandas as pd

# Write code here

# Declare result var:
type (must be "figure"), value must be a matplotlib.figure or plotly.graph_objects.Figure. Example: { "type": "figure", "value": go.Figure(...) }   

"""



### QUERY
 Plot the sepal length and width of the data and color points by class

Variable `dfs: list[pd.DataFrame]` is already declared.

At the end, declare "result" variable as a dictionary of type and value.

If you are asked to plot a chart, use "plotly" for charts, save as png.


Generate python code and return full updated code:

Resulting Code:

df = dfs[0]
fig = px.scatter(df, x='Sepal_Length', y='Sepal_Width', color='Class', title='Sepal Length vs Sepal Width', labels={'Sepal_Length': 'Sepal Length', 'Sepal_Width': 'Sepal Width'})
result = {'type': 'figure', 'value': fig}

Alternatives

I know its also possible to convert plotly figures from/to json. So maybe this could be another option to return (or potentially also save) the figure as json instead.

Additional context

Final Result in Chatbot App:
image

@dosubot dosubot bot added the enhancement New feature or request label Aug 21, 2024
@at-eez-jedi
Copy link

@Blubbaa May I ask how did you modify the output_type_template.tmpl and the output_validator.py in order to make pandasai return figure objects?

@Blubbaa
Copy link
Author

Blubbaa commented Sep 3, 2024

@at-eez-jedi yes surely. I have modified output_type_template.tmpl like this:

{% if not output_type %}
type (possible values "string", "number", "dataframe", "plot", "figure"). Examples: { "type": "string", "value": f"The highest salary is {highest_salary}." } or { "type": "number", "value": 125 } or { "type": "dataframe", "value": pd.DataFrame({...}) } or { "type": "plot", "value": "temp_chart.png" } or { "type": "figure", "value": go.Figure(...) }
{% elif output_type == "number" %}
type (must be "number"), value must int. Example: { "type": "number", "value": 125 }
{% elif output_type == "string" %}
type (must be "string"), value must be string. Example: { "type": "string", "value": f"The highest salary is {highest_salary}." }
{% elif output_type == "dataframe" %}
type (must be "dataframe"), value must be pd.DataFrame or pd.Series. Example: { "type": "dataframe", "value": pd.DataFrame({...}) }
{% elif output_type == "plot" %}
type (must be "plot"), value must be string. Example: { "type": "plot", "value": "temp_chart.png" }
{% elif output_type == "figure" %}
type (must be "figure"), value must be a matplotlib.figure or plotly.graph_objects.Figure. Example: { "type": "figure", "value": go.Figure(...) }
{% endif %}

I also inserted this part in generate_python_code.tmpl, to deal with the save as PNG instruction:

At the end, declare "result" variable as a dictionary of type and value.
{% if viz_lib %}
If you are asked to plot a chart, use "{{viz_lib}}" for charts.
{% endif %}
{% if output_type == "plot" %}
Save charts as PNG.
{% endif %}
{% if output_type == "figure" %}
Do not save the figure to file.
{% endif %}

And output_validator.py:

def validate_value(self, expected_type: str) -> bool:
        if not expected_type:
            return True
        elif expected_type == "number":
            return isinstance(self, (int, float))
        elif expected_type == "string":
            return isinstance(self, str)
        elif expected_type == "dataframe":
            return isinstance(self, (pd.DataFrame, pd.Series))
        elif expected_type == "plot":
            if not isinstance(self, (str, dict)):
                return False

            if isinstance(self, dict):
                return True

            path_to_plot_pattern = r"^(\/[\w.-]+)+(/[\w.-]+)*$|^[^\s/]+(/[\w.-]+)*$"
            return bool(re.match(path_to_plot_pattern, self))
        elif expected_type == "figure":
            return "plotly.graph_objs._figure.Figure" in repr(type(self)) or "matplotlib.figure.Figure" in repr(type(self))

    @staticmethod
    def validate_result(result: dict) -> bool:
        if not isinstance(result, dict) or "type" not in result:
            raise InvalidOutputValueMismatch(
                "Result must be in the format of dictionary of type and value"
            )

        if not result["type"]:
            return False

        elif result["type"] == "number":
            return isinstance(result["value"], (int, float, np.int64))
        elif result["type"] == "string":
            return isinstance(result["value"], str)
        elif result["type"] == "dataframe":
            return isinstance(result["value"], (pd.DataFrame, pd.Series))
        elif result["type"] == "plot":
            if "plotly" in repr(type(result["value"])):
                return True

            if not isinstance(result["value"], (str, dict)):
                return False

            if isinstance(result["value"], dict) or (
                isinstance(result["value"], str)
                and "data:image/png;base64" in result["value"]
            ):
                return True

            path_to_plot_pattern = r"^(\/[\w.-]+)+(/[\w.-]+)*$|^[^\s/]+(/[\w.-]+)*$"
            return bool(re.match(path_to_plot_pattern, result["value"]))
        elif result["type"] == "figure":
            return "plotly.graph_objs._figure.Figure" in repr(type(result["value"])) or "matplotlib.figure.Figure" in repr(type(result["value"]))

@phoenixor
Copy link

I have the same requirement. I use streamlit as the front end. I want PandasAI to return the chart object so that I can draw interactive charts with streamlit. At present, I can only judge if the output is the image path, and use st.image to display it, but every time a pop-up window will remind me that the local program cannot be found to open the picture, and the error is these.do you know why and how to fix this?
Error: no "view" rule for type "image/png" passed its test case
2e21cfbe-d430-4d35-b814-bb100c4dc032

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants