Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeEncodeError when trying pivot_ui(df) #2

Closed
Paul-Yuchao-Dong opened this issue Sep 14, 2015 · 17 comments
Closed

UnicodeEncodeError when trying pivot_ui(df) #2

Paul-Yuchao-Dong opened this issue Sep 14, 2015 · 17 comments

Comments

@Paul-Yuchao-Dong
Copy link

Great tool for data exploration, I think this should be a core utility of Jupyter!

I got the error message:
UnicodeEncodeError: 'ascii' codec can't encode characters in position 8-11: ordinal not in range(128)

I think it is because of of the pd.to_csv method, adding some encoding handling capability could fix this.

But not really sure how...

thank you for the great js magic!

@nicolaskruchten
Copy link
Owner

OK, I'll see if I can fix this :)

Can you provide a sample data frame that causes this problem?

@Paul-Yuchao-Dong
Copy link
Author

image
Sorry total newbie here, not sure how to do a proper uploading of the data.

Basically, one of the columns contains character in Chinese, like 巴西出口投资促进局北京代表处.

I tested using pd.to_csv(encoding = 'utf-8'), and it can be properly output into a csv, I am using pandas 0.16+

@nicolaskruchten
Copy link
Owner

I'm having trouble replicating this problem... Can you provide a bit more context/sample code please? What versions of Jupyter/Pandas/Python are you using?

@ahmetanildindar
Copy link

Same problem :)
Versions => Jupyter/Pandas/Python = 4.1.0 / 0.18.1 / 2.7.12


The columns namely "Kullanıcı" and "İsim" cause coding problem as followis

Any solution or recommendation?

++Ahmet

@nicolaskruchten
Copy link
Owner

Sorry for the long delay on this, but I'm back to giving this project a bit of love... Can someone post a link to a CSV file that exposes this problem? I still can't replicate it.

@indivisible
Copy link

This reproduces the error:

import pandas as pd
import io
from pivottablejs import pivot_ui
csv_data = ',\xe1rv\xedzt\u0171r\u0151 t\xfck\xf6rf\xfar\xf3g\xe9p\n0,42\n'
csv = io.StringIO(csv_data)
df = pd.read_csv(csv)
pivot_ui(df)

You should probably treat the input things and template as unicode, and then properly encode it for writing. Easy to fix:

diff --git a/pivottablejs/__init__.py b/pivottablejs/__init__.py
index 64c2b2f..fd70da5 100644
--- a/pivottablejs/__init__.py
+++ b/pivottablejs/__init__.py
@@ -1,4 +1,4 @@
-TEMPLATE = """
+TEMPLATE = u"""
 <!DOCTYPE html>
 <html>
     <head>
@@ -70,8 +70,8 @@ import json

 def pivot_ui(df, outfile_path = "pivottablejs.html", url="",
     width="100%", height="500", **kwargs):
-    with open(outfile_path, 'w') as outfile:
-        outfile.write(TEMPLATE %
-            dict(csv=df.to_csv(), kwargs=json.dumps(kwargs)))
+    with open(outfile_path, 'wb') as outfile:
+        outfile.write((TEMPLATE %
+            dict(csv=df.to_csv(), kwargs=json.dumps(kwargs))).encode('utf8'))
     return IFrame(src=url or outfile_path, width=width, height=height)

This fixes it for py3.6, and should work for 2.7 too (not tested).

@nicolaskruchten
Copy link
Owner

Interesting. So in Python 2.7, your code results in an error on the line csv_data = ',\xe1rv\xedzt\u0171r\u0151 t\xfck\xf6rf\xfar\xf3g\xe9p\n0,42\n' but if I prefix the string with u then the code runs without error, including the pivot table:

image

@nicolaskruchten
Copy link
Owner

I'm actually seeing the same behaviour in Python 3.5

@indivisible
Copy link

The problem is that you do things without ever specifying an encoding, so behaviour is system dependent.
See the docs for example:

In text mode, if encoding is not specified the encoding used is platform dependent: locale.getpreferredencoding(False) is called to get the current locale encoding.

You should always specify the encoding you wish to use for text IO, even if it happens to work at the time on your dev environment.

@indivisible
Copy link

Also a cleaner way than my fix would be to just use io.open(outfile_path, 'w', encoding='utf8') to open the file.

@nicolaskruchten
Copy link
Owner

nicolaskruchten commented May 10, 2017 via email

@nicolaskruchten
Copy link
Owner

nicolaskruchten commented May 10, 2017 via email

@nicolaskruchten
Copy link
Owner

@indivisible your suggestion does work in Python 3 but causes all sorts of encoding errors in Python 2.7 for me with the following CSV: https://github.com/nicolaskruchten/pivottable/blob/master/examples/mps.csv

@indivisible
Copy link

I see, sorry.

I finally installed python2.7 and looked a bit closer at the problem. I've made a gist that has a script to reproduce the problem, sample outputs, and a slightly better patch.

@nicolaskruchten
Copy link
Owner

OK, nice, this seems like a reasonably elegant fix @indivisible, thanks!

Question re your gist: in output.txt the fourth test case fails... Is that with or without your patch? I wasn't able to replicate this test case locally, as here on my Mac even with LC_ALL=en_US the preferred locale is still UTF8.

@indivisible
Copy link

Without. With the patch applied all outputs matched. I have no idea how locale settings on Macs work, the only systems I could test it on were Linux and Windows. The sample output is from Linux. On windows the "preferred encoding" seems to be cp1252...

Cheers!

@nicolaskruchten
Copy link
Owner

Awesome, thanks so much for the help! I've just released v0.7.0 with these fixes :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants