UnicodeEncodeError when trying pivot_ui(df) #2

Paul-Yuchao-Dong · 2015-09-14T01:52:46Z

Great tool for data exploration, I think this should be a core utility of Jupyter!

I got the error message:
UnicodeEncodeError: 'ascii' codec can't encode characters in position 8-11: ordinal not in range(128)

I think it is because of of the pd.to_csv method, adding some encoding handling capability could fix this.

But not really sure how...

thank you for the great js magic!

nicolaskruchten · 2015-09-14T02:23:00Z

OK, I'll see if I can fix this :)

Can you provide a sample data frame that causes this problem?

Paul-Yuchao-Dong · 2015-09-14T02:29:46Z

Sorry total newbie here, not sure how to do a proper uploading of the data.

Basically, one of the columns contains character in Chinese, like 巴西出口投资促进局北京代表处.

I tested using pd.to_csv(encoding = 'utf-8'), and it can be properly output into a csv, I am using pandas 0.16+

nicolaskruchten · 2015-09-27T02:40:55Z

I'm having trouble replicating this problem... Can you provide a bit more context/sample code please? What versions of Jupyter/Pandas/Python are you using?

ahmetanildindar · 2016-07-17T12:29:10Z

Same problem :)
Versions => Jupyter/Pandas/Python = 4.1.0 / 0.18.1 / 2.7.12

The columns namely "Kullanıcı" and "İsim" cause coding problem as followis

Any solution or recommendation?

++Ahmet

nicolaskruchten · 2017-05-07T16:21:59Z

Sorry for the long delay on this, but I'm back to giving this project a bit of love... Can someone post a link to a CSV file that exposes this problem? I still can't replicate it.

indivisible · 2017-05-10T19:39:16Z

This reproduces the error:

import pandas as pd
import io
from pivottablejs import pivot_ui
csv_data = ',\xe1rv\xedzt\u0171r\u0151 t\xfck\xf6rf\xfar\xf3g\xe9p\n0,42\n'
csv = io.StringIO(csv_data)
df = pd.read_csv(csv)
pivot_ui(df)

You should probably treat the input things and template as unicode, and then properly encode it for writing. Easy to fix:

diff --git a/pivottablejs/__init__.py b/pivottablejs/__init__.py
index 64c2b2f..fd70da5 100644
--- a/pivottablejs/__init__.py
+++ b/pivottablejs/__init__.py
@@ -1,4 +1,4 @@
-TEMPLATE = """
+TEMPLATE = u"""
 <!DOCTYPE html>
 <html>
     <head>
@@ -70,8 +70,8 @@ import json

 def pivot_ui(df, outfile_path = "pivottablejs.html", url="",
     width="100%", height="500", **kwargs):
-    with open(outfile_path, 'w') as outfile:
-        outfile.write(TEMPLATE %
-            dict(csv=df.to_csv(), kwargs=json.dumps(kwargs)))
+    with open(outfile_path, 'wb') as outfile:
+        outfile.write((TEMPLATE %
+            dict(csv=df.to_csv(), kwargs=json.dumps(kwargs))).encode('utf8'))
     return IFrame(src=url or outfile_path, width=width, height=height)

This fixes it for py3.6, and should work for 2.7 too (not tested).

nicolaskruchten · 2017-05-10T20:09:07Z

Interesting. So in Python 2.7, your code results in an error on the line csv_data = ',\xe1rv\xedzt\u0171r\u0151 t\xfck\xf6rf\xfar\xf3g\xe9p\n0,42\n' but if I prefix the string with u then the code runs without error, including the pivot table:

nicolaskruchten · 2017-05-10T20:13:30Z

I'm actually seeing the same behaviour in Python 3.5

indivisible · 2017-05-10T20:26:56Z

The problem is that you do things without ever specifying an encoding, so behaviour is system dependent.
See the docs for example:

In text mode, if encoding is not specified the encoding used is platform dependent: locale.getpreferredencoding(False) is called to get the current locale encoding.

You should always specify the encoding you wish to use for text IO, even if it happens to work at the time on your dev environment.

indivisible · 2017-05-10T20:36:03Z

Also a cleaner way than my fix would be to just use io.open(outfile_path, 'w', encoding='utf8') to open the file.

nicolaskruchten · 2017-05-10T21:51:55Z

Ok I'll look into that, thanks! To your earlier point, which is the specific call am I making that has system-dependent behaviour if encoding is not specified?

…

On Wed, May 10, 2017 at 16:36 indivisible ***@***.***> wrote: Also a cleaner way than my fix would be to just use io.open(outfile_path, 'w', encoding='utf8') to open the file. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#2 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAMbA5L0_5RbCbDawr-0ZwhQeqf93Vndks5r4h-zgaJpZM4F8jQ5> .

nicolaskruchten · 2017-05-10T21:52:40Z

Ah never mind, I see the link now :)

…

On Wed, May 10, 2017 at 17:51 Nicolas Kruchten ***@***.***> wrote: Ok I'll look into that, thanks! To your earlier point, which is the specific call am I making that has system-dependent behaviour if encoding is not specified? On Wed, May 10, 2017 at 16:36 indivisible ***@***.***> wrote: > Also a cleaner way than my fix would be to just use io.open(outfile_path, > 'w', encoding='utf8') to open the file. > > — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub > <#2 (comment)>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/AAMbA5L0_5RbCbDawr-0ZwhQeqf93Vndks5r4h-zgaJpZM4F8jQ5> > . >

nicolaskruchten · 2017-05-11T14:35:36Z

@indivisible your suggestion does work in Python 3 but causes all sorts of encoding errors in Python 2.7 for me with the following CSV: https://github.com/nicolaskruchten/pivottable/blob/master/examples/mps.csv

indivisible · 2017-05-11T17:25:40Z

I see, sorry.

I finally installed python2.7 and looked a bit closer at the problem. I've made a gist that has a script to reproduce the problem, sample outputs, and a slightly better patch.

nicolaskruchten · 2017-05-12T03:02:00Z

OK, nice, this seems like a reasonably elegant fix @indivisible, thanks!

Question re your gist: in output.txt the fourth test case fails... Is that with or without your patch? I wasn't able to replicate this test case locally, as here on my Mac even with LC_ALL=en_US the preferred locale is still UTF8.

indivisible · 2017-05-12T07:38:57Z

Without. With the patch applied all outputs matched. I have no idea how locale settings on Macs work, the only systems I could test it on were Linux and Windows. The sample output is from Linux. On windows the "preferred encoding" seems to be cp1252...

Cheers!

nicolaskruchten · 2017-05-12T15:26:01Z

Awesome, thanks so much for the help! I've just released v0.7.0 with these fixes :)

nicolaskruchten added a commit that referenced this issue May 12, 2017

fixing Unicode issues reported in #2

4091a43

nicolaskruchten closed this as completed May 12, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UnicodeEncodeError when trying pivot_ui(df) #2

UnicodeEncodeError when trying pivot_ui(df) #2

Paul-Yuchao-Dong commented Sep 14, 2015

nicolaskruchten commented Sep 14, 2015

Paul-Yuchao-Dong commented Sep 14, 2015

nicolaskruchten commented Sep 27, 2015

ahmetanildindar commented Jul 17, 2016

nicolaskruchten commented May 7, 2017

indivisible commented May 10, 2017

nicolaskruchten commented May 10, 2017

nicolaskruchten commented May 10, 2017

indivisible commented May 10, 2017

indivisible commented May 10, 2017

nicolaskruchten commented May 10, 2017 via email

nicolaskruchten commented May 10, 2017 via email

nicolaskruchten commented May 11, 2017

indivisible commented May 11, 2017

nicolaskruchten commented May 12, 2017

indivisible commented May 12, 2017

nicolaskruchten commented May 12, 2017

UnicodeEncodeError when trying pivot_ui(df) #2

UnicodeEncodeError when trying pivot_ui(df) #2

Comments

Paul-Yuchao-Dong commented Sep 14, 2015

nicolaskruchten commented Sep 14, 2015

Paul-Yuchao-Dong commented Sep 14, 2015

nicolaskruchten commented Sep 27, 2015

ahmetanildindar commented Jul 17, 2016

nicolaskruchten commented May 7, 2017

indivisible commented May 10, 2017

nicolaskruchten commented May 10, 2017

nicolaskruchten commented May 10, 2017

indivisible commented May 10, 2017

indivisible commented May 10, 2017

nicolaskruchten commented May 10, 2017 via email

nicolaskruchten commented May 10, 2017 via email

nicolaskruchten commented May 11, 2017

indivisible commented May 11, 2017

nicolaskruchten commented May 12, 2017

indivisible commented May 12, 2017

nicolaskruchten commented May 12, 2017