Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSV files are no longer imported correctly when double quotes inside strings are delimited with backslashes #1812

Closed
bjoernross opened this issue Oct 8, 2017 · 6 comments

Comments

@bjoernross
Copy link

Expected Behavior

Gephi 0.9.2 should be able to import CSV files in which double quotes inside strings are escaped with backslashes (see below for an example). This worked in 0.9.1 but no longer does.

Also, an error message should be shown when an IOException occurs while reading a CSV file.

Current Behavior

Since 0.9.2, CSV files are only imported correctly when double quotes inside strings are escaped with double quotes. When another character is used, the import fails, beginning with the first line to contain the escaped quotes.

No error message is shown. The rest of the file is simply ignored. This means that when importing a file with 100,000 edges you may end up with 10,000 and not know why.

Possible Solution

When importing a CSV file, let the user set the escape character, just like the user can currently set the field delimiter, or auto-detect.

Pass this parameter to CSVFormat’s withEscape function.

Steps to Reproduce

  1. Create a file with the following content:
 "Source";"Target"
 "Dwayne \"The Rock\" Johnson";"John Cena"
  1. Data Laboratory -> Import Speadsheet -> Select File, import as Edges List
  2. No data is shown in the preview. After clicking 'Finish' Gephi reports a successful import of zero edges and zero nodes.

Context

In CSV files, some people escape quotes inside strings with backslashes (see above). Some people escape them with double quotes:

 "Source";"Target"
 "Dwayne ""The Rock"" Johnson";"John Cena"

Gephi 0.9.1 expected the former; Gephi 0.9.2 expects the latter (as does Excel). I haven't seen this change documented anywhere. It means that, for example, some scripts written for older versions of Gephi that produce CSV files for import into Gephi are now broken.

Your Environment

  • Version used: Gephi 0.9.2
  • OS: I have tested this on macOS and Windows.

Relevant part of messages.log:

SEVERE: IOException reading next record: java.io.IOException: (line 2) invalid char between encapsulated token and delimiter
SEVERE [null]: Last record repeated 4 more times.
INFO [DefaultProcessor]: # Nodes loaded: 0
INFO [DefaultProcessor]: # Edges loaded: 0
@eduramiba
Copy link
Member

Thanks for the report!

@bjoernross
Copy link
Author

It looks like if you simply pass the backslash as an escape character, both examples will work. So this might actually already be the solution for my use case.

Of course, the separate issue remains that the user should always be notified if something goes wrong during import.

@eduramiba
Copy link
Member

Yeah I think I will do that, to not over-complicate the GUI.

@eduramiba
Copy link
Member

Fix commited, please try with 0.9.3-SNAPSHOT when the travis job ends https://travis-ci.org/gephi/gephi/builds/285083055

@bjoernross
Copy link
Author

Works great! Thank you.

@eduramiba
Copy link
Member

Great! This will be part of future in-app updates of Gephi 0.9.2

eduramiba added a commit that referenced this issue Nov 18, 2017
Based on commit 6634da3

#1516 Edge labels not retained on graphml export
#1788 GephiFormatException: Gephi failed saving the project.
#1789 NullPointerException: The fileObject parameter cannot be null
#1802 Exception with no-merge strategy in some cases. Incompatible edge should not be created
#1810 GephiFormatException can cause ArrayIndexOutOfBoundsException: 0
#1811 NullPointerException on EdgeTypeFilter
#1812 CSV files are no longer imported correctly when double quotes inside strings are delimited with backslashes
#1815 Add support of Byte Order Mark to CSV parser
#1840 Import of graphml still confuses d3 and label fields
#1848 Import CSV error edges: force undirected makes edges disappear when merged
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants