Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import of graphml still confuses d3 and label fields #1840

Closed
vparunak opened this issue Nov 13, 2017 · 5 comments
Closed

Import of graphml still confuses d3 and label fields #1840

vparunak opened this issue Nov 13, 2017 · 5 comments

Comments

@vparunak
Copy link

Expected Behavior

Gephi should display the 'label' node attribute as the node's label.

Current Behavior

Gephi displays the 'label' attribute if there is no key 'd3', but otherwise displays attribute 'd3'.
This problem appears to be related to #1719 and #1575, both of which are reported fixed. But I'm still getting it with 0.9.2.

Possible Solution

Steps to Reproduce

  1. The attached archive contains two small graphml files. The one with filename *_ok.graphml displays fine. The one with filename *_bad.graphml does not. The actual labels differ between the two, because they are generated with a random algorithm, but structurally the only difference is whether the 'label' attribute is associated with key 'd3' or not.
    Archive.zip

Context

Your Environment

  • Version used: Gephi 0.9.2
  • Java version: 1.8
  • Operating System: Mac OSX 10.12.6
@eduramiba
Copy link
Member

What does d3 contain in your file? The problem is some users want d3 to be the label, and others don't. Not sure what to do with this format, but it seems it's either not well defined or not well used by many tools (why "d3"? it says very little about what it contains).

Anyway, I suggest you try to remove that d3 attribute or try to make the label attribute appear after d3 attribute, so label overwrites it.

@vparunak
Copy link
Author

vparunak commented Nov 14, 2017

I think there's some confusion here among keys, attributes, and values, that has led to an inconsistency in how Gephi handles node attributes.

GraphML allows the labeling of nodes and edges with data. This labeling takes the form of partial functions. Multiple functions can be defined over the same domain.
Internally, GraphML indexes these functions with the key construct. Here's an example of a conformant key statement (from graphdrawing.org, the official GraphML site):

<key id="d0" for="node" attr.name="color" attr.type="string">`

This element defines a function over nodes ('for = "node"'), whose name is "color" and whose values are "string"s. Because there may be other functions over nodes, GraphML assigns a key, "d0", for cross-referencing within the .graphml file. Assuming we already have a element, here's how the value of this function for a given node is defined:

<node id="n2">
<data key="d0">blue</data>
</node>

But d0 is just an index. The name of the attribute in question is "color". Graph generation packages (such as networkx for Python) don't give the user access to the key at all. One just adds named attributes. For example, if "pattern" is a networkx graph object, the statements

pattern.node[i]['detail'] = 0
pattern.node[i]['kernelQ'] = 0
pattern.node[i]['label'] = a_label

define three attributes, named respectively 'detail', 'kernelQ', and 'label'. Internally, these get mapped to keys d1, d2, d3--it happens, in this order, but that shouldn't matter. If we invoke the 'label' line first, it will go into d1, 'detail' into d2, and 'kernelQ' into d3. Importantly, graphml has no intrinsic "label" attribute for nodes. If a user wants such an attribute, it has to be defined via a element.

Now here's where Gephi is inconsistent. When Gephi unpacks a graphml file, the column headings in the Data Laboratory view are drawn for the most part from attribute names. (The initial 'id' column comes from the element header itself.) If I define a 'foo' attribute, Gephi doesn't care whether its key is d1, d10, or d100. It extracts the attr.name and lists for each node the value defined by that node's element. But there is one exception. If there is a key d3, Gephi picks that up (whatever its attr.name may be) and assigns it to the attribute 'Label'. If the user's 'label' attribute happens to be associated with key d1 or d2, it gets clobbered by the values associated with d3. Interestingly, Gephi also unpacks whatever attr.name is associated with d3 and presents that as a separate column in the Data Laboratory (with the correct attr.name), but the user's 'label' function is lost.

You are correct: if I always define my 'label' attribute with a key at or after d3, Gephi labels my graph correctly. But the current implementation imposes a constraint not in the graphml spec, by treating d3 as a distinguished key for label information, IF it exists and IF no higher key defines a label. This hardly seems a clean implementation.

So I'm running, but it would be nice if I didn't have to worry about the order in which I define node attributes to be sure that 'label' is always defined in third place.

Thanks for a really great package!

@eduramiba
Copy link
Member

Oh that makes it more clear, thank you.

It's just the original commentary mentioning certain software confused me (

properties.addNodePropertyAssociation(NodeProperties.LABEL, "d3"); // Default node label used by yEd from yworks.com.
)

So I guess we should just ignore the keys and only rely on attribute names. If some software exports labels, it will have to do it in an attribute with attr.name=label.

It actually works the same in gexf attributes (https://gephi.org/gexf/format/data.html), where they have a key and a title, so that sounds reasonable.

I will add this to 0.9.3 milestone, sorry for the inconvenience!

@vparunak
Copy link
Author

You might add a configuration option to GraphML that allows the user to select any attribute visible in the Data Laboratory view for display in the Overview view. That would accommodate the yEd problem...and also might be useful for data exploration for other string-valued attributes.

@eduramiba
Copy link
Member

Yeah, that's already possible to configure in overview settings panel.

eduramiba added a commit that referenced this issue Nov 18, 2017
Based on commit 6634da3

#1516 Edge labels not retained on graphml export
#1788 GephiFormatException: Gephi failed saving the project.
#1789 NullPointerException: The fileObject parameter cannot be null
#1802 Exception with no-merge strategy in some cases. Incompatible edge should not be created
#1810 GephiFormatException can cause ArrayIndexOutOfBoundsException: 0
#1811 NullPointerException on EdgeTypeFilter
#1812 CSV files are no longer imported correctly when double quotes inside strings are delimited with backslashes
#1815 Add support of Byte Order Mark to CSV parser
#1840 Import of graphml still confuses d3 and label fields
#1848 Import CSV error edges: force undirected makes edges disappear when merged
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants