-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sum edges merge strategy gives surprising results #2038
Comments
Hi, If the edges don't have an interval, their weights will be summed, if there are intervals/timestamps of existence, then the intervals/timestamps will be merged instead. |
There are multiple possible semantics here. The implementation that exists doesn't cover the two that are arguably the most reasonable.
Given that there are multiple semantics, the behaviour should be properly documented and probably there should be alternatives to how the edge merge is performed. By way of background. Under the analysis being performed the edges are indeed equivalent, representing multiple interactions between the addresses in the mail history. Note that the bug also includes serial overwrites of existing data; if the edges are not equivalent this should not be happening. It is. |
Please remove the Not an issue label, it is incorrect. |
We will need to review the behavior in each case, but I guess that when merged, weight values can only be summed if the intervals overlap/are the same. The overwriting of non-dynamic values seems unavoidable given that edges are chosen to be merged. You can still choose not to merge them and will have the original edges. |
At the moment, data is being lost, but not in all cases. It looks to me very much like the intended behaviour was what I originally expected. The intervals are merged when the edge merge is done — each individual spell is listed in the spells. The attributes are not, despite the fact that the spec describes attributes as unbounded list items. Not merging the edges is not an option. The graphs I am working with have on the order of half a million edges and I am finding that Gephi is unable to do layout on these effectively unless there is edge flattening. If I performing the edge flattening in the code that is actually constructing the graph, Gephi complains that attributes are overlapping in time (despite being attached to different edges). |
To illustrate, the following is the GEXF export of the summed data from the fille above:
It would be perfectly reasonable to expect it to end up looking like this
|
Hi, thanks for the detailed case. I've reviewed it and my conclusion is that the weight should indeed be merged. For static weights, we generally define edge existence as a triplet (source, target, type). Given that your GEXF doesn't specify the weight column, it's implied that weight is static (default value) and therefore it should be merged. With this change, we would still give two options for the users not wanting to merge
I hope I'm getting this right. let me know if I'm missing something. Regarding the attributes not being merged correctly, this was a separate issue. Resolved by now. |
Yeah, it's been three years since I was thinking about this. That's probably right. |
Thanks for your patience! The fix will be included in the upcoming 0.9.3 release. |
Expected Behavior
The "Sum" merge strategy would give a merged edge that has an edge weight that is the sum of the edge weights that are merged and has the union of the attributes of the merged edges.
Current Behavior
Edge weights are not summed and edge attributes are serially overwritten.
Possible Solution
Given that the last attvalue is reflected in the edge, it appears that the container for edge attributes is not being extended on read.
Steps to Reproduce
Context
I have an mbox contacts construction program here that writes out a GEXF formatted email contact graph with message-ID and date information annotated onto the edges. The edge weight is used to perform community detection and Page Rank on the nodes, but the edge weights here are underestimated and the message-IDs are lost.
Your Environment
The text was updated successfully, but these errors were encountered: