Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove dispatcher classes #209

Merged
merged 4 commits into from
Nov 2, 2022
Merged

Conversation

leotrs
Copy link
Collaborator

@leotrs leotrs commented Nov 1, 2022

As I was writing this comment, I realized that the current state of finding/creating/dispatching NodeStats is sorta ridiculous. So I decided to improve things.

  1. This PR removes three (!) classes (StatDispatcher and its descendants), creates two functions instead (stats.dispatch_stat and stats.dispatch_many_stats).
  2. As a result, the IDView classes no longer need to create and hold reference to a dispatcher object.
  3. On the other hand, IDView classes now need to hold a reference to the network itself, in the newly created instance variable _net.

The improvement is clear from the following benchmark. Note that this is measuring the time it takes to access H.nodes.degree on an empty network.

Before:

>>> import xgi
>>> import cProfile
>>> cProfile.run("H = xgi.Hypergraph(); [H.nodes.degree for i in range(1000000)]")
         3000017 function calls (3000016 primitive calls) in 0.854 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.530    0.530    0.850    0.850 <string>:1(<listcomp>)
        1    0.003    0.003    0.854    0.854 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 __init__.py:114(__init__)
        1    0.000    0.000    0.000    0.000 __init__.py:128(__init__)
        1    0.000    0.000    0.000    0.000 __init__.py:135(__init__)
        2    0.000    0.000    0.000    0.000 __init__.py:71(__init__)
        1    0.000    0.000    0.000    0.000 __init__.py:78(__getattr__)
        1    0.000    0.000    0.000    0.000 hypergraph.py:101(__init__)
  1000000    0.088    0.000    0.088    0.000 hypergraph.py:256(nodes)
        1    0.000    0.000    0.000    0.000 reportviews.py:520(__init__)
        1    0.000    0.000    0.000    0.000 reportviews.py:613(__init__)
        2    0.000    0.000    0.000    0.000 reportviews.py:86(__init__)
  1000000    0.157    0.000    0.233    0.000 reportviews.py:98(__getattr__)
        1    0.000    0.000    0.854    0.854 {built-in method builtins.exec}
1000001/1000000    0.076    0.000    0.076    0.000 {built-in method builtins.getattr}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.000    0.000    0.000    0.000 {method 'update' of 'dict' objects}

After this PR:

>>> import xgi
>>> import cProfile
>>> cProfile.run("H = xgi.Hypergraph(); [H.nodes.degree for i in range(1000000)]")
         1000014 function calls in 0.285 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.193    0.193    0.282    0.282 <string>:1(<listcomp>)
        1    0.003    0.003    0.285    0.285 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 __init__.py:425(dispatch_stat)
        1    0.000    0.000    0.000    0.000 __init__.py:64(__init__)
        1    0.000    0.000    0.000    0.000 hypergraph.py:101(__init__)
  1000000    0.089    0.000    0.089    0.000 hypergraph.py:256(nodes)
        1    0.000    0.000    0.000    0.000 reportviews.py:100(__getattr__)
        1    0.000    0.000    0.000    0.000 reportviews.py:529(__init__)
        1    0.000    0.000    0.000    0.000 reportviews.py:621(__init__)
        2    0.000    0.000    0.000    0.000 reportviews.py:88(__init__)
        1    0.000    0.000    0.285    0.285 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {built-in method builtins.getattr}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.000    0.000    0.000    0.000 {method 'update' of 'dict' objects}

This is a 3x improvement in the time it takes to access an attribute!.

Technical note: the way that things used to work is that a NodeView object would create and hold reference to a NodeStatDispatcher object. Then, whenever a NodeStat was requested, say by executing H.nodes.degree, the NodeView would rely on the NodeStatDispatcher to find and create the corresponding NodeStat object. After this PR, this is no longer the case, and NodeStat just calls the dispatch_stat function instead. No need for middle-men dispatcher classes!

…etattr__. Refactor: StatDispatcher no longer keeps a reference to the network or the view - this is in preparation for the next commit where each IDView will no longer keep a separate IDDispatcher instance
…e constructor -- instead, it stores the class and instantiates it
…tDispatcher have been removed and the functions dispatch_stat and dispatch_many_stats have been created instead. As a result, IDView contains one fewer instance variable (_dispatcher has been removed).
@maximelucas maximelucas mentioned this pull request Nov 2, 2022
@maximelucas
Copy link
Collaborator

Nice. I'm not following all the implementation details, but I did think a few days ago that the stats package had many classes.

@leotrs leotrs merged commit 3ac4417 into main Nov 2, 2022
@leotrs leotrs deleted the refactor-remove-dispatcher-classes branch November 2, 2022 10:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants