An entire Social Network in 1.6GB (GraphD Part 2)

18

An entire Social Network in 1.6GB (GraphD Part 2) databases go performance web jazco.dev
authored by jaz 7 months ago | caches
Archive.org Archive.today Ghostarchive
| 5 comments

5

1. 2
  
  pims 7 months ago | link
  
  I’d be curious to hear how this would work in a production setting. Having a single instance makes the current implementation consistent at the expense of being “HA”.
  
  I believe it’s not as simple as keeping a “warm” second instance.
  
  Looking forward to part3
  1. 2
    
    jaz 7 months ago | link
    
    Yeah, I’ve been thinking about having shared-nothing instances and fanning out writes to each of the instances to keep them up-to-date. The features I want to power with this are all gonna be low-consistency requirement features like social proof etc. and not correctness-critical things but even then I want to see how correct I can make it without too much effort.
    
    I’m actually rethinking the in-memory component of this and am tempted to use a LRU cache for bitmaps that are actively being used and then have everything else be loaded from sharded SQLite files on-demand and then synced to SQLite on write with lower synchronous pragmas to keep the disk thrash down.
    
    I want to power social proof for things like likes on posts as well which should be doable via one Roaring Bitmap per post that tracks UIDs of likers, then I can intersect those with the follow bitmaps too.
    
    Ideally the entire graph should be able to scale to the size of your local disk and the RAM available will be used for caching hot values (though potentially a LOT of hot values since the bitmaps are so small).
2. 1
  
  mariusor 7 months ago | link
  
  Is there a trick to use this with identifiers that are not integers ? It doesn’t feel like it.
  1. 3
    
    jaz edited 7 months ago | link
    
    No but you can intern your identifiers into integers (described in part 1) which end up being internal to the service and still allow querying and responses with the full string IDs.
    
    Basically you can keep a nextUID uint32 and put a mutex on it and then every time you see a new string identifier, you assign it the nextUID and bump the int. You can then keep track of everything internally as relating to the UIDs and keep a mapping of UIDs to String Identifiers and String Identifiers to UIDs that’s cheap and easy to do lookups in.
    1. 2
      
      mariusor 7 months ago | link
      
      Cool, thank you. I missed that going over the article.