Applications of Simulation in Peer 2 Peer Networking

_These are notes from a session at libp2p developers meetings in Berlin, 12-07-2018._

Designing distributed algorithms is hard.

# Simulation Driven Development

### P2P Simulation for Problem Solving
@kubuxu - KAD DHT gives us problems, but before that we had interesting

DHT is probably the most complex system in IPFS when it comes to network topology & interaction. In most other parts of the system interactions are 1-to-1 whereas DHT interaction is whole-network

Not too long ago we discovered that for a very long time we were shipping a beta DHT to peers, causing them

Current problem is 40-60% can't be dialed into, so when the DHT is trying to dial to someone because they are close in the DHT space, the requests wweere timing out


## Metrics & Request Tracing

### OpenCensus
basically opentracing + stats
able to include opt-in metrics in opentracing spans.
@lanzafame has adapted the libp2p/rpc library to  include method calls and arbitrary stats in  cluster
opencensus separates the creation of the metric with aggregation. Record creation isn't aggregated in-band

### A note on tracing & performance
it's super expensive.
when cluster is running with full verbose tracing on, it can't even get anything out.
There's a difference between doing simulation to see emergent behaviours vs. debugging a specific issue. 
visualizations are better for seeing emergent behaviour.

Steps for looking for emergent behaviour:
* develop a hypothesis of what is going on
* visualize it 

Logging is a primitive form of visualization

### Visualization

### Hive Plots
the idea is to take advantage of a "grid" to create an easier-to reason-about plots that are comparable.
When networks are plotted in arbitrary space it's difficult to understand when two networks share the same topology 

### simulation vs production testing
* simulation can be done by an engineering team
* production testing has eaten up a *lot* of txime, with fewer results
* can't control production environment much / at all

### network topologies & algorithm interaction
Simulation should help you dial out
We should be able to Iterate through many network topologies, "fuzz"
The idea that a node can only dial out isn't something that can/is being taken into consideration when designing an algorithm
Specifying failure conditions. Using TDD as an example, it states we should build a simulator before we even write any networking code.
adding in bad actors

#### iterating on performance
In regard to simulation, before you simulate, you'd like to know if the algorithm is working in a "best-case" context. It's hard to know how optimizations you're aiming to implement will affect the failure conditions of the algorithmn in practice

"Simulation driven Design"

#### expansion on IPTB
Two use cases:
* 1 to get a better understanding of _how_ somethign is working(). Manual poking
* 2 in a testing environment: how do we specify performance regresssions
* 3 Scale, how does this algorithm perform or fail with 5, 10, or 10000 nodes


### Network Topology Specification
- [YANG Network Topology Data Model](https://tools.ietf.org/html/rfc8345)
- [YANG Tree Diagram Syntax](https://tools.ietf.org/html/rfc8340)
- [YANG Parser and Compiler](https://github.com/openconfig/goyang)

Using the OpenConfig YANG tooling we should be able to specify network configurations and get an AST out that we then use to construct the simulation network, it would require the addition of 'testing' aspects to YANG, i.e. add random variation to the interface speed.

**Starting conditions**
* number of bootstraps
* only dial out
* number of connections
* NAT/firewall
* peer resources?
* stable conditions?
    * start from bootnodes and grows?
    * or already well-connected
    * or decrease?
    * !! record all of these changes?

**Flux conditions(over ? network)**
* growth of peers
* decrease of peers
* connectivity failures

**variant peer properties**
* latency
* degree of connectivity

**YANG example**
```
network {
    peer {
        connections: {jitter:0.5%},
        transport: &hellip;,
        latency: &hellip;,
        dial: in/out,
        node_type: bad_actor/new_actor, ()
    }
}
```

## Tests & Actions
These should be arbitrary functions


## Failure Conditions
To turn 

After some thought we came to the conclusion that it's

Two _Types_ of failure conditions:

## 1. A state in which we know a failure has occurred
"If Peer A _ever_ has x, we know that we've failed"
"If Peer B _ever_ Has No connections, we've failed"
"If Peers A & B _ever_ have no route to each other, we've failed"

This hints at the ability to attach state-checks to individual nodes would be helpful here. You need to specify behaviour of the nodes

### Pros
* 

### Cons
* Difficult to know ahead of time
* Failure condition may not always occur

## 2. A measurement-based failure occurs 
"If this takes more than 5 minutes, we've failed"
"If there are more than 200 connections across the network, we've failed"
"No Activity has happened for x amount of time"
"If each node on the network has"

### Pros
* Much easier to reason about
* easy to make general statements about the whole network / each node on the network
### Cons
* Not deterministic
* Hard to define things like "completion" sometimes


Being able to inspect the state of the global network is useful here. We can measure global state by having all nodes in the simluation report to a central location.

## Specifying Failure conditions

Assigning tests to nodes
you're trying to ask a question of the network
Frequently you're trying to initiate an action on a node, and evaulate the effect it has on a network

### Time
* global clock
* measurement frequency

most important thing to think about here is mesurement metrics that are time bound should be configurable

### Related Github Issues:
- https://github.com/ipfs/notes/issues/34
- https://github.com/libp2p/js-libp2p/issues/215
- https://github.com/

#### Papers
- [Topology generators for Software Defined Network testing](https://ipfs.io/ipfs/QmWRitPWrHuRwFcJXjZPQvzHPay6fyVENiToJ6SUdsUrWC)

### Existing Tooling & Useful Links
- [Hashicorp Serf Convergence Simulator](https://www.serf.io/docs/internals/simulator.html) - 
- [Metamask Mesh Testing](https://github.com/MetaMask/mesh-testing) - _includes visualization, recently added support for pubsub_
- [mininet](http://mininet.org/sample-workflow/) - 
- [Eth Swarm simulation tool](https://github.com/ethersphere/go-ethereum/blob/a5e7f0cf9c7c2dcd7f516315a063c410f81ffe83/p2p/simulations/README.md)
- [Eth Swarm viz tool](https://github.com/ethersphere/simple-p2p-d3/) -
- [Go Libp2p Swarm Testing](https://github.com/libp2p/go-libp2p-swarm/blob/master/testing/testing.go)
- [Qri p2p-testbed](http://github.com/qri-io/p2p-testbed/) - _includes opentracing spans_
- [CJDNS](https://github.com/cjdelisle/cjdns) - _@kubuxu mentioned there's a visualizer in there somewhere..._
- [Hive Plots](http://hiveplot.net/) - _a method for consistently visualizing network graphs_

### Physical Networks with Testing Support
- planet lab
- ripe atlas

### Initial Session Abstract:
A discussion of best practices for presenting & iterating on networking solutions, with the goal of refining a document that attempts to distinguish, enumerate, and label different approaches to p2p network simulation, outlining strengths, weakness, and examples.

Point of context for this discussion may include:

* machine-local simulation
* the intersection of testing and simulations
* container-based network simulations
* cryptocurrency test networks
* language-agnostic & multi-language enviornments
* how the different layers of libp2p map to each approach

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Applications of Simulation in Peer 2 Peer Networking #1

Simulation Driven Development

P2P Simulation for Problem Solving

Metrics & Request Tracing

OpenCensus

A note on tracing & performance

Visualization

Hive Plots

simulation vs production testing

network topologies & algorithm interaction

iterating on performance

expansion on IPTB

Network Topology Specification

Tests & Actions

Failure Conditions

1. A state in which we know a failure has occurred

Pros

Cons

2. A measurement-based failure occurs

Pros

Cons

Specifying Failure conditions

Time

Related Github Issues:

Papers

Existing Tooling & Useful Links

Physical Networks with Testing Support

Initial Session Abstract:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Applications of Simulation in Peer 2 Peer Networking #1

Description

Simulation Driven Development

P2P Simulation for Problem Solving

Metrics & Request Tracing

OpenCensus

A note on tracing & performance

Visualization

Hive Plots

simulation vs production testing

network topologies & algorithm interaction

iterating on performance

expansion on IPTB

Network Topology Specification

Tests & Actions

Failure Conditions

1. A state in which we know a failure has occurred

Pros

Cons

2. A measurement-based failure occurs

Pros

Cons

Specifying Failure conditions

Time

Related Github Issues:

Papers

Existing Tooling & Useful Links

Physical Networks with Testing Support

Initial Session Abstract:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions