forked from valentinitnelav/bootstrapnet
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathexplain-sampling-method.Rmd
230 lines (192 loc) · 8.72 KB
/
explain-sampling-method.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
---
title: "Explain me the sampling method"
author: "by [Valentin Stefan](https://github.com/valentinitnelav) - last update `r format(Sys.time(), '%d %B %Y')`"
---
```{r global-options, include=FALSE}
knitr::opts_chunk$set(echo=FALSE, warning=FALSE, message=FALSE)
```
```{r setup, include=FALSE}
# For avoiding long waiting time, read already saved/cached objects
nest <- readRDS(file = "./man/cache/explain-sampling-method-nest-1.rds")
```
```{r load-packages}
# Install bootstrapnet if not already done:
# install.packages("devtools")
# devtools::install_github("valentinitnelav/bootstrapnet")
library(bootstrapnet)
library(bipartite)
library(dplyr)
library(data.table)
library(ggplot2)
library(animation)
library(patchwork)
library(filesstrings)
data(Safariland)
```
```{r global-functions, include=FALSE}
matrix_plot <- function(data){
ggplot(data = data,
aes(x = Insects,
y = Plants)) +
geom_raster(aes(fill = counts_fct)) +
scale_fill_manual(values = "gray", na.value = "transparent") +
geom_text(aes(label = counts),
size = 3) +
coord_fixed() +
theme_bw() +
theme(axis.text.x = element_text(size = 9, angle = 90, hjust = 1, vjust = 0.3),
axis.text.y = element_text(size = 9),
legend.position = "none")
}
matrix_plot_sampled <- function(data){
data %>%
count(lower, higher) %>%
rename(Plants = lower,
Insects = higher) %>%
full_join(x = saf_long, y = ., by = c("Plants", "Insects")) %>%
mutate(counts = n,
counts_fct = ifelse(is.na(n), yes = NA, no = "red"),
Plants = factor(Plants, levels = levels(saf_long$Plants)),
Insects = factor(Insects, levels = levels(saf_long$Insects))) %>%
matrix_plot() +
theme(axis.title = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank())
}
plot_nest <- function(df_nest){
ggplot(data = df_nest,
aes(x = n_interctions,
y = nestedness)) +
geom_point() +
geom_line() +
scale_x_continuous(breaks = seq(from = 100, to = 1200, by = 100),
limits = c(100, 1150)) +
scale_y_continuous(breaks = seq(from = 15, to = 35, by = 5),
limits = c(19, 35)) +
theme_bw()
}
```
# One iteration
## Explain
Consider the `Safariland` web from the `bipartite` package, which looks like this:
```{r about-Safariland, fig.cap = 'Fig. 1 - Safariland web from which we will sample without replacement', fig.align="center"}
saf_long <- reshape2::melt(Safariland) %>%
rename(Plants = Var1,
Insects = Var2,
counts = value) %>%
filter(counts != 0) %>%
mutate(counts_fct = "gray",
Plants = reorder(Plants, desc(Plants)))
matrix_plot(saf_long)
# visweb(Safariland, labsize = 4, text = "interaction", textsize = 2)
# kable(Safariland)
```
There are `r sum(Safariland)` total interactions in the web. We can draw the first `start` sample of, say, 100 random interactions without replacement.
```{r get-split-indices}
dt <- Safariland %>% web_matrix_to_df() %>% setDT()
ids_lst <- bootstrapnet:::sample_indices(data = dt, start = 100, step = 50, seed = 666)
```
Side note: for a start value of 100 interactions and a step of 50, the sampling procedure splits the `r sum(Safariland)` interactions into `r length(ids_lst)` sub-webs (first one contains 100 interactions, then each subsequent one contains approximatively 50 more sampled interactions on top of the previous one).
So, with the start sample of 100 interactions we can form this web below (Fig 2). All names of plants and insects are kept for easy visual comparison with the entire web from above (Fig 1).
```{r start-sample, fig.cap = 'Fig. 2 - The web formed with the first sample of interactions from Safariland', fig.align="center"}
# dt[ids_lst[[1]], table(lower, higher)] %>%
# visweb(labsize = 1, text = "interaction", textsize = 2)
dt[ids_lst[[1]]] %>%
count(lower, higher) %>%
rename(Plants = lower,
Insects = higher) %>%
full_join(x = saf_long, y = ., by = c("Plants", "Insects")) %>%
mutate(counts = n,
counts_fct = ifelse(is.na(n), yes = NA, no = "red"),
Plants = factor(Plants, levels = levels(saf_long$Plants)),
Insects = factor(Insects, levels = levels(saf_long$Insects))) %>%
matrix_plot()
```
Adding about 50 more sampled interactions to the previous one, we get the second sub-web, which looks like this (Fig. 3):
```{r second-sample, fig.cap = 'Fig. 3 - The web formed with adding the second sample of interactions from Safariland', fig.align="center"}
# dt[ids_lst[[2]], table(lower, higher)] %>%
# visweb(labsize = 1, text = "interaction", textsize = 2)
dt[ids_lst[[2]]] %>%
count(lower, higher) %>%
rename(Plants = lower,
Insects = higher) %>%
full_join(x = saf_long, y = ., by = c("Plants", "Insects")) %>%
mutate(counts = n,
counts_fct = ifelse(is.na(n), yes = NA, no = "red"),
Plants = factor(Plants, levels = levels(saf_long$Plants)),
Insects = factor(Insects, levels = levels(saf_long$Insects))) %>%
matrix_plot()
```
Note that, in the second sub-web we managed to sample new species by sampling 50 more interactions:
- higher level species (OX axis): `r setdiff(dt[ids_lst[[2]],higher], dt[ids_lst[[1]],higher])`
- lower level species (OY axis): `r setdiff(dt[ids_lst[[2]],lower], dt[ids_lst[[1]],lower])`
And we keep on sampling without replacement until we sample the entire Safariland. For each sub-web we can compute a network index, say 'nestedness'. So, we get a vector of `r length(ids_lst)` values, the last one corresponding to the entire network.
```{r nestedness-1-boot, fig.cap = 'Fig. 4 - Nestedness values for each sampled web. The last value corresponds to the entire Safariland web', fig.align="center"}
metric_lst <- vector(mode = "list", length = length(ids_lst))
for (i in 1:length(ids_lst)){
metric_lst[[i]] <- try({
web <- dt[ids_lst[[i]], table(lower, higher)]
set.seed(42)
bipartite::networklevel(web = web, index = "nestedness", level = "both")
})
}
# Prepare results
df_nest <- metric_lst %>%
lapply(rbind) %>%
lapply(as.data.frame) %>%
data.table::rbindlist()
df_nest$n_interctions <- ids_lst %>% sapply(length)
ggplot(data = df_nest,
aes(x = n_interctions,
y = nestedness)) +
geom_point() +
geom_line() +
theme_bw()
```
## Animation
Below is an animation of the sampling method (one iteration):
```{r build-gif, eval = FALSE}
safariland_gg <- matrix_plot(saf_long) +
scale_x_discrete(position = "top") +
theme(axis.text.x = element_text(size = 9, angle = 90, hjust = 0),
axis.title = element_blank())
saveGIF({
for (i in 1:length(ids_lst)) {
safariland_sample_gg <- dt[ids_lst[[i]]] %>% matrix_plot_sampled()
nest_gg <- plot_nest(df_nest[1:i]) +
coord_fixed(ratio = 20)
wrap_plots(safariland_gg, safariland_sample_gg, nest_gg,
ncol = 1) %>%
plot()
}
},
movie.name = "sample-nestedness-1-boot.gif",
ani.width = 600, ani.height = 800,
interval = 0.5)
file.move(files = "./sample-nestedness-1-boot.gif",
destinations = "./man/cache",
overwrite = TRUE)
```
![](https://github.com/valentinitnelav/bootstrapnet/raw/master/man/cache/sample-nestedness-1-boot.gif)
# Multiple iterations
If we repeat this sampling procedure n times, then we get n lines. We can then compute an average line with 95% quantile based confidence intervals as in Fig. 5 below.
```{r compute-nestedness-n-boots, eval = FALSE}
nest <- list(web = Safariland) %>%
lapply(web_matrix_to_df) %>%
boot_networklevel(col_lower = "lower", # column name for plants
col_higher = "higher", # column name for insects
index = "nestedness",
level = "both", # here, nestedness is not affected by level
start = 100,
step = 50,
n_boot = 10,
n_cpu = 2)
saveRDS(nest, file = "./man/cache/explain-sampling-method-nest-1.rds")
```
```{r nestedness-n-boots, fig.cap = 'Fig. 5 - Sampled nestedness values, 10 interations. The thick continuous line represents the average line and the dashed lines depict the 95% quantile based confidence intervals around the mean line.', fig.align="center"}
gg_networklevel(nest)$nestedness +
labs(x = "n_interctions") +
theme_bw() +
theme(legend.position = "none")
```
Such accumulation/rarefaction curves allow comparison of networks/webs with different number of interactions. Ideally the indices/metrics will be compared if the curves display a trend of reaching an asymptote. That means that if we keep on investing effort to sample interactions (observe plant-pollinator in the field) we will not gain much further information, so network comparison is already possible.