I am trying to bootstrap a confidence interval for the difference in means between pg1 and pg2. Here's the \"manual\" version:
\nbd = []\nfor i in tqdm(range(Nboot)):\n res_pg1 = np.random.choice(pg1, size=pg1_shape, replace=True, p=None)\n res_pg2 = np.random.choice(pg2, size=pg2_shape, replace=True, p=None)\n bd.append(np.mean(res_pg1) - np.mean(res_pg2))\n\nsns.histplot(bd)\n
Here's what I think is the equivalent via Pingouin:
\nci, dist = pg.compute_bootci(\n x=pg1,\n y=pg2,\n func=lambda x, y: np.mean(x) - np.mean(y),\n confidence=0.9,\n n_boot=Nboot,\n paired=False,\n return_dist=True\n)\nprint(ci)\n\nsns.histplot(dist);\n
Why are the histograms so different?
\nThe distribution returned by pg.compute_bootci()
- is it not the distribution of the output of the func
parameter?
@FlorinAndrei you are right — I have just looked at the implementation of the recently-added scipy.stats.bootstrap and they indeed do resample each group separately when paired=False
, which makes more sense because otherwise you're discarding data if x
and y
do not have the same length. My implementation was based on Matlab's bootci which only support the case where x
and y
have the same length:
\n\nWhen you use multiple data input arguments d1,...,dN, you can specify some arguments as scalar values, but all nonscalar arguments must have the same number of rows.
\n
I'll open an issue for this. Thanks
","upvoteCount":2,"url":"https://github.com/raphaelvallat/pingouin/discussions/274#discussioncomment-2972610"}}}-
This is the dataset: Prep work:
I am trying to bootstrap a confidence interval for the difference in means between pg1 and pg2. Here's the "manual" version:
Here's what I think is the equivalent via Pingouin:
Why are the histograms so different? The distribution returned by |
Beta Was this translation helpful? Give feedback.
-
Another question is that the confidence interval returned by
Which gives:
In the past, I've noticed |
Beta Was this translation helpful? Give feedback.
-
@FlorinAndrei please see the source code of Pingouin here: Lines 343 to 363 in 8825a21 A few differences:
As for the different confidence intervals, please refer to the source code, which should hopefully be self-explanatory: Lines 365 to 393 in 8825a21 For the unit tests of the confidence intervals against Matlab, see: pingouin/pingouin/tests/test_effsize.py Lines 80 to 94 in 8825a21 |
Beta Was this translation helpful? Give feedback.
-
Quick question: if |
Beta Was this translation helpful? Give feedback.
-
@FlorinAndrei you are right — I have just looked at the implementation of the recently-added scipy.stats.bootstrap and they indeed do resample each group separately when
I'll open an issue for this. Thanks |
Beta Was this translation helpful? Give feedback.
@FlorinAndrei you are right — I have just looked at the implementation of the recently-added scipy.stats.bootstrap and they indeed do resample each group separately when
paired=False
, which makes more sense because otherwise you're discarding data ifx
andy
do not have the same length. My implementation was based on Matlab's bootci which only support the case wherex
andy
have the same length:I'll open an issue for this. Thanks