Skip to content
\n

\"pecks_hist\"

\n

I am trying to bootstrap a confidence interval for the difference in means between pg1 and pg2. Here's the \"manual\" version:

\n
bd = []\nfor i in tqdm(range(Nboot)):\n    res_pg1 = np.random.choice(pg1, size=pg1_shape, replace=True, p=None)\n    res_pg2 = np.random.choice(pg2, size=pg2_shape, replace=True, p=None)\n    bd.append(np.mean(res_pg1) - np.mean(res_pg2))\n\nsns.histplot(bd)\n
\n

\"man_hist\"

\n

Here's what I think is the equivalent via Pingouin:

\n
ci, dist = pg.compute_bootci(\n    x=pg1,\n    y=pg2,\n    func=lambda x, y: np.mean(x) - np.mean(y),\n    confidence=0.9,\n    n_boot=Nboot,\n    paired=False,\n    return_dist=True\n)\nprint(ci)\n\nsns.histplot(dist);\n
\n

\"pg_hist\"

\n

Why are the histograms so different?

\n

The distribution returned by pg.compute_bootci() - is it not the distribution of the output of the func parameter?

","upvoteCount":1,"answerCount":4,"acceptedAnswer":{"@type":"Answer","text":"

@FlorinAndrei you are right — I have just looked at the implementation of the recently-added scipy.stats.bootstrap and they indeed do resample each group separately when paired=False, which makes more sense because otherwise you're discarding data if x and y do not have the same length. My implementation was based on Matlab's bootci which only support the case where x and y have the same length:

\n
\n

When you use multiple data input arguments d1,...,dN, you can specify some arguments as scalar values, but all nonscalar arguments must have the same number of rows.

\n
\n

I'll open an issue for this. Thanks

","upvoteCount":2,"url":"https://github.com/raphaelvallat/pingouin/discussions/274#discussioncomment-2972610"}}}

meaning of pg.compute_bootci() with custom function #274

Answered by raphaelvallat
FlorinAndrei asked this question in Q&A
Discussion options

You must be logged in to vote

@FlorinAndrei you are right — I have just looked at the implementation of the recently-added scipy.stats.bootstrap and they indeed do resample each group separately when paired=False, which makes more sense because otherwise you're discarding data if x and y do not have the same length. My implementation was based on Matlab's bootci which only support the case where x and y have the same length:

When you use multiple data input arguments d1,...,dN, you can specify some arguments as scalar values, but all nonscalar arguments must have the same number of rows.

I'll open an issue for this. Thanks

Replies: 4 comments

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Answer selected by raphaelvallat
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants