-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add percentages instead of counts to countplot #1027
Comments
As of v0.13, normalization is built directly into sns.countplot(diamonds, x="cut", stat="percent") # or "proportion" The recommendation is otherwise to use sns.histplot(tips, x="day", hue="sex", stat="percent", multiple="dodge", shrink=.8) Original answer (context for the rest of the thread): This is already pretty easy to do with barplot, e.g.
|
Oh! IMHO it would worth to add this example somewhere in the docs. Thanks |
Unfortunately, this doesn't work if both x and y are non_numeric. I get the following error:
|
Pass |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Can you explain from your original example - ax = sns.barplot(x="x", y="x", data=df, estimator=lambda x: len(x) / len(df) * 100)
If I understood the answers to these two questions, I could move forward solving this on my own. Thanks so much for your time and attention. |
|
This comment has been minimized.
This comment has been minimized.
That's certainly one way to do it. But it is by no means the only way to do it. What if someone wants to have both That said, I think people are somewhat forgetting that, while it can be convenient to be able to pass a full dataset to a plotting function and get a figure in one step, pandas is quite useful. It's really not very difficult to generate the plot you want, exactly the way you want it, with just one more step external to seaborn:
You can even do this in one method chain, saving a temporary variable name, if that's your preferred style:
|
I appreciate the response. And naturally it's not the only way to do it. And I can also appreciate the difficulty in finding where to draw the line for a suitably general API. Had I not seen the R snippet above and also stumbled across this discussion thread, I would probably not have bothered to say anything. But It looks to me like having some kind of normalized rendition could be a pretty generalized need. (I notice that ggplot outputs these values with
but still gives normalized values on the graph. I doubt it throws anyone for too big of a loop. Or am I misunderstanding how you propose that normalized values are obtained?) I may be completely wrong in my idea that this is a reasonably generalized desire, and I'm not sure if there's a good way to find out, though this thread and stackexchange are suggestive at least. I posted because the ggplot inclusion of this functionality was also suggestive to me that it is of general use. My inexperience with ggplot may mean that there's something important I'm missing. I can also appreciate the argument that this can be done in basically a one-liner in pandas. But I find this line of reasoning a little strange, because of the inclusion of countplot in the first place. I've only had a glance at the code for countplot and haven't fully wrapped my head around it, but am I right in my understanding that countplot is basically a special case function implementing the same underlying plotting functionality as barplot? This is what confuses me: surely it would be even more trivial to pass counts into barplot than it is to pass percentages or normalized values. So why include countplot? This is part of what I really like about seaborn. Anyway, It's possible that this "quality of life" handling of percentages out of the box is not worth the effort. Honestly, I don't know. Would it be worth including the code snippet above as an example in countplot? I guess I might just write some wrapper function that performs as desired, but I have to think that something like this would interest more people than just me. Edit: Another idea might be to include something like 'scaling' as a passed parameter in countplot and factorplot. It would take a function, similar to the 'estimator' parameter in barplot, and scale the counts according to that function. Maybe this would be generalized enough while also being convenient enough. I guess things like gaussian distributions would be trivial to do then also, for example? |
I was able to get the early barplot code from @mwaskom to work for visualizing the distribution of a categorical variable with a small DataFrame, but when working with a DataFrame that has millions of rows my kernel seems to freeze up. What's odd is that countplot has no issue and runs in under 2 seconds for the same dataset. Any ideas why that might be the case? |
You probably want |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
With #2125: tips = sns.load_dataset("tips")
sns.histplot(tips, x="day", stat="probability") tips = sns.load_dataset("tips")
sns.histplot(tips, x="day", hue="sex", stat="probability", multiple="dodge") tips = sns.load_dataset("tips")
sns.histplot(tips, x="day", hue="sex", stat="probability", multiple="fill", shrink=.8) |
Hello,
I would like to make a proposal - could we add an option to a countplot which would allow to instead displaying counts display percentages/frequencies? Thanks
The text was updated successfully, but these errors were encountered: