Big Data with Python

Barnard | Department of Environmental Science | Professor Brian J. Mailloux

Class 19 Groupby and Categorical Data

Posted on November 13, 2022 by Brian Mailloux

Class 19 (Maybe). Groupby and Categorical Data

I numbered this class 19 to try and have it go to the correct spot on the website. I have the most trouble wrapping my mind around this data. It is hard thinking like this. It is when you have a lot of categorical data describing each point.

BIG WARNING: For some people with new python we get an error when doing the first mean on groupby. You need to update it to this

df.groupby(‘old tree number’).mean(numeric_only=True)

I will update the packet and GitHub soon

Files we need

Groupby folder on GitHub

Homework Due TBA

See packet

6 thoughts on “Class 19 Groupby and Categorical Data”

Gillian Murphy says:

May 14, 2025 at 3:27 pm

This notebook made me pull out my hair a bit, but I learned a lot through being challenged! I think my recommendation would be to spend a little longer explaining the seaborn correlation matrix, because I did that part quite blindly without fully understanding why we were doing it. And perhaps more generally, having a longer lecture / introduction portion to what exactly we’re looking at with the and what the correlations analysis means.

Log in to Reply
ts3419 says:

April 29, 2024 at 5:21 pm

I really liked using groupby and I think it would be helpful to introduce it earlier in the semster, as we could have used it for different projects/notebooks.

Log in to Reply
jav2184 says:

April 29, 2024 at 11:53 am

the group by section was super helpful for describing datasets — i’m glad we went over it. I think what was needed to submit for the homework was just a bit confusing personally, I thought we only had to submit the notebook ahah.

Log in to Reply
slb2237 says:

December 19, 2022 at 3:45 am

This was one of my favourite topics! I found it so satisfying to be able to filter scary data into something much more digestible.

Log in to Reply
crl2149 says:

December 16, 2022 at 3:33 pm

I think this class helped me realize just how useful python is in looking at large data sets. The data scope was incredible and it was really cool feeling like I had an understanding of how to manipulate that. The bar graph we did for this notebook was also useful for my final project as it showed me how to do a specific column of the data set.

Log in to Reply
sd3541 says:

December 11, 2022 at 10:22 am

This was one of the most useful tools I think we learned all semester. I thought the notebook went over it well.

Log in to Reply

Big Data with Python

Barnard | Department of Environmental Science | Professor Brian J. Mailloux

6 thoughts on “Class 19 Groupby and Categorical Data”

Leave a Reply Cancel reply