Class 19 (Maybe). Groupby and Categorical Data
I numbered this class 19 to try and have it go to the correct spot on the website. I have the most trouble wrapping my mind around this data. It is hard thinking like this. It is when you have a lot of categorical data describing each point.
BIG WARNING: For some people with new python we get an error when doing the first mean on groupby. You need to update it to this
df.groupby(‘old tree number’).mean(numeric_only=True)
I will update the packet and GitHub soon
Files we need
- Groupby folder on GitHub
Homework Due TBA
- See packet
This notebook made me pull out my hair a bit, but I learned a lot through being challenged! I think my recommendation would be to spend a little longer explaining the seaborn correlation matrix, because I did that part quite blindly without fully understanding why we were doing it. And perhaps more generally, having a longer lecture / introduction portion to what exactly we’re looking at with the and what the correlations analysis means.
I really liked using groupby and I think it would be helpful to introduce it earlier in the semster, as we could have used it for different projects/notebooks.
the group by section was super helpful for describing datasets — i’m glad we went over it. I think what was needed to submit for the homework was just a bit confusing personally, I thought we only had to submit the notebook ahah.
This was one of my favourite topics! I found it so satisfying to be able to filter scary data into something much more digestible.
I think this class helped me realize just how useful python is in looking at large data sets. The data scope was incredible and it was really cool feeling like I had an understanding of how to manipulate that. The bar graph we did for this notebook was also useful for my final project as it showed me how to do a specific column of the data set.
This was one of the most useful tools I think we learned all semester. I thought the notebook went over it well.