Today we are going to try and automate pandas (https://github.com/bmaillou/BigDataPython/tree/master/11-morePandas). Lets see how we can do.

VIDEO
Homework Due: Due Next Class
(Monday October 12, 2025)
- use the well_data.xlsx file.
- Make a pdf file that contains a plot of As versus every other parameter. Make sure it is labeled nicely and in one pdf file. Arsenic should be on the x-axis and the other parameters should be on the y-axis.
- Each parameter should be on one page.
- make sure your axes go in the correct direction.
- Don’t leave in leftover code that messes you up.
- You will need to hand in your pdf file of results.
- It can be all one color!! Choose a your own symbol and color.
- The column “Drink” is an object and not a float. It plots okay when on an x-axis but not a y-axis. If your code crashes you need to skip this column.
- Make a plot with subplots that is As versus at least 3 other parameters each on their own graph but all on one page. You can decide if it is better to share an x or share a y axis. But Arsenic is the independent variable.
- Save this graph to a jpeg and turn that in .
- For this your graphs should be square. Adjust the figure size to make them square.
- They should have letters denoting them like in the packet.
- Final 10%
- Redo #3 the plot with 3 boxes but color the points by your parameter from the last homework (pump, platform, aquifer, shared). You should only need to alter the original scatter call, add a second scatter call, add a legend, and save as a new name. Choose the color scheme from your favorite team, company etc and state on the figure or in your notebook what your color scheme represents.
HINTS
The Drink column is going to cause scatter to crash. So you will need to avoid it. See notes from class.
I appreciated how this notebook encouraged us to revisit concepts from earlier notebooks, especially giving me a chance to practice using for loops again. I do think it would be helpful to provide a clearer explanation of the steps needed to create a file with all the parameters, as I found it a bit confusing to figure out what exactly was expected.
This assignment helped me get better at automating plots using pandas. I learned how to loop through columns, filter out non-float types like the “Drink” column, and create clean, well-labeled plots. At first, setting up the subplots and formatting the figure size was a bit tricky, but adjusting things like fig.set_size_inches() really helped improve the layout. It was also helpful to practice exporting results to both PDF and JPEG formats. Overall, it was a great intro to handling larger datasets visually and writing cleaner, more automated code.
I like this notebook because it felt very applicable to something that we could do in a real life context. It was particularly nice learning how we could differentiate graphs with different letters (A/B/C/D….) and also automating the axes and the titles using forloops. It definitely makes it faster and more coherent to automize this rather than setting axes labels and titles individually! One part that was tricky was learning to shorten the axes (ie modify the names from what is present on the graphs). I also found it valuable to learn how the savefig() trick. I’m sure it’ll be useful later!
This assignment was definitely a bit challenging, as one could not just plug and chug, but in thinking through the assignment, I really think I became a much better coder! it allowed for the first time some degree of creativity, which was a great way to start sending us out on our own with coding, which is what most of the rest of the class is!
This class was important for demonstrating the capabilities of Python in creating plots for large datasets as well as determining useful correlations via the generated graphs. I gained an understanding of how the program reads files and ultimately allows us to see trends in the data. I found this class, along with many of the others that taught us how to create plots to support our hypotheses, to be very useful.
It was really cool to be able to make a bunch of plots with one box of code and I feel like the packet was helpful in walking us through it step by step. The one thing I wish we had clarified more is data types. I didn’t really understand what the difference was between an object and float and how that impacts your graphs.
This class I felt was the crux for intersecting the data analysis and coding we learned before. Being able to line up graphs side by side and have a legend of the statistical significance for all the data is so incredible. Python is really powerful and this homework let me realize that you don’t have to have a super long code to have a beautiful graph that tells a detailed story.
This class session was particularly helpful in guiding my final project. Also really enjoyed using real and relevant data, put an interestingn analytical lens the the class in addition to learning the commands.
I think we made a huge leap in our understanding of Python’s and Pandas’ capability with this lesson. Automating correlations is a great way to practice for loops while figuring out how to slice and substitute properly, as well as learn how to most efficiently articulate the regression code. All of those skills come up again and again as I discovered in my final project. This lesson also introduced the way to cull a new data frame from an existing one – reflecting on the relevance given our final poster work, I think it would be helpful to specify when it’s best to just make a new data frame object by setting a new variable name to a subset of the data versus just create a new df. What is the nature of the data type, in each case, and what are the attendant limitations when using each?
This class really transformed the way I look at this course and helped me see the applicability of Python to real life. I think the concepts we learnt in class today, are those that I will be taking back at the end of the class and using in my work. Really wishing I knew these cool tricks in high school for science lab reports! My only concern was that sometimes new vocabulary showed up in the packet like “sharey” which caught me off guard but a quick google search does the trick 🙂
I liked how this notebook made us reflect on previous notebooks. This made me practice for loops again. I think a better explanation on the steps to take in order to make a file with all the parameters would help in trying to understand what is supposed to be done. I had difficulty in understanding where the code should be placed and if there were additional steps to take besides the one mentioned because the output was not clearly shown on the packet. I understand that the output cannot be really shown because it would take a lot of paper but maybe describing the output would help.
This was a useful assignment, and I have actually made similar plots using similar code in outside of class research. It was a little tricky figuring out how to use the for loop to graph each parameter on a separate page, but overall, not too bad.
I thought the pandas lesson was great, and I really appreciated how it was split into two parts so that we could really get a good grasp of the foundational python analysis. The pandas packet went step by step and was very detailed and helpful for understanding a new concept. I found the homework to be really great as a combination of useful basic enough to do at the beginning of the learning process with pandas.
As stated by others, the introduction of dictionaries would have helped with this lesson but besides that, I found that this class and the p-hacking class were the most interesting and helpful of the semester.
Honestly, these Pandas classes were perhaps the most practical learning parts of python and of reading in data. I agree with others, on learning dictionaries, and I think learning groupby would also be helpful here.
I wish I better understood what dictionaries are and how they could be used in different contexts. I first learned about dictionaries through code academy I think — and code academy showed different ways to manipulate dictionaries to many different things. I think they are quite a useful tool and wish we had a whole class dedicated to them (like for loops and lists and linspace).
I agree about the dictionary. But as we use Pandas more we don’t need them that much. So I made a choice to focus on other areas instead. I will think about this more for next time!
I found this class and homework really useful, especially as it relates to the data analysis I will be conducting in my environmental science classes. It was a great tool to be able to quickly plot certain things and instead of having them in different files, to have them all in one pdf so it is easily accessible. The classwork correlated with the homework which was a great way to reinforce the skills.
The first part of this assignment was unclear to me. I ended up making one graph that has As on the x axis and all of the other parameters on the y axis. This didn’t really make sense to me because the graph seemed useless (it was so crowded that you couldn’t really see anything). I did like learning how to save plots as pdfs and I think that it will be a very useful skill to know in the future!
I wanted to make a general point about code academy but didn’t know where to put it. I feel like Code Academy lacks support in many of the lessons. The hints in the lower left corner were sometimes helpful but especially with the bigger project pieces I had to scour the internet to find more substantive help. For a lot of the smaller pieces and for understanding what you can do in python, it was very helpful, but it still lacks enough features to make it a great learning tool.
This class was important because it was the first time we really combined for loops and pandas slicing to read and graph the data. I found myself constantly referring to this section for help with later assignments. I would also agree that dictionaries should be taught earlier. In a way, I had to re-think what I was doing once I later incorporated dictionaries.
It would have been nice to learn how to do dictionaries for this class section instead of later when we are doing correlations. Overall, it was really useful to learn how to make pdf files and various panda functions.