This notebook we are doing over two classes. Today we are going to look for correlations (pdf=pandas-correlations) in the arsenic data and make some more nice subplots and through this we are going to get more practice with pandas. Then on Wednesday I am going to show you some tricks with dataframes to make your life easier and make the plots nicer. This will set us up for your homework which is a different data set on arsenic and Bacteria and will be due in a week. So you only have 1 homework assignment on this topic. We are going to go through the arsenic data set and we are going to find the best correlations and then plot them as either 4 or 6 plots. See the figures below! Then for homework you are going to take the other data and plot all the correlations and then the best 4 and best 6 plots. As part of the plots below I added the letters to the plots so we can better reference them in the figure caption and make them look publication quality. .
What I am teaching you is called p-hacking and it is not always looked upon favorably. With the advent of these programs such as python and r you can search your data for correlations even if they make no scientific sense. I am sure you have heard some weird result in the news. That is bad science and is p-hacking. But it can be a good start to looking at your data. To understand the controversy look at the links on this website.
Video
Video Quiz 4
ARSENIC BACKGROUND
Arsenic Paper
1 Fendorf, S., Michael, H. A. & van Geen, A. Spatial and Temporal Variations of Groundwater Arsenic in South and Southeast Asia. Science 328, 1123-1127, doi:10.1126/science.1172974 (2010). fendorf-science-09
Also, here is my powerpoint on Arsenic to give us all some background. Bangladesh-python. In case you want more arsenic information this article has the latest numbers and predictions.
Answers