Class 12 – Correlations in Pandas

This notebook we are doing  over two classes.  Today we are going to look for correlations (pdf=pandas-correlations) in the arsenic data and make some more nice subplots and through this we are going to get more practice with pandas.  Then on Wednesday I am going to show you some tricks with dataframes to make your life easier and make the plots nicer.  This will set us up for your homework which is a different data set on arsenic and Bacteria and will be due in a week.  So you only have 1 homework assignment on this topic.   We are going to go through the arsenic data set and we are going to find the best correlations and then plot them as either 4 or 6 plots.  See the figures below!   Then for homework you are going to take the other data and plot all the correlations and then the best 4 and best 6 plots.  As part of the plots below I added the letters to the plots so we can better reference them in the figure caption and make them look publication quality.  .

What I am teaching you is called p-hacking and it is not always looked upon favorably.  With the advent of these programs such as python and r you can search your data for correlations even if they make no scientific sense.  I am sure you have heard some weird result in the news.  That is bad science and is p-hacking.  But it can be a good start to looking at your data.  To understand the controversy look at the links on this website.



Video Quiz 4


Arsenic Paper

1 Fendorf, S., Michael, H. A. & van Geen, A. Spatial and Temporal Variations of Groundwater Arsenic in South and Southeast Asia. Science 328, 1123-1127, doi:10.1126/science.1172974 (2010).   fendorf-science-09


Also, here is my powerpoint on Arsenic to give us all some background. Bangladesh-python.   In case you want more arsenic information this article has the latest numbers and predictions.



As-corr14 As-corr6