This notebook we are doing over two classes. Today we are going to look for in the arsenic data and make some more nice subplots and through this we are going to get more practice with pandas. Then on Wednesday I am going to show you some tricks with dataframes to make your life easier and make the plots nicer. This will set us up for your homework which is a different data set on arsenic and Bacteria and will be due in a week. So you only have 1 homework assignment on this topic. We are going to go through the arsenic data set and we are going to find the best correlations and then plot them as either 4 or 6 plots. See the figures below! Then for homework you are going to take the other data and plot all the correlations and then the best 4 and best 6 plots. As part of the plots below I added the letters to the plots so we can better reference them in the figure caption and make them look publication quality. .
What I am teaching you is called p-hacking and it is not always looked upon favorably. With the advent of these programs such as python and r you can search your data for correlations even if they make no scientific sense. I am sure you have heard some weird result in the news. That is bad science and is p-hacking. But it can be a good start to looking at your data.
Today’s github.
https://github.com/bmaillou/BigDataPython/tree/master/12-13-Correlations
Video
Video Quiz 4
ARSENIC BACKGROUND
Arsenic Paper
1 Fendorf, S., Michael, H. A. & van Geen, A. Spatial and Temporal Variations of Groundwater Arsenic in South and Southeast Asia. Science 328, 1123-1127, doi:10.1126/science.1172974 (2010). fendorf-science-09
Also, here is my powerpoint on Arsenic to give us all some background. Bangladesh-python. In case you want more arsenic information this article has the latest numbers and predictions.
Answers