Here is a data set we just recently published Gnanaprakasam-mBio-2017-2j44e56.   this is the figure I made for the paper.  
.   What happened was I did field work with a colleague in Britain who then went and did illumina sequencing to look at species diversity in Bangladesh sediment.  He did 16s rRNA analysis of the DNA and found percent species abundance.  I then did analysis of the water and sediment to get the arsenic content.   My colleague wants to know what bacterial species he should further investigate.  Now we want to look at more species and see if there are correlations with the the different arsenic species.  He is asking if you can give him a list of 4 and then 6 species to investigate further.
Your mission if you choose………. Is to p-hack this data. Tell me what bacterial species might be important and what we should investigate further.
Due Monday October 20, 2025
- Look up what arsenic species you will analyze and what site
 - Next p-hack the data. Search the data to see if any correlations are significant and make one pdf file. This might be big and cumbersome but could be helpful. It will have 100’s of pages! Choose to p-hack with arsenic or hematite.
 - Then you should look at the plots and results.
 - Save the r2 values, sort them and make two more figures.
 
- One with 4 graphs on one figure for species with the 4 highest r2 values
 - one with 6 graphs on one figure with the 6 highest r2 values.
 - All graphs should have a text box from linregress that show the slope, intercept, r2 and p-value. remember r2 is just the r-value squared.
 - For you best correlation google the bacterial species and see if the description makes sense as to why there is a correlation. Describe in 1-3 sentences.
 - The data is in excel this time! It is on github
 - Final 10%
 - The column names are long and ugly from the sequencing. Fix this for your plots of 4 and 6! Use the method from he renaming notebook. This gets you 5%. These long names are what are automatically downloaded from NIH/NCBI.
 - The second 5% is if you make a seaborn correlation matrix for at least 5 columns including the 3 arsenic species and 2 bacterial species.
 
YOU WILL NEED TO HAND IN
- Your Notebook as usual.
 - PDF file of all the results.
 - JPG Plot of 4 parameters!
- I am not showing the answers yet!
 
 - JPG Plot of 6 parameters!
- I am not showing the answers yet!