Class 11 – More Pandas

Today we are going to try and automate pandas (pdf=pandas_well_data_part2).  Lets see how we can do.  Our first goal is to make a plot of every parameter versus depth.  this is what my file looks like.  Click on view raw to get the file.  this is the subplots we are also going to try and make.    Lets see how far we get!  Yours should look much better!

VIDEO

 

Homework Due: Due Next Class

(Wednesday February 28, 2024)

  1. Make a pdf file that contains a plot of As versus every other parameter.  Make sure it is labeled nicely and in one pdf file.   Arsenic should be on the x-axis and the other parameters should be on the y-axis.
    1. Each parameter should be on one page.
    2. make sure your axes go in the correct direction.
    3. Don’t leave in leftover code that messes you up.
    4. You will need to hand in your pdf file of results.
    5. You do not need to color by arsenic.  It can be all one color!! Since you are plotting against arsenic the colors become redundant.  But choose a good symbol and color.
    6. The column “Drink” is an object and not a float.  It plots okay when on an x-axis but not a y-axis.  If your code crashes you need to skip this column.
  2. Make a plot with subplots that is As versus at least 3 other parameters each on their own graph but all on one page.  You can decide if it is better to share an x or share a y axis.  But Arsenic is the independent variable.
    1. Save this graph to a jpeg and turn that in also.
    2. For this your graphs should be square.  They should have letters denoting them like in the packet.
  3. If a graph is a weird shape or scrunched that is not a good thing.  You need the data to look nice. I use fig.set_size_inches(10,15) where the 10 and 15 are the paper size. You can play with the numbers to get a good looking figure.
  4. When you make a pdf file or any file for class make sure to turn the file in on courseworks.

 

HINTS

The Drink column is going to cause scatter to crash. So you will need to avoid it. There are mulitple methods. For example you could use an if statement and say

if col != ‘Drink’:
            then do stuff
A more robust methods might be to check and make sure the parameter is a float.
if df[col].dtype==float:
            then do stuff
AN EXAMPLE
for col in df:
if df[col].dtype==float:
print (‘{} is a float ‘.format(col))
else:
print(‘{} is NOT a float’.format(col))