Classes 17 and 18 – CO2 Time Series

Overview

(Class numbers may be slightly off don’t let that impact you. We are going in numerical order if a number gets skipped. Changing headers messes up the website)

Pandas is known for its time series capability where you make the index the time. We are going to do this with CO₂ data. First we will analyze the data from the LaJolla Pier. Here is the website that the data comes from. I made a CO2 folder in GitHub. I will walk you through how to do time series analysis with Pandas(pdf=pandastimeseries). Work through how to do this. Take your time. Then you will do your own analysis from Mauna Lao, building on what you learned. This is our work for the this class and next.

Try this link http://scrippsco2.ucsd.edu/assets/data/atmospheric/stations/flask_co2/daily/daily_flask_co2_ljo.csv

VIDEO

Homework.

Notebook 13.

1. Get the daily Flask CO2 data from from Keeling curve data from Mauna Lao. USE THER DAILY FLASK! So that is Flask CO2 Data->daily_flask_co2_mlo.csv . http://scrippsco2.ucsd.edu/data/atmospheric_co2/mlo.html. Things should work as we did it in the class example. If we use the daily flask and like we did in class things should work! When using this data you need to clean it up by only using flag=0. We used this in our code. df_scripps=df_scripps[df_scripps.Flags==0]

2. Predict the annual MAX CO2 out to 2050. Present the graph showing this data all properly labeled. You can show the equations for each line that you use for the predictions. Make it look good! This is similar to what we did in class so a good warm up. I am NOT showing my answer below. In class we did mean. Now we are doing max. Choose your own unique linestyle, scatter symbol, and scatter color.

3. This is the harder one. When looking at CO2 data we see yearly patterns. It peaks in the late spring and decreases in the summer. Here is a worldwide visualization for 20 years of CO2. We know this is caused by growth during northern hemisphere spring and summer. I want to see the monthly patterns by themselves. So to do this we need to subtract out the yearly mean data from our samples. this is easy to do but took me a while to figure out. You can do it one of at least two ways. Choose one and go for it.

a. Pandas has a nice function called rolling. It used to be called rolling_mean but that was deprecated and it is now called rolling. Since we have weekly data if you make a window size of 52 weeks this is a year long average around each point. It is not the average of that year but a rolling average with a 52 week window. Do not use weeks. It is buggy. Use days. Use a frequency average of 365D. You can not use a frequency average of years or months because that length changes during leap years. Here are two good links https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.rolling.html and https://stackoverflow.com/questions/43556344/pandas-monthly-rolling-operation and https://pandas.pydata.org/pandas-docs/stable/computation.html#rolling-windows To be entertaining and distracting here is a video of a rolling panda(a slightly different google search). So if you compute the rolling mean in a new column you get a nice smooth curve and could use that to subtract out the mean to look at the monthly differences.

b. If you really want to subtract out the calendar year differences it is slightly different. This is the January 1-December 31 mean. I found two examples here is a stackoverflow description of one. But I also found a slightly different method. If you make a new column that is the year of each sample you can find the mean for each year this way. mlo.groupby(‘year’).CO2.mean() It is similar to how we resampled for the trend analysis. Now you need to put the years data back into the larger dataframe. You can do this using join. Look at the favorite green checked answer here. https://stackoverflow.com/questions/12200693/python-pandas-how-to-assign-groupby-operation-results-back-to-columns-in-parent

c. I think you could also use resample but I have not figured it out yet. So I wouldn’t try it yet.

Now you should be able to make 3 plots very quickly from the new column of data you could create. I am going to paste in the three graphs you need to make. I did not label my graphs so don’t follow my lead. Make yours look much better. I shouldn’t have to tell you that by now but I am reminding you. Add a figure caption to each figure. Good luck making them.

4. Final 10%. Can you make a simple box model of CO2 concentrations? Take a look at this notebook(pdf=MLO-Bonus-BoxModel) and see if you can do it. I have too many hints for you in the notebook. My graph that is not labeled well is below!

1. The difference between every sampling point and the mean value for the year to see the seasonal differences.

2. Same as #1 but for the years 2000 to present. I am showing the answers for rolling and groupby so you can compare.

3. A boxplot by month.

4. Bonus graph of boxmodel results. You will need to submit two graphs! This is just the first!

Big Data with Python

Barnard | Department of Environmental Science | Professor Brian J. Mailloux

Category Archives: Classes 17 and 18 – CO2 Time Series

Classes 16 and 17 – CO2 Time Series

Overview

(Class numbers may be slightly off don’t let that impact you. We are going in numerical order if a number gets skipped. Changing headers messes up the website)

VIDEO

Homework.