Classes 16 and 17 – CO2 Time Series

Overview

(Class numbers may be slightly off don’t let that impact you.  We are going in numerical order if a number gets skipped.  Changing headers messes up the website)

Pandas is known for its time series capability where you make the index the time.  We are going to do this with CO2 data.  First we will analyze the data from the LaJolla Pier.  Here is the website that the data comes from.  I made a CO2 folder in GitHub.   I will walk you through how to do time series analysis with Pandas(pdf=pandastimeseries).  Work through how to do this.  Take your time.  Then you will do your own analysis from Mauna Lao, building on what you learned.  This is our work for the this class and next.

Try this link http://scrippsco2.ucsd.edu/assets/data/atmospheric/stations/flask_co2/daily/daily_flask_co2_ljo.csv

VIDEO

Homework.

Notebook 13.

1.  Get the daily Flask CO2 data from from Keeling curve data from Mauna Lao. USE THER DAILY FLASK!  So that is Flask CO2 Data->daily_flask_co2_mlo.csv .   http://scrippsco2.ucsd.edu/data/atmospheric_co2/mlo.html.   Things should work as we did it in the class example.  If we use the daily flask and like we did in class things should work!  When using this data you need to clean it up by only using flag=0.  We used this in our code.  df_scripps=df_scripps[df_scripps.Flags==0]

2. Predict the annual  MAX CO2 out to 2050.  Present the graph showing this data all properly labeled.  You can show the equations for each line that you use for the predictions.  Make it look good!  This is similar to what we did in class so a good warm up.  I am NOT showing my answer below.  In class we did mean.  Now we are doing max.  Choose your own unique linestyle, scatter symbol, and scatter color.

3. This is the harder one.  When looking at CO2 data we see yearly patterns.  It peaks in the late spring and decreases in the summer.  Here is a worldwide visualization for 20 years of CO2.  We know this is caused by growth during northern hemisphere spring and summer.  I want to see the monthly patterns by themselves. So to do this we need to subtract out the yearly mean data from our samples.  this is easy to do but took me a while to figure out.  You can do it one of at least two ways.  Choose one and go for it.

a.  Pandas has a nice function called rolling.  It used to be called rolling_mean but that was deprecated and it is now called rolling.  Since we have weekly data if you make a window size of 52 weeks this is a year long average around each point.  It is not the average of that year but a rolling average with a 52 week window.  Do not use weeks.  It is buggy.  Use days.  Use a frequency average of 365D.  You can not use a frequency average of years or months because that length changes during leap years.  Here are two good links https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.rolling.html and https://stackoverflow.com/questions/43556344/pandas-monthly-rolling-operation and https://pandas.pydata.org/pandas-docs/stable/computation.html#rolling-windows    To be entertaining and distracting here is a video of a rolling panda(a slightly different google search). So if you compute the rolling mean in a new column you get a nice smooth curve and could use that to subtract out the mean to look at the monthly differences.

b. If you really want to subtract out the calendar year differences it is slightly different.  This is the January 1-December 31 mean.  I found two examples here is a stackoverflow description of one.  But I also found a slightly different method.  If you make a new column that is the year of each sample you can find the mean for each year this way.  mlo.groupby(‘year’).CO2.mean()   It is similar to how we resampled for the trend analysis.  Now you need to put the years data back into the larger dataframe.  You can do this using join.  Look at the favorite green checked answer here.  https://stackoverflow.com/questions/12200693/python-pandas-how-to-assign-groupby-operation-results-back-to-columns-in-parent

c.  I think you could also use resample but I have not figured it out yet.  So I wouldn’t try it yet.

Now you should be able to make 3 plots very quickly from the new column of data you could create.  I am going to paste in the three graphs you need to make.  I did not label my graphs so don’t follow my lead.  Make yours look much better.  I shouldn’t have to tell you that by now but I am reminding you.  Add a figure caption to each figure.   Good luck making them.

4.  Final 10%.  Can you make a simple box model of CO2 concentrations?  Take a look at this notebook(pdf=MLO-Bonus-BoxModel) and see if you can do it.  I have too many hints for you in the notebook.  My graph that is not labeled well is below!

1.  The difference between every sampling point and the mean value for the year to see the seasonal differences.

2.  Same as #1 but for the years 2000 to present.  I am showing the answers for rolling and groupby so you can compare.

3.  A boxplot by month.

co2-monthly-diff

 

4.  Bonus graph of boxmodel results.  You will need to submit two graphs!  This is just the first!

CO2-box-model

38 thoughts on “Classes 16 and 17 – CO2 Time Series

  1. This class was a difficult but rewarding one. I enjoyed analyzing data that is of interest to me.The homework was much more difficult for these classes because we had done similar things to only half of it in class. I struggled with visualizing the monthly patterns. However, I do think the challenge was an important one, and we got enough class time and hints to make it possible to accomplish. For me using groupby was much easier and more intuitive than using rolling mean. Being able to plot the monthly patterns after all of the difficulties was worth it. I’m not sure about this idea, but the groupby packet that was given out when we started working on our final projects may have been useful as a bonus packet for this lesson. It’s not necessary, but maybe it would help students understand the groupby function better.

  2. This was one of the more challenging assignment especially trying to calculate the running mean and then the difference. But it really made you think and threw a lot of new things at us like the boxplot, but it truthfully was my favorite thing we made behind the gifs. Also I like the classical music for the video rather than the EDM haha.

  3. As others have said, it was fun to interact with data we’ve discussed through other classes in the major. I found it really tough to figure out the rolling mean on the homework and understand stack overflows in general. On one hand, I understand the importance of being able to figure things out independently from google and stack overflow but on the other hand it takes so long to figure out for what usually ends up being a simple line of code.

  4. I appreciate that we learned about the box model of CO2 concentration. It was my favorite way from the entire semester of how we connected our Python skills with what actually happens in the real world.

  5. This was a good lesson that helped me better understand pandas. It would be helpful if in the notes there was a reminder on how to reset the x tick labels to custom labels such as months for time series.

  6. Calculating the seasonal difference was the most difficult for me in the homework but at the end I understood what I was doing and what I wanted my code to do. This was one of the most difficult homework, but I felt accomplished after completing this assignment.

  7. I found this assignment to be so useful for learning how to manipulate data in a data frame, to prepare the data before analysis. Creating new columns and resampling was a great to help visualize data in a way that may not be initially obvious from what is given in a particular dataset. One thing I found tricky was adding a new column by actually creating an entirely separate dataframe, and I think an extra note on why this is necessary could be useful.

  8. In my opinion, this was by far the most challenging assignment throughout the whole class. I really struggled with the rolling mean function and wish we had spent a bit more time on it. However, I also think this was the most important class of the semester, because it incorporated almost everything we had learned up to it. Also, pandas is a very useful system to know and probably the most applicable for future use outside the Big Data with Python classroom.

  9. I definitely could have used more information about rolling before trying to do this homework. I don’t think the websites are as helpful as examples in class of how to use the function. I also thought this homework was just generally longer and more complicated than most of the other ones. And I would have liked more experience with boxplots as well. I think a better explanation of what the graphs were actually supposed to mean in the homework assignment would have made this homework a lot less painful.

  10. I expect Brian has already explored this extensively, but I wonder if it would be worthwhile to spend more time on box plots and how to make them pretty. I appreciated that we didn’t spend a long time on box plots because the thing where you have to add a list of tick mark labels (including blank labels) if you want your tick marks to be labeled properly is atrocious. But I also know there are good reasons people might want to use box plots for data visualization. Maybe, from a “teaching data analysis” point of view, we could spend a little bit of time reflecting on why you might choose a box plot to visualize your data vs. a scatter plot of mean annual CO2 concentrations, and what you lose by switching from a box plot to a scatter plot.

    • Also, this page is labeled in the sidebar link as “Classes 17 and 18,” but its title at the top of the page is “Classes 16 and 17,” which is slightly confusing!

  11. This was a really enjoyable packet, especially when we were able to make a graph that has so much meaning to me already as an environmental major. However, the Notebook was extremely difficult! I did enjoy figuring out how to make the graphs, but perhaps a bit more instruction about how to incorporate the equation would have been helpful. We did do it already with the polynomial homework, but hinting that we had to separate different parts of the equation out would have saved me a lot of time and frustration!

  12. What I enjoyed most about this is the fact that we were working with relevant data that we see illustrated a lot, the Mauna Loa Data. It was most satisfying to see a graph that I recognize and then making it myself. This was the first assignment introducing panda and I think that with panda, in addition to other data analysis tools, we should go over, maybe in a short lecture and out loud in class, what data analysis tool we are using and why, and what it can help us find.

  13. This was probably the most frustrating assignment for me all semester. Even working with a partner, I think we sat for several hours just trying to figure it out without getting anything done. Eventually we figured it out (with some help from you I think) but it was pretty painful. I think writing the directions in a clearer way would help students know what they were supposed to be aiming for. And maybe I just didn’t get that good an understanding of the notebook in class, but I felt like it didn’t really prepare me for the harder elements of this assignment.

  14. First of all, this was hands down my favorite pre-class video because of your decision to wear orange for Halloween. Secondly, I thought this particular class and Homework assignment were particularly challenging. I think it would have been better to explain each graph we were expected to construct a bit more thoroughly instead of just showing a picture of what it was supposed to look like upon completion. While it was possible to decipher, it could have been more clear!

  15. This was probably the lecture I enjoyed the most, as it connected Python analysis with a concept I had seen numerous times in class. I also appreciated the integration of the data through the url, as that allows for the data to be routinely updated as the next month’s concentration data is posted.

    I think it may have been valuable to integrate the box model notebook into the main one; I went to do the homework and didn’t realize some crucial details hints that were in the notebook. Nonetheless, being able to use an equation from literature to visualize a popular dataset was a lot of fun, and hopefully can be extended to other data I find in the future!

  16. This was my least favorite lesson/homework/class of all because I found the material really really challenging and the connection between the homework and the lesson was a little too minimal considering how difficult the content was. Additionally, I did not really understand the box plot in particular and spent many hours trying to piece this one together.

  17. Personally, this was the most difficult class/homework for me. My lack of environmental science background definitely did not help either. I know Brian likes to “make us suffer a bit,” but I was really lost. Maybe better descriptions of concepts or steps on the classwork would help? Also I think the homework requirements could be written more clearly.

  18. This lecture was interesting because it combined a lot of the concepts we had learned until that point. The homework was definitely one of the more difficult ones this semester as I found that it involved some concepts that hadn’t been discussed in depth in lecture. However, it taught me to be patient and to apply the skills that I had learned in a new scenario, something that I think will be really helpful using Python going forward.

  19. I didn’t really understand what the rolling mean was. I wish we had a blurb on how python actually calculated the rolling mean so that I knew what I was working with. Overall, the homework was quite difficult, but very useful as CO2 Time Series are probably super popular these days.

  20. Definitely one of the hardest homework assignments of the semester but really pushed me to use a lot of little tricks from past homework. This assignment is a direct application of class materials on a larger data set. I would recommend writing the directions out in a different manner just to make it easier to understand exactly what is expected.

  21. This assignment was my favorite an the most useful one for me. I really like to learn how to get data from the web.. super helpful!!!! and I also found the datetime index extremely useful, and it became handy when trying to do other work. I like how we use the equation for the model. I did had some problems in the hw, but I though the information you gave was sufficient to complete the assignment. I though the stack overflow description to used for the’normed’ was easier to find the rolling means. at first it was a bit confusing , as many of the answers in stack overflows, until i get what they are saying it can be so helpful! I just want to say that this assignment was really helpful for anyone in earth sciences. i learned a lot!

  22. This was a fun and exciting class, because the data was something we’ve studied in other courses as environmental science majors, and now we got to plot it ourselves with code. It also made me sad though because that morning we had elected a climate denier as our next president, and just hours later we were being made to create the most iconic climate change curve. While the notebook itself was straightforward, I had trouble creating the plots for the homework on my own. This notebook was also helpful for referring back to when working on final projects.

  23. I think this class was really useful in terms of learning how to analyze data sets using different statistical methods. Learning how to plot the residuals of a scatter plot was one of the most useful things I learned this semester, and I think that it should be given some extra explanation and emphasis in the future, as should using poly1d. Students who aren’t as familiar with statistics may not understand why plotting the residuals is important, and students who use statistics often may find themselves needing to learn how to do it in Python.

  24. I thought most of this lesson was pretty straightforward, and think that the information provided will be very helpful in future projects. I also appreciated the opportunity to get more practice with poly1d. I think further explanation of rolling means beforehand could have prevented some confusion about the homework.

  25. I found this assignment really interesting and applicable to other coursework. I’m still a little bit confused about boxplots and what boxplot grouped by month actually means. Perhaps it might be useful to go over this with the whole class a bit more.

  26. This was a valuable assignment: I agree it was good to learn about datetime indices, bar graphs and doing math in dataframes. However, the connections between the classwork and homework could have used more scaffolding: How do you calculate rolling_mean? If you are not familiar with differential equations, how do you break them down and do the math in Python?

  27. The in-class notes were fairly easy to follow and I found datetime very useful. I struggled with the statistics involved in the homework assignment, because I didn’t really understand how monthly cycles were supposed to look wrt the graphs, and I didn’t know how rolling means worked.

  28. This exercise was fun and I think it helped a lot with my ability to manipulate data sets, especially with regards to setting and using an index. I wish that we had spent more time on rolling mean, or looked at another way to plot seasonal variability, because it seems to me like that is generally an important thing with large environmental data sets.

  29. This exercise of plotting CO2 data collected in Mauna Lao was very interesting. Any student who is interested in basic Earth Science would have seen the weekly CO2 keeling curve or had to make similar excel plots for homework in related courses (particularly EESC V2100 The Climate System). So this class was very relatable.

    For the plotting the first graph for the homework (seasonal difference), I wasn’t sure how to put the date in the x-axis instead of mol[‘Year’]. I think it would be useful to remind students in the notebook that “filename.index” would call to the index whichever we created and stored in Panda, which would be “Date” in this case.

  30. Extending the line to predict potential outcomes will be very helpful for my project and as Dani said above finding new ways to make your data look more presentable is very helpful!

Leave a Reply