1. 程式人生 > >STA 483/583 Semester Project

STA 483/583 Semester Project

STA 483/583作業代做、代寫R語言作業、代做dataDescription作業、代寫R實驗設計作業
STA 483/583 Semester Project
Part 2 – Performing a historical analysis of a time series
Due: November 19, 2018
In this project we will be exploring some atmospheric environmental data from Madrid, Spain.
The file madrid_01_17.zip (compressed zip file on canvas site) contains the raw recorded
measurements of hourly air quality measurements from Madrid, Spain from 2001 through 2017 at 24
sites in and around Madrid. The site locations is available in the file stations.csv. A data
dictionary is available in the file dataDescription.pdf.
General Goal (part 2):
Perform a historic assessment to determine if the amount of carbon monoxide and nitrogen dioxide has
changed in time, with the caveat that you should make your assessment only on stations with a
reasonably complete record (more on that below)
Notes & nuances in the data:
- The data is fairly big: 75.6MB (zipped), 3,729,128 rows after properly combining the 17 years of data
- Some of the files have different numbers of columns (e.g., some years there is no NOx measure). It is
probably best to trim each year to relevant variable before combining.
- Because of equipment difference, some measurements are completely missing from some stations.
- The list of stations (with names) is in a separate file from the raw data, you will need to link the two.
- This is a real dataset and, like all real data, occasionally experiences some real data problems.
Specifics:
To successfully complete this part of the project, you will need to:
1. Read in the data successfully
2. Determine which stations have a reasonably complete record of measurements for carbon
monoxide and nitrogen dioxide, to do this, perform the following:
a) Aggregate the carbon monoxide and nitrogen dioxide measurements into year-month
averages for each site in the dataset.
b) Construct plots to explore the historic record for each site determining which sites you feel
are reasonably complete (you make that determination).
c) Justify your decision on which sites you feel have a reasonably complete record. For the
remainder of the analysis, only use data from these sites.
3. Using only the sites you feel are reasonably complete, aggregate the carbon monoxide and
nitrogen dioxide measures into daily averages, year-month averages and yearly averages.
4. Graphically explore each of the three aggregated measures of carbon monoxide and nitrogen
dioxide to determine if you feel the measurements have changed in time.
5. Using the year-month average of carbon monoxide and nitrogen dioxide, build statistical
models that model all systematic components of the time series (possible trends, seasonality,
autocorrelation) and use the model to address the question of whether the measurement of
carbon monoxide or nitrogen dioxide has changed in time. Your chosen model should be as
parsimonious as possible while adequately modeling all aspects of the time series.
6. Make some overall conclusions regarding the air quality levels of carbon monoxide and
nitrogen dioxide in Madrid. Your overall findings must be supported by graphical and/or
numeric summaries that help tell the story.
Some hints:
There are several ways to take monthly aggregates in R:
o the functions group_by() and summarize() in the package dplyr will be handy.
The packages lubridate will be useful for working with timestamps.
Feel free to use other software languages (SAS) if you find them helpful, but your results must
be completely reproducible! You will need to replicate the results on future parts of the project.
The option na.rm=TRUE will be handy in your aggregation if using R.
This analyses does involve model building but also includes graphical and numerical
summaries. Make sure to properly label plots and tables.
Several DataCamp modules will also be posted that may be helpful!
Encouragement:
The underlying idea of this assignment is largely the same as the lab days in October where we fit
deterministic models with autocorrelation, however here you are combining several aspects of what we
covered as well as dealing with date and time stamps. Part of the modern practice of statistics is dealing
with difficult data and getting it into a usable format.
What to turn in:
A short well-written report outlining your exploratory analysis (I suggest using Rmarkdown). There is
no page limit requirements but I would expect at least a 3-5 page report (keep in mind you are
including several and tables in your writeup). Make sure your report addresses all questions and is
well-formatted (information on formatting Rmarkdown documents is also available on the canvas site.
This report is due on Monday, November 19 on canvas. Your report should include all necessary work
but be as brief as possible. Large chunks of source code should (largely) be relegated to an appendix.
Grades:http://www.6daixie.com/contents/18/2140.html
This is part 2 (of 3) of a semester long project looking at this environmental data. This part will count
as 30% of your total semester project grade, which corresponds to 6% of your total grade in the course
(project is worth 20% total). Undergraduate students are allowed to work in pairs while graduate
students are expected to work on your own!
If you are an undergraduate and working with another student you must tell me your partner before
Friday, November 9, 2018.
In part 3 you will construct a predictive model and undergraduates once again will be allowed to pair
up, however you will NOT be allowed to work with the same student.

 

因為專業,所以值得信賴。如有需要,請加QQ99515681 或郵箱:[email protected] 

微信:codinghelp