Recently I ran into a stone wall when I tried to analyse the time series data from my first-year experiments.
In my experiments, I measured tree sap flow velocity and different environmental controls such as temperature, solar radiation, relative humidity, wind speed and soil moisture. However, these data were at different time steps. The sap flow velocity were at a hourly basis, while the environmental controls were measured at either every 10 minutes or 20 minutes. So I had to aggregate those into the same time sequences.
There are two types of aggregating data. The first is to assemble these data at every hour at each date, in order to make a correlation analysis or multiple linear regression; and the second is to average data which are measured at the same hour, disregarding the date, aiming at analyzing the 24-hour profile of the variables.
Thanks to R, I succeeded in dealing with my four-month data at just ten seconds. Now, I’d like to share the codes with you, giving my soil moisture data as an example. 🙂
-
Making a 24-hour profile of soil moisture
Here is part of my soil moisture data (unit: VW%) at my Treatment 2 plot. I measured the soil moisture at two depth: Shallow ( at the 20 cm depth), and Deep (at the 40 cm depth).
Fig. 1 Example of initial soil moisture data
(1) Save the excel file as *.csv, and import it into R Studio. In my case, I input my data as data frame named soilt2. Please remember to change the data type of Datetime column as Datetime, and turn Shallow column and Deep colunm as numeric.
Fig. 2 Input soil moisture dataset
(2) Input code:
soilt2$Datetime1 <- as.POSIXlt(soilt2$Datetime)$hour ### This will add a new colunm named Datetime1 in the soilt2 dataset. Datetime1 displayed below showed that only the hour was extracted from Datetime.
Fig. 3 the added new colunm Datetime1 is showed on the right of the table.
soilt2 <- aggregate(cbind(Shallow, Deep) ~ Datetime1, data=soilt2, FUN=mean) ### This averaged Shallow and Deep data into hourly data, regardless of the date. In my case, I changed my original dataset. If you don’t want to make change of the original dataset, please give a new name to your new dataset, such as soilt3.
Fig. 4 The changed soilt2 data frame.
Done! Yay!
2. Assemble the data at each hour at each date.
(1) Import the dataset (the same as the first method).
(2) Input the following code
soilt2$Datetime2 <- droplevels(cut(soilt2$Datetime, breaks=”hour”)) ### This added a new colunm named Datetime2 in soilt2, which includes the date and the hour extracted from Datetime column.
Fig. 5 The added new column Datetime2 is showed on the right of the table.
soilt3 <- aggregate(cbind(Shallow, Deep) ~ Datetime2, data=soilt2, FUN=mean) ### This calculate the mean of data in each hour at each date.
Fig. 6 The new soilt3 data frame.
Yay again! Done!
If you want to output these data into excel, you can load package (xlsx), and using the code of write.xlsx( ). For example, I use the following code:
write.xlsx(soilc1, “c:/Users/Yi Wang/Desktop/soilc1.xlsx”)
There are definitely other ways to aggregate the data, but the codes that I put on this blog, in my opinion, is the most simple code. If you disagree with me, or you’d like to share your code, please feel free to leave your comments.
Thank you for reading!
May, 04, 2017