I produced several graphs for my field research study project, some are more insightful than others. One that surprised me, and which I was grateful to have produced was for elevation data obtained from my handheld Garmin GPS device, plotted against sampling order. The reason that I plotted these variables was to identify if the elevation data was actually accurate or not. Because I did not sample sites sequentially by order of their elevation (I sampled randomly), there should be no relationship between sampling order and elevation. However, as you can see, elevation clearly declined with each sampling event, suggesting that each reading was continually improving (or worsening) as additional satellites connected with the device.
Summarizing and aggregating data was made much easier by using the R statistical software which is free and has an excellent community of users and resources available online for support. I highly recommend using R if you need to complete analysis, wrangle or clean data, manipulate tables, and/produce figures. It has amazing functionality and is completely free. There are lots of great YouTube resources out there as well, I recommend the channel Statistics Globe.
I also think it’s important to share knowledge and help others not to have to reinvent the wheel, so in case there are any R users out there or students wishing to learn and use a free and open source program for statistical analysis and data visualization, please feel free to use and adapt the below R script for your own analysis, or redistribute/share as you wish. The code is all fully annotated (using ###) so that it is a little easier to understand. You should be able to copy paste this into R Studio and use it or change as needed.
#### get working directory
getwd()
#### set working directory
setwd(“C:/Users/Elliot/Desktop/Bsc Environmental Science/TRU/BIOL 3021 Community and Ecosystem Ecology”)
#### read data into R from .csv format
df <- read.csv(“Sampling Data.csv”)
#### make df a data frame
data.frame(df)
#### create objects for variables and observations
species <- as.factor(df$species)
flood <- as.factor(df$flood.frequency)
cover <- as.numeric(df$cover)
site <- as.factor(df$site)
beaver <- as.factor(df$beaver)
disturbance <- as.factor(df$disturbance)
total.cover <- as.numeric(df$total.woody.cover)
sampling.order <- as.numeric(df$sampling.order)
elev <- as.numeric(df$elevation)
invasive <- as.numeric(df$invasive.cover)
native <- as.numeric(df$native.cover)
#### packages needed for analysis and visualization,
#### use install.packages(“packagename”) for first time
library(tidyr)
library(dplyr)
library(ggplot2)
library(psych)
######## scratch workspace for random analysis/visualization
df$native.cover = df$total.woody.cover – df$invasive.cover
specsum <- summary(species)
specsum <- data.frame(specsum)
speciesmeans <- describeBy(cover, species)
covermeans <- describeBy(total.cover, flood)
data_count_2 <- df %>% # Applying group_by & summarise
group_by(site) %>%
summarise(count = n_distinct(species))
hist(data_count_2$count, xlab = “Species Richness”, main = “”)
mean(data_count_2$count)
######## visualizations
#### cover by species, all sites
ggplot(df, aes(x = species, y = cover, fill = species)) + geom_boxplot() +
coord_flip() + labs(x = “Species”, y = “% Cover”) + theme(legend.position = “none”) +
theme(axis.text.y = element_text(face = “italic”))
#### total woody cover by flood frequency
ggplot(df, aes(x = flood, y = total.cover, fill = flood)) + geom_boxplot() +
labs(x = “Flood Return Period (Years)”, y = “Total Woody Species Cover (%)”)
#### invasive species cover by flood frequency
ggplot(df, aes(flood, y = invasive)) + geom_boxplot() +
labs(y = “% Cover Invasive Species”, x = “Flood Return Period (Years)”)
#### elevation by flood frequency, this was to test GPS accuracy (it was bad)
ggplot(df, aes(x = flood, y = elev, fill = flood)) + geom_boxplot() +
labs(x = “Flood Frequency”, y = “Elevation (masl)”)
#### elevation versus sampling order
#### obviously the elevation just went down every reading, totally inaccurate
plot(sampling.order, elev, xlab = “Sampling Order”, ylab = “Elevation (masl)”)
#### summary stats for GPS elevations
summary(elev)
######### analysis of flood level on cover
#### create new dataframe with 5 variables
dat <- df %>% select(invasive.cover, total.woody.cover, native.cover, flood.frequency, disturbance)
#### remove NAs
dat <- na.omit(dat)
#### summary stats for woody species cover
summary(dat$total.woody.cover)
sd(dat$total.woody.cover)
#### summary stats for invasive species cover
summary(dat$invasive.cover)
sd(dat$invasive.cover)
#### create objects for each variable
dat$invasive.cover = as.numeric(dat$invasive.cover)
dat$flood.frequency = as.factor(dat$flood.frequency)
dat$total.woody.cover = as.numeric(dat$total.woody.cover)
dat$native.cover = as.numeric(dat$native.cover)
#### boxplot of invasive cover at each flood freq
ggplot(dat) +
aes(x = flood.frequency, y = invasive.cover) +
geom_boxplot() +
theme(legend.position = “none”)
#### boxplot of native cover at each flood freq
ggplot(dat) +
aes(x = flood.frequency, y = native.cover) +
geom_boxplot() +
theme(legend.position = “none”)
#### create object for one-way ANOVA of response groups vs. flood freq
inv_aov <- aov(invasive.cover ~ flood.frequency, data = dat)
total_aov <- aov(total.woody.cover ~ flood.frequency, data = dat)
native_aov <- aov(native.cover ~ flood.frequency, data = dat)
#### create object for two-way ANOVA of response groups vs. flood freq
twoway <- aov(invasive.cover ~ flood.frequency + disturbance, data = dat)
twoway2 <- aov(invasive.cover ~ flood.frequency + disturbance + flood.frequency:disturbance, data = dat)
#### summary stats for one and two-way ANOVA results
summary(inv_aov)
summary(total_aov)
summary(native_aov)
summary(twoway)
summary(twoway2)
#### histograms of one-way ANOVA residuals
hist(inv_aov$residuals)
hist(total_aov$residuals)
#### calculate means and standard dev for invasive cover at each flood frequency
aggregate(invasive.cover ~ flood.frequency,
data = dat,
function(x) round(c(mean = mean(x), sd = sd(x)), 2))
aggregate(total.woody.cover ~ flood.frequency,
data = dat,
function(x) round(c(mean = mean(x), sd = sd(x)), 2))
#### another way of doing ANOVA in R
oneway.test(invasive.cover ~ flood.frequency, data = dat,
var.equal = TRUE)
oneway.test(total.woody.cover ~ flood.frequency, data = dat,
var.equal = TRUE)
oneway.test(native.cover ~ flood.frequency, data = dat,
var.equal = TRUE)
#### post-hoc analysis using Tukey HSD for significant ANOVA results
tuk1 <- TukeyHSD(inv_aov)
tuk1 <- as.data.frame(tuk1$flood.frequency)
tuk2 <- TukeyHSD(total_aov)
tuk2 <- as.data.frame(tuk2$flood.frequency)
tuk3 <- TukeyHSD(native_aov)
tuk3 <- as.data.frame(tuk3$flood.frequency)
######## analysis of disturbance type vs cover
#### create new data frame of 4 variables related to disturbance
disturb <- df %>% select(invasive.cover, total.woody.cover, native.cover, disturbance)
#### remove NA
disturb <- na.omit(disturb)
#### summary stats for disturbance data frame
summary(disturb)
#### create objects for variables
disturb$invasive.cover = as.numeric(disturb$invasive.cover)
disturb$disturbance = as.factor(disturb$disturbance)
disturb$total.woody.cover = as.numeric(disturb$total.woody.cover)
disturb$native.cover = as.numeric(disturb$native.cover)
#### boxplot of invasive cover at different disturbance types
ggplot(disturb) +
aes(x = disturbance, y = invasive.cover) +
geom_boxplot() +
theme(legend.position = “none”)
#### another way of doing a boxplot, using base R syntax
boxplot(invasive.cover ~ disturbance, data = disturb)
#### one-way ANOVA for cover variables vs. disturbance types
dist.inv_aov <- aov(invasive.cover ~ disturbance, data = disturb)
dist.total_aov <- aov(total.woody.cover ~ disturbance, data = disturb)
dist.native_aov <- aov(native.cover ~ disturbance, data = disturb)
#### summary stats for one-way ANOVA results
summary(dist.inv_aov)
summary(dist.total_aov)
summary(dist.native_aov)
#### histogram of ANOVA residuals
hist(dist.inv_aov$residuals)
hist(dist.total_aov$residuals)
#### calculate means and standard dev for invasive cover at each disturbance type
aggregate(invasive.cover ~ disturbance,
data = disturb,
function(x) round(c(mean = mean(x), sd = sd(x)), 2))
aggregate(total.woody.cover ~ disturbance,
data = disturb,
function(x) round(c(mean = mean(x), sd = sd(x)), 2))
#### another way of doing ANOVA in R
oneway.test(invasive.cover ~ disturbance, data = disturb,
var.equal = TRUE)
oneway.test(total.woody.cover ~ disturbance, data = disturb,
var.equal = TRUE)
#### post-hoc analysis using Tukey HSD for significant ANOVA results
tuk4 <- TukeyHSD(dist.inv_aov)
tuk4 <- as.data.frame(tuk4$disturbance)
tuk5 <- TukeyHSD(dist.total_aov)
tuk5 <- as.data.frame(tuk5$disturbance)
tuk6 <- TukeyHSD(dist.native_aov)
tuk6 <- as.data.frame(tuk6$disturbance)
######## analysis of beaver vs. cover
#### no annotation of code beyond this point
beav <- df %>% select(invasive.cover, total.woody.cover, native.cover, beaver)
beav <- na.omit(beav)
summary(beav)
beav$invasive.cover = as.numeric(beav$invasive.cover)
beav$disturbance = as.factor(beav$beaver)
beav$total.woody.cover = as.numeric(beav$total.woody.cover)
beav$native.cover = as.numeric(beav$native.cover)
ggplot(beav) +
aes(x = beaver, y = invasive.cover) +
geom_boxplot() +
theme(legend.position = “none”)
boxplot(invasive.cover ~ beaver, data = beav)
beav.inv_aov <- aov(invasive.cover ~ beaver, data = beav)
beav.total_aov <- aov(total.woody.cover ~ beaver, data = beav)
beav.native_aov <- aov(native.cover ~ beaver, data = beav)
summary(beav.inv_aov)
summary(beav.total_aov)
summary(beav.native_aov)
hist(beav.inv_aov$residuals)
hist(beav.total_aov$residuals)
hist(beav.native_aov$residuals)
aggregate(invasive.cover ~ beaver,
data = beav,
function(x) round(c(mean = mean(x), sd = sd(x)), 2))
aggregate(total.woody.cover ~ beaver,
data = beav,
function(x) round(c(mean = mean(x), sd = sd(x)), 2))
oneway.test(invasive.cover ~ beaver, data = beav,
var.equal = TRUE)
oneway.test(total.woody.cover ~ beaver, data = beav,
var.equal = TRUE)
oneway.test(native.cover ~ beaver, data = beav,
var.equal = TRUE)
TukeyHSD(beav.inv_aov)
TukeyHSD(beav.total_aov)
TukeyHSD(beav.native_aov)
#### woohoo, all done!!
Impressed with your R usage, your paper should have some nice figures. You will have to address how your elevation and sampling order results impact the rest of your results…