This is the #TidyTuesday data for 9/1/20
Tidy Tuesday Summary
This week I was pretty excited to attack this data set. I am from central Illinois and agriculture is a major part of my life. The Tidy Tuesday data is about crop yields over the years. The data was retrieved from Our World in Data. For ease, I decided to use the TidyTuesdayR package template for my analysis. Upon loading, I discovered 5 different data sets with all kinds of information, but each table had a country and a year. I decided to combine all the tables with a left join. I thought it would be interesting to see how farming has changed over the years in the United States so I standardized the values into a percentage based on the respective elements start date, most commonly, 1961. There is quite a bit of information contained within the plot. We can see that the amount of land needed to produce the same amount of crops has significantly decreased over the years. Also, the amount of tractors needed for the same size of land has went relatively unchanged. The most interesting aspect is how crop yields have increased proportionately to population growth over the years. It seems population is a major driver for the need of increased crop production. As I said, this connection is proportionate. These crops also feed animals. Animal production has surely seen the same trend as population increases. Therefore, it makes sense that crop yield would need to outpace population growth on average. Further, land needed for a fixed amount of crops has seen a decrease with population growth. People need a place to live. More people need more places to live and that takes up space. Population growth also seems to be driving innovation and engineering to make the most of the land we have to use. This analysis has turned out to be incredibly fascinating. It has been a pleasure diving into this #TidyTuesday.
Analysis Section
Load the weekly Data
Dowload the weekly data and make available in the tt
object.
# tt <- tt_load("2020-09-01")
tt <- read_rds("tt-9-1-20.rds")
Readme
Take a look at the readme for the weekly data to get insight on the dataset. This includes a data dictionary, source, and a link to an article on the data.
tt
Glimpse Data
Take an initial look at the format of the data available.
tt %>%
map(glimpse)
## Rows: 11,280
## Columns: 4
## $ Entity <chr> …
## $ Code <chr> …
## $ Year <dbl> …
## $ `Arable land needed to produce a fixed quantity of crops ((1.0 = 1961))` <dbl> …
## Rows: 11,965
## Columns: 5
## $ Entity <chr> "Afghanistan", "Afg…
## $ Code <chr> "AFG", "AFG", "AFG"…
## $ Year <dbl> 1961, 1962, 1963, 1…
## $ `Cereal yield (tonnes per hectare)` <dbl> 1.1151, 1.0790, 0.9…
## $ `Nitrogen fertilizer use (kilograms per hectare)` <dbl> NA, NA, NA, NA, NA,…
## Rows: 49,403
## Columns: 6
## $ Entity <chr> "Afghanistan"…
## $ Code <chr> "AFG", "AFG",…
## $ Year <chr> "1961", "1962…
## $ `Tractors per 100 sq km arable land` <dbl> 0.1568627, 0.…
## $ `Cereal yield (kilograms per hectare) (kg per hectare)` <dbl> 1115.1, 1079.…
## $ `Total population (Gapminder)` <dbl> 9169000, 9351…
## Rows: 13,075
## Columns: 14
## $ Entity <chr> "Afghanistan", "Afghanistan", "Afg…
## $ Code <chr> "AFG", "AFG", "AFG", "AFG", "AFG",…
## $ Year <dbl> 1961, 1962, 1963, 1964, 1965, 1966…
## $ `Wheat (tonnes per hectare)` <dbl> 1.0220, 0.9735, 0.8317, 0.9510, 0.…
## $ `Rice (tonnes per hectare)` <dbl> 1.5190, 1.5190, 1.5190, 1.7273, 1.…
## $ `Maize (tonnes per hectare)` <dbl> 1.4000, 1.4000, 1.4260, 1.4257, 1.…
## $ `Soybeans (tonnes per hectare)` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ `Potatoes (tonnes per hectare)` <dbl> 8.6667, 7.6667, 8.1333, 8.6000, 8.…
## $ `Beans (tonnes per hectare)` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ `Peas (tonnes per hectare)` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ `Cassava (tonnes per hectare)` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ `Barley (tonnes per hectare)` <dbl> 1.0800, 1.0800, 1.0800, 1.0857, 1.…
## $ `Cocoa beans (tonnes per hectare)` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ `Bananas (tonnes per hectare)` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA…
## Rows: 49,259
## Columns: 6
## $ Entity <chr> "Afghanis…
## $ Code <chr> "AFG", "A…
## $ Year <chr> "1961", "…
## $ `Cereal yield index` <dbl> 100, 97, …
## $ `Change to land area used for cereal production since 1961` <dbl> 100, 103,…
## $ `Total population (Gapminder)` <dbl> 9169000, …
## $arable_land_pin
## # A tibble: 11,280 x 4
## Entity Code Year `Arable land needed to produce a fixed quantity of cr…
## <chr> <chr> <dbl> <dbl>
## 1 Afghanist… AFG 1961 1
## 2 Afghanist… AFG 1962 0.984
## 3 Afghanist… AFG 1963 1.01
## 4 Afghanist… AFG 1964 0.939
## 5 Afghanist… AFG 1965 0.907
## 6 Afghanist… AFG 1966 0.927
## 7 Afghanist… AFG 1967 0.832
## 8 Afghanist… AFG 1968 0.812
## 9 Afghanist… AFG 1969 0.792
## 10 Afghanist… AFG 1970 0.876
## # … with 11,270 more rows
##
## $cereal_crop_yield_vs_fertilizer_application
## # A tibble: 11,965 x 5
## Entity Code Year `Cereal yield (tonnes p… `Nitrogen fertilizer use (kil…
## <chr> <chr> <dbl> <dbl> <dbl>
## 1 Afghanis… AFG 1961 1.12 NA
## 2 Afghanis… AFG 1962 1.08 NA
## 3 Afghanis… AFG 1963 0.986 NA
## 4 Afghanis… AFG 1964 1.08 NA
## 5 Afghanis… AFG 1965 1.10 NA
## 6 Afghanis… AFG 1966 1.01 NA
## 7 Afghanis… AFG 1967 1.22 NA
## 8 Afghanis… AFG 1968 1.29 NA
## 9 Afghanis… AFG 1969 1.31 NA
## 10 Afghanis… AFG 1970 1.11 NA
## # … with 11,955 more rows
##
## $cereal_yields_vs_tractor_inputs_in_agriculture
## # A tibble: 49,403 x 6
## Entity Code Year `Tractors per 100… `Cereal yield (kilo… `Total populatio…
## <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 Afghan… AFG 1961 0.157 1115. 9169000
## 2 Afghan… AFG 1962 0.195 1079 9351000
## 3 Afghan… AFG 1963 0.258 986. 9543000
## 4 Afghan… AFG 1964 0.256 1083. 9745000
## 5 Afghan… AFG 1965 0.385 1099. 9956000
## 6 Afghan… AFG 1966 0.511 1012. 10175000
## 7 Afghan… AFG 1967 0.637 1224. 10400000
## 8 Afghan… AFG 1968 0.637 1288. 10637000
## 9 Afghan… AFG 1969 0.700 1310. 10894000
## 10 Afghan… AFG 1970 0.699 1105. 11174000
## # … with 49,393 more rows
##
## $key_crop_yields
## # A tibble: 13,075 x 14
## Entity Code Year `Wheat (tonnes … `Rice (tonnes p… `Maize (tonnes …
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Afgha… AFG 1961 1.02 1.52 1.4
## 2 Afgha… AFG 1962 0.974 1.52 1.4
## 3 Afgha… AFG 1963 0.832 1.52 1.43
## 4 Afgha… AFG 1964 0.951 1.73 1.43
## 5 Afgha… AFG 1965 0.972 1.73 1.44
## 6 Afgha… AFG 1966 0.867 1.52 1.44
## 7 Afgha… AFG 1967 1.12 1.92 1.41
## 8 Afgha… AFG 1968 1.16 1.95 1.71
## 9 Afgha… AFG 1969 1.19 1.98 1.72
## 10 Afgha… AFG 1970 0.956 1.81 1.48
## # … with 13,065 more rows, and 8 more variables: `Soybeans (tonnes per
## # hectare)` <dbl>, `Potatoes (tonnes per hectare)` <dbl>, `Beans (tonnes per
## # hectare)` <dbl>, `Peas (tonnes per hectare)` <dbl>, `Cassava (tonnes per
## # hectare)` <dbl>, `Barley (tonnes per hectare)` <dbl>, `Cocoa beans (tonnes
## # per hectare)` <dbl>, `Bananas (tonnes per hectare)` <dbl>
##
## $land_use_vs_yield_change_in_cereal_production
## # A tibble: 49,259 x 6
## Entity Code Year `Cereal yield in… `Change to land area… `Total populatio…
## <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 Afghan… AFG 1961 100 100 9169000
## 2 Afghan… AFG 1962 97 103 9351000
## 3 Afghan… AFG 1963 88 103 9543000
## 4 Afghan… AFG 1964 97 104 9745000
## 5 Afghan… AFG 1965 99 104 9956000
## 6 Afghan… AFG 1966 91 104 10175000
## 7 Afghan… AFG 1967 110 94 10400000
## 8 Afghan… AFG 1968 115 92 10637000
## 9 Afghan… AFG 1969 118 93 10894000
## 10 Afghan… AFG 1970 99 96 11174000
## # … with 49,249 more rows
Wrangle
Explore the data and process it into a nice format for plotting! Access each dataset by name by using a dollarsign after the tt
object and then the name of the data set.
corn <- read_csv("cornyields.csv") %>%
rename(`Corn (tonnes per hectare)` = `Corn yield (USDA (2017) & FAO (2017))`)
## Parsed with column specification:
## cols(
## Year = col_double(),
## `Corn yield (USDA (2017) & FAO (2017))` = col_double()
## )
crop <- tt$key_crop_yields
land <- tt$arable_land_pin
fertilizer <- tt$cereal_crop_yield_vs_fertilizer_application
tractor <- tt$cereal_yields_vs_tractor_inputs_in_agriculture %>%
mutate(Year = as.numeric(Year))
## Warning in mask$eval_all_mutate(dots[[i]]): NAs introduced by coercion
yield <- tt$land_use_vs_yield_change_in_cereal_production
crops <- crop %>%
left_join(land) %>%
left_join(corn) %>%
left_join(fertilizer) %>%
left_join(tractor) %>%
filter(Entity == "United States") %>%
rename(wheat = `Wheat (tonnes per hectare)`,
soybeans = `Soybeans (tonnes per hectare)`,
barley = `Barley (tonnes per hectare)`,
land = `Arable land needed to produce a fixed quantity of crops ((1.0 = 1961))`,
nitrogen = `Nitrogen fertilizer use (kilograms per hectare)`,
tractors = `Tractors per 100 sq km arable land`,
corn = `Corn (tonnes per hectare)`,
population = `Total population (Gapminder)`) %>%
select(Year, wheat, soybeans, barley, land, nitrogen, tractors,
population, corn) %>%
mutate(wheat = wheat / wheat[1],
soybeans = soybeans / soybeans[1],
barley = barley / barley[1],
land = land / land[1],
nitrogen = nitrogen / nitrogen[42],
tractors = tractors / tractors[1],
corn = corn / corn[1],
population = population / population[1])
## Joining, by = c("Entity", "Code", "Year")
## Joining, by = "Year"
## Joining, by = c("Entity", "Code", "Year")
## Joining, by = c("Entity", "Code", "Year")
Visualize
Using your processed dataset, create your unique visualization.
ggplot(crops, aes(x = Year)) +
geom_line(aes(y = wheat,
color = "Wheat")) +
geom_line(aes(y = corn,
color = "Corn")) +
geom_line(aes(y = soybeans,
color = "Soybeans")) +
geom_line(aes(y = barley,
color = "Barley")) +
geom_line(aes(y = land,
color = "Arable land needed to produce a fixed quantity of crops")) +
geom_line(aes(y = nitrogen,
color = "Nitrogen fertilizer use")) +
geom_line(aes(y = tractors,
color = "Tractors needed for a fixed quantity of land")) +
geom_line(aes(y = population,
color = "Total population")) +
labs(title="Farming over Time", subtitle = "United States, 1961-2018",
x = "Year", y = "% Change since 1961",
caption = "Twitter: @Corey_Maxedon | Source: Our World in Data") +
scale_color_manual(name="",values=
c("Corn"="dark green",
"Soybeans"="green",
"Wheat"="yellow",
"Barley"="brown",
"Total population"="cyan",
"Arable land needed to produce a fixed quantity of crops"="red",
"Tractors needed for a fixed quantity of land"="purple",
"Nitrogen fertilizer use"="blue")) +
theme(legend.position="bottom", legend.title = element_text(size = 5),
legend.text = element_text(size = 6)) +
guides(color=guide_legend(nrow=2, byrow=TRUE, override.aes = list(size = 0.5))) +
ylim(0, 3)
## Warning: Removed 4 row(s) containing missing values (geom_path).
## Warning: Removed 4 row(s) containing missing values (geom_path).
## Warning: Removed 42 row(s) containing missing values (geom_path).
## Warning: Removed 11 row(s) containing missing values (geom_path).
Save Image
Save your image for sharing. Be sure to use the #TidyTuesday
hashtag in your post on twitter!
# This will save your most recent plot
ggsave(
filename = "tt-9-1-20.jpg",
device = "jpg")
## Saving 7 x 5 in image
## Warning: Removed 4 row(s) containing missing values (geom_path).
## Warning: Removed 4 row(s) containing missing values (geom_path).
## Warning: Removed 42 row(s) containing missing values (geom_path).
## Warning: Removed 11 row(s) containing missing values (geom_path).