Tidy Tuesday Summary

This week I was pretty excited to attack this data set. I am from central Illinois and agriculture is a major part of my life. The Tidy Tuesday data is about crop yields over the years. The data was retrieved from Our World in Data. For ease, I decided to use the TidyTuesdayR package template for my analysis. Upon loading, I discovered 5 different data sets with all kinds of information, but each table had a country and a year. I decided to combine all the tables with a left join. I thought it would be interesting to see how farming has changed over the years in the United States so I standardized the values into a percentage based on the respective elements start date, most commonly, 1961. There is quite a bit of information contained within the plot. We can see that the amount of land needed to produce the same amount of crops has significantly decreased over the years. Also, the amount of tractors needed for the same size of land has went relatively unchanged. The most interesting aspect is how crop yields have increased proportionately to population growth over the years. It seems population is a major driver for the need of increased crop production. As I said, this connection is proportionate. These crops also feed animals. Animal production has surely seen the same trend as population increases. Therefore, it makes sense that crop yield would need to outpace population growth on average. Further, land needed for a fixed amount of crops has seen a decrease with population growth. People need a place to live. More people need more places to live and that takes up space. Population growth also seems to be driving innovation and engineering to make the most of the land we have to use. This analysis has turned out to be incredibly fascinating. It has been a pleasure diving into this #TidyTuesday.

Analysis Section

Load the weekly Data

Dowload the weekly data and make available in the tt object.

# tt <- tt_load("2020-09-01")
tt <- read_rds("tt-9-1-20.rds")

Readme

Take a look at the readme for the weekly data to get insight on the dataset. This includes a data dictionary, source, and a link to an article on the data.

tt

Glimpse Data

Take an initial look at the format of the data available.

tt %>%
  map(glimpse)
## Rows: 11,280
## Columns: 4
## $ Entity                                                                   <chr> …
## $ Code                                                                     <chr> …
## $ Year                                                                     <dbl> …
## $ `Arable land needed to produce a fixed quantity of crops ((1.0 = 1961))` <dbl> …
## Rows: 11,965
## Columns: 5
## $ Entity                                            <chr> "Afghanistan", "Afg…
## $ Code                                              <chr> "AFG", "AFG", "AFG"…
## $ Year                                              <dbl> 1961, 1962, 1963, 1…
## $ `Cereal yield (tonnes per hectare)`               <dbl> 1.1151, 1.0790, 0.9…
## $ `Nitrogen fertilizer use (kilograms per hectare)` <dbl> NA, NA, NA, NA, NA,…
## Rows: 49,403
## Columns: 6
## $ Entity                                                  <chr> "Afghanistan"…
## $ Code                                                    <chr> "AFG", "AFG",…
## $ Year                                                    <chr> "1961", "1962…
## $ `Tractors per 100 sq km arable land`                    <dbl> 0.1568627, 0.…
## $ `Cereal yield (kilograms per hectare) (kg per hectare)` <dbl> 1115.1, 1079.…
## $ `Total population (Gapminder)`                          <dbl> 9169000, 9351…
## Rows: 13,075
## Columns: 14
## $ Entity                             <chr> "Afghanistan", "Afghanistan", "Afg…
## $ Code                               <chr> "AFG", "AFG", "AFG", "AFG", "AFG",…
## $ Year                               <dbl> 1961, 1962, 1963, 1964, 1965, 1966…
## $ `Wheat (tonnes per hectare)`       <dbl> 1.0220, 0.9735, 0.8317, 0.9510, 0.…
## $ `Rice (tonnes per hectare)`        <dbl> 1.5190, 1.5190, 1.5190, 1.7273, 1.…
## $ `Maize (tonnes per hectare)`       <dbl> 1.4000, 1.4000, 1.4260, 1.4257, 1.…
## $ `Soybeans (tonnes per hectare)`    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ `Potatoes (tonnes per hectare)`    <dbl> 8.6667, 7.6667, 8.1333, 8.6000, 8.…
## $ `Beans (tonnes per hectare)`       <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ `Peas (tonnes per hectare)`        <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ `Cassava (tonnes per hectare)`     <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ `Barley (tonnes per hectare)`      <dbl> 1.0800, 1.0800, 1.0800, 1.0857, 1.…
## $ `Cocoa beans (tonnes per hectare)` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ `Bananas (tonnes per hectare)`     <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA…
## Rows: 49,259
## Columns: 6
## $ Entity                                                      <chr> "Afghanis…
## $ Code                                                        <chr> "AFG", "A…
## $ Year                                                        <chr> "1961", "…
## $ `Cereal yield index`                                        <dbl> 100, 97, …
## $ `Change to land area used for cereal production since 1961` <dbl> 100, 103,…
## $ `Total population (Gapminder)`                              <dbl> 9169000, …
## $arable_land_pin
## # A tibble: 11,280 x 4
##    Entity     Code   Year `Arable land needed to produce a fixed quantity of cr…
##    <chr>      <chr> <dbl>                                                  <dbl>
##  1 Afghanist… AFG    1961                                                  1
##  2 Afghanist… AFG    1962                                                  0.984
##  3 Afghanist… AFG    1963                                                  1.01
##  4 Afghanist… AFG    1964                                                  0.939
##  5 Afghanist… AFG    1965                                                  0.907
##  6 Afghanist… AFG    1966                                                  0.927
##  7 Afghanist… AFG    1967                                                  0.832
##  8 Afghanist… AFG    1968                                                  0.812
##  9 Afghanist… AFG    1969                                                  0.792
## 10 Afghanist… AFG    1970                                                  0.876
## # … with 11,270 more rows
##
## $cereal_crop_yield_vs_fertilizer_application
## # A tibble: 11,965 x 5
##    Entity    Code   Year `Cereal yield (tonnes p… `Nitrogen fertilizer use (kil…
##    <chr>     <chr> <dbl>                    <dbl>                          <dbl>
##  1 Afghanis… AFG    1961                    1.12                              NA
##  2 Afghanis… AFG    1962                    1.08                              NA
##  3 Afghanis… AFG    1963                    0.986                             NA
##  4 Afghanis… AFG    1964                    1.08                              NA
##  5 Afghanis… AFG    1965                    1.10                              NA
##  6 Afghanis… AFG    1966                    1.01                              NA
##  7 Afghanis… AFG    1967                    1.22                              NA
##  8 Afghanis… AFG    1968                    1.29                              NA
##  9 Afghanis… AFG    1969                    1.31                              NA
## 10 Afghanis… AFG    1970                    1.11                              NA
## # … with 11,955 more rows
##
## $cereal_yields_vs_tractor_inputs_in_agriculture
## # A tibble: 49,403 x 6
##    Entity  Code  Year  `Tractors per 100… `Cereal yield (kilo… `Total populatio…
##    <chr>   <chr> <chr>              <dbl>                <dbl>             <dbl>
##  1 Afghan… AFG   1961               0.157                1115.           9169000
##  2 Afghan… AFG   1962               0.195                1079            9351000
##  3 Afghan… AFG   1963               0.258                 986.           9543000
##  4 Afghan… AFG   1964               0.256                1083.           9745000
##  5 Afghan… AFG   1965               0.385                1099.           9956000
##  6 Afghan… AFG   1966               0.511                1012.          10175000
##  7 Afghan… AFG   1967               0.637                1224.          10400000
##  8 Afghan… AFG   1968               0.637                1288.          10637000
##  9 Afghan… AFG   1969               0.700                1310.          10894000
## 10 Afghan… AFG   1970               0.699                1105.          11174000
## # … with 49,393 more rows
##
## $key_crop_yields
## # A tibble: 13,075 x 14
##    Entity Code   Year `Wheat (tonnes … `Rice (tonnes p… `Maize (tonnes …
##    <chr>  <chr> <dbl>            <dbl>            <dbl>            <dbl>
##  1 Afgha… AFG    1961            1.02              1.52             1.4
##  2 Afgha… AFG    1962            0.974             1.52             1.4
##  3 Afgha… AFG    1963            0.832             1.52             1.43
##  4 Afgha… AFG    1964            0.951             1.73             1.43
##  5 Afgha… AFG    1965            0.972             1.73             1.44
##  6 Afgha… AFG    1966            0.867             1.52             1.44
##  7 Afgha… AFG    1967            1.12              1.92             1.41
##  8 Afgha… AFG    1968            1.16              1.95             1.71
##  9 Afgha… AFG    1969            1.19              1.98             1.72
## 10 Afgha… AFG    1970            0.956             1.81             1.48
## # … with 13,065 more rows, and 8 more variables: `Soybeans (tonnes per
## #   hectare)` <dbl>, `Potatoes (tonnes per hectare)` <dbl>, `Beans (tonnes per
## #   hectare)` <dbl>, `Peas (tonnes per hectare)` <dbl>, `Cassava (tonnes per
## #   hectare)` <dbl>, `Barley (tonnes per hectare)` <dbl>, `Cocoa beans (tonnes
## #   per hectare)` <dbl>, `Bananas (tonnes per hectare)` <dbl>
##
## $land_use_vs_yield_change_in_cereal_production
## # A tibble: 49,259 x 6
##    Entity  Code  Year  `Cereal yield in… `Change to land area… `Total populatio…
##    <chr>   <chr> <chr>             <dbl>                 <dbl>             <dbl>
##  1 Afghan… AFG   1961                100                   100           9169000
##  2 Afghan… AFG   1962                 97                   103           9351000
##  3 Afghan… AFG   1963                 88                   103           9543000
##  4 Afghan… AFG   1964                 97                   104           9745000
##  5 Afghan… AFG   1965                 99                   104           9956000
##  6 Afghan… AFG   1966                 91                   104          10175000
##  7 Afghan… AFG   1967                110                    94          10400000
##  8 Afghan… AFG   1968                115                    92          10637000
##  9 Afghan… AFG   1969                118                    93          10894000
## 10 Afghan… AFG   1970                 99                    96          11174000
## # … with 49,249 more rows

Wrangle

Explore the data and process it into a nice format for plotting! Access each dataset by name by using a dollarsign after the tt object and then the name of the data set.

corn <- read_csv("cornyields.csv") %>%
            rename(`Corn (tonnes per hectare)` = `Corn yield (USDA (2017) & FAO (2017))`)
## Parsed with column specification:
## cols(
##   Year = col_double(),
##   `Corn yield (USDA (2017) & FAO (2017))` = col_double()
## )
crop <- tt$key_crop_yields
land <- tt$arable_land_pin
fertilizer <- tt$cereal_crop_yield_vs_fertilizer_application
tractor <- tt$cereal_yields_vs_tractor_inputs_in_agriculture %>%
                mutate(Year = as.numeric(Year))
## Warning in mask$eval_all_mutate(dots[[i]]): NAs introduced by coercion
yield <- tt$land_use_vs_yield_change_in_cereal_production

crops <- crop %>%
    left_join(land) %>%
    left_join(corn) %>%
    left_join(fertilizer) %>%
    left_join(tractor) %>%
    filter(Entity == "United States") %>%
    rename(wheat = `Wheat (tonnes per hectare)`,
           soybeans = `Soybeans (tonnes per hectare)`,
           barley = `Barley (tonnes per hectare)`,
           land = `Arable land needed to produce a fixed quantity of crops ((1.0 = 1961))`,
           nitrogen = `Nitrogen fertilizer use (kilograms per hectare)`,
           tractors = `Tractors per 100 sq km arable land`,
           corn = `Corn (tonnes per hectare)`,
           population = `Total population (Gapminder)`) %>%
    select(Year, wheat, soybeans, barley, land, nitrogen, tractors,
           population, corn) %>%
    mutate(wheat = wheat / wheat[1],
           soybeans = soybeans / soybeans[1],
           barley = barley / barley[1],
           land = land / land[1],
           nitrogen = nitrogen / nitrogen[42],
           tractors = tractors / tractors[1],
           corn = corn / corn[1],
           population = population / population[1])
## Joining, by = c("Entity", "Code", "Year")
## Joining, by = "Year"
## Joining, by = c("Entity", "Code", "Year")
## Joining, by = c("Entity", "Code", "Year")

Visualize

Using your processed dataset, create your unique visualization.

ggplot(crops, aes(x = Year)) +
    geom_line(aes(y = wheat,
                  color = "Wheat")) +
    geom_line(aes(y = corn,
                  color = "Corn")) +
    geom_line(aes(y = soybeans,
                  color = "Soybeans")) +
    geom_line(aes(y = barley,
                  color = "Barley")) +
    geom_line(aes(y = land,
              color = "Arable land needed to produce a fixed quantity of crops")) +
    geom_line(aes(y = nitrogen,
              color = "Nitrogen fertilizer use")) +
    geom_line(aes(y = tractors,
                  color = "Tractors needed for a fixed quantity of land")) +
    geom_line(aes(y = population,
              color = "Total population")) +
    labs(title="Farming over Time", subtitle = "United States, 1961-2018",
         x = "Year", y = "% Change since 1961",
         caption = "Twitter: @Corey_Maxedon | Source: Our World in Data") +
    scale_color_manual(name="",values=
       c("Corn"="dark green",
         "Soybeans"="green",
         "Wheat"="yellow",
         "Barley"="brown",
         "Total population"="cyan",
         "Arable land needed to produce a fixed quantity of crops"="red",
         "Tractors needed for a fixed quantity of land"="purple",
         "Nitrogen fertilizer use"="blue")) +
    theme(legend.position="bottom", legend.title = element_text(size = 5),
           legend.text = element_text(size = 6)) +
    guides(color=guide_legend(nrow=2, byrow=TRUE, override.aes = list(size = 0.5))) +
    ylim(0, 3)
## Warning: Removed 4 row(s) containing missing values (geom_path).

## Warning: Removed 4 row(s) containing missing values (geom_path).
## Warning: Removed 42 row(s) containing missing values (geom_path).
## Warning: Removed 11 row(s) containing missing values (geom_path).

Save Image

Save your image for sharing. Be sure to use the #TidyTuesday hashtag in your post on twitter!

# This will save your most recent plot
ggsave(
  filename = "tt-9-1-20.jpg",
  device = "jpg")
## Saving 7 x 5 in image
## Warning: Removed 4 row(s) containing missing values (geom_path).

## Warning: Removed 4 row(s) containing missing values (geom_path).
## Warning: Removed 42 row(s) containing missing values (geom_path).
## Warning: Removed 11 row(s) containing missing values (geom_path).

Last Updated: 09/09/2020

Thanks for Reading!

I hope you enjoyed. Have a great day.