Tidy Tuesday Exercise2

To be filled..

Warning: package 'tidytuesdayR' was built under R version 4.2.2
Warning: package 'dplyr' was built under R version 4.2.2
Warning: package 'skimr' was built under R version 4.2.2
Warning: package 'memisc' was built under R version 4.2.2
Warning: package 'ggplot2' was built under R version 4.2.2
Warning: package 'reshape' was built under R version 4.2.2
Warning: package 'ggthemes' was built under R version 4.2.2
Warning: package 'DataExplorer' was built under R version 4.2.3
Warning: package 'table1' was built under R version 4.2.3

Download/access data

eggproduction  <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-04-11/egg-production.csv')
Rows: 220 Columns: 6
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (3): prod_type, prod_process, source
dbl  (2): n_hens, n_eggs
date (1): observed_month

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
cagefreepercentages <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-04-11/cage-free-percentages.csv')
Rows: 96 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (1): source
dbl  (2): percent_hens, percent_eggs
date (1): observed_month

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Explore data #take a first look a the dataset

view(cagefreepercentages)
view(eggproduction)

ggplot2::ggplot(ggplot2::aes(x = observed_month, y = percent_hens)) + ggplot2::geom_point(color = color, size = 1) + ggplot2::geom_line(color = color) + ggplot2::scale_x_date( breaks = scales::date_breaks(“1 year”), date_labels = “%Y” #missing data

plot_missing(cagefreepercentages)

cagefree percentages seems like has almost half missing values

plot_missing(eggproduction)

#looks like the eggs production looks like a complete dataset with no missing values.

view(eggproduction)

#there seems to be an imbalance where hatching eggs only has “all” observation, table eggs seem to have all three different process types. so removing the hatching eggs obsevation for a better analysis.

#now lets understand what the missing values are about in the cage free percentages dataset

view(cagefreepercentages)

#It looks like those from Egg-Markets-Overview-2019-10-19.pdf dont have percent egg computed, since we dont have the denominator we cannt compute so best to remove NA’s

cagefreepercentages <- cagefreepercentages[complete.cases(cagefreepercentages), ]
view(cagefreepercentages)

#Eggfee percentages variabls include #data/percentegg/percenthen/source with n=54 #eggproduction variables include #producttype/productprocess/eggs/hens/sources with n=220 #by combining the two datasets we should have

#lets plot some graphs and see if there are any relationships

ggplot(cagefreepercentages,aes(observed_month))+ #basic graph object
  geom_line(aes(y=percent_hens), colour="red") + #layor 1
  geom_line(aes(y=percent_eggs), colour ="green")+ #layor 2
ggtitle("Percentages of Hens(RED) and eggs(Green), 2017-2021")+
 xlab("Year")+ #add x axis label
  ylab("Percentages")   #add y axis label

#seems like there is a relationship here with both hens and eggs percentages with an upward trend over time, note: I was not able to add the legend in the graph function

#Now lets look at the eggproductions dataset

mosaicplot(prod_type~prod_process,data=eggproduction,col=c("Blue","Red","Pink")) 

#This figure illestrates the two different product types (hatching eggs and table eggs) and the three different process types caged-organic, caged-non organic and all. Hatching eggs contains only “all” value whereas table eggs has equal amount of caged organic, caged non-organic and all(combined). #still trying to find inspiration from this data. #Perhaps we can go ahead and merge these two datasets for a more complete picture.

#Merge two datasets based on their common variable (inner join) observed month

merged<- merge(x = cagefreepercentages, y = eggproduction, by = c("observed_month"))

#check your merged data

view(merged)

#lets take a look

Eggdata <- (table1(~ n_hens + n_eggs  + factor(prod_process) +factor(prod_type)+percent_eggs + percent_hens , data=merged))

Eggdata <- t1kable(Eggdata)
Eggdata
  Overall
(N=216)
n_hens
Mean (SD) 111000000 (124000000)
Median [Min, Max] 59900000 [13500000, 341000000]
n_eggs
Mean (SD) 2610000000 (3090000000)
Median [Min, Max] 1150000000 [298000000, 8600000000]
factor(prod_process)
all 108 (50.0%)
cage-free (non-organic) 54 (25.0%)
cage-free (organic) 54 (25.0%)
factor(prod_type)
hatching eggs 54 (25.0%)
table eggs 162 (75.0%)
percent_eggs
Mean (SD) 17.1 (4.26)
Median [Min, Max] 16.2 [9.56, 24.5]
percent_hens
Mean (SD) 18.0 (4.33)
Median [Min, Max] 17.2 [10.1, 25.2]
Eggdata <- (table1(~ factor(prod_process) + n_hens + n_eggs +percent_eggs + percent_hens  | prod_type, data=merged))

Eggdata <- t1kable(Eggdata)
Eggdata
  hatching eggs table eggs Overall
(N=54) (N=162) (N=216)
factor(prod_process)
all 54 (100%) 54 (33.3%) 108 (50.0%)
cage-free (non-organic) 0 (0%) 54 (33.3%) 54 (25.0%)
cage-free (organic) 0 (0%) 54 (33.3%) 54 (25.0%)
n_hens
Mean (SD) 61600000 (2310000) 127000000 (140000000) 111000000 (124000000)
Median [Min, Max] 61900000 [56900000, 66100000] 41500000 [13500000, 341000000] 59900000 [13500000, 341000000]
n_eggs
Mean (SD) 1170000000 (53800000) 3090000000 (3440000000) 2610000000 (3090000000)
Median [Min, Max] 1170000000 [1010000000, 1270000000] 937000000 [298000000, 8600000000] 1150000000 [298000000, 8600000000]
percent_eggs
Mean (SD) 17.1 (4.29) 17.1 (4.27) 17.1 (4.26)
Median [Min, Max] 16.2 [9.56, 24.5] 16.2 [9.56, 24.5] 16.2 [9.56, 24.5]
percent_hens
Mean (SD) 18.0 (4.36) 18.0 (4.34) 18.0 (4.33)
Median [Min, Max] 17.2 [10.1, 25.2] 17.2 [10.1, 25.2] 17.2 [10.1, 25.2]

The above table shows all prod process types are available for hatching eggs where as cage free organice and non organic are available only for table eggs.

Next steps , removing “all” observations from hatching eggs so that we are left with table eggs only with three prod process types. #here I got stuck with an error about deleting atomic verctors..