Home Page

Simulating starfish dataset wegihts and Arm Lengths

2. Generating random normal distributions for variables in the dataset

library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.1.2
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5     v purrr   0.3.4
## v tibble  3.1.5     v dplyr   1.0.7
## v tidyr   1.1.4     v stringr 1.4.0
## v readr   2.0.2     v forcats 0.5.1
## Warning: package 'ggplot2' was built under R version 4.1.2
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
start_weight <- rnorm(n=30, mean=33.5, sd=11)

delta_weight_1 <- rnorm(n=15, mean=2.9, sd=1.8)
delta_weight_2 <- rnorm(n=15, mean=2.9, sd=1.8)

delta_weight <- c(delta_weight_1, delta_weight_2)

treatment <- c(rep('A', 15),rep('B', 15))

3. Formatting into a data frame
sample sizes, means, and standard deviations were drawn directly from the data set I used in HW6 pertaining to a dietary experimental treatment to see how it affects sea star biomass.

S <- data.frame(treatment, start_weight, delta_weight)
S <- mutate(.data = S, end_weight = start_weight + delta_weight)
S
##    treatment start_weight delta_weight end_weight
## 1          A    43.822020    0.4835810  44.305601
## 2          A    19.458313    2.3399052  21.798218
## 3          A    26.182849    4.6069136  30.789762
## 4          A    47.302912    1.2927389  48.595651
## 5          A    35.356280    1.6405651  36.996845
## 6          A    37.461265    2.8623616  40.323627
## 7          A     7.697095    1.2452992   8.942394
## 8          A    40.765986    3.8700006  44.635987
## 9          A    28.049277    1.6011701  29.650447
## 10         A    36.183842    1.2791809  37.463023
## 11         A    46.212124    4.1159798  50.328104
## 12         A    40.214683   -0.6190749  39.595608
## 13         A    37.239988    1.6445456  38.884533
## 14         A    48.280210    5.1442362  53.424447
## 15         A    34.378930    3.5900107  37.968940
## 16         B    37.295184    5.0272279  42.322412
## 17         B    41.626548    1.7553283  43.381876
## 18         B    21.064475    1.4363871  22.500862
## 19         B    34.372670    3.5388323  37.911503
## 20         B    24.537295    2.3189837  26.856279
## 21         B    39.288685    1.4995790  40.788264
## 22         B    48.422636    3.2876683  51.710304
## 23         B    26.778920    3.4398655  30.218785
## 24         B    49.383181    6.7228718  56.106053
## 25         B    10.071664    3.7257295  13.797394
## 26         B    33.941488    4.5141678  38.455656
## 27         B    36.502637    2.3957632  38.898400
## 28         B    46.953216    5.2667922  52.220008
## 29         B    44.806838    5.1836046  49.990443
## 30         B    53.561719    5.1833850  58.745104

4) T.test between starfish arm_length and starting_weight

t.test(data = S, end_weight ~ treatment)
## 
##  Welch Two Sample t-test
## 
## data:  end_weight by treatment
## t = -0.6062, df = 27.652, p-value = 0.5493
## alternative hypothesis: true difference in means between group A and group B is not equal to 0
## 95 percent confidence interval:
##  -11.741229   6.381208
## sample estimates:
## mean in group A mean in group B 
##        37.58021        40.26022

Graphical output of linear model of the data

G <- ggplot(S, aes(x=treatment, y=end_weight)) + 
  geom_boxplot()
print(G)

Analysis

Multiple runs with the same parameters generate a decent degree of variability in the model. the random normal distributions change each treatment every time new numbers are generated, sometimes ending in a statistical difference between the treamtne groups and sometimes not, depending on their distributions. this is with each group drawing from seperate normal distributions but with the same parameters.

6) adjusting means of the groups. Given the sample sizes you have chosen, how small can the differences between the groups be (the “effect size”) for you to still detect a significant pattern (p < 0.05)?

target parameter = delta_weight. After running this multiple times, the means do not need to be changed for there to be a significant difference p < 0.05. because they are drawn from different distributions (although with the same parameters) there is enough deviation with the parameters chosen that statistical differences can be seen sometimes, though rarely. Adjusting the means by which weight is changed in each treatment from 2.9 to 3.5~4.0 for treatment B results in significant differences between the two treatment groups about 50% of the time. raising the mean to around 4.5 for treamtnet B returns significant different between groups most of the time and raising the mean to 5 bring it almost always. this leaves about a 2 pound difference in means results in signifficant differences in the end weight of the starfish in this treatment.

start_weight <- rnorm(n=30, mean=33.5, sd=11)

delta_weight_1 <- rnorm(n=15, mean=2.9, sd=1.5)
delta_weight_2 <- rnorm(n=15, mean=5, sd=1.5)

delta_weight <- c(delta_weight_1, delta_weight_2)

treatment <- c(rep('A', 15),rep('B', 15))

S <- data.frame(treatment, start_weight, delta_weight)
S <- mutate(.data = S, end_weight = start_weight + delta_weight)

t.test(data = S, delta_weight ~ treatment)
## 
##  Welch Two Sample t-test
## 
## data:  delta_weight by treatment
## t = -2.6057, df = 27.999, p-value = 0.01452
## alternative hypothesis: true difference in means between group A and group B is not equal to 0
## 95 percent confidence interval:
##  -3.0315992 -0.3630155
## sample estimates:
## mean in group A mean in group B 
##        3.389628        5.086935
G <- ggplot(S, aes(x=treatment, y=end_weight)) + 
  geom_boxplot()
print(G)

7. Changing the sample sizes for each distribution.

For this secton I will keep the mean1=2.9 and mean2=5 for significant generated for N=15 (if they had the same mean it would not be significant no matter the N).

start_weight <- rnorm(n=20, mean=33.5, sd=11)

delta_weight_1 <- rnorm(n=10, mean=2.9, sd=1.5)
delta_weight_2 <- rnorm(n=10, mean=5, sd=1.5)

delta_weight <- c(delta_weight_1, delta_weight_2)

treatment <- c(rep('A', 10),rep('B', 10))

S <- data.frame(treatment, start_weight, delta_weight)
S <- mutate(.data = S, end_weight = start_weight + delta_weight)

t.test(data = S, delta_weight ~ treatment)
## 
##  Welch Two Sample t-test
## 
## data:  delta_weight by treatment
## t = -2.0727, df = 17.307, p-value = 0.05344
## alternative hypothesis: true difference in means between group A and group B is not equal to 0
## 95 percent confidence interval:
##  -2.65579840  0.02177254
## sample estimates:
## mean in group A mean in group B 
##        3.248129        4.565142
G <- ggplot(S, aes(x=treatment, y=end_weight)) + 
  geom_boxplot()
print(G)

Testing it with multiple runs, a sample size of 10 seems as low as we can go to get significance most of the time. once samples drop below 10 with the current parameters used, it returns mostly P>0.5.