Saturday, 22 March 2014

Ideas about sampling

# set up a vector of zeros

sam20<-rep(0,100)
# loop round 100 times, select a sample of size 20 from the list rectange_areas, calculate
  the average of these 20 numbers, and save it in the ith item of vector sam20

for (i in 1:100) {
  sam20[i]<-mean(sample(rectangle_areas,size=20,replace=FALSE))
}

mean(sam20)
sqrt(var(sam20))   #this gives the standard deviation

The next 3 histograms illustrate how different the average of samples of 5,10, 20 are by drawing a histogram of the sets of 100 means



















I have also created a set of 3 boxplots.


The scale of the charts are the same - so can you see what is happening to the spread

> mean(sam5)
[1] 7.182
> sqrt(var(sam5))
[1] 2.207443
> mean(sam10)
[1] 7.527
> sqrt(var(sam10))
[1] 1.382127
> mean(sam20)
[1] 7.5245
> sqrt(var(sam20))
[1] 0.9615721
>






The averages are around 7.3
The standard deviations are getting smaller - why is this?