Saturday, 22 March 2014

Ideas about sampling

# set up a vector of zeros

sam20<-rep(0,100)
# loop round 100 times, select a sample of size 20 from the list rectange_areas, calculate
  the average of these 20 numbers, and save it in the ith item of vector sam20

for (i in 1:100) {
  sam20[i]<-mean(sample(rectangle_areas,size=20,replace=FALSE))
}

mean(sam20)
sqrt(var(sam20))   #this gives the standard deviation

The next 3 histograms illustrate how different the average of samples of 5,10, 20 are by drawing a histogram of the sets of 100 means



















I have also created a set of 3 boxplots.


The scale of the charts are the same - so can you see what is happening to the spread

> mean(sam5)
[1] 7.182
> sqrt(var(sam5))
[1] 2.207443
> mean(sam10)
[1] 7.527
> sqrt(var(sam10))
[1] 1.382127
> mean(sam20)
[1] 7.5245
> sqrt(var(sam20))
[1] 0.9615721
>






The averages are around 7.3
The standard deviations are getting smaller - why is this?


Friday, 28 February 2014

Doing loops and testing them


To simulate rolling a dice

outcome6<-c(1,2,3,4,5,6)   #this is because there are 6 sides to the dice

sim1000<-sample(outcome6,size=1000, replace=TRUE)


Relative histograms

This website looks useful
http://www.statmethods.net/graphs/bar.html

Here is a similar example of what we were trying to do

Simple Bar Plot

# Simple Bar Plot
counts <- table(mtcars$gear)
barplot(counts, main="Car Distribution",
   xlab="Number of Gears")

simple barplot

The following will work for your simulated data

# Relative histogram of 100 rolls
count<-table(sim100)

 barplot(count/100,main="simulated rolling of a dice - 100 times",
    xlab="Number on dice", ylab = "Proportion of rolls")



Sequences

Rle – counts sequences of the same number, gives the count of each sequence, with the number in the sequence

streak[[2]] - this count of the sequence corresponding to the 2nd item (in this case a "T")

max(streak) - the highest sequence regardless of "h" or "T"


Loops

(Write a for loop that stores in an object the longest run of consecutive heads in 1000 sets of 200 coin tosses
coinmax<-rep(0,1000)
for (i in 1:1000) {
  coin200 <- sample(outcomes, size = 200, replace = TRUE) 
  streak <- tapply(rle(coin200)$lengths, rle(coin200)$values, max)

coinmax[i]<-streak[[1]]

 }

max(coinmax) - This will print out the highest number of sequences out of the 1000 samples

Wednesday, 29 January 2014

How to get the scale right in a chart


Exercise 2 We can use help to find how to adjust the plot to meet our requirements. For example, you can add a title to the x-axis using the additional argument xlab="year" Try this, and also add an appropriate y-axis label and plot title. Provide your plot for the solution to this exercise (you can use the ‘Export’ button in the graphics window). [4 Marks]
Exercise 3 Is there an apparent trend in the number of girls baptized over the years? [2 Marks]


#exercise 2
plot(x = arbuthnot$year, y = arbuthnot$girls, main="Baptism records for girls born in London 1629 to 1710",
     xlab="year", ylab = "number of newborn girls",type="l")





Make the axis start at zero is good practice







plot(x = arbuthnot$year, y = arbuthnot$girls, main="Baptism records for girls born in London 1629 to 1710",
     xlab="year", ylab = "number of newborn girls",type="l",ylim=c(0,10000))





Exercise 4 Now, make a plot of the proportion of boys over time. What do you see? Tip: As well as by using the and keys you can access your command history using the history tab in the upper right panel. This might save you a lot of typing. [3 Marks]
#exercise 4
boygirlrat<-arbuthnot$boys/(arbuthnot$boys + arbuthnot$girls)


plot(arbuthnot$year, boygirlrat,type="l")
This chart looks exciting - the ratio of boys to girls is all over the place

 But maybe not - it is just about 0.5 with bits of wiggling


plot(arbuthnot$year, boygirlrat,type="l",ylim=c(0,1))

Sunday, 26 January 2014

Starting off

Examples of how to use R with comments and help

One good thing about R Studio is that you can create a list of commands which you can keep.  This means when you've worked out how to do something you can save the command.

To do this go to File on the top line of RStudio > Select New File > RScript

Your screen will look like this now and you can see the two commands in a box on the top LHS .  This is the r script





 Once you have the Rscript, you can run the commands by selecting a few lines and then going to Code > Run line(s)