Generating sample data in R

When testing random functions or predictions in R it is usually a good thing to have some sample or random data. A lot of libraries and base libraries in R are equipped with good sample data, but let me show you a nice way of generating  a data frame of random data.

We will generate random data using rnorm function (random generation for the normal distribution with mean equal to defined mean). We will apply a linear function to random values using sapply function (applying a function to list or vector or array of values). Similar functions are lapply or vapply.

x <- rnorm(1000,10,5)
y <- sapply(x, function(x) rnorm(1,2*x+6,10))
dat_set <- data.frame(x,y)

After this, we can visulize the dataset dat_set to see the dispersion.

ggplot()+geom_point(data=dat_set, aes(x=x, y=y),size=1, color='brown')

Visualization looks like:

2016-01-04 21_31_23-RStudio

One can tell that initial data distribution follows the linear function of y=2x+6 with applied (using sapply) y-coordinated values.


One thought on “Generating sample data in R

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s