# Probability Distributions in R

0
255 In this fourteenth article in the R series, we shall explore the common probability distribution functions available in R.

We will use version 4.2.1 installed on Parabola GNU/Linux-libre (x86-64) for the code snippets.

```\$ R --version

R version 4.2.1 (2022-06-23) -- “Funny-Looking Kid”
Copyright (C) 2022 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under the terms of the
GNU General Public License versions 2 or 3.

Consider the mtcars data set in the lattice library:

```> head(mtcars)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1```

The summary() function can be used to produce the basic statistics of the data, as shown below:

```> summary(mtcars)
mpg cyl disp hp
Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0
1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5
Median :19.20 Median :6.000 Median :196.3 Median :123.0
Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7
3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0
Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0```

The Tukey’s five number summary (minimum, lower-hinge, median, upper-hinge, and maximum) can be obtained using the fivenum() function, as follows:

```> mtcars\$disp
 160.0 160.0 108.0 258.0 360.0 225.0 360.0 146.7 140.8 167.6 167.6 275.8
 275.8 275.8 472.0 460.0 440.0 78.7 75.7 71.1 120.1 318.0 304.0 350.0
 400.0 79.0 120.3 95.1 351.0 145.0 301.0 121.0

> fivenum(mtcars\$disp)
 71.10 120.65 196.30 334.00 472.00```

A histogram of the mtcars displacement data is shown in Figure 1.

The stem() function can generate a stem-and-leaf plot for input data. For example, we can obtain a plot for the mtcars cylinder, as demonstrated below:

```> mtcars\$cyl
 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4

> stem(mtcars\$cyl)

The decimal point is at the |

4 | 00000000000
4 |
5 |
5 |
6 | 0000000
6 |
7 |
7 |
8 | 00000000000000```

## Uniform

The Uniform distribution has the following density function:

`f(x) = 1/b-a`

…where a <= x <= b. The runif() function generates random deviates, while the dunif() function gives the density, as illustrated below:

```> d <- runif(10, min=0, max=1)
> d
 0.48730291 0.26676446 0.15025110 0.77975962 0.42828890 0.05255129
 0.54070173 0.54715839 0.42703064 0.31628728

> dunif(d)
 1 1 1 1 1 1 1 1 1 1```

These functions accept the following common arguments:

 Arguments Description x, q A vector of quantiles p A vector of probabilities n Number of observations log, log.p A TRUE logical value indicates probabilities as log(p) lower.tail A TRUE logical value that probabilities are P[X <= x]]

The specific set of arguments for the uniform distribution is given below:

 Arguments Description min, max Finite lower and upper distribution limits

## Binomial

The density of the binomial distribution with probability ‘p’ and size ‘n’ is given by:

`p(x) = select(n,x) p^x (1-p)^(n-x)`

…where ‘x’ varies from 0 to n. The p(x) is calculated using the Loader’s algorithm. The pbinom() function gives the distribution function while the rbinom() function provides random values, as shown below:

```> rbinom(5, 5, 1)
 5 5 5 5 5

> rbinom(5, 5, 0.5)
 3 2 2 4 2

> rbinom(5, 5, 0.25)
 0 3 0 2 2```

An example of pbinom() function for five successes in 10 trials, each having probability 0.5, is as follows:

```> pbinom(5, 10, 0.5)
 0.6230469```

In addition to the common arguments listed previously, the binomial distribution function accepts the following arguments:

 Arguments Description size Number of trials prob Probability of success in each trial

## Exponential

The exponential function for a given rate ‘lambda’ is given by:

`f(x) = lambda {e}^{- lambda x}`

…where x >= 0. The rexp() function provides random deviations, the qexp() function gives the quantile function, and the pexp() function provides the distribution function. A couple of examples are given below:

```> rexp(5, rate = 1)
 0.44090566 0.18508747 1.18596321 0.86235515 0.05020871

> pexp(d, rate = 1)
 0.38571907 0.23414656 0.13950812 0.54148378 0.34837687 0.05119434
 0.41766053 0.42140839 0.34755644 0.27114996```

The specific ‘rate’ argument is applicable to the exponential function.

 Arguments Description rate A vector of rates

## Geometric

The geometric distribution density with probability ‘p’ is defined as follows:

p(x) = p (1-p)^x

…where x = 0, 1, 2, etc, and ‘p’ is between 0 and 1. The rgeom() function for 10 observations and probability 0.5 is given below:

```> d <- rgeom(10, 0.5)
> d
 0 1 1 0 0 0 0 0 1 1```

The pgeom() and qgeom() functions are based on closed-form formulae and provide the distribution and quantile functions, respectively.

```> pgeom(d, 0.5)
 0.50 0.75 0.75 0.50 0.50 0.50 0.50 0.50 0.75 0.75

> qgeom((1:10)/15, prob = 0.2)
 0 0 0 1 1 2 2 3 4 4```

The argument specific to the geometric function is given below:

 Arguments Description prob Probability of success in each trial

## Normal

The normal distribution for mean ‘mu’ and standard deviation ‘sigma’ is given by:

`f(x) = 1/(sqrt(2 pi) sigma) e^-((x - mu)^2/(2 sigma^2))`

A sample of ten random values for mean 0 with standard deviation 1 can be generated using the rnorm() function, as follows:

```> d <- rnorm(10, mean=0, sd=1)
> d
 0.66512403 -0.43102217 0.50305386 1.55028087 1.20598557 0.33704698
 0.09200192 0.67090448 -0.24424568 -0.78798191```

Example uses of pnorm() and dnorm() functions are given below:

```> pnorm (d, mean=0, sd=1)to
 0.7470144 0.3332261 0.6925368 0.9394629 0.8860885 0.6319593 0.5366517
 0.7488593 0.4035203 0.2153536

> dnorm(d, mean=0, sd=1)
 0.3197763 0.3635536 0.3515265 0.1199568 0.1927928 0.3769138 0.3972575
 0.3185439 0.3872184 0.2924691```

The normal distribution specific arguments are listed for your reference below.

 Arguments Description mean A vector of means sd A vector of standard deviations

## Poisson

The Poisson distribution density is defined as follows:

`p(x) = lambda^x exp(-lambda)/x!`

…where x = 0, 1, 2, … The mean and variance is lambda.

```> d <- rpois(10, 0.9)
> d
 2 2 1 3 1 2 1 1 0 0```

The dpois() function gives the log density while the ppois() function gives the log distribution function. A couple of examples are given below:

```> dpois(d, 0.9)
 0.16466071 0.16466071 0.36591269 0.04939821 0.36591269 0.16466071
 0.36591269 0.36591269 0.40656966 0.40656966

> ppois(d, 0.9)
 0.9371431 0.9371431 0.7724824 0.9865413 0.7724824 0.9371431 0.7724824
 0.7724824 0.4065697 0.4065697```

The Poisson distribution function specific parameter is lambda, as shown below.

 Arguments Description lambda A vector of non-negative means

The Poisson CDF can be plotted in a graph, as shown below:

```> x <- seq(-0.5, 5, 0.05)

 -0.50 -0.45 -0.40 -0.35 -0.30 -0.25

> plot(x, ppois(x, 1), type = “s”, ylab = “F(x)”, main = “Poisson CDF”)```

## Weibull

The shape parameter ‘a’ and scale parameter ‘b’ define the Weibull distribution density, as follows:

`f(x) = (a/b) (x/b)^(a-1) exp(- (x/b)^a)`

…where x > 0. The dweibull() function gives the density while the rweibull() function generates random deviates, as illustrated below:

```> x <- c(0, rlnorm(10))
> x
 0.0000000 0.5194988 2.2301132 0.1844523 0.5267321 0.5487738 1.3437871
 2.0694222 3.6199835 6.3532987 5.9563711

> all.equal(dweibull(x, shape=1), dexp(x))
 TRUE

> rweibull(10, 5, 1)
 0.5197291 0.6851027 1.0279023 1.1397738 0.8810083 0.7945575 0.5939060
 0.8762381 0.7979457 0.6977511```

The specific parameters for the Weibull distribution are as follows.

 Arguments Description shape, scale shape and scale parameters

You are encouraged to read the manual pages for the above R functions to learn more on their arguments, options and usage.