HomeProbability Distributions in R

# Probability Distributions in R

In this fourteenth article in the R series, we shall explore the common probability distribution functions available in R.

We will use version 4.2.1 installed on Parabola GNU/Linux-libre (x86-64) for the code snippets.

```\$ R --version

R version 4.2.1 (2022-06-23) -- “Funny-Looking Kid”
Copyright (C) 2022 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under the terms of the
GNU General Public License versions 2 or 3.

Consider the mtcars data set in the lattice library:

```> head(mtcars)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1```

The summary() function can be used to produce the basic statistics of the data, as shown below:

```> summary(mtcars)
mpg cyl disp hp
Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0
1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5
Median :19.20 Median :6.000 Median :196.3 Median :123.0
Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7
3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0
Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0```

The Tukey’s five number summary (minimum, lower-hinge, median, upper-hinge, and maximum) can be obtained using the fivenum() function, as follows:

```> mtcars\$disp
[1] 160.0 160.0 108.0 258.0 360.0 225.0 360.0 146.7 140.8 167.6 167.6 275.8
[13] 275.8 275.8 472.0 460.0 440.0 78.7 75.7 71.1 120.1 318.0 304.0 350.0
[25] 400.0 79.0 120.3 95.1 351.0 145.0 301.0 121.0

> fivenum(mtcars\$disp)
[1] 71.10 120.65 196.30 334.00 472.00```

A histogram of the mtcars displacement data is shown in Figure 1.

The stem() function can generate a stem-and-leaf plot for input data. For example, we can obtain a plot for the mtcars cylinder, as demonstrated below:

```> mtcars\$cyl
[1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4

> stem(mtcars\$cyl)

The decimal point is at the |

4 | 00000000000
4 |
5 |
5 |
6 | 0000000
6 |
7 |
7 |
8 | 00000000000000```

## Uniform

The Uniform distribution has the following density function:

`f(x) = 1/b-a`

…where a <= x <= b. The runif() function generates random deviates, while the dunif() function gives the density, as illustrated below:

```> d <- runif(10, min=0, max=1)
> d
[1] 0.48730291 0.26676446 0.15025110 0.77975962 0.42828890 0.05255129
[7] 0.54070173 0.54715839 0.42703064 0.31628728

> dunif(d)
[1] 1 1 1 1 1 1 1 1 1 1```

These functions accept the following common arguments:

 Arguments Description x, q A vector of quantiles p A vector of probabilities n Number of observations log, log.p A TRUE logical value indicates probabilities as log(p) lower.tail A TRUE logical value that probabilities are P[X <= x]]

The specific set of arguments for the uniform distribution is given below:

 Arguments Description min, max Finite lower and upper distribution limits

## Binomial

The density of the binomial distribution with probability ‘p’ and size ‘n’ is given by:

`p(x) = select(n,x) p^x (1-p)^(n-x)`

…where ‘x’ varies from 0 to n. The p(x) is calculated using the Loader’s algorithm. The pbinom() function gives the distribution function while the rbinom() function provides random values, as shown below:

```> rbinom(5, 5, 1)
[1] 5 5 5 5 5

> rbinom(5, 5, 0.5)
[1] 3 2 2 4 2

> rbinom(5, 5, 0.25)
[1] 0 3 0 2 2```

An example of pbinom() function for five successes in 10 trials, each having probability 0.5, is as follows:

```> pbinom(5, 10, 0.5)
[1] 0.6230469```

In addition to the common arguments listed previously, the binomial distribution function accepts the following arguments:

 Arguments Description size Number of trials prob Probability of success in each trial

## Exponential

The exponential function for a given rate ‘lambda’ is given by:

`f(x) = lambda {e}^{- lambda x}`

…where x >= 0. The rexp() function provides random deviations, the qexp() function gives the quantile function, and the pexp() function provides the distribution function. A couple of examples are given below:

```> rexp(5, rate = 1)
[1] 0.44090566 0.18508747 1.18596321 0.86235515 0.05020871

> pexp(d, rate = 1)
[1] 0.38571907 0.23414656 0.13950812 0.54148378 0.34837687 0.05119434
[7] 0.41766053 0.42140839 0.34755644 0.27114996```

The specific ‘rate’ argument is applicable to the exponential function.

 Arguments Description rate A vector of rates

## Geometric

The geometric distribution density with probability ‘p’ is defined as follows:

p(x) = p (1-p)^x

…where x = 0, 1, 2, etc, and ‘p’ is between 0 and 1. The rgeom() function for 10 observations and probability 0.5 is given below:

```> d <- rgeom(10, 0.5)
> d
[1] 0 1 1 0 0 0 0 0 1 1```

The pgeom() and qgeom() functions are based on closed-form formulae and provide the distribution and quantile functions, respectively.

```> pgeom(d, 0.5)
[1] 0.50 0.75 0.75 0.50 0.50 0.50 0.50 0.50 0.75 0.75

> qgeom((1:10)/15, prob = 0.2)
[1] 0 0 0 1 1 2 2 3 4 4```

The argument specific to the geometric function is given below:

 Arguments Description prob Probability of success in each trial

## Normal

The normal distribution for mean ‘mu’ and standard deviation ‘sigma’ is given by:

`f(x) = 1/(sqrt(2 pi) sigma) e^-((x - mu)^2/(2 sigma^2))`

A sample of ten random values for mean 0 with standard deviation 1 can be generated using the rnorm() function, as follows:

```> d <- rnorm(10, mean=0, sd=1)
> d
[1] 0.66512403 -0.43102217 0.50305386 1.55028087 1.20598557 0.33704698
[7] 0.09200192 0.67090448 -0.24424568 -0.78798191```

Example uses of pnorm() and dnorm() functions are given below:

```> pnorm (d, mean=0, sd=1)to
[1] 0.7470144 0.3332261 0.6925368 0.9394629 0.8860885 0.6319593 0.5366517
[8] 0.7488593 0.4035203 0.2153536

> dnorm(d, mean=0, sd=1)
[1] 0.3197763 0.3635536 0.3515265 0.1199568 0.1927928 0.3769138 0.3972575
[8] 0.3185439 0.3872184 0.2924691```

The normal distribution specific arguments are listed for your reference below.

 Arguments Description mean A vector of means sd A vector of standard deviations

## Poisson

The Poisson distribution density is defined as follows:

`p(x) = lambda^x exp(-lambda)/x!`

…where x = 0, 1, 2, … The mean and variance is lambda.

```> d <- rpois(10, 0.9)
> d
[1] 2 2 1 3 1 2 1 1 0 0```

The dpois() function gives the log density while the ppois() function gives the log distribution function. A couple of examples are given below:

```> dpois(d, 0.9)
[1] 0.16466071 0.16466071 0.36591269 0.04939821 0.36591269 0.16466071
[7] 0.36591269 0.36591269 0.40656966 0.40656966

> ppois(d, 0.9)
[1] 0.9371431 0.9371431 0.7724824 0.9865413 0.7724824 0.9371431 0.7724824
[8] 0.7724824 0.4065697 0.4065697```

The Poisson distribution function specific parameter is lambda, as shown below.

 Arguments Description lambda A vector of non-negative means

The Poisson CDF can be plotted in a graph, as shown below:

```> x <- seq(-0.5, 5, 0.05)

[1] -0.50 -0.45 -0.40 -0.35 -0.30 -0.25

> plot(x, ppois(x, 1), type = “s”, ylab = “F(x)”, main = “Poisson CDF”)```

## Weibull

The shape parameter ‘a’ and scale parameter ‘b’ define the Weibull distribution density, as follows:

`f(x) = (a/b) (x/b)^(a-1) exp(- (x/b)^a)`

…where x > 0. The dweibull() function gives the density while the rweibull() function generates random deviates, as illustrated below:

```> x <- c(0, rlnorm(10))
> x
[1] 0.0000000 0.5194988 2.2301132 0.1844523 0.5267321 0.5487738 1.3437871
[8] 2.0694222 3.6199835 6.3532987 5.9563711

> all.equal(dweibull(x, shape=1), dexp(x))
[1] TRUE

> rweibull(10, 5, 1)
[1] 0.5197291 0.6851027 1.0279023 1.1397738 0.8810083 0.7945575 0.5939060
[8] 0.8762381 0.7979457 0.6977511```

The specific parameters for the Weibull distribution are as follows.

 Arguments Description shape, scale shape and scale parameters

You are encouraged to read the manual pages for the above R functions to learn more on their arguments, options and usage.

Shakthi Kannan
The author is a free software developer at the Fedora project, and also a blogger. He co-maintains the Fedora Electronic Lab project.