*In this fourteenth article in the R series, we shall explore the common probability distribution functions available in R.*

We will use version 4.2.1 installed on Parabola GNU/Linux-libre (x86-64) for the code snippets.

$ R --version R version 4.2.1 (2022-06-23) -- “Funny-Looking Kid” Copyright (C) 2022 The R Foundation for Statistical Computing Platform: x86_64-pc-linux-gnu (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under the terms of the GNU General Public License versions 2 or 3. For more information about these matters see https://www.gnu.org/licenses/.

Consider the mtcars data set in the lattice library:

> head(mtcars) mpg cyl disp hp drat wt qsec vs am gear carb Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1

The *summary()* function can be used to produce the basic statistics of the data, as shown below:

> summary(mtcars) mpg cyl disp hp Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0 1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5 Median :19.20 Median :6.000 Median :196.3 Median :123.0 Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7 3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0 Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0

The Tukey’s five number summary (minimum, lower-hinge, median, upper-hinge, and maximum) can be obtained using the *fivenum()* function, as follows:

> mtcars$disp [1] 160.0 160.0 108.0 258.0 360.0 225.0 360.0 146.7 140.8 167.6 167.6 275.8 [13] 275.8 275.8 472.0 460.0 440.0 78.7 75.7 71.1 120.1 318.0 304.0 350.0 [25] 400.0 79.0 120.3 95.1 351.0 145.0 301.0 121.0 > fivenum(mtcars$disp) [1] 71.10 120.65 196.30 334.00 472.00

A histogram of the mtcars displacement data is shown in Figure 1.

The *stem()* function can generate a stem-and-leaf plot for input data. For example, we can obtain a plot for the* mtcars* cylinder, as demonstrated below:

> mtcars$cyl [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4 > stem(mtcars$cyl) The decimal point is at the | 4 | 00000000000 4 | 5 | 5 | 6 | 0000000 6 | 7 | 7 | 8 | 00000000000000

## Uniform

The Uniform distribution has the following density function:

f(x) = 1/b-a

…where a <= x <= b. The *runif()* function generates random deviates, while the *dunif()* function gives the density, as illustrated below:

> d <- runif(10, min=0, max=1) > d [1] 0.48730291 0.26676446 0.15025110 0.77975962 0.42828890 0.05255129 [7] 0.54070173 0.54715839 0.42703064 0.31628728 > dunif(d) [1] 1 1 1 1 1 1 1 1 1 1

These functions accept the following common arguments:

Arguments |
Description |

x, q | A vector of quantiles |

p | A vector of probabilities |

n | Number of observations |

log, log.p | A TRUE logical value indicates probabilities as log(p) |

lower.tail | A TRUE logical value that probabilities are P[X <= x]] |

The specific set of arguments for the uniform distribution is given below:

Arguments |
Description |

min, max | Finite lower and upper distribution limits |

## Binomial

The density of the binomial distribution with probability ‘p’ and size ‘n’ is given by:

p(x) = select(n,x) p^x (1-p)^(n-x)

…where ‘x’ varies from 0 to n. The p(x) is calculated using the Loader’s algorithm. The *pbinom()* function gives the distribution function while the* rbinom()* function provides random values, as shown below:

> rbinom(5, 5, 1) [1] 5 5 5 5 5 > rbinom(5, 5, 0.5) [1] 3 2 2 4 2 > rbinom(5, 5, 0.25) [1] 0 3 0 2 2

An example of pbinom() function for five successes in 10 trials, each having probability 0.5, is as follows:

> pbinom(5, 10, 0.5) [1] 0.6230469

In addition to the common arguments listed previously, the binomial distribution function accepts the following arguments:

Arguments |
Description |

size | Number of trials |

prob | Probability of success in each trial |

## Exponential

The exponential function for a given rate ‘lambda’ is given by:

f(x) = lambda {e}^{- lambda x}

…where x >= 0. The rexp() function provides random deviations, the qexp() function gives the quantile function, and the pexp() function provides the distribution function. A couple of examples are given below:

> rexp(5, rate = 1) [1] 0.44090566 0.18508747 1.18596321 0.86235515 0.05020871 > pexp(d, rate = 1) [1] 0.38571907 0.23414656 0.13950812 0.54148378 0.34837687 0.05119434 [7] 0.41766053 0.42140839 0.34755644 0.27114996

The specific ‘rate’ argument is applicable to the exponential function.

Arguments |
Description |

rate | A vector of rates |

## Geometric

The geometric distribution density with probability ‘p’ is defined as follows:

p(x) = p (1-p)^x

…where x = 0, 1, 2, etc, and ‘p’ is between 0 and 1. The *rgeom()* function for 10 observations and probability 0.5 is given below:

> d <- rgeom(10, 0.5) > d [1] 0 1 1 0 0 0 0 0 1 1

The *pgeom()* and *qgeom()* functions are based on closed-form formulae and provide the distribution and quantile functions, respectively.

> pgeom(d, 0.5) [1] 0.50 0.75 0.75 0.50 0.50 0.50 0.50 0.50 0.75 0.75 > qgeom((1:10)/15, prob = 0.2) [1] 0 0 0 1 1 2 2 3 4 4

The argument specific to the geometric function is given below:

Arguments |
Description |

prob | Probability of success in each trial |

## Normal

The normal distribution for mean ‘mu’ and standard deviation ‘sigma’ is given by:

f(x) = 1/(sqrt(2 pi) sigma) e^-((x - mu)^2/(2 sigma^2))

A sample of ten random values for mean 0 with standard deviation 1 can be generated using the *rnorm()* function, as follows:

> d <- rnorm(10, mean=0, sd=1) > d [1] 0.66512403 -0.43102217 0.50305386 1.55028087 1.20598557 0.33704698 [7] 0.09200192 0.67090448 -0.24424568 -0.78798191

Example uses of *pnorm()* and *dnorm()* functions are given below:

> pnorm (d, mean=0, sd=1)to [1] 0.7470144 0.3332261 0.6925368 0.9394629 0.8860885 0.6319593 0.5366517 [8] 0.7488593 0.4035203 0.2153536 > dnorm(d, mean=0, sd=1) [1] 0.3197763 0.3635536 0.3515265 0.1199568 0.1927928 0.3769138 0.3972575 [8] 0.3185439 0.3872184 0.2924691

The normal distribution specific arguments are listed for your reference below.

Arguments |
Description |

mean | A vector of means |

sd | A vector of standard deviations |

## Poisson

The Poisson distribution density is defined as follows:

p(x) = lambda^x exp(-lambda)/x!

…where x = 0, 1, 2, … The mean and variance is lambda.

> d <- rpois(10, 0.9) > d [1] 2 2 1 3 1 2 1 1 0 0

The *dpois()* function gives the log density while the *ppois()* function gives the log distribution function. A couple of examples are given below:

> dpois(d, 0.9) [1] 0.16466071 0.16466071 0.36591269 0.04939821 0.36591269 0.16466071 [7] 0.36591269 0.36591269 0.40656966 0.40656966 > ppois(d, 0.9) [1] 0.9371431 0.9371431 0.7724824 0.9865413 0.7724824 0.9371431 0.7724824 [8] 0.7724824 0.4065697 0.4065697

The Poisson distribution function specific parameter is lambda, as shown below.

Arguments |
Description |

lambda | A vector of non-negative means |

The Poisson CDF can be plotted in a graph, as shown below:

> x <- seq(-0.5, 5, 0.05) > head(x) [1] -0.50 -0.45 -0.40 -0.35 -0.30 -0.25 > plot(x, ppois(x, 1), type = “s”, ylab = “F(x)”, main = “Poisson CDF”)

## Weibull

The shape parameter ‘a’ and scale parameter ‘b’ define the Weibull distribution density, as follows:

f(x) = (a/b) (x/b)^(a-1) exp(- (x/b)^a)

…where x > 0. The dweibull() function gives the density while the rweibull() function generates random deviates, as illustrated below:

> x <- c(0, rlnorm(10)) > x [1] 0.0000000 0.5194988 2.2301132 0.1844523 0.5267321 0.5487738 1.3437871 [8] 2.0694222 3.6199835 6.3532987 5.9563711 > all.equal(dweibull(x, shape=1), dexp(x)) [1] TRUE > rweibull(10, 5, 1) [1] 0.5197291 0.6851027 1.0279023 1.1397738 0.8810083 0.7945575 0.5939060 [8] 0.8762381 0.7979457 0.6977511

The specific parameters for the Weibull distribution are as follows.

Arguments |
Description |

shape, scale | shape and scale parameters |

You are encouraged to read the manual pages for the above R functions to learn more on their arguments, options and usage.