# R Series: Optimization

0
324

We have learnt various R profiling techniques in the previous article. In this twentieth article in the R series, we shall learn how to optimise R code.

We will use version 4.2.2 installed on Parabola GNU/Linux-libre (x86-64) for the code snippets.

```\$ R --version
R version 4.2.2 (2022-10-31) -- “Innocent and Trusting”
Copyright (C) 2022 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under the terms of the
GNU General Public License versions 2 or 3.

## Recursive versus non-recursive functions

Consider a recursive function to compute the nth Fibonacci sequence:

```recurFib <- function(n) {
if (n == 0) {0}
else if (n == 1 || n == 2) {1}
else { recurFib(n - 1) + recurFib(n - 2) }
}```

You can verify the output sequence numbers, as shown below:

```> recurFib(0)
[1] 0
> recurFib(1)
[1] 1
> recurFib(2)
[1] 1
> recurFib(3)
[1] 2
> recurFib(4)
[1] 3
> recurFib(5)
[1] 5```

A non-recursive implementation is given by the following fibo() function:

```> fibo <- function(n) {
+ phi <- (sqrt(5) + 1)/2
+ fib <- (phi^(n+1) - (1-phi)^(n+1)) / (2*phi - 1)
+ round(fib)
+ }```

In this example, we are starting the Fibonacci number with 1, as illustrated below:

```> fibo(0)
[1] 1
> fibo(1)
[1] 1
> fibo(2)
[1] 2
> fibo(3)
[1] 3
> fibo(4)
[1] 5
> fibo(5)
[1] 8```

The system.time() function can be used to obtain the computation time taken by both these functions.

```> recurFib(35)
[1] 9227465
> system.time(recurFib(35))
user system elapsed
6.831 0.000 6.834
> fibo(34)
[1] 9227465
> system.time(fibo(34))
user system elapsed
0 0 0```

The recursive function is definitely slower than the non-recursive implementation. Moreover, R uses faster native implementation for computing power of values (^), and hence its usage is recommended.

## Preallocate

You do not need to allocate memory for any object in R. In the following example, the initialisation takes time.

```> v1 <- c()
> system.time(for (i in 1:10000000) {v1[i] <- i;})
user system elapsed
2.143 0.120 2.263```

The length can be specified for a vector and memory can be pre-allocated, so that initialisation can be sped up, as illustrated below:

```> v2 <- c(NA)
> length(v2) <- 10000000
> system.time(for (i in 1:10000000) {v2[i] <- i;})
user system elapsed
0.299 0.020 0.319```

## Matrix multiplication

Consider two matrices, m1 and m2:

```> m1 <-matrix(1:10000, nrow=100)
> m2 <-matrix(10001:20000, nrow=100)```

You can write your own version of matrix multiplication as follows:

```> matrixMul <- function(mat1, mat2) {
rows <- nrow(mat1)
cols <- ncol(mat2)
output <- matrix(nrow = rows, ncol = cols)
for (i in 1:rows) {
for(j in 1:cols) {
vec1 <- mat1[i,]
vec2 <- mat1[,j]
mult1 <- vec1 * vec2
output[i,j] <- sum(mult1)
}
}
return(output)
}```

The time taken for multiplying the two matrices, m1 and m2, using the above function is given below:

```> system.time(matrixMul(m1,m2))
user system elapsed
0.039 0.000 0.039```

The recommended practice is to perform matrix multiplication using the ‘%*%’ operator. The result for the multiplication of matrices m1 and m2 is quite instantaneous.

```> system.time(m1*m2)
user system elapsed
0 0 0```

## Bytecode compiler

The cmpfun() function provides an interface to the byte code compiler for R. It takes the following arguments:

 Argument Description f A closure options Compiler options

The ‘recurFib’ function can be byte-compiled to improve the performance, as shown below:

```> library(compiler)
> compiled.recurFib <- cmpfun(recurFib)
> system.time(compiled.recurFib(34))
user system elapsed
4.296 0.000 4.298```

The compiled function, along with the reference to the bytecode, is shown below:

```> compiled.recurFib
function(n) {
if (n == 0) {0}
else if (n == 1 || n == 2) {1}
else { recurFib(n - 1) + recurFib(n - 2) }
}
<bytecode: 0x55eff06fcba8>```

You can also view the disassembly of the compiled function using the disassemble() function, as follows:

```> disassemble(compiled.recurFib)
list(.Code, list(12L, GETVAR.OP, 1L, LDCONST.OP,3L,EQ.OP, 4L,
BRIFNOT.OP, 5L, 13L, LDCONST.OP, 3L,RETURN.OP, GETVAR.OP,
1L, LDCONST.OP, 8L, EQ.OP, 9L, OR1ST.OP,7L,30L, GETVAR.OP,
1L, LDCONST.OP, 10L, EQ.OP, 11L, OR2ND.OP, 7L, BRIFNOT.OP,
12L, 36L, LDCONST.OP, 8L, RETURN.OP, GETFUN.OP, 14L, MAKEPROM.OP,
17L, CALL.OP, 15L, GETFUN.OP, 14L, MAKEPROM.OP, 19L, CALL.OP,
if (n == 0) {
0
}
else if (n == 1 || n == 2) {
1
}
else {
recurFib(n - 1) + recurFib(n - 2)
}
}, n, structure(c(2L, 3L, 4L, 44L, 3L, 44L, 2L, 4L), srcfile = <environment>, class = “srcref”),
0, n == 0, if (n == 0) {
0
...```

The cmpfile() function can be used to compile a large source .R file. It accepts the following arguments:

 Argument Description infile A closure outfile Compiler options ascii If compiled file should be stored as ascii env Environment for compiling verbose Display compilation output options Compiler options version Workspace format version

## Bigmemory package

The bigmemory package is used to handle large, Big Data matrices using shared memory and memory-mapped files. You can install the package as follows:

```> install.packages(“bigmemory”)
Installing package into ‘/home/guest/R/x86_64-pc-linux-gnu-library/4.1’
(as ‘lib’ is unspecified)
--- Please select a CRAN mirror for use in this session ---
...
Creating a generic function for ‘typeof’ from package ‘base’ in package ‘bigmemory’
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded from temporary location
** checking absolute paths in shared objects and dynamic libraries
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (bigmemory)```

The ‘m’ matrix is created using the ‘matrix’ function and written to a .csv file using the ‘write.table()’ function, as shown below:

```> library(bigmemory)
> m <- matrix(rnorm(30*15000, mean=0, sd=4), 30, 15000)
> write.table(m, file = “/tmp/m1.csv”)
> system.time(write.table(m, file = “/tmp/m1.csv”))
user system elapsed
0.318 0.001 0.318```

You can instead use the big.matrix() function from the bigmemory package to improve the write performance, as demonstrated below:

```> system.time(big.matrix(m, nrow=30, ncol=15000, backingfile=”m2.csv”, type=”double”, descriptorfile=”m2.desc”))
user system elapsed
0.006 0.000 0.006```

The contents of the m2.csv file are given below for reference:

```\$ cat m2.desc
new(“big.matrix.descriptor”, description = list(sharedType = “FileBacked”,
filename = “m2.csv”, dirname = “/home/guest/20-optimization/”,
totalRows = 30L, totalCols = 15000L, rowOffset = c(0, 30),
colOffset = c(0, 15000), nrow = 30, ncol = 15000, rowNames = NULL,
colNames = NULL, type = “double”, separated = FALSE))```

## The R code optimizer

The rco package takes in R code as input, and applies various optimization techniques to improve the code. You can install the package using the following command:

```> install.packages(“rco”)
Installing package into ‘/home/guest/R/x86_64-pc-linux-gnu-library/4.1’
(as ‘lib’ is unspecified)
--- Please select a CRAN mirror for use in this session ---
...
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded from temporary location
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (rco)```

The formula to convert Fahrenheit to Celsius is given below:

`celsius = (fahrenheit - 32) / 1.8`

The code snippets are stored in a ‘code’ variable, as shown below:

```> library(rco)
> code <- paste(
+ “f <- 100”,
+ “paren <- (f - 32)”,
+ “v <- paren / 1.8”,
+ “c <- v”,
+ “if (c > 30) {“,
+ ‘ print(“It is hot!”)’,
+ “} else {“,
+ ‘ print(“It is pleasant!”)’,
+ “}”,
+ sep=”\n”
+ )```

You can invoke the optimize_code() function on the code and view the optimised code, as illustrated below:

```> pass1 <- optimize_text(code, iterations = 1)
Optimization number 1

> pass1
[1] “f <- 100\nparen <- (100 - 32)\nv <- paren / 1.8\nc <- v\nif (c > 30) {\n print(\”It is hot!\”)\n} else {\n print(\”It is pleasant!\”)\n}”
> pass2 <- optimize_text(pass1, iterations = 1)
Optimization number 1

> pass2
[1] “f <- 100\nparen <- 68\nv <- 68 / 1.8\nc <- v\nif (c > 30) {\n print(\”It is hot!\”)\n} else {\n print(\”It is pleasant!\”)\n}”```

The ‘optimize_text’ function accepts the following arguments:

 Argument Description text Code in character vectors optimizers List of optimizer functions iterations Number of times the optimizer should run