Software updates and all that jazz!

Software updates and all that jazz!

In this month’s column, we will look at the concept of dynamic software update solutions, which allow even an operating system to be updated without requiring a reboot.

In last month’s column, we had looked at software upgrade failures, discussing the kind of failures that can arise during the upgrade process, and how such failures can be reduced.

Complex software upgrades, such as patching an operating system kernel, typically require planned downtime in which the required patches are applied and the system is rebooted. However, even this planned downtime leads to loss of service. There has therefore been considerable research of late on Dynamic Software Update (DSU) solutions, which do not require a system reboot or cause loss of application availability.

In this month’s column, we examine the issues involved in developing software updates/patches that can be dynamically applied; how to ensure the safety of DSUs; and what are the existing DSU solutions.

How does DSU work?

Consider any software update. It consists of changes to parts of an existing software application. In the context of the C language, a software program consists of functions and data. A software update can contain changes (addition/deletion/modification) to functions, to data, or to both.

Let us consider a simple example of a security patch that involves a change to a function foo, whose original version is shown in the snippet below:

int foo(char* app_buffer, char* input_string)
    strcpy(app_buffer, input_string);

Since the unsafe strcpy call can lead to security vulnerabilities like buffer overflows, the developer decided to fix this issue by replacing strcpy with strncpy, which copies only as many characters as allocated for app_buffer, avoiding buffer-overflow errors.

To construct a DSU for a statically patched application executable, we first need to know what the changes are between the old and new versions of the software. In this case, we know that it is the function foo that has been changed. Once the changes are determined, the DSU solution generates the executable (object) code for the changed part (foo).

Now, on production machines that run the application, before the DSU can be applied, we need to determine whether the application is in a state in which it is safe for the update to be done. If any active application threads are executing the old version of “active functions” (functions that have changed), then it is safer to wait until they finish executing these before patching. In our case, let’s wait for any thread executing foo to return from it, before applying the update. Later, we will discuss whether merely waiting for “active functions” to exit constitutes sufficient safety for DSUs.

Once you determine that it’s safe, apply the DSU. One way is to overwrite the entry point of each modified function with an instruction that jumps to the new version of the function. This is what we do. Now, the next thread that tries to execute foo will encounter the jump instruction, and go to the new version.

We have to ensure that the DSU happens atomically — either all “active functions” are updated successfully, or none are. We will discuss DSU atomicity later.

Note that to prevent any threads from entering and executing any “active function” while the DSU is going on, you may have to stall/delay the threads. Hence, it’s important to reduce the time taken to apply the DSU, to avoid degrading the application response time.

DSU factors and issues

In our simple explanation of the DSU mechanism, we glossed over certain important issues, such as how to determine whether a code change is safe to be applied as a DSU; how to determine the application safe-points at which a DSU can be applied; and how to verify that a dynamically patched system is working correctly. These issues need to be addressed robustly for a DSU to be employed in production environments.

There are three important factors for DSUs to become practical in real-life environments:

  • Broad applicability: It should be possible to apply a DSU to a broad set of applications and OS kernels, irrespective of whether single- or multi-threaded, whether written in C/C++ or Java, and whether a single or multiple functions/data structures are changed.
  • Minimal programmer burden: DSUs should impose a minimum additional programming burden on the developer, should not cause major changes in the software development process, and should not impact performance.
  • Safety: Programmers must be able to verify that a dynamically updated software application behaves as expected after the DSU is applied.

These properties are not easy to achieve in practice, making development of DSU systems quite difficult. Recall that the first step in the DSU is to determine code/data changes between old and new versions. In our simple example, the only change in foo was strcpy to strncpy — a trivial change that’s internal to the function. However, this does not mean that the executable containing foo only needs to be patched in one location. Why is this so?

The transformation of source code to object-level code is done by compilers, which perform a number of optimisations on the source code during the process. One of these optimisations is known as “inlining”, where a function’s body is put “in line” into calling code. In our example, if foo was called from another function bar, it’s possible that the compiler may have inlined foo into bar. The same could possibly apply for more functions that call foo. Hence, the DSU patch must ensure that all inlinings of foo into its callers are also correctly updated.

Mere source-code comparison between old and new versions will not be able to detect changes in callers of foo, because to inline or not is a compiler decision. So, a source-code comparison scheme would have to conservatively mark all callers of foo as modified. This can cause false positives in the “active functions” list, increasing the time needed to apply the DSU.

Another major issue is ensuring that the application state is safe to apply the DSU. Our example used a simple criterion: none of the “active functions” are currently active on the stack of any running thread. However, this check may not always be sufficient. Consider the following simple code snippet:

    = lock(&L1);
    = unlock(&L1);

Assume that the code modification was to remove the lock and unlock calls in funcA and funcC. Now, if any thread is executing funcB, as per our safety criterion, none of the threads have the “active functions” on their stack, and hence the system can be patched. Having patched the application, when the thread next enters funcC, it goes to the new version — which does not contain the unlock call. But, this particular thread earlier executed the older version of funcA, and has locked L1. Since there is no unlock call in the new funcC, the lock remains held by the thread — which is an error.

Therefore, our simple safety criterion is not sufficient to guarantee the safety of the DSU. We will continue our discussion of DSUs in next month’s column, focusing on their safety and timeliness.

This month’s takeaway question

Our takeaway problem is also related to the DSU problem. Let us consider the issue of modifications to data structures whose instances have already been created. For instance, consider a linked list of objects, which is created at application startup, and accessed throughout the application lifetime. As part of a defect fix, the developer added a new field to each object, and the DSU contains code that sets/gets the added field. However, this newly added field is not a standalone variable; it is part of an existing data structure, which has already been allocated by the application, and references to which are held by other objects. If necessary, can we dynamically update the application with the modified type for this object? What are the possible techniques?

This question is one to ponder over, rather than a programming problem. Can you think of scenarios in which an application cannot be updated dynamically? Can you write a tool that can verify whether a dynamically updated application is indeed performing correctly?

My must-read book for this month

This month’s must-read book is Algorithms for Interviews by Aziz and Prakash. This is an entertaining book, along the lines of Steve Skiena’s Algorithm Design Manual, discussing difficult programming problems and their algorithmic solutions. However, be aware that this is not meant to be an introduction to algorithms. As the book title suggests, this is a very useful book to browse through if you are preparing for interviews at Google, Microsoft or Amazon.

If you have a favourite programming book/article that you think is a must-read for every programmer, please do send me a note with the book’s name, and a short writeup on why you think it is useful, so I can mention it in the column. This would help many readers who want to improve their coding skills.

If you have any favourite programming puzzles that you would like to discuss on this forum, please send them to me, along with your solutions and feedback, at sandyasm_AT_yahoo_DOT_com. Till we meet again next month, happy programming, and here’s wishing you the very best!

Feature image courtesy: fsse8info. Reused under the terms of CC-BY-SA 2.0 License.


Please enter your comment!
Please enter your name here