CodeSport (October 2010)

Programming questions

Welcome to CodeSport. Here are some of the solutions to questions we raised in last month’s column.

My last column featured a medley of questions on computer architecture, operating systems and algorithms. Congratulations to four of our readers: K Radha Krishnan, S Priya, P Vasudevan and K Lakshmikanth, for getting most of the answers correct. In this column, we will provide answers to some of the questions, leaving the remaining questions to be addressed next month.

Operating system questions

We are aware that the set of memory locations that a process is allowed to access is known as the address space of the process. Consider a program that reads an array of 10,00,000 integers. Can this program be executed as a process on a machine whose physical memory is only 4,00,000 bytes?

Yes, virtual memory allows an application to pretend that it has access to a much larger memory than what is physically available.

We know that operating systems are typically multi-programmed, which means that multiple processes can be active at the same time. Can there be multiple kernels executing at the same time?

Yes. For example, in a virtual machine, multiple kernels execute at the same time, while there exists a host operating system, which controls access to the physical resources, and allows each operating system to think that it is the only one executing.

Is it possible to support multiple processes without support for virtual memory?

Yes. Since nobody has answered this question correctly till now, I will keep the question open for now. However, the clue is to think of segmented address spaces.

Consider the following code snippet. What signal gets delivered to the process when this code segment is executed?

int main(void)
{
  int* array = (int*) malloc(10);
  array++;
 *array = 120000;
  printf("value is %d\n", *array);
}

This was a trick question, in that this is a correct piece of code, and does not cause any SIGBUS signal. We declare an array of integers, and then increment the array pointer by 1. Originally, array points to the element array[0], and after the increment, it points to array[1]. The question was intended to confuse you about the increment operation of the pointer.

Consider the following code snippet, which shows how a process invokes the fork and exec system calls. Will it output the line “execve succeeded”? If not, why?

int result = fork();
if (result == 0) /* child code */
{
   if (execve("my new program", …) < 0)
   {
     printf("execve failed\n");
   }
   else
     printf("execve succeeded\n");
}

The line “execve succeeded” will not be output, because if the execve call is successful, it will execute the new program, overwriting the text, data, bss segment, and stack of the calling process. Hence, the printf would not be executed.

If a process is multi-threaded, and one of the threads invoke the fork system call, what happens?

The child process is a replica of the calling thread of the parent process, including its entire address space, and including the state of the mutex and all its resources.

If a process is multi-threaded, and one of the threads invokes the exec system call, what happens?

If the calling process is multi-threaded, a call to the exec function will cause all threads and light-weight processes (LWPs) in the calling process to be terminated, and the new executable image to be loaded and executed. No thread-specific data destructor functions are called. If the exec function fails and returns to the caller, the threads and LWPs in the calling process will not be terminated.

What is copy-on-write, and how is it used to reduce the fork-exec overhead?

Copy-on-write is an optimisation used to reduce the overhead of regular fork system calls, in cases where a child is going to invoke an exec system call immediately after the fork system call. In this case, the data and stack regions of the parent’s address space are marked as copy-on-write. The child process shares the data and stack region pages of the parent.

If, after the fork system call, there is any modifying access to the data and stack region pages, then a page protection fault is raised, since these pages are now marked as copy-on-write. The fault handler makes a new, writeable, copy of the page, corresponding to the faulting access, and updates the address mapping for that page in the process requesting the modification.

Thus, with copy-on-write, the data and stack pages of the parent are copied on demand, in a lazy fashion, instead of having to copy the entire address space at the time of the fork itself. If the child process invokes the exec system call, it results in revoking the copy-on-write for the shared data and stack pages of the parent.

Assume that a process detects and handles a stack overflow by protecting its stack region boundary with a write-protected page. Hence, any write to this page due to a stack overflow will result in a SIGSEGV being generated. Since the process stack has already overflowed, how can the signal handler for the SIGSEGV be executed?

Many operating systems provide an alternate signal stack to address this issue. There is a system call known as signalstack, which allows a process to define an alternate signal handling stack for specific signals. Signals that have been explicitly declared to execute on the alternate stack will be delivered on the alternate stack.

Should threads be supported at the user level, at the kernel level, or both? Justify your answer.

Threads can be supported at different levels, either at the application level by the user-level threads library; at the kernel level by means of kernel threads; or a combination of the two. Simply providing user-level threads without the underlying kernel threads prevents parallel execution of multiple user threads at the same time, resulting in loss of concurrency.

On the other hand, providing only kernel threads, without user-level threads, makes it difficult to extract parallelism, since the kernel does not have exposure to internal application-specific knowledge on what operations can be parallelised. Hence, a mix of kernel- and user-level threads is required.

Computer architecture questions

What are write-through and write-back caches? What are their advantages and disadvantages?

In the case of a write-through cache, the processor updates the main memory in sync with the cache entry update. Both the cache entry and the main memory location are updated simultaneously. Every write issued by the processor to the cache results in a corresponding write to the main memory.

In a write-back cache, the processor does not update the main memory immediately, at the time of writing to the corresponding cache entry. Instead, the main memory location is updated only when the cache entry is evicted from the cache. The processor marks the written cache entries using a dirty bit. When a cache entry is selected for replacement, if it’s dirty bit is set, the processor updates the main memory location corresponding to the cache entry.

If a memory location (cache entry) is updated multiple times by a processor before it is evicted, then a write-back cache reduces the memory traffic, since only one write to main memory happens at the time of eviction. On the other hand, when a cache miss occurs in a write-back cache, and a cache entry with dirty bit set is selected for eviction, it first needs to be written back to main memory, before the newly requested data can be brought into the cache. This results in two memory requests: one to update the memory location corresponding to the entry getting evicted, and another to bring back the requested data. This increases the cache miss service time for write-back caches.

What are the advantages and disadvantages of associative caches over direct-mapped caches?

In the case of a direct-mapped cache, there is only one location in the cache where a particular memory address can be mapped. Consider a trivial example of a cache with 16 entries. In this case, there are 16 sets, each containing only one cache entry, numbered 0 to 15. A given memory address is mapped, using a hash function address % 16, to a set in the cache. Since there is only one cache entry in each set, if the application accesses addresses in the order 0, 16, 32, 0, 8, 16 and 32, each will get mapped to set 0. Since set 0 has only one entry, when Address 16 is accessed, it has to evict the data corresponding to Address 0.

Similarly, when Address 32 is accessed, it will again evict the data corresponding to Address 0, since they all get mapped to the same cache set. Now, if Address 16 is accessed again, it will have a miss, since Address 32 has replaced it in the cache. Therefore, a direct-mapped cache can lead to conflicts between different cache entries. Cache misses that occur due to multiple memory addresses mapping to the same cache entry are known as conflict misses.

Set-associative caches were introduced to reduce conflict misses. Consider the above example with a 3-way set-associative cache. In this case, each cache set contains three cache entries. Therefore, all three addresses (0, 16 and 32) can be present simultaneously in the cache, without having to evict each other, resulting in zero conflict misses. Note, however, that set-associative caches can lead to an increase in cache look-up time — since there are k entries where a particular memory location can be mapped, all k entries need to be checked to see if the data is present in the cache or not, leading to a possible increase in cache access time.

What is the difference between little-endian architecture and big-endian architecture?

Little-endian and big-endian refer to the order in which the data bytes are stored in memory. Little-endian stands for “Little End In”. In this architecture, the little end is stored first, i.e., the memory address contains the least significant byte stored first. For example, a data value 0×12345678 stored at location X would be stored as (0×78 0×56 0×34 0×12), with the memory location X containing 0×78, the memory location X+1 holding 0×56, the memory location X+2 storing 0×34, and the memory location X+3 having 0×12. Intel x86 processors follow the little-endian convention.

In big-endian architecture, which stands for “Big End In”, the memory address contains the most significant byte stored first. For example, a data value 0×12345678 stored at location X would be stored as (0×12 0×34 0×56 0×78), with the memory location X containing 0×12,  X+1 storing 0×34, and so on. Both Motorola and HP’s PA-RISC/IA-64 processors follow the big-endian convention.

What is the difference between symmetric multiprocessor (SMP) systems and chip multiprocessor systems?

An SMP system consists of two or more separate processors connected, using a common bus, switch hub, or network, to shared memory and I/O devices. The overall system is usually physically smaller, and uses less power than an equivalent set of uni-processor systems, because physically large components such as memory, hard drives, and power supplies can be shared by some or all of the processors.

So, when faced with the power wall and memory wall, chip designers decided to apply the same principles of scale out from SMP systems, down to the chip level. They introduced chip multiprocessor systems, wherein two or more processor cores are placed on a single chip, and share certain processor resources such as cache.

What is the difference between write-allocate and write-no-allocate cache?

When a processor writes to a memory location, it can either first allocate an entry in the cache and write to the entry in the cache (known as write-allocate), or it can directly write to the main memory without allocating a cache entry (known as write-no-allocate). In case of a write-allocate cache, the processor has a choice of when to update the main memory with the latest value from the cache — for that memory location.

My ‘must read book’ for this month

This month’s book suggestion comes from S Priya. Her recommendation is, Programming Pearls by Jon Bentley. While this book is not meant for a first course in programming, it is a great find for those trying to sharpen their skills in design and coding. The book provides multiple solutions to each of the programming problems, by discussing the pros and cons of each approach. This makes it easier for the reader to appreciate how to apply the techniques mentioned in the book. So do make time to read it.

If you have a favourite programming book that you think is a must-read for every programmer, please do send me a note with the book’s name, and a short note on why you think it is useful. I will mention it in this column. This will help readers who want to improve their coding skills.

Please do send me your answers to this month’s questions. If you have any favourite programming puzzles that you would like to discuss on this forum, please send them to me, along with your solutions and feedback, at sandyasm_AT_yahoo_DOT_com. Till we meet again next month, happy programming, and wish you the very best!

Feature image courtesy: walknboston. Reused under the terms of CC-BY 2.0 License.

All published articles are released under Creative Commons Attribution-NonCommercial 3.0 Unported License, unless otherwise noted.
Open Source For You is powered by WordPress, which gladly sits on top of a CentOS-based LEMP stack.

Creative Commons License.