Developers

Code Profiling in Linux Using Gprof

June 1, 2011

13421

Today, performance is a pivotal point in the programming world. Programmers constantly strive to make their code run in fewer milliseconds. Performance analysis can be done in various ways, static as well as dynamic. One way to optimise code is to dynamically analyse it for call-time and execution delays in order to find the bottlenecks in the execution. In Linux, Gprof can help to profile the code you compiled by using GCC.

To explain code profiling in Linux for C and C++ code in GCC, I developed a small C program for function calling. This shows the usage of Gprof, the GCC profiling program, and its reporting pattern. Profiling is used in Linux to improve code performance by analysing call times and call chains involved in the operation. You can find out the time taken by functions (which function code took a long time to run) and this can be very useful to identify bottlenecks. There are two types of profiling — the flat or the call-graph variety.

Environment and prerequisites

For the purpose of this article, I am using Ubuntu Server 9.10 in Oracle VirtualBox, with the following VM characteristics:

Virtual hardware: Single CPU, 2,000 MB RAM, I/O APIC enabled; Acceleration: VT-X, Nested Paging, PAE/NX; Virtual hard disk: 8 GB on SATA controller
Software/OS: Linux 2.6.31-14-generic-pae, Ubuntu; GCC 4.4.1 (Ubuntu 4.4.1-4ubuntu9); SAR utility for CPU reporting (sysstat package); HTOP utility

The results may vary in different scenarios. I would recommend running this code in a highly isolated environment, as it can exhaust the CPU in no time. The code is pretty basic and relies only on basic mathematical operations, with a few system calls for the MD5 sum generation. The internal calls of the functions are counted by global counters to verify with the Gprof calls.

The code

A brief explanation about the code: First, create the files to append the hashes and random number. p is the integer assigned; the value of p must be less than 10000. p++ increases the value stored at the p pointer by 1. func1() is called if the value lies within 9600 and 10000; i.e., func1() is called 399 times. func3() is called for the remaining p values, i.e., 9601.

In func1(), j is incremented. When j reaches 66668, it increases the global counter1 to keep track of function calls. In func2(), m is incremented in the for loop. The filegen() function is called thrice. counter2 is incremented to track the filegen() function. Otherwise, it returns to func1().

In filegen(), tempc.txt is created and opened in append mode; 4000 values are added to this file in every filegen() call. Then the tempc.txt file is hashed, and this hash stored in hashes.txt. So for every single call of func2(), three calls of filegen() are made — 3 MD5 hashes are computed for every increment of p. MD5 hashing is used to keep the CPU busy. func3() comprises of the l variable, which is incremented to 50000, and will increment counter3 every time to track the calls. Every function has different time consumptions.

/*
  Name :Subodh Pachghare 
  Author :I am the author!!!
  Description :Function calling test program for Gprof
*/
#include <stdio.h>
int counter1=0;            /*Global Variable*/
int counter2=0;            /*Global Variable*/
int counter3=0;            /*Global Variable*/
int main()
{
int p=0;
system("> /root/hashes.txt");     /*File will be overwritten or Created*/
system("> /root/tempc.txt");      /*File will be overwritten or Created*/
printf("Initial value in Main function= %d\n",p);
while(p<10000)        /*Condition for while loop*/
    {
        if(p>9600 && p<10000)
            {
            func1();
            }
          else
            {
            func3();
            }
        p++;            
    }
printf("Last value in Main function (value(p) incremented for) = %d\n",p);
printf("Func1(),Func2(),Func3() & filegen() called for %d %d %d %d times\n",counter1,counter1,counter3,counter2);
}
func1()
{
int j=0;
while(j<=66667)            /*Another operation to eat CPU cycles*/
    j++;
if(j==66668)                 
    {
    counter1++;
    }
func2();            /*Call to Function func2()*/
}
func2()
{
int m=0;
for (m=1;m<1000000;m++)
    {
    if(m<4)
        {
            filegen();    /*Call to Function filegen()*/
            counter2++;
        }
    else
        {
            return 0;     /*return to the func1()*/
        }
    }
}
filegen()
{
FILE *fo,*fg;
int f=0;
fo=fopen("/root/tempc.txt","a");
while(f<4000)
    {
        fprintf(fo,"%d\n",f);
        f++;
    }
fclose(fo);
system("md5sum /root/tempc.txt >> /root/hashes.txt"); /*Unique MD5 hash generation*/
}
func3()
{
int l=0,b=0;
while(l<=50000)
    {
        b=l;
        l++;
    }
if(l==50001)
    {
    counter3++;
    }
}

Killing code before completion: Note that this code cannot be interrupted with Ctrl+C due to presence of system call. So if you want to stop execution in between, use kill -9 `pidof a.out` from another terminal to send a SIGKILL signal.

Reporting CPU activity

To report CPU utilisation in user-space and kernel-space while executing the code, use the SAR (system activity report) from the sysstat package, which is my favourite.

sar -u 2 400

Here, -u is for CPU reporting; 2 implies a 2-second interval; and 400 is to run it 400 times.

Surprisingly, the code eats lots of cycles for kernel-space, as compared to user-space.

Compile the code using the -pg option to include the profiler code in the executable, as follows:

gcc -pg gprof_test_code_svp.c

This will generate an a.out executable file. You can specify another output filename with -o. Now run the binary with ./a.out. After executing the code, the following output will be generated:

root@ubuntu:~# gcc -pg gprof_test_code_svp.c
root@ubuntu:~# ./a.out
Initial value in Main function= 0
Last value in Main function (value(p) incremented for) = 10000
func1(),func2(),func3() & filegen() called for 399 399 9601 1197 times
root@ubuntu:~#

Flat-profile generation

After successful execution of the program, it will generate the file gmon.out. Use Gprof on this file to generate the graph:

gprof -p a.out gmon.out 

Flat profile:
Each sample counts as 0.01 seconds.
  %   cumulative self self total           
 time seconds seconds calls us/call us/call name    
 97.98 0.97 0.97 9601 101.03 101.03 func3
  2.02 0.99 0.02 399 50.13 50.13 func2
  0.00 0.99 0.00 1197 0.00 0.00 filegen
  0.00 0.99 0.00 399 0.00 50.13 func1
<output snipped>

This generates flat-profile analytics, where time calls are given for comparison. Indications from this profile are:
The program took 1.46 minutes of clock time to complete. From the flat-profile, func3 consumed 0.97 seconds per call and was called 9601 times; around 101 milliseconds are spent in this function for every call to func3. The same values in the self us/call and total us/call field indicate that the complete time was spent on the function’s own operations, and not on the children functions.

In func1, which was the last entry presented in the profile, there is practically no time spent in itself (self us/call) but as it calls func2, most of its time is spent there. Hence, the total us/call field shows 50.13 milliseconds — the same as that of func2.

In func2, the self us/call and total us/call field are the same — the time is spent in doing self operations.

There is practically no time spent in the filegen function according to Gprof, as the md5sum is a system call, which is not presented in Gprof.

So, using this, we can clearly look for bottlenecks present in the code consuming CPU time.

Call-graph profile generation

To generate a call-graph profile of the functions used in the program, use gprof with the q switch:

gprof -q a.out gmon.out

Call graph (explanation follows)…

granularity: each sample hit covers 4 byte(s) for 1.01% of 0.99 seconds

index % time self children called name
                                                 <spontaneous>
[1]    100.0 0.00 0.99 main [1]
                0.97 0.00 9601/9601 func3 [2]
                0.00 0.02 399/399 func1 [3]
-----------------------------------------------
                0.97 0.00 9601/9601 main [1]
[2]     98.0 0.97 0.00 9601 func3 [2]
-----------------------------------------------
                0.00 0.02 399/399 main [1]
[3]      2.0 0.00 0.02 399 func1 [3]
                0.02 0.00 399/399 func2 [4]
-----------------------------------------------
                0.02 0.00 399/399 func1 [3]
[4]      2.0 0.02 0.00 399 func2 [4]
                0.00 0.00 1197/1197 filegen [5]
-----------------------------------------------
                0.00 0.00 1197/1197 func2 [4]
[5]      0.0 0.00 0.00 1197 filegen [5]
-----------------------------------------------

<Output Snipped>

Indications from call-graph are:

The number after the name indicates the index numbers. So we can see that main spends 0.99 seconds in the children waiting for func3 to complete, which is in the children field. This is substantiated by the 0.97 seconds spent per call in the self field of func3. This is again followed up by func1.
Every entry has the function name in the middle of it with the same index number; i.e., main with value 1 in index 1 field. So you can easily see the cycle for it, in Index 1 field kernel operations calling main, then main calling func3 and so on. The same follows for the rest of the index.
In Index 3, func1 is called by main with 399 calls; spent 0.02 time in children process (this is substantiated by 0.02 in func2 call). func3 is directly executed from main.
The middle line in each section is referenced by a unique index number. Above the line indicates originating function for the function given in index number and below the line indicates the process call forwarded to the next function. For example, in index 3 section, the call originated from main() function to func1() and then to func2(). Here func1() has index number 3.
<spontaneous> indicates that the parent function cannot be determined, generally due to system-related APIs.
A cumulative time recording implies that 98 per cent of the total time was spent in performing func3 operations, as per index 2.

Function calls substantiate that the filegen() function is called thrice for a single call of func2(). func1() and func2() are called for the same time; the counter values present in the code prove this. Counters are displayed after successful execution of the code. func1() and func2() functions comprise the same timings for calculations. func3 is called for 9601 times. The call-graph gives the call-tree data.

Source code annotations in profiling

To generate annotations in the source code presented, the source code has to be included in the executable while compiling code, using the g switch:

gcc gprof_test_code_svp.c –g -pg -lc

Now run the code normally, and use the source code highlighting option -A in Gprof to have annotations
marked in the source code. This gives line-by-line profiling, which is easy to understand.

gprof a.out gmon.out -A
*** File /root/gprof_test_code_svp.c:
                /*
                  Name :Subodh Pachghare 
                  Author :I am the author!!!
                  Description :Function calling test program for gprof
                */
                #include <stdio.h>
                int counter1=0;            /*Global Variable*/
                int counter2=0;            /*Global Variable*/
                int counter3=0;            /*Global Variable*/
                int main()
       ##### -> {
                int p=0;
                system("> /root/hashes.txt");     /*File will be overwritten or Created*/
                system("> /root/tempc.txt");      /*File will be overwritten or Created*/
                printf("Initial value in Main function= %d\n",p);
                while(p<10000)        /*Condition for while loop*/
                    {
                        if(p>9600 && p<10000)
                            {
                            func1();
                            }
                          else
                            {
                            func3();
                            }
                        p++;            
                    }
                printf("Last value in Main function (value(p) incremented for) = %d\n",p);
                printf("Func1(),Func2(),Func3() & filegen() called for %d %d %d %d times\n",counter1,counter1,counter3,counter2);
                }
                func1()
         399 -> {
                int j=0;
                while(j<=66667)            /*Another operation to eat CPU cycles*/
                    j++;
                if(j==66668)                 
                    {
                    counter1++;
                    }
                func2();            /*Call to Fucntion func2()*/
                }
                func2()
         399 -> {
                int m=0;
                for (m=1;m<1000000;m++)
                    {
                    if(m<4)
                        {
                            filegen();    /*Call to Function filegen()*/
                            counter2++;
                        }
                    else
                        {
                            return 0;     /*return to the func1()*/
                        }
                    }
                }
                filegen()
        1197 -> {
                FILE *fo,*fg;
                int f=0;
                fo=fopen("/root/tempc.txt","a");
                while(f<4000)
                    {
                        fprintf(fo,"%d\n",f);
                        f++;
                    }
                fclose(fo);
                system("md5sum /root/tempc.txt >> /root/hashes.txt");/*Unique MD5 hash generation*/
                }
                func3()
        9601 -> {
                int l=0,b=0;
                while(l<=50000)
                    {
                        b=l;
                        l++;
                    }
                if(l==50001)
                    {
                    counter3++;
                    }
                }


Top 10 Lines:

     Line Count

       72 9601
       59 1197
       32 399
       43 399

Execution Summary:

        5 Executable lines in this file
        5 Lines executed
   100.00 Percent of the file executed

    11596 Total number of line executions
  2319.20 Average executions per line

The timing presented is by user-space; so the system calls generally spend far less time in user-space as compared to kernel-space. That’s why we see the 0.00 in the Time collection of Gprof. From the SAR (system activity report) snapshot attached, it can be seen that system space is using around 94 per cent of the CPU processing, as compared to 5-6 per cent of the user space.

That’s it, guys; happy coding! I will be back with some more cool stuff.

References

“Gprof: a Call Graph Execution Profiler” by Susan L Graham, Peter B Kessler, Marshall K McKusick [PDF]

Feature image courtesy: Tambako the Jaguar. Reused under the terms of CC-BY-ND 2.0 License.

2 COMMENTS

DrGspot Perenti January 9, 2012 At 11:05 AM

ha ha …googleTranslate said that Vlatko is “Wiryana”..rULeZ :-D
***********
At Friday, 12.09.2011. Hall 1 12:00 to 14:00 hours, Wiryana, an IBM security expert in the EMEA region will be part of the course Information Systems Security held a lecture on “Secrets of the source code.”

Summary:

Although the popular penetration testing, reviewing izvornogkoda (<-I src :)) is a different way of finding vulnerabilities, and these two methods complement well. The lecture will introduce you to the basics of the security review of source code, and it will be convenient to show the analysis of open source software.

CV:

Wiryana employed in the department of IBM ISS in position for the information security expert for Europe, Middle East and Africa (EMEA), where help customers achieve the target level of safety. Vlatko is specialized in security and certified by the “ethicalhacking” or penetration tests, continuity of operations (business continuity), increasing levels of information security systems, and the development and monitoring of security according to international standards including ISO / IEC 27001 and PCI DSS. Certifications include PCI QSA, CISSP, CISA, C | EH, LPIC-3, etc. Likewise, Wiryana has written many IBM Redbook titles with topics of Linux, free software, and security. Vlatko also actively develop and contribute software free code. Thus, for example, and Snort, Nessus, nmap, w3af, Metasploit and OpenVAS contain portions of code that are a work of his hands.

http://www.foi.unizg.hr/vijesti/Pozvano-predavanje-Vlatko-Kosturjak

Thilak December 10, 2012 At 4:11 AM

Nice one !Thanks

Environment and prerequisites

The code

Reporting CPU activity

Flat-profile generation

Call-graph profile generation

Source code annotations in profiling

References

2 COMMENTS

LEAVE A REPLY Cancel reply

Thought Leaders

HOW TOs

MOST POPULAR

Open Journey

EDITOR PICKS

POPULAR POSTS

POPULAR CATEGORY