Software automation testing using Sikuli


Sikuli is a scripting language that can carry out automated software testing of graphical user interfaces (GUI) using screenshot images of the software under test.

Testing of any software project is as important as its development, and is done to check or validate different aspects like functional testing, security testing and database testing. The testing process can be manual or automated. Manual testing is performed by a person sitting in front of a computer, carefully executing tedious and time-consuming tests. Testing can also be done by using suitable automation tools, which makes it more reliable and faster. Whether one opts for manual or automated testing depends on various factors like the project’s requirements, the budget, timelines, the expertise available and suitability.

A few major reasons why one should opt for automation testing are listed below:

  • Manual testing is very time consuming when it comes to overall flow testing and covering all scenarios. Regression testing also becomes a very tedious task when done manually as it needs repetition of the same actions/steps.
  • Manual testing becomes physically tiring.
  • Manual testing is also less thorough than automation.
  • To err is human; so manual testing is more error prone.

Automation testing may be useful in some cases but, sometimes, it may be too high-tech and could wind up costing you way more than it’s worth. So it becomes very important to choose the correct automation tool for your project. The open source tools available include Selenium, Robotium, Autoit, Sahi and Sikuli, while some others are licensed like HP Unified Functional Tool (UFT) and Tosca, for which one has to pay to use. Choosing open source tools can reduce the project cost; however, paid tools have many more features, are less time consuming and have wonderful support teams. So depending upon your project’s requirement, you may opt for any automation tool to enhance your testing scope and speed.

Let’s now discuss an emerging automation tool called Sikuli.

Introducing Sikuli

Sikuli is an open source automation tool that uses image recognition to identify and control GUI components. It can be integrated with the Selenium Web driver to automate Flash content and Java applets.

According to the official site, in the Huichol Native American language, Sikuli refers to God’s Eye, implying the power to see and understand things unknown. It is basically a software framework licensed under MIT2.0 and is cross-platform. It was started in mid-2009 as an open source project by Tom Yeh and Tsung-Hsiang at the User Interface Design Group in MIT, USA. Both developers worked with Sikuli till Sikuli-X-1.0rc3 in 2012. Then Raimund Hocke (aka RaiMan) took over development support for Sikuli and maintained it. He developed it further as the SikuliX (where X denotes eXperimental) package together with the open source community, and continues to maintain it with its help.

Sikuli basically automates anything you see on the screen of your desktop. It uses image recognition to identify and control GUI components. It comes up with basic text recognition OCR powered by Tesseract, which can be used to search for text in images.

So we can say that using Sikuli is WYSIWYS or What You See Is What You Script.

Sikuli can be used to automate testing through screenshots using Power Point slides while code lovers can use scripts in IDE to enhance its functionality. This framework is very useful in many scenarios like the following:

  • It is best for use on Flash applications. For using the Selenium Web driver, we need the source code to develop the API. For example, if we need to automate the validation of Adobe Photoshop (whether an image got opened or not), then Sikuli can be very useful without using any API.
  • It can be very useful in some scenarios where applications have a very complex source code yet very simple visualisation. So without going into the source code or some Xpath, we can automate and test the functionality of that application.
  • It is very useful in cases in which the application code gets changed frequently but GUI components remain the same. In such cases, the functionality of an application can be validated using Sikuli.
  • Sikuli can be very useful for game testers as well. Without using an API they can do some sort of testing on it. The system requirements for Sikuli are:
  1. Windows XP and later, including Windows 8 and 10 (32-bit and 64-bit).
  2.  Linux/UNIX systems, depending on what prerequisites are available (32-bit or 64-bit).
  3. Mac OSX 10.5 and later (64-bit only).

Installation of Sikuli on Windows

The path for downloading and setting up Sikuli is

Download the Sikuli setup.jar file from this link. Once the download is complete, click on the sikulixsetup-1.1.1.jar executable file and follow the instructions to install the Sikuli IDE.

Now, to run the Sikuli IDE, open command prompt, go to the path where you have just installed Sikuli and run runsikulix.cmd. This will open Sikulix IDE homepage.

Sikuli can be divided into two parts:

  1. Integrated Development Environment (IDE): This is used to make scripts by taking screenshots.
  2. API/Sikuli script: This part is used for GUI interaction of Jython and the Java library with keyboard or mouse events.

Both these components are part of SikuliX.

Some basic features of Sikuli

Let us go through some basic functions of Sikuli.

Type (): The type command is a very basic command, which we can use to enter input or text:

type (“This is Sample text example of type command”)

The type command can also be used with a focused image, as while scripting we can focus on a particular area of application; then during execution, the type () command will search that region first and type there. We can also use a modifier (as an option) with the type command to provide modifier keys as shown in the example below:

Type (“text”, KeyModifier.ALT)

wait () and waitVanish () method: Both methods are used to slow down the script to wait for something or to make something vanish. They take an optional duration parameter, which can be a number of seconds, or the global parameter FOREVER, which will wait until something happens.

Find () and findAll (): These two are other common operations in Sikuli to search for things and interact with them. They are used when operating on a bunch of similar items on the screen. We can use some variable r to store the region as shown below:

r = find ( )

And later we can use that variable to call wait (), click (), type (), and other functions so that it will restrict the search area and, hence, will help in speeding up the script. Selecting a region and assigning it to a variable also helps when there are multiple similar items on screen and we want to deal with a particular one at a time. For example: ()

Highlight (): This is another basic command used to draw a box around a particular region.

Flow control technique in Sikuli

Sikuli uses some sort of control structure like a FOR loop with a combination of the findAll () function. A sample coding of the FOR loop is given below:


Below_options= find (image1.png)

Checkboxes = below_options.findAll (image.png)

For checkbox in checkboxes:



Similarly, if and while flow control mechanisms can also be used, which allows you to do some more complex interactions through scripting.

Use of Python in scripting Sikuli

To enhance the scripting capabilities, we can access the entire Python language. As an example, let’s suppose we run our Sikuli script unattended, and it has failed. Then by using the captureto ( ) method we can save the images that can be used to debug script failure.

Accessing Java from Sikuli

To provide some kind of on-screen display, we can use Java classes and, in this way, give a GUI representation to our script. Sikuli starts by importing the Swing classes, which are some of Java’s GUI libraries, and then uses Swing to show a borderless window with the designated text over everything on the screen for the specified number of seconds (the default is 1 second).

Start your first script with Sikuli

To begin with, let’s build a very basic and simple ‘Hello world’ example of a testing script for an application WordPad to understand what a simple script looks like. Figure 1 shows how our complete script will look.

First of all, start Sikuli, select the editor and write the following line of code:

App =”C:\Program Files\Windows NT\Accessories\wordpad.exe”)

This will start the WordPad application using Sikuli.

Note: The character r in this code is to use space in the application path, if any.

Type ‘wait’ in the editor, and then click on the Take Screenshot button before selecting the area of screen that you would like Sikuli to wait for the text to appear.

Wait (screenshot1.png)

Now add input text to the WordPad application using type(), as follows:

type (“Hello, this is my first Sikuli Code!”)

You can check whether the text that you entered appears as expected or not, using the wait() command:

Wait (Hello, this is my first Sikuli Code!)

Now save and run the script. It will open the WordPad, and write input text to it. In this way, you can start coding and further customise it according to your requirement.

A trick you can try out

If we get two similar buttons like the Save button on the same screen and the script is unable to distinguish between the two prior to selecting which one is to be clicked, use a larger portion of the screen in scripting. This will help Sikuli to get a better understanding of the buttons.

Advantages and disadvantages of Sikuli


  • Sikuli is an open source tool, so it is better than tools like UFT.
  • It works very well with Flash objects.
  • It is very handy in automation when working with Web elements that have dynamic Xpaths and IDs.
  • It has multiple scripting and programming language support like Scala, JRuby and Jpython.
  • It is very good with boundary value analysis testing.
  • With proper and smart use of scripting, it is easy to identify application crashes and bugs.
  • The Sikuli set-up and use is very simple and easy.
  • Automation testing of mobile applications can also be done with the help of emulators.
  • Its integration with Selenium makes it worth using. It can solve the browser dialogue box handling problems of Selenium.
  • It can read texts on images with the help of its basic text recognition OCR.
  • It supports almost every platform including Windows XP+, most Linux flavours and Mac OS 10.5+.
  • Sikuli is very useful in functional testing where input and output are predefined; so it can be used efficiently for testing the overall behaviour of applications.


  • The script cannot be run on the back-end as it needs a visible application GUI during the time it is being executed.
  • It is platform and resolution dependent.
  • Running multiple scripts automatically, one by one, is very tricky in Sikuli.
  • A slight change in the text label or image of the GUI of the application can result in the failure of the script.
  • Maintenance of scripting is very hard if the GUI of the apps changes frequently.


Please enter your comment!
Please enter your name here