Aadhaar: A Testimony to Success of FOSS in India!

An identity for the junt

It is heartening to see how FOSS is powering one of India’s most prestigious and mammoth e-governance initiatives — the Aadhaar project from the Unique Identification Authority of India.

For those of you who need an introduction to the Aadhaar project, it is a 12-digit individual identification number issued by the Unique Identification Authority of India (UIDAI) on behalf of the Government of India. This number will serve as a proof of identity and address, anywhere in India.

Each Aadhaar number will be unique to an individual and will remain valid for life. By providing a clear proof of identity, Aadhaar is targeted at empowering the poor and underprivileged residents of the country to access services such as the formal banking system, and will give them the opportunity to easily avail various other services provided by the government and the private sector.

More than 6 crore UIDs have been issued till date, in an 18 month period. The UIDAI systems were recently scaled to process up to 1 million UID applications per day. Enrolments are actively happening in many Indian states. (Refer to the official UIDAI website for daily updates.)

The project is historic since it is one of the largest e-governance initiatives in India till date and the scale is unprecedented. What makes it all the more interesting is the fact that the project is driven extensively by the FOSS stack.

Where FOSS is the first and foremost choice

The UIDAI technology team had indicated its preference for open standards and open source, says Regunath Balasubramanian, principal architect, the UIDAI project. “We were prudent in the very early stages of the project. That has continued to be the project’s guiding philosophy. Initially, only the enrolment client application had a dependency on device drivers of a certain operating system, and that too was for just a short while,” he reveals.

Balasubramanian explains why open source software (OSS) became the first choice for the project: “The primary technical requirements of the project were of scale and vendor neutrality at all levels. FOSS helped us achieve vendor-neutrality in many of our application components, which is very important for an initiative of national importance. The use of open standards has encouraged multi-vendor participation. This has driven costs down and improved the quality.

“The per-CPU license costs that are normally associated with proprietary software have been largely eliminated. The benefits become more pronounced as the deployment footprint increases over thousands of CPU cores across multiple data centres.

“Access to the source code of the FOSS solutions has also helped us use them more effectively as we got a pretty good understanding of how it works, internally. The use of FOSS has also helped in its growing adoption by registrars and enrolment agencies, some of whom have strict guidelines around the use of proprietary software.”

The strategy of adopting OSS solutions that are mainstream, have a strong community backing, and are actively developed and enhanced, works well most of the time, affirms Balasubramanian.

But have there been any pitfalls during FOSS adoption? To this question, he replies: “The challenges to using OSS are that of perception, the willingness of developers to explore and the availability of support for all the solutions available out there.”

FOSS tools that are powering the UIDAI project

For the Aadhaar project, the UIDAI mainly uses FOSS solutions, a list of which is given below.

  • RabbitMQ: This is an AMQP (advanced message queuing protocol) standards-based messaging solution for many of the project’s asynchronous processing needs, with integration and high scalability via the SEDA (staged event-driven architecture) implemented in Mule.
  • Mule: This is a light-weight object broker and SEDA runtime instance used also in orchestrating various stages in batch processing.
  • The Hadoop stack: HDFS (Hadoop distributed file system) is used to provide high data read/write throughput in the order of many terabytes per day. Distributed architecture enables scale out as needed.
  • Hive is used for building the UIDAI data warehouse, HBase  for indexed lookup of records across millions of rows, Zookeeper as a distributed coordination service for server instances, and Pig as an ETL (extract, transform and load) solution for loading data into Hive.
  • Pentaho is used for report generation and OLAP (on-line analytical processing) analysis. It has good alignment with the rest of the UIDAI stack.
  • Apache Tomcat is used as a Web container wherever HTTP access of dynamic data is needed.
  • MongoDB: The distributed document oriented database provides schema flexibility, low latency data reads, and auto-sharding of content.
  • Apache Solr is used for fast search index of full text data.
  • MySQL is the RDBMS store for all relational data that requires ACID (atomicity, consistency, isolation and durability) properties, which is heavily sharded for scaling.

Apart from this, the project makes use of the Spring framework, an application container and custom-built server-like runtime for configuration, administration, call interception, monitoring and metrics collection. “We also use many of its sub-projects like Spring security, Spring batch, Spring MVC (model-view-controller), etc.,” reveals Balasubramanian.

The biometric devices and scanners used for enrolment are procured from the market by enrolment agencies. Initially, most vendors supported device drivers only for Windows. Now, drivers are available for other operating systems like Linux. The enrolment client software works flawlessly with these drivers on non-Windows operating systems as well, Balasubramanian affirms.

Since the data gathered for the UIDAI project is sensitive and is an access point for many different benefits, it also requires backups in case of crashes or physical corruption of data. So how does the UIDAI team ensure data security and disaster management? Balasubramanian divulges more details: “The UIDAI has a comprehensive security policy to ensure the safety and integrity of its data. Data in transit is encrypted to protect against tampering. The Aadhaar datastores are secured both physically and electronically, and access is given only to a few select individuals. Data is secured with the best encryption and in a highly secure data vault. All access details are logged. There are strict data retention guidelines that are adhered to, including rules related to redundant copies and multi-data centre deployment. We use FOSS distributed file systems (with data replication) for realising high throughput data read-writes within the data centre.”

The FOSS advantages
From his own experience with the open source software stack, Balasubramanian enumerates the key benefits that the use and adoption of OSS can render to developers, software product development teams and organisations, respectively.

“For developers: OSS encourages getting your hands dirty and promotes the geek in you. OSS users are therefore better off than their proprietary software counterparts in their computer science skills. Access to the source of well-written frameworks enhances one’s own design and programming skills. The can-do attitude of OSS users and contributors makes them better suited to solve complex design and programming issues.

“For software product teams: Many OSS frameworks have liberal licensing and distribution terms. This allows product teams to embed them into their proprietary solutions. Product teams that open source their solutions can still benefit from business models of support, multiple distributions, training, etc.

“Organisations can also save on licence costs by using low-risk, well-written tools like source code editors and such. Organisations that have a strong engineering culture and development teams to support their IT solutions are better suited to adopt OSS. Such organisations typically have OSS solutions in their roadmap and influence business units to build solutions on it. In general, overall costs can be low when adopting OSS that is stable and has an active user community. Finding skilled resources for mainstream OSS solutions is not difficult or expensive.”

A few ‘open’ practices

Balasubramanian has been with the Aadhaar project from the time the work to build the core application was awarded to MindTree. He has since led the architecture, design and development efforts on the project. His team has built the enrolment client, enrolment and authentication servers, the portals and the business intelligence platform for the project.

He shares a few best practices that have helped him and his team in their journey with the project so far: “Some of the best practices that we follow have been driven in part by the chief architect of the project, Dr Pramod Varma. For example, we follow agile development so that feature releases can happen more often and feedback can be incorporated. There is a strong design focus where no part of the system is stereotyped to a certain product or approach. Fresh thinking is encouraged, and we try to keep our design principles simple and follow patterns whenever prudent.”

An ‘open’ platform for future e-governance projects

The Aadhaar project is consuming a lot of FOSS. The general interest in the project and in nation building has invited contributions and support from volunteers, user communities and technology teams, in general. “This collective wisdom has helped in feature design, technology selection and making course corrections, especially when solutions don’t work as required,” reveals Balasubramanian.

But is the project team also contributing back to the community? Balasubramanian affirms: “As far as giving back to the community goes, the UIDAI team interacts with user and developer communities of certain solutions that are being used in the project, the more significant being RabbitMQ, Hadoop and MongoDB. The developer community of RabbitMQ has been particularly helpful in our early days of adoption and, more recently, it has been the MongoDB community.”

Apart from this, the e-governance platform that the team has built is intended for use by other e-governance initiatives in India, informs Balasubramanian. “I would recommend the use of FOSS and, in fact, the reuse of the e-governance platform that we have built for the UIDAI project. We have written a few technology components as part of the platform that can integrate with existing frameworks. These may be open sourced when the time is appropriate and sustenance is taken care of,” he adds.

Looking at this success story and considering how well FOSS has worked for the Aadhaar project so far, we are sure many other projects and initiatives will be inspired to leverage FOSS and enjoy many of its benefits.

Aadhaar at Open Source India 2011
Regunath Balasubramanian, principal architect, the UIDAI project, an eminent speaker and guest of honour at the recently concluded Open Source India 2011 convention, shared with the audiences the complexities of a project as huge as Aadhaar and explained in detail the advantages that the OSS stack has rendered to the project.
  • sandeep

    And i have some queries regarding some article(http://www.opensourceforu.com/2011/12/aadhaar-testimony-to-foss-success-in-india/) in LFY which contains the following content FOSS tools that are powering the UIDAI project The Hadoop stack: HDFS (Hadoop distributed file system) is used to provide high data read/write throughput in the order of many terabytes per day. Distributed architecture enables scale out as needed.Hive is used for building the UIDAI data warehouse, HBase for indexed lookup of records across millions of rows, Zookeeper as a distributed coordination service for server instances, and Pig as an ETL (extract, transform and load) solution for loading data into Hive.Pentaho is used for report generation and OLAP (on-line analytical processing) analysis. It has good alignment with the rest of the UIDAI stack.Apache Tomcat is used as a Web container wherever HTTP access of dynamic data is needed.MongoDB: The distributed document oriented database provides schema flexibility, low latency data reads, and auto-sharding of content.Apache Solr is used for fast search index of full text data.MySQL is the RDBMS store for all relational data that requires ACID (atomicity, consistency, isolation and durability) properties, which is heavily sharded for scaling.Queries are 1)Is data from MySQL is synced again in to Hadoop Hive by Hadoop Pig? 2)Why is Pentaho used in addition to Hadoop which seems can do similar work? 3)What does MongoDB can do that Hadoop/MySQL here can’t do? 4)HBase is known as a database and here it is said that it is used for indexed lookup. Why? 5)Why so many databases?

    • sandeep

      After formating the above comment

      Queries are regarding this article under the heading

      FOSS tools that are powering the UIDAI project

      Queries are

      1)Is data from MySQL is synced again in to Hadoop Hive by Hadoop Pig?

      2)Why is Pentaho used in addition to Hadoop which seems can do similar work?

      3)What does MongoDB can do that Hadoop/MySQL here can’t do?

      4)HBase is known as a database and here it is said that it is used for indexed lookup. Why? 5)Why so many databases?

  • al3xandru

    @yoursandeep not sure I understand what’s with that article…

    • yoursandeep

      @al3xandru Please look the comment by me under that article .I request you to reply to that. Thanks…

    • yoursandeep

      @al3xandru please reply to the comment under that article

All published articles are released under Creative Commons Attribution-NonCommercial 3.0 Unported License, unless otherwise noted.
Open Source For You is powered by WordPress, which gladly sits on top of a CentOS-based LEMP stack.

Creative Commons License.