“Using open source means you’re hiring the whole world as your support system”

0
44

Suman Debnath, Principal Machine Learning Advocate at AWS has transitioned from having zero knowledge about open source to becoming an assertive community leader in this space. He has delivered over 100 international conference keynotes, driving engagement across tech communities, and attributes this success to his community members. This expansive conversation with OSFY’s Yashasvini Razdan on Suman’s journey in this domain is a testament to the power of open source.

A newly minted electronics and telecommunications engineering graduate, Suman’s first encounter with open source, like many others, was driven by necessity. As a performance consultant for Hitachi Data Systems in 2008, while working on benchmark raw/application performance testing on HDS’ (now Hitachi Vantara) modular storage systems, Suman was searching for a tool that would help him with benchmarking, when he came across the flexible I/O tester or FIO. “I used it for two weeks but ran into some issues and needed to learn a few more configurations. I was still unaware of the power of the open source community at that time,” he says, recalling the inexperience of his youth with a smile on his face.

When Suman searched for a solution he realised that multiple people had similar queries, and many others were resolving them. “That was when I realised what open source meant. I could request feature updates, and experts on GitHub sat down to act on those requests! I was amazed,” he exclaims.

The gateway to open source

From that point on, Suman entered a whole new world of GitHub and began contributing to projects along with 300 other contributors. He went on to work with and contribute to the documentation of ezFIO, a user-friendly graphical interface or wrapper designed to simplify the usage of FIO.

He didn’t stop at benchmarking tools. The world of open source had opened a new gateway. “In 2015, there was a requirement to create a test bed where I had to set up more than 50 VMs and configure them in a certain way. I came across this tool, Ansible, which allowed me to automate and update software across all the machines. I learned that within 10-12 days! I had to share my feat with the world,” he remarks.

At the AWS Gen AI Loft in San Francisco in 2024
At the AWS Gen AI Loft in San Francisco in 2024

With experience in documentation and significant contributions to the tools he had been using, Suman took his findings to the PyCon conference in Delhi and delivered a presentation on infrastructure automation for testing using Ansible and Vagrant. “Before that, whenever I attended a conference, it was always as a representative of a company. This was the first time I was representing myself and my work,” he recalls proudly.

The conference connected Suman with many people from open source domains beyond storage and systems. “I was in Bengaluru engaging with various user groups like the Python community, AWS User Group, and the Docker community. I got to work on numerous open source projects, gaining in-depth learning, with source code and documentation readily available. Gradually, I began participating in more conferences, such as that of the Storage Networking Industry Association (SNIA),” he reveals.

The appeal of open source did not stop at community engagement. It was the learning. “Open source allows you to learn whatever you want, even if it’s not your core area. You ask questions, and engineers from all over the world can answer them. If you use anything proprietary, you’re often bound by SLAs, and you may not even be eligible to raise a request without an enterprise licence; with open source, I have the whole world who can contribute to my idea or product,” he states.

His involvement with open source led him to join AWS as a principal developer advocate for India to create and share content with the community. “When I joined AWS, there were roughly 12 to 15 user groups. In two to three years, we scaled that number to more than 25,” adds Suman.

Speaking about his community engagement, Suman reveals that since he joined AWS, the instances of speaking at tech conferences have grown manifold. Be it the Open Data Science Conference, PyCon, or even the Python Software Foundation, Suman has been completely immersed in educating the community about his work in open source, which lately revolves around Python, cloud, and machine learning.

“Today, I’m working on fine-tuning large language models (LLMs) and retrieval-augmented generation (RAG). We use many open source tools, such as LlamaIndex, LangChain, and pgvector (an extension of PostgreSQL). At this point, my whole work revolves around open source tools and other publicly available models,” he reveals.

Advocacy and engagement

With his work around machine learning, Suman advocates the importance of responsible AI. “We need to understand that while working with machine learning, especially with any tool in the generative AI space, we should be responsible and not exploit open source or mix it with anything that has licensing complexities. For this, good education and awareness are crucial,” he advises.

The challenge does not stop at responsible and ethical use of AI in open source. Having worked with the open source community across the globe, Suman believes that sustainability is one of the biggest challenges. “We need to ensure that what we build solves a critical problem. If it is a problem that has already been solved but in a different way, it may not be sustainable, even if you get funding,” he shares.

PyData Dublin 2024
PyData Dublin 2024

Educating and raising awareness about open source on a global scale is Suman’s enduring commitment. “Forums such as NumFocus and the Python Software Foundation contribute to major open source projects around data science and Python in general. Funding and sponsorship for such projects should come from academia, industry, and individuals, in whatever capacity,” he stresses.

The community leader highlights the need to mentor and educate engineers about open source, right from the start. “Had I known about open source and community contributions in college, I would be a different person today. I learned it during my mid-journey, but the earlier you start, the less you have to struggle as you grow,” he shares.

A tale of contributors and contributing

Suman stresses the importance of giving back to the community. “Using open source means you’re hiring the whole world as your support system. Initially, this is a consumer mindset, but once you’re in that system, you will automatically give back. Many organisations contribute back in terms of monetary support or training others. It’s inspiring to see people across the world come together to build something impactful,” he shares.

Suman believes there are many ways of contributing to the community. “People without coding backgrounds, who just use a particular open source package, can always contribute to documentation. Attending community meetups and sharing knowledge is also an indirect way of contributing to the open source world,” he says.

Another interesting way of contributing, and one that Suman is currently involved in, is creating code samples for different books. “Currently, I’m helping build the code samples for a book called ‘Building Large Language Models from Scratch’ by Sebastian Raschka, a renowned professor in machine learning and applied sciences and a researcher at Lightning AI. The author has written the code samples, and others can raise PR requests suggesting additions or modifications. These could be documentation changes or actual code contributions,” he elucidates.

Suman shares that there are many books about machine learning and the community, both in the tech and non-tech fields. A few of his favourites include ‘Probabilistic Machine Learning’ by Kevin Murphy, which he calls the bible for machine learning, and ‘Understanding Deep Learning’ by Simon J.D. Prince. Among non-technical books, he suggests ‘Community in a Box’, authored by his colleague Mark Birch. “It’s about how to build an event-driven professional community, and it’s an insightful book,” he says.

Open Source India Conference 2022
Open Source India Conference 2022

Along with the suggestions, Suman shares a warning for all learners. “We live in a world of reels and short videos, but these kinds of subjects cannot be learned in 30 or 40 seconds. You have to spend time and have patience to learn about these topics,” he cautions.

With such a vast community, Suman has a list of leaders he looks up to, the foremost being Neependra Khare (Docker captain), Ajeet Singh Raina (Docker captain), and Vivek Raja (an AWS hero). “Vivek created the AWS user group in Madurai, building everything from the ground up,” he says, sharing his association with Vivek. “I met him through the community at one of the user group meetups. Vivek was contributing to various meetups and groups, and ended up joining the startup he was contributing to, along with his other contributions to the open source community. He was hosting events and speaking at tech conferences. Today, he’s the vice president of products at that company, working in the healthcare and neuroscience domain, and he’s also an AWS hero. He’s well-known globally in the fields of machine learning and community work,” narrates Suman, highlighting the power of open source, the community, and their role in career growth.

For Suman, open source doesn’t come with a minimum eligibility criterion. “The minute you take the first step and contribute to the ecosystem, you become a community member. Once you are in this kind of ecosystem, you end up elevating your career and your life by sharing and receiving knowledge,” he concludes.

LEAVE A REPLY

Please enter your comment!
Please enter your name here