How to Manage Cloud Costs

0
996
cloud cost

Cloud cost management, also defined as cloud cost optimisation, is the organisational planning that allows an enterprise to understand and manage the costs and needs associated with its cloud technology. It entails determining the most cost-effective ways to maximise usage of the cloud at the lowest possible cost. This involves managing memory, storage, network traffic, instances, and a variety of other costs.

For effective cost control in cloud computing services, it is quite important to analyse and manage cloud cost and leverage cloud cost management tools to help discover the cause(s) of any inefficiencies. Unplanned cloud costs are frequently the result of lack of visibility about the current consumption patterns and past trends, non-standard deployments that come from unclear or absent development processes, poor organisation, or the absence of automated deployment and configuration tools. By contrast with on-premise infrastructure, which is financed by fixed upfront investments, cloud consumption is an everyday operational expense. This requires a huge shift in the approach to operational management, where optimising cloud cost is as important as optimising performance.

Hierarchical teams involved in cloud cost management
Figure 1: Hierarchical teams involved in cloud cost management

There is absolutely no shortcut when it comes to managing cloud costs — it requires proper planning, getting the basics right, and involving the staff to ensure that they get it right too.

Why do we need cloud cost management?
More often than not, the cloud follows a decentralised management approach. This means that the cost visibility is less and we find it difficult to understand the costs. Cloud cost includes a lot of components, all of which need separate handling. Cloud waste is enormous and is growing prodigiously. There are several reasons due to which cloud waste occurs, and most of it is due to poor management. If you don’t want cloud spending to take a toll on your health, it is time to put proper strategies in place.

Life cycle of cloud cost management
It is important to comprehend cloud cost management to understand the operating expenditure in cloud spend, which includes infrastructure costs like the server, network and storage. FinOps is the term used for cloud cost management, and it not only deals with operational costs but also the cloud culture and the governance of the cost management service between the business units of the organisations and the team that manages the cloud control tower setup.

Typically, FinOps can be carried out using a customised framework through cloud service provider (CSP) services like resource tagging. FinOps is considered a discipline and cost analytics solution, which can calculate the cost incurred on cloud services. It helps to plan budgets and forecast cloud consumption spending requirements.

FinOps works on three major principles.

Inform: Track and manage costs through comprehensive reports including data filtering for internal consumption share.

Optimise: Prepare budgetary advice to derive optimised options for effective cost management.

Operate: Provide a governance model for managing expenses in cloud services and infrastructure including financial accountability, real-time monitoring and operational improvement.

FinOps works on fundamental services like cost analysis, real-time decision making in cost optimisation, and resource planning based on historical usage.

Cloud cost management (FinOps) life cycle
Figure 2: Cloud cost management (FinOps) life cycle

There are various tools available for tracking cost management like Apptio, Cloud Easier, Iota and Cloudsoft.io. These help to prepare the framework and governance structure around cost management for public clouds like AWS, Azure or GCP as well as private cloud environments.

Cost optimisation, high availability, security and scalability are the key expectations or technical drivers for any enterprise to take the cloud route. With the application estate and the infrastructure costs getting higher and higher, it is always a complex task to decide how to optimise cloud consumption and cost. Cloud service providers like Azure, AWS or GCP provide advisory services, which helps to optimise their services and usage. This may be useful in the case of native services but when it comes to the IaaS or PaaS models, we may need something more cutting-edge than that.

How to manage cloud cost
We need to focus on the following points to ensure that cloud cost management is more effective.

  • Understand pricing models: Each cloud vendor has a different pricing model for different services. If you are a public cloud user, be sure to look into the fine print on what the pricing model entails.
  • Identify key cloud cost contributors: Based on your cloud spend, identify services that are key contributors to your monthly spend, and focus on them with a top-to-bottom approach.
  • Analyse workload usage patterns: Detect usage trends for each of the services utilised on the public cloud, paying special attention to the key cloud cost contributors. Then, collect metrics like CPU, memory, disk/storage, API, etc.
  • Identify unused resources: Since every cloud service is available at the click of a button, each one can be provisioned and completely forgotten, leading to orphaned or unused resources that cost thousands of dollars.
  • Excavate archaic data: Along with the key cloud cost contributors, it’s also important to analyse services that have a higher chance of storing purposeless archaic data.
  • Scale down underutilised resources: Based on your usage patterns, slowly scale down resources to the next smallest size, ensuring the size-down doesn’t impact application performance. Continue this exercise until your workload runs at its desired level.
  • Upgrade infrastructure with the latest resources: Public cloud providers keep optimising their services to ensure security, performance and cost are competitive. So, it’s wise to also optimise your infrastructure with those upgrades to stay on top of the game.
  • Leverage on-demand services: Plan to leverage services on-demand for workloads that don’t run in-production. Since development teams can’t work 24/7, it’s ideal to automate your start/stop workloads.
  • Consolidate accounts: When running services from public cloud providers on multiple accounts, try to consolidate your accounts—most providers have a volume-based discounting pricing model that can help you optimise cost.
  • Baseline your infrastructure: While going through the analyse and optimise exercises, it’s possible to reach the baseline of your infrastructure, which you need in order to run any workload. There might also be times when your infrastructure needs to be scaled out; so keep those needs in mind.
  • Separate the workloads: Based on the usage patterns and purposes of your workloads, separate them into different categories like stable, variable, long-term and short-term.
  • Reserve resources: Once you’ve identified stable workloads for each environment, reserve these resources. Reservations are smarter financially if the capacity meets different vendors’ pricing models. If you have workloads running across multiple accounts on demand in different time zones, consider reserving some capacity for dynamic workloads.
  • Create business units: Each workload has a specific function. Based on your organisation’s structure, create business units (BUs) and define the owners. Assign these workloads to these business units, ensuring they are responsible for the charge
    Identify requirements of each BU: Each BU should provide capacity and scale requirements for their workloads that are stable for the short-term, but flexible enough to meet budgeting demands.
  • Create a budget and chargeback policy: Based on your requirements, define a budget for each BU. Communicate this budget to each BU owner, making sure they are responsible for staying optimised and within budget.
  • Define cloud cost governance policies: Policies vary by organisation, but just as reviewing cost summaries can lead to immediate savings, your infrastructure should be provisioned only through automation, ensuring key tags are always assigned to each resource (since not everyone has access to infrastructure provisioning).
  • Go ahead with tagging: Public cloud providers support tagging infrastructure, which can help identify workloads for management, segregation and billing purposes. Your cloud management platform can then consume these tags and generate different reports.
Cloud operations and cost economics
Figure 3: Cloud operations and cost economics

Models for cloud cost management
Cost optimisation on the cloud platform can be achieved using any of the three models listed below.

DevOps driven model: This is the low hanging fruit with low maturity. It provides a transparent facility for cost visibility and management with a proper governance structure for service creation and utilisation, alert and event handling, and integration with native and third-party cost management tools and billing APIs. Tools like CloudCheckr and Apptio Cloudability fall in this category.

CloudOps driven model: This is a mid-range maturity model to provide cloud service usage optimisation or consumption gating through infrastructure optimisation like stopping idle instances, and moving compute based instances to optimised compute type to save cost. It uses spot and reserved instances for high-volume and long-term services like data lakes. It also uses IaaC templates — for instance, for VM and storage handling. Tools like Matilda, BMC TrueSight or Lightwing fall in this category.

AIOps driven model: This is the high maturity model and long-term solution to enable predictive optimisation based on usage patterns and historical data. It has an ML driven approach to create models for predictive learning and suggesting areas of improvement based on these patterns. It also creates solutions for cost optimisation for non-production environments and production environments separately for better cost benefit with low risk. Tools like Cloudhealth and Densify fall in this category.

During the FinOps life cycle, we need to design the solution to get the cost metrics, which are categorised as business metrics, and technical or usage metrics. Business metrics are a set of parameters that can help to understand the high-level view of cloud cost expense including capex and opex. Here, capex is a combination of one-time sunk cost, integration cost and locked-in cost. Capex is important to calculate the RoI in cloud investment, and both capex and opex are important to understand the total cost to the organisation.

Technically, the FinOps life cycle is handled in two stages.

Tagging and resource management: The cloud service provider gives meta data tagging or labelling for any cloud resource to derive accountability of resources, which can be used for filtering and grouping of services based on this data like environment/stage, cost centre, importance of services, application, and more.

Metrics management and usage analysis: In this stage, data analytics is done on patterns of usage over a period of time in order to find the cost optimisation patterns. This includes proactively managing resource utilisation, using historical data for cost optimisation, and integration with cost advisory tools like CloudCheckr.

For cloud cost management, FinOps helps to derive a framework to understand the cost of services utilised, track resource usage and optimise the cloud spend. This starts by onboarding a value stream of applications to the FinOps life cycle, followed by creating resource tagging and tag policies, rule-based event alerts for threshold use of the CPU, memory and storage services, and finally, preparing reporting dashboards for management or technical usage views.

For reporting dashboards, native dashboards using GCP Data Studio can be employed. The PowerBI tool can be used for easy integration with collaboration platforms like MS Teams or Atlassian Confluence. There is no need then for security activities like roles and permissions.

Cloud governance and cost economics
In the financial management of the cloud, two common heads of accounts are cost management and cost transparency. The FinOps framework implemented as part of the cloud adoption strategy addresses both of these.

There are many best practices defined for cloud cost management and cost transparency, and adopting them as part of the FinOps framework enables better cost management. The top seven ways to optimise cloud usage and control costs are listed below.

  • Shut down unused resources/instances: Though elasticity in cloud resources to scale-in and scale-out gives better cost benefit, a common issue in resource optimisation is handling unused resources or instances, particularly in non-production environments. Monitoring and controlling unused resources gives better cost optimisation.
  • Right-size underused resources: If infrastructure sizing is done based on existing on-premise infrastructure during cloud migration, one may not know the performance requirements (completely). Therefore, right-sizing resources like VM instance, storage service and database size yields better cost benefits.
  • Reserve instances or spot instances for consistent long-term workloads: When adopting the cloud, we may have a clear idea of resources for long-term usage like data lakes or FTP landing zone. For these services, using reserved instances or spot instances for three to five years can give as much as 50 per cent cost benefit.
  • Choose the hybrid cloud approach for reducing migration cost: During cloud migration for large estates, instead of the big-bang approach for all applications at once, staged migration can be done and some key applications can be kept on-premise. A hybrid architecture can help to curtail losses.
  • Use auto-scaling features for required resources: Many cloud services have auto-scale facilities to scale-up when there is higher usage and scale-down when there is lower usage of resources. This kind of elasticity without manual monitoring and maintenance using an auto-scale feature gives cost benefits.
  • Enable budgets for resources and allocate costs: Resources should be budgeted and costs allocated. Any dynamic burst in resource requirements can then be controlled within the allocated budget with the help of alerts and notifications.
  • Choose the right compute services for better performance and costs: Though cloud service providers like Azure, AWS and GCP have multiple compute models and instance types, in many situations we may not know if the right compute size and model is being used. If we optimise this model taking into account performance and technical requirements, we can save costs.

Cloud service providers have internal financial governance defined as part of their cloud service expense management (CSEM), which helps in better cost management and transparency. Cloud architects and consultants must enable this while designing cloud adoption.

FinOps adoption helps new models in cloud management by bringing cloud engineers and finance teams together in an agile POD structure. Here, procurement is transparent and efficient through a defined workflow, which enables predictable costs and reasonable budgets.

FinOps also helps to address the cloud demand trade-offs like quality of service, speed in procurement and cost management. According to the FinOps Foundation, the key principles for FinOps adoption are:

  • Create collaborative teams covering finance, architects, business sponsors and operations teams.
  • Decisions for cloud adoption should be driven by business value.
  • Cloud usage is not a single person or team’s responsibility but is everyone’s responsibility.
  • Cost transparency is very important right from the early stage of adoption, and FinOps reports should be available to all stakeholders.
  • A centralised FinOps team should be set up to define the cloud adoption roadmap and the policies.
  • The FinOps framework should use the variable cost model from the cloud service provider (CSP).

Cloud cost management tools
The most effective and popular cloud cost management tools are listed below.

Components of FinOps/cloud cost management
Figure 4: Components of FinOps/cloud cost management

Apache CloudStack: Apache CloudStack is an IaaS cloud computing platform that helps businesses deploy virtual machines and manage geographically distributed data centres on a centralised server. IT professionals can customise the user interface and configure installed applications with features like firewalling, virtual machine (VM) templates, routing, storage replication, dynamic host configuration protocol (DHCP), and more.

Administrators can use Apache CloudStack to manage access permissions, change account passwords and allocate resources to specific domains or users.

With its project management module, enterprises can send project invites to employees and organise users into multiple teams, enabling them to collaborate and share virtual resources such as snapshots, templates, data disks and IP addresses. Managers can verify and authenticate end users via the external Lightweight Directory Access Protocol (LDAP) servers such as Microsoft Active Directory and ApacheDS. Apache CloudStack allows businesses to create new virtual machines and assign them to specific hosts or affinity groups based on individual requirements. Developers can create new network offerings with details, including name, description, data transfer rate, guest type, and virtual private cloud (VPC).

OpenStack: The OpenStack cloud cost management tool includes software tools that help in creating and managing cloud computing services, using pooled virtual resources. It contains ‘projects’, which are tools made using the OpenStack platform. These help to manage core cloud services such as computing, storage, networking, identity, and images. OpenStack tools can help to control large batches of computing, storage and networking, using resources from the data centre. These tools are managed by the OpenStack API or a dashboard.

ManageIQ: ManageIQ is another top open source cloud cost management tool. It offers help with managing small and large virtual environments and also supports advanced technologies, such as public clouds, virtual machines and containers. With ManageIQ users can download any virtual appliance and deploy its copies in virtualisation platforms like VMware or OpenStack.

LEAVE A REPLY

Please enter your comment!
Please enter your name here