Hybrid IT Is Emerging as the Solution to AI’s Rising Cost

Bringing DevOps and finance together can control unpredictable AI expenses.

Dec 9th, 2024 10:30am by Jennifer Curry Hendrickson

Featued image for: Hybrid IT Is Emerging as the Solution to AI’s Rising Cost

Photo by Christina @ wocintechchat.com on Unsplash.

As artificial intelligence evolves rapidly, the costs of the compute resources in the supporting infrastructure continue to rise sharply. Developers need high-performance scalable compute and storage to effectively train and test AI models, which require high-density power and cooling.

Infrastructure costs can spiral out of control without careful planning, and the finance team may not know what’s coming on the next bill.

DevOps and finance teams must collaborate closely to solve the challenge of AI infrastructure costs. Only then can they control unpredictable cloud expenses and build a future-proof infrastructure without impeding AI initiatives that allow their enterprises to thrive.

When DevOps teams use public cloud platforms to develop, train, test, and deploy AI models, it’s easy to spin up new compute resources. That’s what public clouds are all about. DevOps teams love this, so why shouldn’t they? They can focus on getting the models to perform at scale, and with new servers just a click away, they get all the firepower their models need.

However, overhead costs can skyrocket if no one watches how the company spins up servers and storage arrays. Finance gets the bill and wonders, “What happened?” Public cloud providers won’t issue a credit because DevOps didn’t watch resource usage.

The Flexibility of Hybrid IT Environments

Many enterprises turn to the hybrid infrastructure model to address escalating overhead infrastructure costs. Combining on-premises and colocation data centers with public and private cloud environments gives them flexibility in where they run their workloads, balancing performance with cost optimization.

The key is planning usage across all environments and monitoring your activities closely. A gaming company, for example, can take advantage of a multi-cloud strategy. When usage increases, they can spin up additional resources in a public cloud for new game launches or holiday breaks. They can then resources over a term to reduce the monthly spending for consistent (baseline) usage.

If your business handles sensitive data and has a lot of security concerns and regulations to comply with — and your environment runs at consistent resource levels — you can partner with a compliance-focused colocation provider that offers a private cloud. When you need more resources, the provider can move you into a multitenant cloud environment where you can easily access additional infrastructure.

A colocation data center is also ideal for disaster recovery. This is perhaps the top use case that no one thinks about — it’s much like paying for insurance. You hope you never need it.

Controlling AI Infrastructure Costs

While the hybrid cloud model gives you more excellent financial stability, you won’t necessarily see cost savings. That’s because the costs and application of AI technology are still unpredictable.

The key factor is the nature of your business and the IT requirements: How often do you move your applications, workloads, and servers between environments? If you have an active hybrid environment, the cost may not be as predictable as you would like.

To control your AI infrastructure costs as much as possible, consider the compute resources you need to add from an overhead perspective to gain new efficiencies. When training and testing AI, avoid static IT environments where system configurations, applications, and hardware remain unchanged.

Across all the phases of AI — data collection and preparation as well as model design, testing, training, and inference — consider an infrastructure provider with a complete set of services like those offered by DataBank. Colocation data centers are ideal when you’re not sure how to control AI costs and want to avoid being so cost-conscious that you take away some of the value of training your AI models.

Beware the Cost of Moving Workloads Off Public Clouds

Another way the hybrid environment can help control costs is when you don’t want to commit to a long-term contract with a public cloud provider. This is especially true as AI and its relevant regulations evolve. Who knows what might turn everything upside down regarding expanded use cases or regulatory requirements that could jack up the infrastructure costs?

The controls you put in place are also a factor in predicting costs. The cost is unpredictable if anybody can spin up a server, download data, and port it to another environment. However, if you understand exactly how much you might flex, hybrid can be more predictable with the proper controls and bill monitoring. You will know what it will cost when you go up. And when you flex down, you know the cost savings as well.

Another challenge is knowing the cost of moving workloads. If you give people the flexibility to move applications when they are not running so well, this comes with a cost. With public cloud hyperscalers storing your data, for example, if you want to move it to another environment, they charge egress charges that might surprise you.

Customer Use-Case for Controlling Infrastructure Costs

A prime example of controlling infrastructure costs is a customer planning to move everything to a hyperscale cloud platform within a few years. As they turn down servers and storage arrays in our colocation facilities, they use our managed services program to plan how to consolidate rack space.

Instead of figuring out how to consolidate their three colocation environments, they tap into our Infrastructure as a Service (IaaS) offering. This allows them to add to their hybrid environment while controlling costs by giving them access to terabytes of storage without buying and installing new arrays.

When ready, they can quickly move to a public cloud provider. In the meantime, we help them manage their three environments and model the spending throughout this journey. Their CFO knows the cost, and the flexibility of our infrastructure allows them to manage the transformation.

Bringing DevOps and Finance Onto the Same Page

When working with a managed service provider like DataBank, your partner can be a “marriage counselor” between your IT and finance teams. When we talk to CFOs, we translate the infrastructure costs. We explain where costs are variable and fixed and what could change the budget.

This can lead to better collaboration with the DevOps team to manage unpredictable AI infrastructure costs and plan together for the unknown, such as the cost of software licenses, hardware, and power. By moving into a hosting situation, you remove some of that unpredictability. The provider can lock you into a fixed monthly fee for many services.

IT can then talk with finance about where the company is and what IT has signed up for over the next 12 months. From there, you can discuss the development costs as you push more money into AI applications that drive the business.

The conversation is more meaningful because you have a level of spend with the infrastructure you’re locked into. You can change it by growth, but you have a foundation, and the cost is listed on your contract.

Setting the Stage to Fail Fast

Controlling AI infrastructure costs on your own is difficult. You need to know how long you will operate in each AI phase. You also don’t know how soon your AI apps will make money or improve workflow efficiencies.

How do you build your infrastructure with all those unknowns? How do you purchase something and put it inside your data center if you’re unsure where it will be in six months?

You can deal with these unknowns by using a hybrid model and partnering with a host provider. You will pivot more quickly to stay on pace and test your AI thesis. As you manage your infrastructure costs, you will gain the ability to fail fast — one of the hallmarks of eventually succeeding in AI.

This article is part of The New Stack’s contributor network. Have insights on the latest challenges and innovations affecting developers? We’d love to hear from you. Become a contributor and share your expertise by filling out this form or emailing Matt Burns at [email protected].

Jennifer Curry Hendrickson is the Senior Vice President of Managed Services for DataBank. She leads the DataBank product and technology teams and is responsible for the architecture, design and engineering, service delivery and support of the company’s managed services and...