We work with many companies that come to us after they’ve already moved some (or all) of their on-prem environments to the cloud. The complaint is never about the infrastructure itself.
The questions they bring to us have one thing in common: Cost.
The prevailing assumption is similar. “We moved our servers to the cloud, and we’re not saving money. Can you tell us why?” Those companies come to us searching for an objective third party to clarify why the bill for cloud servers is as expensive as it was for on-prem servers.
If you simply make a one-to-one transition, which is what we call a lift-and-shift approach, you are missing critical cost savings and opportunities for innovation and workflow improvements.
Back in the “early” days, engineers provisioned resources for the busiest day or season of the year. But, not every day is Black Friday or Cyber Monday.
That meant buying bigger and heftier servers to manage peak loads. One of the most significant advantages of a cloud provider, though, is they make a huge upfront investment in infrastructure, so you don’t have to.
In the cloud, it’s possible to maximize server utilization on a larger stack of smaller machines. The images we used in this blog post illustrate how much easier it is to fine-tune on a granular level using smaller (and therefore less expensive) equipment.
Additionally, not only do you save on the actual cost of depreciating equipment, you don’t have to invest in the human capital to maintain it. How many hours a month are you currently spending on your “people” time to babysit those servers? In the cloud scenario, the cloud providers do most of that heavy lifting for you.
When we work with clients to migrate servers and apps into the cloud, we nearly always find opportunities to implement an Infrastructure-as-Code (IaC) framework. IaC defines the entire environment for an application and makes deploying infrastructure:
More likely than not, your server patching/update process looks like this: You have an operations team manually logging on to one of many QA servers, making a firewall change or installing a patch, updating Java (or whatever the update is), and then going through the wash/rinse/repeat of that time-consuming process until all of your servers in that environment are at parity.
Then, the operations team asks the QA team to run smoke tests to ensure the app still works as expected under the new hardware config. With a (hopefully) green light from QA, the operation team logs on to the staging/UAT servers. One by one, the operations team goes through the entire wash/rinse/repeat cycle again for this environment.
This wash/rinse/repeat happens for each of your environments (QA, test, staging, UAT, pre-production, production, etc.), with a slight variation in production.
Once the update gets to prod, people are less inclined to initiate the smoke tests as those may write data or trigger orders in your live system. So, everyone generally hopes things are good (I mean, they got the green light in the other environments, right?), and your end-users won’t start submitting tickets about performance issues. The push to prod often involves a lot of wishful thinking rather than confidence.
Sound about right?
What if the team were confident that everything would work right the first time?
IaC instead:
Because the process is automated and templated, the team doesn’t launch updates and cross their fingers. IaC saves operations engineers tons of time and improves their confidence in the update once it goes live.
That wash/rinse/repeat process is automatic and no longer a headache. The critical takeaway, and where cost reductions come, is in the time saved. That’s where you can find some hefty savings.
Picture, if you will, an application environment that requires running four app servers, two web servers, a caching platform, and a database, all with their own respective firewall rules and load balancers in front of them.
Let’s also assume you have three environments where this setup is replicated: QA, staging, and production. Are all of these environments running 24/7? Surely your production environment should, but…does the QA team really work 24/7?
All cloud providers (AWS, Microsoft, or Google) have pay-as-you-go models, meaning you pay only for what you use, instead of having an on-prem server running 24/7 or paying a third-party hosting provider you’ve contracted to keep those servers running.
Shutting down servers when you’re not using them is paramount to how you save money by using the cloud. Leveraging an IaC framework, you can spin up your environments at the push of a button. Instead of running the testing environment all the time, think of the savings you could realize if the QA team clicked the IaC button to get those servers up and running when—and only when—they need it.
Running a serverless application on-prem isn’t typical. After all, apps run on servers, right? Don’t they have to? Not necessarily.
Cloud providers have found a way to offer serverless technologies to the masses. By refactoring your applications to a cloud-native or serverless technology, you will see immediate cost benefits.
Serverless deployments theoretically can scale to infinity. Using serverless, you can scale as quickly as the largest and most well-provisioned companies on the planet, like Netflix, Google, or Amazon. When your system gets hammered, your cloud provider can scale up to handle that inbound demand seamlessly and without any performance issues.
On the flip-side, with serverless, you don't pay anything if an app doesn’t get any requests. You don’t need operations teams to manage that server (what server?). That’s all offloaded to your cloud provider. As a developer, you just focus on your code. You can hand off that code to your provider and ask them to run it whenever it’s requested.
Beyond infinite scaling, serverless deployments are phenomenally cost-effective. Here’s a recent, real-world example.
We converted a data processing application for a particular client to serverless. This application is expected to process 4TB of data. In the server-based version of this process, it took 20 hours with 50 Windows-based servers at a total cost of $1,900 for that compute power.
As we converted to a serverless application, that same amount of data was processed by 7,500 concurrent Lambda functions and took only two hours.
The cost? $8.00.
Amazon is one of the most well-provisioned companies on earth and can scale instantly to meet sudden traffic spikes. Instead of paying for hardware, you pay for scenarios.
These can be abstract constructs when you’re not already experienced with cloud strategies, including AWS. Your engineers could have the skill set to teach themselves how to do this, but you could be paying for their learning curve, kicking out the ROI for your cloud investment even further. Also, your team may appreciate the opportunity to work with developers who already know what they’re doing and can pass that knowledge base on to them.
Sketch has the tools to help right-size your instances, monitor your environment, and tell you if you’re overprovisioned. We can also find opportunities to scale back to smaller servers and uncover other strategies to reduce cost and increase efficiency. We also have tools to help you automate the shutdown/startup of your systems.
We aren’t just another managed cloud service provider. We have the development expertise to help you refactor and build applications for the cloud and leverage all the best tools the cloud offers.
Let us know if you need help with your existing cloud deployment or someone to manage your cloud transition. We’ve helped dozens of companies with successful cloud migrations, and we’re ready to help you with yours. Contact us today.
For more information on this topic, check out our upcoming webinar below!