A few months ago I had the pleasure of attending KubeCon/CloudNativeCon in Austin, TX and doing a talk about the costs of running cloud-native infrastructure. In my opinion, Austin was a good pick for a conference location - the BBQ scene in Austin was nothing but fabulous. I haven’t been before to a city where BBQ ribs and brisket taken that seriously.
I was glad to see that many people interested in my talk. For anyone interested in checking out the presentation, it is available on the Youtube. The more detailed explanation of the model and its parameters you can find in my previous post. Apparently, a lot of organizations and people are interested in how the bill for infrastructure in the cloud or on-prem can look like, and understanding where the money goes. However, some of the feedback I’ve got was that I could have also discussed the costs of building cloud-native setups from the ground up. Even though it was a bit out the scope of the original talk, I decided to write this blog post specifically about that.
I based these estimates on my personal experience, data reported in various discussions on Hacker News, and individual conversations. If any of these numbers seem strange to you, or you had drastically different experience, please contact me on @dyachuk or in slack. I would be happy to hear from you and refine some of these numbers. Also if you want to receive notifications about our other blog posts, you might sign up for our mailing list.
What do we need factor in to build a Kubernetes ecosystem on bare-metal servers?
Before I drill into the details of this question, let me state the main assumptions. I assume that we are starting from scratch, meaning there is nothing other than a credit card which would be needed to pay the bills. I assume that the staff has good knowledge of Linux, filesystems and fundamental principles of distributed computing, networking, etc. This would basically be an operations person you can find in most of the organizations.
All the time estimates you will find below, are extremely subjective as different engineers have very different backgrounds. Most of the numbers I’ve used came from personal conversations with people at ground zero. The numbers still varied hugely depending on the company and a specific person. I tried to average them up and stick the more conservative numbers.
Now let’s focus on what needs to be included in the complete Kubernetes ecosystem. What are the components without which it won’t function?
- Network fabric
- Racks and PDUs
- NAS (optional, ranging from redundant to hyper-converged storage)
- Inventory management
- Time server
- OS provisioning
- Configuration manager
- Container orchestrator
- Software Defined Networking
- Monitoring and alerting
- Converged storage, e.g., Ceph
- Database backups
The components which are non-essential but are always useful to have to make life easier:
- Backup servers
- Request tracing
- Service mesh
- Secret storage
- User management
I’ve tried making this list as comprehensive as possible. Now let me dissect and discuss the list of essential components item by item.
For starters, you are going to need servers. There are many different options available on the market, so you evaluate a few different ones to see which suit your needs the best. The selection process might take a few weeks; then you probably want to run a few performance tests to ensure the performance is up to your standards. I prefer sysbench and some of the Phoronix suites. If you have limited time, focus at least on testing the IOPS of the disks, to ensure you can get close to the fantastic advertised specs.
Connecting the servers and ensuring that packets flow between them smoothly and are not lost on the way might take a bit of effort and experience. So unless you know the network gear specs you need, allocate at least two weeks for picking hardware, two weeks for testing the configuration, and a few weeks for deploying it and benchmarking its performance.
Racks and PDUs
Choosing racks, ordering them, and installing PDUs is not such a complicated procedure and in total won’t take more than a week or two. It is essential to keep track of the power of servers and networking gear you plan to put in those racks. If you don’t provision enough power during the peak load you may experience an outage, as the power consumed by the servers grows with their utilization. On the other hand, power is a substantial expense, and over-provisioning might be costly. Maintaining a simple spreadsheet might do a good enough job to get it right.
I find using NAS devices to be a rather expensive way of implementing storage for persisting the container data. At, Pax Automa, we advocate using available storage capacity from the server in the form of hyper-converged storage, e.g., Ceph. If you want to run a NAS, it might take quite some time to get in touch with the vendors, like NetApp or Hitachi, arrange the trial, then wait for shipping and installation.
Once you have all the hardware in place, you might need an inventory system to record all the serial numbers, warranties, etc. Patchmanager is frequently used for documenting the topology of the hardware. Foreman has a plugin for doing inventory management as well. In any case, you might need to budget another couple of weeks for assessing the capabilities of that software and configuring it to your liking.
All of these servers will need to run an operating system of your liking, i.e., Linux. That operating system has to be updated and configured from time to time. The traditional approach is to run tools like Foreman or Cobbler. The latter, for example, permits to PXE boot your servers and install a predefined OS image. Setting up these tools, plus the time needed for assessment and testing would take another 2-4 weeks.
This is a small yet critical component, which is often forgotten. Keeping the time in sync with all your servers can be vital for a lot of applications. Drifting times on the servers can cause all kinds of issues, including breaking applications which employ timestamping in their logic. Setting up a time server, like chrony, is a pretty straightforward procedure and won’t take more than a day.
Before we jump into running Kubernetes, we need a way to configure the hosts to run kubelets correctly and provide them with all the necessary certificates. At the moment, most of the DIY setups were using Ansible, or Salt for accomplishing that. The time needed for setting up those tools varies hugely. It can range from one week, if you’ve done it before, to as long as six or even eight weeks, if you need to start from scratch.
Container orchestrator (Kubernetes)
The container orchestrator, being one of the cornerstones of the ecosystem, directs the actions of the underlying components – from storage to networking. It ensures that the application containers are running and healthy, have access to the resources they need and can be accessed by clients. In this post, I focus on Kubernetes only. The proper installation of the orchestrator would require setting up the API server, scheduler, etcd for persisting the state, and dashboards for management via the UI. Also, you might need to think about backing up your cluster configuration, storing your secrets, and so on. Given that this is a somewhat new component, it might take a bit longer than a lot of people expect. The number I came across ranges from a month to several months, before you build a production-ready setup which works with all the components. Software Defined Networking One of the core ideas of cloud-native infrastructure is abstracting from the physical networking. Calico, Flannel,Weave are just a few of the many open source software-defined networking solutions available on the market right now. Choosing between different options might be time-consuming and require a week. However, you once you have selected the SDN doing the networking to your liking, you still might need another two weeks to deploy and test it to make sure it’s ready for serving production traffic.
Monitoring and alerting
Managing any applications without proper monitoring is virtually impossible. Collecting metrics, aggregating, and displaying is a solved problem. Some solutions exist: Heapster, Prometheus, and InfluxDB for collecting, aggregating and storing metrics; Grafana for presenting them in nice dashboards; Reimann and Prometheus can also be responsible for triggering alerts. You still would need to configure proper tagging of the metrics, decide on rules for aggregating them, select metrics on which alerts should be triggered, build dashboards, and so on. Given the variety of different options available, and complexity of their dependencies it might take from two to six weeks depending on the level of prior expertise.
Kubernetes, in the default configuration, has a somewhat limited ability to collect logs. All the logs are stored on the hosts locally; there is no ability to archive, analyze or aggregate them. This kind of logging would probably suffice nano size setups with a handful of hosts. For more substantial installations, you might need a more advanced solution. Elastic Search + Kibana can be a good choice in the open source domain. The basic setup is not that complicated; however, final tweaks, archiving, basic analytics, configuring dashboards and alerts might take a considerable amount of time. It can be anything from one to four weeks.
Converged storage (Ceph)
As I have mentioned in the earlier paragraphs, at Pax Automa, we advocate the idea of hyper-converged storage solutions, e.g., Ceph, as a cost-effective alternative to NAS. The maturity of these solutions is high, and many Fortune 500 companies are relying on it for storing data for their production environments. Projects like Rook can dramatically simplify the delivery of Ceph storage on Kubernetes clusters. At this point, I haven’t thoroughly evaluated it and cannot judge its maturity. I am a firm believer, that to extract the maximum performance and reliability one should run Ceph on bare-metal, and have full control over disk partitioning, cache configuration, ability to user high performance interconnect with infiniband/RDMA and so on. Such an approach might be a bit more expensive in terms of manpower, as it can take from one month for several months to deliver a production ready solution.
Operating cloud-native infrastructure has quite low operational costs, offers a high-degree of automation, and allows strong abstractions from the underlying physical infrastructure. Often over-excited about the benefits of running cloud-native infrastructure, managers and engineers overlook the amount of manpower required for assembling it, underestimate the necessary expertise and forget to scope out requirements. I address this problem, by identifying the essential and optional components for cloud-native infrastructure hosted on premises and estimate the cost of delivering those in a medium size organization. The summary breakdown is as follows:
|Task||Assess, w||Testing, w||Deployment, w||Total, w|
|Racks and PDUs||0.5||0.5||0.5||1.5|
|Software Defined Networking||1||1||1||3|
|Monitoring and alerting||2||2||2||6|
|Converged storage, e.g. Ceph||2||4||8||14|
According to this estimate, it will take about 70 weeks worth of work to assemble the essential parts of the cloud-native infrastructure. So if we assume a 40 week/y availability of dedicated engineers, that’s almost two “person-years”.
Stay in touch
If you happen to have a dramatically different experience in building any of those components, or strongly disagree to what I have said here, give me a shout on twitter @dyachuk or in slack. I would love to listen to your story, and update these numbers.
As usual, stay tuned and follow @pax_automa for blog updates.
At Pax Automa, we understand that a lot of organizations cannot afford to wait for such a long time, do not have engineers on staff, or just cannot carry out a project with such a high cost.
That’s precisely why we have created Operos - it provides all the essential components out of the box, and you can install it in a matter of minutes. You can try it now; it is free and open-source. You can download it here.