Using the Kubernetes API Server and Custom Resources to Create an Inventory Management Database

behaviorInventory management is an unglamorous but necessary duty in the operations industry that answers basic queries like which machines belong to which teams. What are machines currently in use? I’m curious as to how long they’ve been there. Getting relevant answers to these questions, on the other hand, can be difficult—especially when a team is working in a fast-paced, ever-changing environment.

Our solution, which makes use of the Kubernetes API server and custom resource definitions, makes inventory management significantly more robust and scalable—and, as we’ll show, the same features are suitable for a variety of other applications as we.

Staying ahead of an expanding fabric is a difficult task.

Our developers send code into containers from their laptops running on our platform. This team maintains a vast and constantly growing environment that isn’t massive by modern enterprise standards. We currently manage over 1,000 machines, most of which are hosted on physical hardware from three different infrastructure providers, including our own bare-metal data center. In addition, we recently established a specific European area to our infrastructure, which was a necessary move that introduced a new layer of complexity.

Our current approach to inventory management is likely to cause an operational bottleneck as the Container Fabric platform grows. The number of secure network connections our orchestration services must traverse has increased due to operating across several locations.

We devised a new inventory management solution using the Kubernetes API server’s custom object definition features to create a single source of truth for our orchestration services. Even better, this approach is built on well-known technology, which we have already discovered methods to apply to other applications.

The issue with large-scale inventories

Mesosphere DC/OS, an open-source, distributed operating system developed using Apache Mesos that uses Marathon to deliver container orchestration services, now powers the Container Fabric software environment.

Container Fabric was created before Kubernetes’ mainstream adoption. We chose DC/OS because it was the most stable and trustworthy option for our needs at the time, and it has performed well for us in terms of operating services. Managing DC/OS, on the other hand, has proven to be a challenge. As a result, the Container Fabric team is working on a Kubernetes migration.

Ansible and Terraform were used to describe, enumerate, configure, and manage our infrastructure in our previous approach to inventory management. A Terraform run was required to set up a new machine, which requested provisioning from our infrastructure suppliers. We used Ansible to ask the providers to communicate the same resource that we had just requested Terraform for, and we asked the host to put it into service after the provider produced the machines we needed.

This technique necessitated complete network connectivity between all of the components. For example, the Ansible instance in charge of orchestration used a VPN to interact with the infrastructure provider. It used the same VPN to talk with the additional machines to get them up and to run.

Unless we can find a means to simplify the process of tracking and managing our Container Fabric infrastructure, we’re looking at a highly complex future.

However, we quickly recognised that Container Fabric’s actual obstacle was an inventory problem, not a connectivity issue. To keep an always up-to-date picture of what was happening “on the ground” in our infrastructure, we needed current data on our machines—names, locations, and states. The orchestration system might then make the necessary VPN connection to access a region and cluster to change machine configuration if necessary. Knowing what was going on and where it was going was crucial to cutting down on toil as we grew.

Looking for a solution

We began seeking answers to our inventory problem at this time. In our judgment, the first three possibilities were fatally defective.

Option 1: Use Terraform state data to build a solution. The “terraform.tfstate” file, which is a custom JavaScript Object Notation (JSON) format file that records a mapping from the Terraform resources in your templates to the representation of those resources in the actual world, is familiar to most Terraform users. We looked at Terraform state intelligence as a foundation for a distributed and scalable inventory management solution, but there were at least two major flaws with it.

First, the file only changes when an on-demand event occurs, such as a Terraform plan or a refresh command. It does not reflect the infrastructure in real-time.
Second, because we’re working with a simple, static text file, any change, no matter how minor, would have necessitated a complete rewrite. This raised the chance of numerous clients stomping on each other while updating.

Option 2: Invest in a commercial DCIM system. Device42 was one of several data center infrastructure management (DCIM) products we looked at. These are designed for companies that run their own data centres, and we surely meet the bill.

“Where’s my stuff?” is a question that DCIM’s services excel at addressing. Is there a specific machine I’m looking for? What shelf does it belong in? What exactly are PDUs? “How many hard disk drives do you have?” However, such solutions are less effective at designing and managing the higher-level abstractions (such as clusters and regions) required in an inventory management solution.

Option 3: Modify DC/OS to accommodate inventory management. Finally, we examined DC/OS to see if any capabilities could be used for an inventory solution. Unfortunately, DC/OS isn’t designed for this job; Mesos, the technology at its core, is designed for a more responsive pattern. Mesos does not keep track of the “desired state” of the computers in a cluster or even the services on a single machine.

It’s worth mentioning that DC/default OS’s installation includes Apache Zookeeper, a distributed information server, etcd, a distributed key-value store. These services were investigated for implementing the features we needed. However, the resulting solution would not have worked with higher-level abstractions, and it would not have provided any advantages to deploying our installation of these services.

Important lessons have been learned.

Although we were unable to discover a suitable inventory management solution during our initial search, it did assist us in defining the requirements and features we required:

Inventory management with a single source of data. We wouldn’t have to divide code paths based on infrastructure suppliers any more.
The ability to update from any location without a VPN connection. This was a criterion that all data centers had to meet.
Ansible dynamic inventory compatibility. We needed to do this since we were looking for a new inventory management solution but also wanted to preserve our current orchestration strategy for the time being.
Performance of high production value. This was an obvious requirement to expand the solution beyond inventory management to include monitoring and other activities.
Object representation that is locally meaningful and expandable. We wanted the flexibility to interact with higher-level abstractions beyond the typical “where’s my stuff?” questions that DCIM solutions are known for.

It’s the fourth time around, and it’s a charm: obtaining access to the Kubernetes API server.

The fact that Kubernetes has something DC/OS doesn’t: an inherent, centralized configuration service was what got it on our radar as an inventory management solution.

The kube-apiserver is one of the essential parts of a Kubernetes cluster since it manages the declarative state of all the objects that reflect the cluster’s behaviour. It comes with a library of preset things, such as deployment representations, stateful sets, and services to help speed up the process. The kube-apiserver also supports custom resources. These materials are primarily meant to aid in deploying more complicated applications. However, there’s no reason why they couldn’t be used for other purposes.

Indeed, we discovered that kube-apiserver could handle all of our inventory management needs. However, we would need to make three extra components first:

To specify our custom resources for the kube-apiserver, we’ll use a basic object structure—in practice, a blob of YAML.
A service that runs inside each infrastructure provider to enumerate and deliver a list of the hosts operating there back to the kube-apiserver. This populates the kube-apiserver with host objects that reflect individual computers using at a particular provider.
Thanks to this dynamic inventory script, Ansible can interface with the kube-apiserver and use host objects. The host objects will be filled with data from a “fetcher,” a service that retrieves the appropriate host data from a specific source for usage in our new inventory system.

Putting our kube-apiserver solution into action

Our deployment was straightforward—much simpler than a conventional Kubernetes deployment. We didn’t use Kubernetes in the usual sense; instead, we used kube-apiserver, kube-controller-manager (for cert management), and etcd3 (for object storage). We didn’t deploy any pods or workloads during this procedure, and we didn’t do any pod networking. There was less to do, and the process might go wrong in fewer ways.

The CRD and Go struct code examples for a “fetcher,” a service that retrieves the required host data from a specific provider for usage in our new inventory system.

In YAML, here’s a condensed version of our host CRD:

In the form of a Go struct, this is how our host looks, which we utilize for internal development:

The annotation and labels for the CRD are stored in TheObjectMeta data. The host object’s ideal state is represented by Spec, whereas Status represents its existing state.

Labels and annotations are essential for operations like searching and filtering; for example, you can see all hosts running as log machines on a particular provider’s infrastructure using the command line or an API request. Annotations are also used to hold data about the host that can be used in later setup scripts.

Working with the answer

The entire system provides a database of infrastructure inventories that may be expanded indefinitely.

Clients are required for a database to be helpful, which means an Ansible dynamic inventory script. Ours, like others, is written in Python and uses the Kubernetes pip package for the kube-apiserver client.

From our kube-apiserver, the Kubernetes Python library delivers an extensive dictionary of host objects, which we convert into JSON output suited for Ansible. As before, we use Ansible for our orchestration services.

There are two reasons why kube-apiserver performs well at scale.

Adapting kube-apiserver to serve as an inventory-management database has significantly impacted our operational capabilities. We may declare declarative cluster desirable states and design decomposed cluster services—clients of the kube-apiserver don’t need to know about each other, even if they’re working on the same object simultaneously.
As a result, we’re no longer bound by a staged, manual, and highly iterative process involving a great deal of back-and-forth communication. We’re now far better equipped to function at scale and confidently move forward in a multi-regional future.

There are a few more reasons why Kubernetes, particularly kube-apiserver, is well-suited for this task:

TLS mutual authentication is supported in Kubernetes. This eliminates the VPNs and network connectivity needs that would have brought more complexity at scale than we were willing to accept.
The kube-apiserver manages the consistency ordering. If a client wants to update an item, but another server has already done so, kube-apiserver notifies the client. This allows us to make dynamic configuration changes, such as deploying monitoring logic to new hosts as they’re configured, without relying on git and Ansible.

A solution that will stand the test of time

As previously stated, other applications for our kube-apiserver-based inventory management solution are easy to imagine. We’ve already created a polling system to generate monitoring configuration from our inventory. We need to dynamically establish or destroy monitoring for each host as the hosts in our fleet change. Every time a host in the fleet changed, it necessitated a git commit to include the hostname. Our monitoring system now automatically subscribes to changes in the host objects and creates alert configuration without our intervention.

We took it a step further and created an Ansible change orchestration system. We may start by placing fields into the spec stanza of each host that the daemon can respond to, triggering Ansible runs on its own, moving the host in and out of maintenance, or upgrading the operating system. The foundation we required to dramatically modify our day-to-day operations with software was having one authoritative, universally accessible source for the intended host state.

This method has also served as an excellent springboard for our longer-term ambitions to shift from DC/OS to Kubernetes, a trip that many other operations teams are considering.

Enteros

About Enteros

Enteros offers a patented database performance management SaaS platform. It proactively identifies root causes of complex business-impacting database scalability and performance issues across a growing number of RDBMS, NoSQL, and machine learning database platforms.

The views expressed on this blog are those of the author and do not necessarily reflect the opinions of Enteros Inc. This blog may contain links to the content of third-party sites. By providing such links, Enteros Inc. does not adopt, guarantee, approve, or endorse the information, views, or products available on such sites.

Are you interested in writing for Enteros’ Blog? Please send us a pitch!

Optimizing Enterprise Performance in the Telecom Sector: How Enteros Drives Cloud FinOps Excellence

17 April 2025
Database Performance Management

In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…

Accelerating Performance Growth in the Insurance Sector with Enteros: Uniting Database Optimization and RevOps Efficiency

Database Performance Management

In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…

Harnessing Enteros and Generative AI to Empower Database Administrators in the Hospitality Sector through a Scalable SaaS Platform

16 April 2025
Database Performance Management

In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…

Driving Cost Attribution and Performance Efficiency in the Travel Sector with AIOps and Cloud-Based Database Platforms

Database Performance Management

In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…