Clustering Cinc and Chef Server Part 1
This article is the first of a series about setting up Cinc or Chef as as a horizontally scalable service. This first post introduces the concept of which parts of the service need to be broken out to provide for a cluster.
Standing up a standalone Chef/Cinc server is surprising simple to do thanks to the Omnibus deployment. Grab single command line from teh website, pipe it through bash, and in about 5 minutes, you have a Chef/Cinc server ready to go talk HTTPS.
# curl -L https://omnitruck.cinc.sh/install.sh | sudo bash -s -- -P cinc-server -v 14
# cinc-server-ctl reconfigure
Sure enough, we have a brand new Chef or Cinc server, ready to manage hundreds, or even thousands of clients. It almost feels too easy, doesn’t it? Let’s take a look at what we have..
From our perspective, it’s just a Chef/Cinc server, taking and serving request. Under the blankets, though, there’s a bit more to it. Things, at least at first, are just fine!
As time passes, it becomes more clear that our entire infrastructure is dependent upon that server never, ever going down. Something as simple as a kernel upgrade on your Chef/Cinc server often causes the entire organization to become rudderless. Your organization loses the ability to scale new app servers, becuase chef is down. SecOps become rudderless, as they lose the ability to patch zero days, because the server is down. Tools that rely upon inventory management lose the ability to watch systems. It can get ugly. Its our responsibility to avoid single points of failure like these.
There is thankfully a process by which we can run as many cinc (or chef) servers as we want! The api server itself is stateless and we can run as many of them as we want, as long as we externalize the Postgres Database and Opensearch Cluster. The rest of the stuff, nginx, reddis, rabbitmq can stay right where it is.
Things are slightly more complicated then the above chart, but not by much! Firstly, redundancy for the database server must be addressed, either by using Amazon RDS in multi-az mode (which will handle all failover for you), or manually setting up some sort of replication and failover process with the database server.
Setting up redundancy with opensearch is less important, as the data can be regenerated at any time by running “cinc-server-ctl reindex” on any of the cinc api servers. That said, setting up replication for Opensearch is rather easy and should be done if you have the money to cover the cost three systems.
RDS and amazon OpenSearch can handle that redundancy for you, but if you want to build a fully redundant architecture build-it-my-own-self style, you’re looking at the following:
This series of articles is going to explain, step by step, how to build and deploy a scalable, clustered Cinc/Chef setup within Kubernetes!
In our next article, we will cover the basic configuration changes needed to externalize the Database and Search engine.