Just a quick post this morning to mention that I’m working on a thing about deploying infrastructure with Terraform. Tentatively should have the first installment out next Monday.
For the not-yet-informed. Terraform is an infrastructure deployment tool that takes a text file description of a distributed application infrastructure, compares it to what exists, and changes reality to match the description of what you want. It does this very well, and has a quick learning curve and excellent ease of use. If you’re doing production deploys into AWS or Azure, or even Cisco ACI, this is a tool you should get to know.
If you want to understand how Terraform does it’s thing from an architecture perspective here’s a link to a 30 minute talk that I found enlightening. TL;DR Graph theory is everywhere!
That’s it for this week. May you care for yourself with ease, and may the odds ever be in your favor.
In this post we’re going to do a brief overview of Containers and Kubernetes. My aim here is to set the stage for further exploration and to give a quick back of the napkin understanding of why they’re insanely great.
From the earliest days of computers that had multitasking operating systems applications have been hosted directly on top of the host operating system. The process of deploying, testing and updating applications was slow and cumbersome to say the least. The result: mounting technical debt that eventually consumes more and more resources just to keep legacy applications operating, without any ability to update or augment them.
Virtualization enabled multiple copies of a virtual computer on top of a physical computer. This dramatically improved the efficiently of resource utilization and the speed at which new workloads could be instantiated. It also made it convenient to run a single application or service per machine.
Because of the ease in which new workloads could be deployed, In this era we started to see horizontal scaling of applications as opposed to vertical scaling. This means running multiple copies of a thing and placing the copies behind a load balancer, as opposed to making the things bigger and heavier. This is an example of how introducing a new layer of abstraction results in innovation, which is a hallmark of the history of computers.
However, Some problems emerged with the explosion of virtualized distributed systems. Horizontal scaling techniques add additional layers of complexity, which made deployments slower and more error prone. Operations issues also emerged due to the need to monitor patch and secure these growing fleets of virtual machines. It was two steps forward, one step backward.
Containers are a similar concept to Virtual machines, however the isolation occurs at the operating system level rather than the machine level. There is a single host operating system, and each application runs in isolation.
Containers are substantially smaller and start almost instantly. This allows for containers to be stored in a central repository where they can be pulled down and executed on demand. This new abstraction resulted in the innovation of immutable infrastructure. Immutable infrastructure solves the day 2 operations of care and feeding of the running workloads. However it doesn’t solve for the complexity of deploying the horizontal scaling infrastructure. This is where Kubernetes comes in.
Kubernetes: Container orchestrator
Kubernetes is a collection of components that:
Deploys tightly coupled containers (pods) across the nodes
Scales the number of pods up and down as needed
Provides load balancing (ClusterIP, Ingress controller) to the pods
Defines who the pods can communicate with
Gives pods a way to discover each other
Monitors the health of pods and services
Automatically terminates and replaces unhealthy pods
Gracefully rolls out new versions of a pod
Gracefully rolls back failed deployments
Kubernetes takes an entire distributed application deployment and turns it into an abstraction that is defined in YAML files that can be stored in a version control repository. A single K8s cluster can scale to hundreds of thousands of Pods on thousands of nodes. This abstraction has resulted in an explosion of new design patterns that is changing how software is written and architected, and has led to the rise of Microservices. We’ll cover Microservices in another post.
I hope this was helpful and gave you a useful concept of what Kubernetes is and why it’s such a game changer. Would you like to see more diagrams? Let me know your thoughts.
In the next few PKI for network engineers posts, I’m going to cover Cisco IOS CA. If you’re studying for the CCIE security lab or you’re operating a DMVPN or FlexVPN network, and you’d like to use Digital certificates for authentication, then this series could be very useful for you.
IOS-CA is the Certification Authority that is built into Cisco IOS. While not a full featured enterprise PKI, for the purposes of issuing certificates to routers and firewalls for authenticating VPN connections it’s a fine solution. It’s very easy to configure and supports a variety of deployment options.
Comes with Cisco IOS
Supports enrollment over SCEP (Simple Certificate Enrollment Protocol)
RSA based certificates only
Easy to configure
Network team maintains control over the CA
Solution can scale from tiny to very large networks
IOS-CA has the flexibility to support a wide variety of designs and requirements. The main factors to consider are the size of the network, what kind of transport is involved, how the network is dispersed geographically, and the security needs of the organization. Let’s briefly touch on a few common scenarios to explore the options.
Single issuing Root CA
For a smaller network with a single datacenter or a active/passive datacenter design, a single issuing root may make the most sense. It’s the most basic configuration and it’s easy to administer.
This solution would be appropriate when the sole purpose of the CA is to issue certificates for the purpose of authenticating VPN tunnels for a smaller network of approximately a hundred routers or less. Depending on the precautions taken and the amount of instrumentation on the network, the blast radius of this CA compromise would be relatively small and you could spin up a new CA an enroll the routers to it fairly quickly.
The primary consideration for this option is CA placement. A good choice would be a virtual machine on a protected network with access control lists limiting who can attempt to enroll. The CA is relatively safe from being probed and scanned, and it’s easily backed up.
Offline Root plus Issuing Subordinate CAs
For a network that has multiple datacenters and/or across multiple continents it would make more sense to create a Root CA, then place a subordinate Issuing CA each datacenter that contains VPN hubs. By Default IOS trusts a subordinate CA, meaning the root CA’s certificate and CRL need not be made available to the endpoints to prevent chaining failure.
As in the case with any online/Issuing CA, steps should be taken to use access controls to limit access to HTTP/SCEP
Issuing Root with Registration Authority
In this design, there is still a single issuing root, however the enrollment requests are handled by an RA (Registration Authority). An RA acts as a proxy between the CA and the endpoint. This allows for the CA to have strict access controls yet still be able to process enrollment requests and CRL downloads.
Offline Root & Issuing Subordinate CAs w/RA
The final variation is a multi-level PKI that uses the RA to process enrollment and CRL downloads. This design provides the best combination of security, scaling, and flexibility, but it is also the most complex.
Bootstrapping remote routers
Consider a situation where you’re turning up a remote router and you need to bring up your transport tunnel to the Datacenter. The Certification authority lives in the Datacenter on a limited access network behind a firewall. In order for the remote router to enroll in-band using SCEP, it would need a VPN connection. But our VPN uses digital certificates.
There are three options for solving this:
Sideband tunnel w/pre-shared key
This method involves setting up a separate VPN tunnel that uses a pre-shared key in order to provide connectivity for enrollment. This could be a temporary tunnel that’s removed when enrollment is complete, or it could be shut down and left in place for use at a later time for other management tasks.
A sideband tunnel for performing management tasks that may bring down the production tunnel(s) is useful, making this a good option. It’s main drawback is the amount of configuration work required on the remote router. It also depends on some expertise on the part of the installer, a shortcoming shared with the manual enrollment method.
A Registration Authority is a proxy that relays requests between the Certification Authority and the device requesting enrollment. In this method an RA is enabled on the untrusted network for long enough to process the enrollment request. Once the remote router has been enrolled the the transport tunnels will come up and bootstrapping is complete.
Using a proxy allows in-band enrollment with a minimum amount of configuration on the remote router, making it less burdensome for the on-site field technician. The tradeoff is we’re shifting some of that work to the head end. Because the hub site staff is likely to possess more expertise, this is an attractive trade-off.
In this method, the endpoint produces a Enrollment request on the terminal formatted as base64 PEM (Privacy Encrypted Mail) Blob. The text is copied and pasted into the terminal of the CA. The CA processes the request and outputs the certificate as a PEM file, which is then pasted into the terminal of the Client router.
While this does have the advantage of not requiring network connectivity between the CA and the enrolling router, it does have a couple of drawbacks. Besides being labor intensive and not straightforward for a field technician to work with, endpoints enrolled with the terminal method cannot use the auto-rollover feature, which allows the routers to renew certificates automatically prior to expiration. The author regards this is an option of last resort.
CRL download on spokes problem
The issue here is when the spoke router needs to download the certificate revocation list (CRL) but the CA that has the CRL is reachable only over a VPN tunnel, that cannot come up because the spoke can’t talk to the CA to download a unexpired copy of the CRL in order to validate the certificate of the head end router.
This is actually a pretty easy problem to solve. Disable CRL checking on the spoke routers, but leave it enabled on the hub routers. This way the administrator can revoke a router certificate and that router will not be allowed to join the network because the hub router will see that it’s certificate has been revoked.
Ok, so there are the basics. In the next installment, we’ll step through a minimum working configuration.
So, I’m committed to the study of coding in python for 250 hours a year. And I’m about 12 weeks in.
I have no ambitions of becoming a professional developer. I like being an infrastructure guy. So why in the heck am I making this commitment as part of my professional development?
Here’s a fundamental problem infrastructure designers & engineers of all stripes are faced with today:
The architecture, complexity, and feature velocity of modern applications overwhelms traditional infrastructure service models. The longer we wait to face this head on, the larger the technical debt will grow, and worse, we become a liability to the businesses that depend on us.
Allow me to briefly put some context around this. I’m going use some loose terminology and keep things simple and high level so we can quickly set the stage. Then I’ll explain where python fits in.
Most all software applications have three major components:
user (or application) interface
Overall, the differences in complexity come down to component placement and whether or not each function supports multiple instances for scaling and availability.
A stage 1 or basic client/server application will place all components on a single host. Users access the application using some sort of lightweight client or terminal that provides input/ouput and display operations. If you buy something at Home Depot and glance at the point of sale terminal behind the checkout person, you’re probably looking at this kind of app.
A stage 2 Client/server application will have a database back end, with the processing and user interface running as a monolithic binary running on the user’s computer. This is the bread and butter of custom small business applications. Classic example of this would the program you see the receptionist using at a doctor or dentist’s office when they’re scheduling your next appointment. Suprisingly, this design pattern is still commonly used to develop new applications even today, becuase it’s simple and easy to build.
A stage 3 Client/Server application splits the 3 major components up so that they can run on different compute instances. Each layer can scale independently of the other. In older designs, the user interface is a binary that runs on the end user’s computer. Newer designs used web based user interfaces. These tend to be applications with scaling requirements in the hundreds to tens of thousand of concurrent sessions. These apps tend to be constrained by the database design. A typical SQL database is single master and often the choke point of the app. There are many of examples in manufacturing and ERP systems of 3 layer designs where the ui and data processing is scale-out, backed by a monster SQL Server failover cluster .
All of these design patterns are based on the presumption of fixed, always available compute resources underpinning them. They’re also generally vendor centric technology stacks with long life cycles. They all depend on redundancy underpinning the compute storage and network components for resiliency, with the 3 tier being the most robust as 2 of the layers can tolerate failures and maintain service.
Traditional infrastructure service models are built around supporting these types of application designs and it works just fine.
Then came cloud computing, and the game changed.
Stage 4: Cloud native applications.
Cloud native applications operate as an elastic confederation of resilient services. Let’s unpack that.
Elastic means service instances spin up or down dynamically based upon demand. when an instance has no work to do, it’s destroyed. When additional capacity is required, new instances are dynamically instantiated to accommodate demand
Resilient means the individual services instances can fail and the application will continue running by simply destroying the failed service instance and instantiating a healthy replacement.
Confederation means a collection of services are designed to be callable via discoverable APIs. This means if an application needs a given function that can be offered by an existing service, the developer can simply have his application consume it. This means components of an application can live anywhere, as long as they’re reachable and discoverable over the network.
Because of this modular design, it’s easy to quickly iterate software, adding additional functions and features to make it more valuable to it’s users.
Great! but as an infrastructure person how do we support something like this? The fact is, this is where traditional vendor-centric shrink wrap tool approach breaks down.
Infrastructure delivery and support
Here are the main problems with shrink wrap monolithic tools in context of cloud native application design patterns:
Tool sprawl (ever more tools and interfaces to get tasks done)
Feature velocity quickly renders tool out of date
Graphical tools surface a small subset of a product’s capabilities
Difficult to automate/orchestrate/manage across technology domains
Figure 1 is what the simple task of instantiating a workload looks like on 4 different public cloud providers. Project this sprawl across all the componentry required to turn up an application instance and the scope of the problem starts to come into focus.
So how in the heck do you deal with this? The answer is surprisingly simple: infrastructure tools that mirror the applications the infrastructure needs to support.
TL;DR Learn how to use cloud native tools and toolchains. And this brings us to the title of the blog post.
Python as a baseline competency
Python is the glue that lets you assemble your own collection of services in a way that’s easy for your organization’s developers and support people to consume. The better you are with python, the easier it is to consume new tools and create tool chains. Imagine that you’re at a restaurant and you’re trying to order for yourself and your family. If you can communicate fluently, it’s much easier to get what you want than if you don’t speak the language, and are reduced to pointing at pictures and pantomiming.
Interestingly, I’ve found that many of the principles of good network and systems design are directly applicable to writing a good program. encapsulation, separation, information hiding, discreet building blocks, etc.
Use Case: Creating a Virtual Machine
Let’s take the example of creating a virtual machine.
For creating a VM, we could create a class or classes in python that allows someone to spin up, shut down and check the status of virtual machines on 4 different public cloud providers in a manner that abstracts the specifics of each provider and gives them to the user as a generic service. Then we or anyone else on the team can use it to spin up workloads with a simple call, without having to know the implementation details of each provider.
The power of making tools like this is:
we only have to solve the problem of doing a thing once
we encapsulate it
we make it available to other code
In the next installment of my DevOps journey, I’ll take a stab at writing the python Virtual Machine class described and we’ll see how well it works.