PKI for Network Engineers Ep 10: Cisco IOS CA introduction

Greetings programs!

In the next few PKI for network engineers posts, I’m going to cover Cisco IOS CA. If you’re studying for the CCIE security lab or you’re operating a DMVPN or FlexVPN network, and you’d like to use Digital certificates for authentication, then this series could be very useful for you.


IOS-CA is the Certification Authority that is built into Cisco IOS. While not a full featured enterprise PKI, for the purposes of issuing certificates to routers and firewalls for authenticating VPN connections it’s a fine solution. It’s very easy to configure and supports a variety of deployment options.

Key points

  • Comes with Cisco IOS
  • Supports enrollment over SCEP (Simple Certificate Enrollment Protocol)
  • RSA based certificates only
  • Easy to configure
  • Network team maintains control over the CA
  • Solution can scale from tiny to very large networks

Deployment Options

IOS-CA has the flexibility to support a wide variety of designs and requirements. The main factors to consider are the size of the network, what kind of transport is involved, how the network is dispersed geographically, and the security needs of the organization. Let’s briefly touch on a few common scenarios to explore the options.

Single issuing Root CA

For a smaller network with a single datacenter or a active/passive datacenter design, a single issuing root may make the most sense. It’s the most basic configuration and it’s easy to administer.

This solution would be appropriate when the sole purpose of the CA is to issue certificates for the purpose of authenticating VPN tunnels for a smaller network of approximately a hundred routers or less. Depending on the precautions taken and the amount of instrumentation on the network, the blast radius of this CA compromise would be relatively small and you could spin up a new CA an enroll the routers to it fairly quickly.

The primary consideration for this option is CA placement. A good choice would be a virtual machine on a protected network with access control lists limiting who can attempt to enroll. The CA is relatively safe from being probed and scanned, and it’s easily backed up.

Fig 1. Single issuing root

Offline Root plus Issuing Subordinate CAs

For a network that has multiple datacenters and/or across multiple continents it would make more sense to create a Root CA, then place a subordinate Issuing CA each datacenter that contains VPN hubs. By Default IOS trusts a subordinate CA, meaning the root CA’s certificate and CRL need not be made available to the endpoints to prevent chaining failure.

As in the case with any online/Issuing CA, steps should be taken to use access controls to limit access to HTTP/SCEP

Fig 2. Offline Root

Issuing Root with Registration Authority

Fig 3. Single root with Registration authority

In this design, there is still a single issuing root, however the enrollment requests are handled by an RA (Registration Authority). An RA acts as a proxy between the CA and the endpoint. This allows for the CA to have strict access controls yet still be able to process enrollment requests and CRL downloads.

Offline Root & Issuing Subordinate CAs w/RA

The final variation is a multi-level PKI that uses the RA to process enrollment and CRL downloads. This design provides the best combination of security, scaling, and flexibility, but it is also the most complex.

Bootstrapping remote routers

Consider a situation where you’re turning up a remote router and you need to bring up your transport tunnel to the Datacenter. The Certification authority lives in the Datacenter on a limited access network behind a firewall. In order for the remote router to enroll in-band using SCEP, it would need a VPN connection. But our VPN uses digital certificates.

There are three options for solving this:

Sideband tunnel w/pre-shared key

This method involves setting up a separate VPN tunnel that uses a pre-shared key in order to provide connectivity for enrollment. This could be a temporary tunnel that’s removed when enrollment is complete, or it could be shut down and left in place for use at a later time for other management tasks.

A sideband tunnel for performing management tasks that may bring down the production tunnel(s) is useful, making this a good option. It’s main drawback is the amount of configuration work required on the remote router. It also depends on some expertise on the part of the installer, a shortcoming shared with the manual enrollment method.

Registration Authority

A Registration Authority is a proxy that relays requests between the Certification Authority and the device requesting enrollment. In this method an RA is enabled on the untrusted network for long enough to process the enrollment request. Once the remote router has been enrolled the the transport tunnels will come up and bootstrapping is complete.

Using a proxy allows in-band enrollment with a minimum amount of configuration on the remote router, making it less burdensome for the on-site field technician. The tradeoff is we’re shifting some of that work to the head end. Because the hub site staff is likely to possess more expertise, this is an attractive trade-off.

Manual/Terminal enrollment

In this method, the endpoint produces a Enrollment request on the terminal formatted as base64 PEM (Privacy Encrypted Mail) Blob. The text is copied and pasted into the terminal of the CA. The CA processes the request and outputs the certificate as a PEM file, which is then pasted into the terminal of the Client router.

While this does have the advantage of not requiring network connectivity between the CA and the enrolling router, it does have a couple of drawbacks. Besides being labor intensive and not straightforward for a field technician to work with, endpoints enrolled with the terminal method cannot use the auto-rollover feature, which allows the routers to renew certificates automatically prior to expiration. The author regards this is an option of last resort.

CRL download on spokes problem

The issue here is when the spoke router needs to download the certificate revocation list (CRL) but the CA that has the CRL is reachable only over a VPN tunnel, that cannot come up because the spoke can’t talk to the CA to download a unexpired copy of the CRL in order to validate the certificate of the head end router.

This is actually a pretty easy problem to solve. Disable CRL checking on the spoke routers, but leave it enabled on the hub routers. This way the administrator can revoke a router certificate and that router will not be allowed to join the network because the hub router will see that it’s certificate has been revoked.


Ok, so there are the basics. In the next installment, we’ll step through a minimum working configuration.

The DevOps Chronicles part 1: Why I’m studying python for 5 hours a week.

So, I’m committed to the study of coding in python for 250 hours a year. And I’m about 12 weeks in.

I have no ambitions of becoming a professional developer. I like being an infrastructure guy. So why in the heck am I making this commitment as part of my professional development?

Here’s a fundamental problem infrastructure designers & engineers of all stripes are faced with today:

The architecture, complexity, and feature velocity of modern applications overwhelms traditional infrastructure service models. The longer we wait to face this head on, the larger the technical debt will grow, and worse, we become a liability to the businesses that depend on us.

Allow me to briefly put some context around this. I’m going use some loose terminology and keep things simple and high level so we can quickly set the stage. Then I’ll explain where python fits in.

Application Components

Most all software applications have three major components:

  1. data store
  2. data processing
  3. user (or application) interface

Overall, the differences in complexity come down to component placement and whether or not each function supports multiple instances for scaling and availability.

Application Architectures

A stage 1 or basic client/server application will place all components on a single host. Users access the application using some sort of lightweight client or terminal that provides input/ouput and display operations. If you buy something at Home Depot and glance at the point of sale terminal behind the checkout person, you’re probably looking at this kind of app.

A stage 2 Client/server application will have a database back end, with the processing and user interface running as a monolithic binary running on the user’s computer. This is the bread and butter of custom small business applications. Classic example of this would the program you see the receptionist using at a doctor or dentist’s office when they’re scheduling your next appointment. Suprisingly, this design pattern is still commonly used to develop new applications even today, becuase it’s simple and easy to build.

A stage 3 Client/Server application splits the 3 major components up so that they can run on different compute instances. Each layer can scale independently of the other. In older designs, the user interface is a binary that runs on the end user’s computer. Newer designs used web based user interfaces. These tend to be applications with scaling requirements in the hundreds to tens of thousand of concurrent sessions. These apps tend to be constrained by the database design. A typical SQL database is single master and often the choke point of the app. There are many of examples in manufacturing and ERP systems of 3 layer designs where the ui and data processing is scale-out, backed by a monster SQL Server failover cluster .

All of these design patterns are based on the presumption of fixed, always available compute resources underpinning them. They’re also generally vendor centric technology stacks with long life cycles. They all depend on redundancy underpinning the compute storage and network components for resiliency, with the 3 tier being the most robust as 2 of the layers can tolerate failures and maintain service.

Traditional infrastructure service models are built around supporting these types of application designs and it works just fine.

Then came cloud computing, and the game changed.

Stage 4: Cloud native applications.

Cloud native applications operate as an elastic confederation of resilient services. Let’s unpack that.

Elastic means service instances spin up or down dynamically based upon demand. when an instance has no work to do, it’s destroyed. When additional capacity is required, new instances are dynamically instantiated to accommodate demand

Resilient means the individual services instances can fail and the application will continue running by simply destroying the failed service instance and instantiating a healthy replacement.

Confederation means a collection of services are designed to be callable via discoverable APIs. This means if an application needs a given function that can be offered by an existing service, the developer can simply have his application consume it. This means components of an application can live anywhere, as long as they’re reachable and discoverable over the network.

Because of this modular design, it’s easy to quickly iterate software, adding additional functions and features to make it more valuable to it’s users.

Great! but as an infrastructure person how do we support something like this? The fact is, this is where traditional vendor-centric shrink wrap tool approach breaks down.

Infrastructure delivery and support

Here are the main problems with shrink wrap monolithic tools in context of cloud native application design patterns:

  1. Tool sprawl (ever more tools and interfaces to get tasks done)
  2. Feature velocity quickly renders tool out of date
  3. Graphical tools surface a small subset of a product’s capabilities
  4. Difficult to automate/orchestrate/manage across technology domains

Figure 1 is what the simple task of instantiating a workload looks like on 4 different public cloud providers. Project this sprawl across all the componentry required to turn up an application instance and the scope of the problem starts to come into focus.

Fig 1: Creating a virtual machine on different cloud providers.

So how in the heck do you deal with this? The answer is surprisingly simple: infrastructure tools that mirror the applications the infrastructure needs to support.

TL;DR Learn how to use cloud native tools and toolchains. And this brings us to the title of the blog post.

Python as a baseline competency

Python is the glue that lets you assemble your own collection of services in a way that’s easy for your organization’s developers and support people to consume. The better you are with python, the easier it is to consume new tools and create tool chains. Imagine that you’re at a restaurant and you’re trying to order for yourself and your family. If you can communicate fluently, it’s much easier to get what you want than if you don’t speak the language, and are reduced to pointing at pictures and pantomiming.

Interestingly, I’ve found that many of the principles of good network and systems design are directly applicable to writing a good program. encapsulation, separation, information hiding, discreet building blocks, etc.

Use Case: Creating a Virtual Machine

Let’s take the example of creating a virtual machine.

For creating a VM, we could create a class or classes in python that allows someone to spin up, shut down and check the status of virtual machines on 4 different public cloud providers in a manner that abstracts the specifics of each provider and gives them to the user as a generic service. Then we or anyone else on the team can use it to spin up workloads with a simple call, without having to know the implementation details of each provider.

The power of making tools like this is:

  1. we only have to solve the problem of doing a thing once
  2. we encapsulate it
  3. we make it available to other code

In the next installment of my DevOps journey, I’ll take a stab at writing the python Virtual Machine class described and we’ll see how well it works.

Best regards,


Blowing the dust off


Hard to believe it’s been a year since the last update. Time really flies doesn’t it?

I’ve maintained a study schedule, however it’s been oriented around topics that don’t make any sense for a technical blog. That’s going to continue for a time, but I’m also bringing relevant topics back into the mix. This means more regular updates here (yay!).

In addition to straight technical content , I’ll be adding some context and high level overview type things. This is my way of processing and participating in the ongoing community dialog about learning, skill stacks, and general professional development.

There are a lot of exciting things going on and not enough hours in the day. I feel like a kid in a candy store when I get up every morning. Let’s get to stuffing our pockets shall we?

Best regards,


CCIE Security – Why and How

Greetings programs!

I’ve been quite surprised by the response to the news that I passed the lab.  I suspect a lot of it is due to my friend Katherine who has a sizeable following on account of her excellent blog. If you haven’t checked it out, you should!

Anyhow, from all my new friends, I received a lot of questions.  This post is intended to answer them in one place, as well as supply some additional context, and share some of the key resources that I used in my preparation.

The big question, Why?

For large undertaking like this, you need to have a pretty strong reason, or reasons.  It’s a sizeable commitment and when things get really hard,  you’re likely to give up if there’s not a good reason for doing it.

In my case there were several reasons.  Listed in no particular order:

  • Increase my knowledge level. I like to have a deep understanding of what I’m working with.
  • Increase my income earning capability
  • Prove to myself that passing Route/Switch wasn’t a fluke; that I could put my mind to a goal of similar scope and do it again in an area where I have less background.
  • I enjoy the topics.  These toys are a **lot** of fun to play with.

Having seem mine, hopefully it will help you work with defining yours.

The commitment required

Let’s take a realistic assessment of the level of effort required.  Understand this is going to specific to the individual based upon:

  • The existing expertise brought to the starting line
  • How much of the blueprint covers work performed in day job
  • Ability to purchase quality training materials
  • Quality of the structures used to facilitate learning process
  • Consistent application of effort (i.e. not engaging in distractions during study time)
  • Capacity to absorb and process the information

In my case, I was already familiar with much of the VPN technology and I knew my way around ASA firewalls and firepower, all of which I implement and support in my day job. Additionally  I had recently completed CCIE Route/Switch, so I knew how to structure information for later retrieval and how to establish a sustainable study routine. However I was a complete newbie to ISE and Cisco wireless networking.

With all that, it took me approximately 19 months and two attempts, studying an average of 20 hours a week, for a total effort of somewhere around 1600-1700 hours.  I reckon the range for the vast majority of candidates would fall somewhere between 1200 and 2500 hours, presuming they average 20 hours a week of quality study time without significant gaps.

Support and buy in from your family

Consider this:  You’re already employed with full time job  and you have adult obligations to maintain.  When you start studying 20+ hours a week, your wife and your kids (if you have them) are going to feel your absence, and it’s not going to be easy for them.  There needs to be relevant reasons for them as well because they are going to sacrifice alongside you.  It’s a good idea to talk to them and ensure they have a crystal clear understanding of what the next 18 to 24 months going to be like, and get their buy-in.  This is not worth losing your family over!

I strongly recommend taking one day off a week to spend that quality time with your family. That may not be possible in your final push, but that’s just the end phase.  This is a marathon not a sprint.

Establishing milestones

One thing that derails a lot of candidates is lacking a clear path to the end goal.  You need a sense of urgency and momentum. It’s easy to get lost in the technical deep dives (especially with the larger topics like VPN and ISE), but it’s important to keep moving.

It’s ok to make adjustments as you go along, and there will be setbacks.  The idea is to have something to help you stay on track so at some point you’ve got your ass planted in a seat at a lab location and you’re making it happen.

Example high level outline

Months 1-3:  Pass one through the blueprint

  • Focus on basics
  • 70% theory 30% lab

Months 4-7:  Pass two through the blueprint

  • Go deeper into the details and advanced use cases, read rfc and advanced docs.
  • 50% theory 50% lab

Months 8-9:  Prepare for written

  • Heavy on theory and facts
  • Study written only topics
  • 80% theory 20% lab

End of month 9: take written exam

Months 10-12: Pass three through the blueprint

  • Repeat deep dive, with more emphasis on hands on
  • focus mostly on VPN and ISE
  • 30% theory 70% lab

Months 13-14: multi-topic labs

  • No more deep dives
  • Combine multiple technologies in each labbing session
  • get used to shifting gears
  • Wean off cli.  Build all config in notepad
  • 20% theory 80% lab

Schedule lab for 90 days out

Months 15-17: final push for lab

  • Lab every topic on blueprint during the week
  • Lab all topics in one sitting 1x/week
  • all config in notepad
  • speed and accuracy paramount
  • Stop using debugs
  • Time all exercises
    • include verification in time
  • Get friend to break your labs for troubleshooting practice
  • 10% review, 90% lab

End of month 17: First attempt

The battle of memory

Setting up your notebook and files

In the beginning, you’re not going to have much to look at in terms of accumulated knowledge. That will gradually change. It’s a really good idea to set up a system at the beginning; a whole lot easier before you’ve accumulated a bunch of stuff.

My preference these days is OneNote, but other tools will of course work fine.  Heck even an old 3 ring binder can do the job if that’s what you’re into.

Converting the blueprint into something more practical

When your setting up your notebook, the first intuition might be to mirror the blueprint.  i.e. a tab called 1.0 Perimeter Security and Intrusion Protection.   Here’s my suggestion: Organize by product and technology, with additional tabs for unprocessed notes, running and completed tasks, general info, classes taken, etc.  Here’s a screenshot of my tab list for example:


Should you start a blog?

There’s a very effective learning technique called the Feynman Technique. It consists of 5 steps.

  1. Write down the name of a concept
  2. Write out an explanation of the concept as you understand it in plain English.
  3. Review what you’ve written, study the gaps you’ve discovered in your knowledge. Update your explanation
  4. Review your explanation a final time and look for ways to simplify and improve clarity.
  5. Share this explanation with someone.

It really does work.

Blogging is a perfect vehicle for this sort of thing.  But here’s the rub: Blogging is a skill that takes effort to develop.  In my own experience, my early posts took hours to write and I just about gave up on it due to the amount of time that it took.  But as I kept doing it, I got more efficient, and ultimately I found it gave a good return on the time invested.

It’s not a requirement of course.  It’s just a tool, and there lots of other useful tools to choose from.

Written prep and fact memorization Tools

The written covers things you’ll see in the lab, but it also stresses blueprint items that aren’t testable in hands on environments, such as knowledge about theory, standards and operations, and emerging technologies.

You’ll need to have a lot of facts at your recall for the written, so you’re going to have to do a lot of memorization.  That means good old spaced repetition.

The two products that I’ve used to aid in memorization are Anki and Supermemo.  I have a personal preference for supermemo, but it’s a quirky program with a learning curve.  Anki is much easier to pick up and start using.  Either one of these tools are a good aid for the memorization grind that’s part and parcel of passing a CCIE written exam.

Gathering learning resources

This will be an ever expanding list as you get deeper into your studies, but before you start, it makes sense to have some material lined up to take you on that first tour though the blueprint.  As there is not a one stop guide for the security track, this first step can take a little bit of time.  You should have your study material for the first six to 9 months lined up before you dive in order to ensure you make productive use of your time.  The resource list near the end of this post Will hopefully make a good starting point.

Labbing equipment

You will want to invest in some equipment for this track.  If you’re serious about doing this, it’s the best way.  Yes you can do quite a bit on GN3 and EVE-NG, but you’re going to want to have a lot of seat time with the switch and the firewalls, and you’re going to want large topologies that include resource hungry appliances.  A good strategy is to pool your resources with a couple of other people and host your gear in a cheap colo.

This is the lab gear that I recommend:

  • Server: EXSi host with 128gb ram, 12 cores, 1tb disk
  • Switch: Catalyst 3650 or 3850 is best, a 3750x is a good budget option
  • AP:  3602i or similar
  • IP Phone: Cisco 7965G
  • Firewall: 2xASA 5512-X with clustering enabled.
  • Optional gear for COLO:
    • Terminal Server (I Prefer OpenGear IM series: not cheapest but quite secure)
    • Edge firewall:  ASA 5506x

Equipment thoughts and notes

There’s lots of good deals on older 1u rack mount servers on ebay, that’s easy information to research.  If you can afford more than 128gb of ram, go for it.  You’ll use all of it once you start labbing larger toplogies.

When shopping for switches on ebay avoid switches with the LAN Base image, as IP Base is the minimum version required to use most features on the blueprint.  Also, it never hurts to check the Trustsec Compatibility Matrix for the specific part you’re looking at buying.

Yes, you want to drop the coin on a pair of ASAs.  Just do it. You don’t have to spring for the x series, but you do need multicontext and clustering, and you need to be able to run version 9.2 of the ASA software, those are your minimums.

My labbing partner and I started out with a cheap terminal server and the darn thing kept getting DOSed, so we dropped 400 bucks on an older Opengear terminal server, and that thing is bulletproof.  Worth the extra money for out of band access to your gear.

Most COLOs, even budget ones, have remote hands to power cycle things for you if they get locked up.  But if you’re not comfortable with that, hooking up a PDU like an old APC AP7931 to your terminal controller will work.  I use one of these for my home rack and they’re great.

About software and platform versions

You want to match the versions on the blueprint as much as you can. Where it’s not realistic or a hassle to do so; I strongly urge you to check your syntax and configs against the lab version periodically if possible.

Personal experience: I my environment I labbed on the newer version of the CSR1kv image that has the 1mb/sec throughput limitation in demo mode.  That’s great, but some of my configuration speed optimizations blew up on lab day.  That repeated in several areas, including differences between the 3750x in my lab and the 3850 in the actual lab.

Bottom line:  It’s a lot of money and stress to go sit the lab, and when the clock is ticking and you’re running into unexpected platform and software issues, the little bit of money you saved on that screaming ebay deal for an older part will be long forgotten.

About licensing the appliances in your lab.

Most of stuff is not a problem, you can run them on demo licenses. The exceptions are NGIPS WSA and ESA which are a pain.  If you don’t work for a partner or Cisco, your local Cisco SE should be able to get you 3 month licenses for those appliances to assist with lab prep.

Establishing the daily routine

This is a critical item. It’s just like a fitness routine.  Consistency is everything.  You’re not going to be able skip studying during the week and binge on weekends expecting quality results.  It just doesn’t work that way.

In my view, the bare minimum do to this in 18 months is 20 hours a week.  That’s 20 focused hours without distractions.

There are going to be days when you *really* don’t want to study. Embrace the suck my friend, and just do it anyway.

Suggested Schedule

  • 3-4 hours a day Monday through Friday
  • 8 hours Saturday
  • Sunday off

If you stick to that week in and week out, you will make good steady progress.

One thing that really helped me is a tool called tomato timer.  The idea is for 25 minutes you’re hyper focused.  Then you take 5 minutes off. Then 25 minutes on again.  It really helps with procrastination and the urge to check your twitter feed and all that.

Responding when life gets in the way

FACT:  Things are going to happen that are out of your control.  You’re probably going to get pulled away from your studies for a week or two here and there.

Try to at least read a little bit and stay mentally engaged on some level every day until the clouds part and you’re able to get back on the grind again.  Main thing is don’t let a short break turn into a long one; it happens very easily and you’ll be upset with yourself as you have to retrace your steps to get back to where your were in your progress.

Resource List:

Here’s a list of learning resources that I found to be particularly helpful.  (Not including obvious things like product documentation).



Instructor led Training


DCloud Labs

  • Security Everywhere
  • IPSEC VPN troubleshooting
  • Web Security Appliance Lab v1

Cisco Live Sessions

Cisco ISE community

Closing thoughts

I hope this answered some of your questions and gave you a sense of a path to get to the CCIE Security lab.  I really enjoyed reliving all the memories creating this post triggered.  It’s been quite an adventure. 🙂

Best regards, and may you care for yourself with ease,



CCIE Security Lab 1, Steve 1 – I passed!

Greetings Programs!

I sat my second attempt at the CCIE security lab on Monday, March 5th, 2018. This time I got the pass. It was not without a bit of drama.


TL;DR: It’s a very tough exam, harder than you think, harder than you remember. Fight with everything you have until the proctor tells you to stop.

The longer version.

Had the usual anxiety thing in the days leading up to lab day. Did what I could to minimize it. Got ok sleep the night before. On Lab day I wasn’t nearly as nervous as I was in January.

Tshoot.  I came out swinging and quickly solved most of the tickets. Then I had an issue with ISE that left me dead in the water for half an hour. left one ticket unsolved.

Diag was like last time, pretty easy, chance to take a breather and get ready for the main event.

Config started out well, was flying through the first few tasks. Had some annoying issues related to code and platform differences between my personal lab and the actual exam, don’t recall that happening the last attempt.

Then the trouble began.

ISE was acting up, so I attempted to restart it. Application shut down dirty with database errors and would not start. While all this was going on I was trying to work on other stuff and keep moving. Eventually the proctor got ISE running somehow, but the database corruption was an issue and ISE wasn’t adding endpoints to the database, which made several tasks impossible to complete. The lowest point was when he was sending people home, and I knew I didn’t have the points. When he said I could continue I put my head down and kept grinding.

I was just about out on my feet but I kept battling. When I was told I’d have to stop, I looked at my scorecard to add the points, and all the tasks that could be completed were.  Scorecard said I had the points with a small cushion. Would the tasks verify though?

To my amazement I got the email about 3 hours later with the good news.

Much respect to the proctor who probably stayed at work late in order to let me make up the time lost by the ISE debacle. Those folks have a very tough job putting up with us.

Final thought: The CCIE lab is **always** so much harder than you remember it. You can’t simulate what it’s really like. Even after just 7 weeks gap I was going “what the hell!?!”.

Now I get to make things up to my wife and try to lose the 10 lbs of body fat I put on in the final push.

Warmest regards,


CCIE security lab 1, Steve 0

I sat the CCIE security lab on Friday, January 12th. Result was Pass/Pass/Fail. That means overall I failed.

For those who are unaware, there are three sections to the exam; Troubleshooting, Diagnostics, and configuration.

Here’s my interpretation of those sections:

Troubleshooting is like getting a bunch of 2:30am phone calls where something is broken and you need to hop on and fix it now.

Diagnostics is like getting emails at work where a field tech and/or the Helpdesk are stuck on a problem, and you have to figure it out from a couple of screenshots and some log/device output. You do not get to actually touch the devices.

Config is like being a consultant asked to complete a build on an enterprise network where sales severely underbid the required project hours and you’ve got to make it happen.

With that out of the way, I’d like to break down what went well, what went less than well, and what I need to improve on.

What Went Well

Staying calm

I did a lot of self care in the 36 hours leading up to sitting the lab. Mostly breath meditation, and reliving an amazing road trip I took last summer. Looked at photos, listened to a playlist of music I listened to on the trip, closed my eyes and relived moments. That was very effective, it’s amazing what we can draw from those times when we need them.


In the run up my waking and sleeping hours were all over the map in a mad dash to get all the labbing in I could, but I needed to rein that in. About a week prior to lab day I started adhering to a rigid routine of going to bed and getting up early. A week was enough to get my body clock reset.

Structured approach to the exam

Keeping a score card, task list, reading questions multiple times to make sure I’m not overlooking small but critical details, and making notes. Using checklists for long tasks. This all worked very well, no problem there.

Keeping all cli config in notepad

Building everything in notepad makes it dead easy to make updates and corrections, and to refer back to what you did earlier. Its also an exam saver if for some reason you have to reinitialize a device. Little things that I do cli first, I will still transfer those to notepad.

Quickly discovering relevent information

I didn’t have much trouble quickly isolating the relevant area I needed to look at in troubleshooting or diag, that was a good feeling and gave me confidence.

Theory and technology knowledge

I was pleased that my topic knowledge was pretty much up to snuff. All the hard work of learning the technology paid off when I found the tickets straightforward and I never got lost or confused. There is something related to TS I need to improve to save time for config, I’ll touch on that in the next section.

What Did Not Go Well

Slow at verification

This first showed up in tshoot. I was spending more time verifying corrections than I was fixing the issue. I think there were two causes.

1. I have no field experience with some of the technology because I’ve never seen it on a live network. This is a disadvantage of working in enterprise; I have a narrower cross section of exposure.

2, In my final prep I was using fixed topologies and designs that I knew intimately. Consequently I didn’t have to put a lot of effort into verification. On a network you’ve never seen, and you don’t know the underlying details of the routing design other than what’s depicted on the topology diagram, it’s a completely different situation.

These significantly impacted my speed in tshoot and config. Spent way too much time on verification.

Managing Information Overload

I was feeling pretty good coming out of Troubleshooting and Diag. As I started reading through the config section, I did not have that good feeling. Some of the tasks were very long and packed with small details. There are key details that you will only pick up on if you read carefully and understand the technology. This intimidated me and caused me to worry too much.

I spent way too much time double checking and verifying those details. I think you have to trust yourself enough to bang it out, and double check your work at the end of day, otherwise you won’t get enough done.

Not Checking the Preconfigure

There may be stealth faults in the configuration section; not checking the preconfigure can extract a heavy toll in wasted time as you have to reset appliances and clean up the mess. Unfortunately this happened to me. Rookie mistake,

Not enough speed/memorization of cli commands

There is a lot to do in config. You have to be capable generating large blocks of config in notepad and pasting them into the devices error-free for any of the technologies being tested. For gui based task elements, you have to be able zip through the ui and configure whats needed without any poking around. That’s the bar, simple as. Some things I was at that level with, others I was not.

Knob Recall

This exam does ask for some knob turning (i.e. advanced requirements). That was a bit of a mixed bag for me. I think these are intended to slow you down more than anything, and that they will if you don’t know where the knob lives.

What I Need To Improve

Verification Speed

That means giving it consistent attention and focus

Configuration Speed

That means being able to create in notepad any tested cli based technology cold….and have it paste in error-free 10x out of 10.

For example if you’re asked to spin up flex mesh, you need to be able to look at the requirements given for it in the task, type it up in notepad, paste it into the routers, and have it come up and work right out of the gate. You may be able to get away with checking cli syntax for some of the knob turning, but at a minimum you need to be able to quickly spin up a basic working configuration from memory without errors.

Knob knowledge

Add knob turning elements into practice exercises

Knowledge hole patching

Study a couple of small things I wasn’t knowledgeable enough on.

Wrap Up

Overall it wasn’t a bad showing, would like to have done a bit better. But the results weren’t terrible. Encouraged that most of what needs to be done is simple repetition to gain speed, rather than correcting issues in fundamental knowledge or approach.

Hopefully I can even the score next time.

Thanks for walking with me.



SEC-1.6 IOS Zone Based Firewall (ZBFW)

Greetings programs!

Today we’re talking about zone based firewall.


Zone based firewall is a stateful firewall available as a feature on cisco routers running ios and ios-xe. It’s capable of using nbar to identify traffic and and can perform deep packet inspection (DPI) on a few protocols (the most notable being http). Interestingly it can use Trustsec Security Group Tags (SGTs) as a matching condition.

It offers a cost effective solution for a couple of common cases. One would be small branch offices utilizing an internet connection for Guest internet and a backup dmvpn tunnel.

ZBFW provides a strong alternative to the care and feeding of a separate firewall in some situations, especially now that IOS-XE is capable of running some pretty cool containerized security apps like Snort and Stealthwatch learning network.

There are a couple of tradeoffs to using ZBFW that I noted when I was playing with it. The biggest one is classes in the service policy cannot be reordered or inserted inline.  you have to delete and re-create the policy, then add it back in to the zone pair.  That could lead to some change management headaches.

With that out of the way, let’s go ahead and take a look at this thing.


What does Stateful mean?

The term stateful in the context of a firewall means the firewall builds and maintains a connection table based on traffic it receives on it’s interfaces.  It uses this information to automatically allow the return traffic when it sees the match in the connection table.

A row in a generic connection would look something like this:


This would be a http connection from to  when the return packet from comes back the firewall sees the match in the connection table and automatically allows the traffic. This neat little trick is what makes a firewall a firewall in the sense that most people understand it.

Basic ZBFW operation

for starters have a quick look at this cheat-sheet that shows a simple use case involving internet access for some corporate computers and guest endpoints.

The way it works at a high level is classes are use to match traffic. A policy calls the classes and sets the action for the matching traffic .  Finally the policy is attached to a zone-pair.  A zone pair defines the source and destination interfaces that the policy will be applied to.

Definitions of terms


Zones define interfaces that share a common security policy. Traffic can move freely between interfaces in same zone.

It’s important to know, Interfaces that belong to security zones cannot communicate with interfaces that do not belong to security zones. This is something to keep in mind when designing the deployment and when troubleshooting.

Classes and polices:

There are two kinds of classes and policies. Layer 3/4 and layer 7.


Layer 3/4 Class maps

Layer 3/4 class maps are traffic selectors. Classes are used to identify what traffic we want to apply an action to. Class maps can match based on protocol (NBAR), access-lists, another class map for compound conditions, security group tags (trustsec), and user groups.

layer 7 class maps

A layer 7 class map is used for Deep packet inspection DPI. Most commonly used with HTTP to match attributes or content of http traffic. It’s outdated since most web traffic is encrypted now, but it’s there so I’m mentioning it.

Policy maps

Layer 3 Policy maps

Layer 3 Policy maps contain a list of classes, and the actions we want to perform on the classes.

There is a built-in class called class-default which matches anything not explicitly matched. It can be very useful for troubleshooting to call that class and add the log action to it to see what’s being blocked.

Layer 7 policy maps

Layer 7 policy maps apply actions to the traffic identified by the layer 7 classes.  These classes cannot be called directly from a service policy.  they are called under a layer 3/4 class in a layer 3 policy map that identifies the traffic flows we want to perform DPI on.

Here’s a example

Class-map type inspect http CM-L7-HTTP-1
match response header length gt 5000
policy-map type inspect http PM-L7-HTTP-1
class CM-L7-HTTP-1
Class-map type inspect CM1
match protocol http
policy-map type inspect PM1
class CM1
service-policy http PM-L7-HTTP-1


Important note about classes in policies

Note:  Classes need to be called in most to least specific order.  For example the there was a class  matching on tcp then a class matching on http, the http rule would never fire.  To re-order the classes, you must delete and re-create the policy, then re-add it back to the zone pair.


There two types of nesting. nested classes and nested policy maps

Nested classes are used for creating compound matching conditions. i.e. example inspect these protocols for those hosts.

Nested policy maps are used in deep packet inspection. layer 7 policy maps are called under a class in a layer 3 policy map

Nested class example.  this accomplishes the goal of allowing a specific list of protocols for a specific host.  It’s a basic compound condition.

access-list 100 permit ip any
class-map type inspect match any CM1-L4
match protocol http
match protocol dns
match protocol icmp
Class-map type inspect match-all CM1-L3
match access-group 101
match class CM1-L4

Parameter Maps

There are quite a few different kinds of parameter maps. I’m going to cover a couple of them briefly.

Layer 3/4 maps are used primarily for DDOS prevention. they are attached to the action in a policy map  i.e. “inspect PARAM-MAP”  where param-map is a parameter map.

The most common layer 7 parameter map is regex which is used to match strings in http traffic. It’s applied as an argument to a layer 7 class map.

There is a global inspect parameter map, and this where NBAR2 for protocol classification can be enabled, as well as general connection controls.

Zone pairs

A zone pair defines what traffic is allowed to pass from a source zone to a destination zone. Being stateful, the zone pair builds a connection table based on outgoing traffic. return traffic that matches an entry in the connection table is allowed.

The self zone

The self zone is a special built in zone that’s used to control traffic to and from the control plane of the router.

The self zone has some different behaviors and restrictions from normal zones, and it’s worth taking a closer look at it.

Zone pairs involving the self zone are not stateful. They behave a lot more like access control lists. There have been changes to how self works from version to version, so be mindful of that and test the your image before deploying self zone on a live network. For the version of IOS I tested 15.6(2)T, this is what I found:

By default traffic is allowed to and from the control plane to any zone unless the self zone is added to a zone pair, and a policy attached to it.

If a zone pair is created using self but no service policy is attached, traffic is still allowed.

If a service policy is attached, traffic in that direction is now restricted to what’s permitted in the policy, however, traffic in the other direction is not affected.

Because the self zone is not stateful, you must use the pass argument instead of inspect to allow the traffic, as the inspect directive has no meaning.

ZBF configuration workflow

1. diagram out your zones and policies
a. determine if you need nested classes
i. i.e. match x protocols for y hosts
2. define zones
3. define parameter maps if called for (advanced)
4. create traffic selectors
protocol matching
5. create policy maps
a. call class(es)
b. set action
6. create zone pairs and assign policies
7. assign interfaces to zones
8. test policies
9. review output to verify

Best practices

1. good naming convention: type, direction.
a. ex: for layer 3/4 class map something like: CM-IN2OUT
b. ex: layer 7 CM-L7-BLOCKED-SITES
2. call class-default with drop log action to catch errors
3. pro tip: you can flip a class between match-any and match-all by just re-inputing the command. don’t have to remove re-add

Verification commands

1. show zone security – shows zone interface assignment
2. show zone pair security – verify overall configuration
3. show policy-map type inspect zone-pair – shows statistics for zone pair
4. show policy-map type inspect zone-pair sessions | s Established – shows connection table info.
5. show run | s class|policy|parameter|zone – basic dump of the config elements

Challege lab

This lab topology takes ZBFW and places it in the context of a typical branch office network doing Direct Internet Access (DIA).  The zip contains the challenge and a solution.  enjoy!

SEC-4.5.1 Troubleshooting Web Authentication (WebAuth) for ISE

Greetings programs!

In this post, I’m going to go through my troubleshooting workflow for webauth redirect.


Webauth redirect is a core function of providing Network Access control with Identity Services Engine ISE.  It’s used for a number of critical authentication flows, and when it does not work, you will not be able to provide guest access or onboard devices.

Taken as a whole the configuration and processes between ISE and the Network Access devices (NADs, which are Switches, and Wireless LAN Controllers) is quite complex, especially in the case of the switches.  Trying to troubleshoot by staring at pages and pages of config, and making random google searches is going to be slow and painful.  It’s much better to understand the information flow and dependencies, and using the device output to logically deduce where the problem lies.

Lab Setup

Network Topology

This is the toplogy we’re going to use for our example. It’s a cobbled together lab.  ISE is running in a remote location, the vWLC is running in esxi on an intel NUC in my home lab.

ISE Configuration

Our ISE configuration is going to as simple as I can make it. This will be our policy:

Policy Sets

There will be separate policy sets for wired and wireless for the sake of clarity.

Authentication policy

  • Mac Address Bypass (MAB)
    Continue if authentication fails

Authorization Policy

  • IT – MAC addresses in the IT group get full access to the network
  • GUEST – Guest users Will have access to the internet only
  • Webauth Redirect – If the connecting device is unknown to ISE, we’ll:
    • Apply an ACL that restricts network access to the bare minimum
    • Redirect the user to a registration portal.

Guest Registration Portal Configuration

To keep things simple, we’re going to allow self registration, which is not something you would normally do.  After registering, the MAC address of the user’s device will be placed in the GuestEndpoints Group, which is the default for ISE.


WebAuth Redirect process flow

At a high level a basic flow works like this:

  1. Unkown device connects to network
  2. ISE returns a result to the Network Access Device (NAD) with a redirect URL.
  3. when the user tries to connect to a website the NAD intercepts the HTTP get request and returns a 302 redirect, pointing towards the web portal on ISE.  The redirect also contains the Radius session ID and the original url the user was trying to reach.
  4. The user and device are onboarded.
    1. in the case of guests, the mac address of the device is stored in the endpoint database and associated with a guest endpoint group
    2. for BYOD, the device will be issued a certificate for dot1x authentication
  5. after onboarding, ISE will send a Change of Authorization (CoA) to the NAD.  This re-triggers the authentication process as if the user had just connected.
  6. Because the Device is now known to ISE, it should match an authorization rule, giving the device network access.

Troubleshooting workflow

To troubleshoot efficiently, it’s important to have a workflow that produces consistent results, then you need to trust it!  It can be tempting to take shots in the dark hoping for the quick fix, but in the long run a process oriented approach will be consistently more productive and less stressful.

Troubleshooting workflow summary:

  1. Trigger the process
    1. connect to network
    2. issue http get via web brower
  2. Check the radius log on ISE to verify output
  3. Check the Authentication result on the Switch/WLC to verify output
  4. Determine what stage of the flow the fault is occurring at
  5. Check dependencies for the failed portion of the flow
  6. Correct what appears to be the problem.
  7. Trigger the process again
    1. If it’s still not working did we fix a fault?
      1. if not, put the change back
        1. return to step 1
      2. If still not working but a fault was fixed
        1. go to step 5

Now that we have our troubleshooting flow written down, let’s get started


Wireless WebAuth Redirect

on the WLC, the Redirect ACL uses opposite processing logic from the switches.   Permit means “dot not redirect this traffic”.  All other traffic will be redirected.  This means we don’t have to bother with referencing an airespace ACL on the WLC as the implicit deny will prevent the remaining traffic from going anywhere.

Additionally, ACLs on the WLC have the following additional differences.

  1. They use natural masks not wildcard masks
  2. They’re bidirectional
    1. you must define the flow in both directions

Dependencies on the WLC

  1. Redirect ACL
    1. must be defined on the WLC
  2. CoA support
    1. disabled by default when defining a radius server
  3. MAC filtering on the layer 2 security configuration

Troubleshooting a wireless problem

Portal page doesn’t load in the browser

Typically a problem with the redirect ACL on the WLC.  Steps to investigate:

  1. Check ISE livelog for webauth redirect status
  2. Check Client Status on WLC
  3. Check webauth ACL.
    1. Make sure DNS is able to resolve hostname in URL
    2. Ensure ACL is permitting the ip address of the ISE node

User does not gain access to the network after logging in to portal

Common causes:

VLAN change in portal config without Java capable browser

Changing VLANs at the end of a guest flow requires the the user to download and run the NAC Agent to release and renew the IP address of the client. The NAC agent is Java based, and not supported in current web browsers. Vlan change for guest flows is not a good idea.

Enabling this setting will trigger a java applet download to run the NAC agent when the users completes log in.  Probably not a good idea.

Change of authorization Failure

CoA is disabled by default in the Radius server configuration on the WLC, so it’s probably a common fault.  Let’s use this one as an example of how to look at the device output to work out what’s going on.

First place to look is the Radius LiveLog.  Looking at the sequence of events, we can see that the client hit the webauth rule, logged in, and then something went wrong.

We already know from glancing at this that first few steps worked and the user access the portal, but let’s take the opportunity to look at output on ISE and WLC for the webauth redirect step.

Checking the initial log entry on ISE (this is at the very bottom):

there’s the redirect with the with the session id.  The user-name is the mac address of the wireless in the client.

Now let’s go to the WLC and take a look:

the redirect URL and redirect ACL have been applied to the client.  The WLC has client in a Central Web Auth State.

Let’s take a quick look at the redirect ACL.

Redirect ACL allows DNS and traffic to and from ISE.  all non http/s traffic will be dropped.  HTTP/s traffic will be intercepted and the redirect mesages with the redirect URL will be send to the client.

So everything looks good until the end.  Let’s take a look at that last log entry in ISE.

Change of Authorization Failure.

Ok, let’s go take a look at the configuration on the WLC.

Sure enough.  Change of Authorization was disabled.

Ok, let’s enable CoA, delete the endpoint and guest account from ISE, and give it another try.

Let’s break down what’s going on here.

  1. User gets redirected
  2. He signs in
  3. mac address of the deivce is added to the guest endpoints Identity group
  4. Change of Authorization is sent to the WLC
  5. The WLC authenticates the user against ISE
  6. The authentication request loops back through our Authorization policy only this time it it hits on our wireless guest rule because the device is now in the guest endpoints group.

Ok that was cool right?


Wired WebAuth Redirect

From the perspective of ISE


This screenshot depicts the set of Radius attributes that are sent down to the NAD.  There are 4 items.

  1. Radius Access-Accept
  2. The Downloadable ACL to restrict network access
  3. The redirect URL for guest registration portal
  4. The Redirect ACL

The Redirect ACL is the always the confusing part.  Put simply, The redirect ACL tells the switch what traffic NOT to redirect and what traffic to intercept for redirection.  This solves a chicken and egg problem.  If http get requests to ISE were intercepted by the switch, what you would have is a redirect loop and you would never reach the portal.

Short answer: Permit means “redirect this traffic”.  Deny means “leave this traffic alone”.

Keep in mind however only http/https traffic can actually be redirected.  The Switch uses the built in web server to create the http redirect message that’s sent back to the client web browser.

Dependencies on the Switch

In addition to the normal radius and dot1x stuff that’s required, WebAuth requires some additional items.

  1. Redirect ACL
    1. is defined locally on the switch
  2. Authentication proxy
    1. needed to support web authentication feature
  3. CoA Support
    1. needed to initiate re-authentication after registering in the portal
  4. IP Device tracker
    1. Needed for the Enforcement Policy Module (EPM) to have awareness of the endpoint’s ip address.
  5. HTTP/HTTPS server enabled
    1. The switches onboard web server is what intercepts the connection request, handshakes with the client’s web browsers, sends the redirect message.

Ok, let’s apply our troubleshooting workflow to a wired problem.

Troubleshooting a Wired problem

User doesn’t get portal page when connecting.

Just like with wireless, we start by looking at log and device output to narrow the scope.

Let’s start with the radius log.  It looks like we’re stopped at webauth redirect.

Let check the switch.

we can see we’re on switchport g1/0/1 and that the port is in the state we expect.  so why isn’t it working?  Let’s check the Redirect ACL.

Redirect ACL looks good.  hmm.  What are our other dependencies?  Ah!

Http server is not running.  ok let’s fix that and try again.

What?  Why is the switch sending a reset instead of redirecting?  we know the http server being shut off was an problem.  It must mean there’s another fault with the switch.  Let’s check for ip device tracking.


What’s this, command doesn’t work?  Better check the documentation.

Ah, looks like our trusty dot1x template for ios-xe 15 isn’t working so well for ios 16, now we have to use some different commands to get ip device tracking to work.  So let’s do that:


These are hard problems to solve because you have multiple faults and when the first thing you find doesn’t fix it, it’s easy to start doubting yourself.  maybe it’s a problem with the endpoint etc.  But by trusting the Radius logs, device output, and our knowledge of the process flow, we can deduce where the problem has to be, and you have to follow the output where it leads you.

I hope this was helpful.  It’s helpful for me to write it down.


Until next time.