Great treat on a Friday afternoon.
I’ve been quite surprised by the response to the news that I passed the lab. I suspect a lot of it is due to my friend Katherine who has a sizeable following on account of her excellent blog. If you haven’t checked it out, you should!
Anyhow, from all my new friends, I received a lot of questions. This post is intended to answer them in one place, as well as supply some additional context, and share some of the key resources that I used in my preparation.
The big question, Why?
For large undertaking like this, you need to have a pretty strong reason, or reasons. It’s a sizeable commitment and when things get really hard, you’re likely to give up if there’s not a good reason for doing it.
In my case there were several reasons. Listed in no particular order:
- Increase my knowledge level. I like to have a deep understanding of what I’m working with.
- Increase my income earning capability
- Prove to myself that passing Route/Switch wasn’t a fluke; that I could put my mind to a goal of similar scope and do it again in an area where I have less background.
- I enjoy the topics. These toys are a **lot** of fun to play with.
Having seem mine, hopefully it will help you work with defining yours.
The commitment required
Let’s take a realistic assessment of the level of effort required. Understand this is going to specific to the individual based upon:
- The existing expertise brought to the starting line
- How much of the blueprint covers work performed in day job
- Ability to purchase quality training materials
- Quality of the structures used to facilitate learning process
- Consistent application of effort (i.e. not engaging in distractions during study time)
- Capacity to absorb and process the information
In my case, I was already familiar with much of the VPN technology and I knew my way around ASA firewalls and firepower, all of which I implement and support in my day job. Additionally I had recently completed CCIE Route/Switch, so I knew how to structure information for later retrieval and how to establish a sustainable study routine. However I was a complete newbie to ISE and Cisco wireless networking.
With all that, it took me approximately 19 months and two attempts, studying an average of 20 hours a week, for a total effort of somewhere around 1600-1700 hours. I reckon the range for the vast majority of candidates would fall somewhere between 1200 and 2500 hours, presuming they average 20 hours a week of quality study time without significant gaps.
Support and buy in from your family
Consider this: You’re already employed with full time job and you have adult obligations to maintain. When you start studying 20+ hours a week, your wife and your kids (if you have them) are going to feel your absence, and it’s not going to be easy for them. There needs to be relevant reasons for them as well because they are going to sacrifice alongside you. It’s a good idea to talk to them and ensure they have a crystal clear understanding of what the next 18 to 24 months going to be like, and get their buy-in. This is not worth losing your family over!
I strongly recommend taking one day off a week to spend that quality time with your family. That may not be possible in your final push, but that’s just the end phase. This is a marathon not a sprint.
One thing that derails a lot of candidates is lacking a clear path to the end goal. You need a sense of urgency and momentum. It’s easy to get lost in the technical deep dives (especially with the larger topics like VPN and ISE), but it’s important to keep moving.
It’s ok to make adjustments as you go along, and there will be setbacks. The idea is to have something to help you stay on track so at some point you’ve got your ass planted in a seat at a lab location and you’re making it happen.
Example high level outline
Months 1-3: Pass one through the blueprint
- Focus on basics
- 70% theory 30% lab
Months 4-7: Pass two through the blueprint
- Go deeper into the details and advanced use cases, read rfc and advanced docs.
- 50% theory 50% lab
Months 8-9: Prepare for written
- Heavy on theory and facts
- Study written only topics
- 80% theory 20% lab
End of month 9: take written exam
Months 10-12: Pass three through the blueprint
- Repeat deep dive, with more emphasis on hands on
- focus mostly on VPN and ISE
- 30% theory 70% lab
Months 13-14: multi-topic labs
- No more deep dives
- Combine multiple technologies in each labbing session
- get used to shifting gears
- Wean off cli. Build all config in notepad
- 20% theory 80% lab
Schedule lab for 90 days out
Months 15-17: final push for lab
- Lab every topic on blueprint during the week
- Lab all topics in one sitting 1x/week
- all config in notepad
- speed and accuracy paramount
- Stop using debugs
- Time all exercises
- include verification in time
- Get friend to break your labs for troubleshooting practice
- 10% review, 90% lab
End of month 17: First attempt
The battle of memory
Setting up your notebook and files
In the beginning, you’re not going to have much to look at in terms of accumulated knowledge. That will gradually change. It’s a really good idea to set up a system at the beginning; a whole lot easier before you’ve accumulated a bunch of stuff.
My preference these days is OneNote, but other tools will of course work fine. Heck even an old 3 ring binder can do the job if that’s what you’re into.
Converting the blueprint into something more practical
When your setting up your notebook, the first intuition might be to mirror the blueprint. i.e. a tab called 1.0 Perimeter Security and Intrusion Protection. Here’s my suggestion: Organize by product and technology, with additional tabs for unprocessed notes, running and completed tasks, general info, classes taken, etc. Here’s a screenshot of my tab list for example:
Should you start a blog?
There’s a very effective learning technique called the Feynman Technique. It consists of 5 steps.
- Write down the name of a concept
- Write out an explanation of the concept as you understand it in plain English.
- Review what you’ve written, study the gaps you’ve discovered in your knowledge. Update your explanation
- Review your explanation a final time and look for ways to simplify and improve clarity.
- Share this explanation with someone.
It really does work.
Blogging is a perfect vehicle for this sort of thing. But here’s the rub: Blogging is a skill that takes effort to develop. In my own experience, my early posts took hours to write and I just about gave up on it due to the amount of time that it took. But as I kept doing it, I got more efficient, and ultimately I found it gave a good return on the time invested.
It’s not a requirement of course. It’s just a tool, and there lots of other useful tools to choose from.
Written prep and fact memorization Tools
The written covers things you’ll see in the lab, but it also stresses blueprint items that aren’t testable in hands on environments, such as knowledge about theory, standards and operations, and emerging technologies.
You’ll need to have a lot of facts at your recall for the written, so you’re going to have to do a lot of memorization. That means good old spaced repetition.
The two products that I’ve used to aid in memorization are Anki and Supermemo. I have a personal preference for supermemo, but it’s a quirky program with a learning curve. Anki is much easier to pick up and start using. Either one of these tools are a good aid for the memorization grind that’s part and parcel of passing a CCIE written exam.
Gathering learning resources
This will be an ever expanding list as you get deeper into your studies, but before you start, it makes sense to have some material lined up to take you on that first tour though the blueprint. As there is not a one stop guide for the security track, this first step can take a little bit of time. You should have your study material for the first six to 9 months lined up before you dive in order to ensure you make productive use of your time. The resource list near the end of this post Will hopefully make a good starting point.
You will want to invest in some equipment for this track. If you’re serious about doing this, it’s the best way. Yes you can do quite a bit on GN3 and EVE-NG, but you’re going to want to have a lot of seat time with the switch and the firewalls, and you’re going to want large topologies that include resource hungry appliances. A good strategy is to pool your resources with a couple of other people and host your gear in a cheap colo.
This is the lab gear that I recommend:
- Server: EXSi host with 128gb ram, 12 cores, 1tb disk
- Switch: Catalyst 3650 or 3850 is best, a 3750x is a good budget option
- AP: 3602i or similar
- IP Phone: Cisco 7965G
- Firewall: 2xASA 5512-X with clustering enabled.
- Optional gear for COLO:
- Terminal Server (I Prefer OpenGear IM series: not cheapest but quite secure)
- Edge firewall: ASA 5506x
Equipment thoughts and notes
There’s lots of good deals on older 1u rack mount servers on ebay, that’s easy information to research. If you can afford more than 128gb of ram, go for it. You’ll use all of it once you start labbing larger toplogies.
When shopping for switches on ebay avoid switches with the LAN Base image, as IP Base is the minimum version required to use most features on the blueprint. Also, it never hurts to check the Trustsec Compatibility Matrix for the specific part you’re looking at buying.
Yes, you want to drop the coin on a pair of ASAs. Just do it. You don’t have to spring for the x series, but you do need multicontext and clustering, and you need to be able to run version 9.2 of the ASA software, those are your minimums.
My labbing partner and I started out with a cheap terminal server and the darn thing kept getting DOSed, so we dropped 400 bucks on an older Opengear terminal server, and that thing is bulletproof. Worth the extra money for out of band access to your gear.
Most COLOs, even budget ones, have remote hands to power cycle things for you if they get locked up. But if you’re not comfortable with that, hooking up a PDU like an old APC AP7931 to your terminal controller will work. I use one of these for my home rack and they’re great.
About software and platform versions
You want to match the versions on the blueprint as much as you can. Where it’s not realistic or a hassle to do so; I strongly urge you to check your syntax and configs against the lab version periodically if possible.
Personal experience: I my environment I labbed on the newer version of the CSR1kv image that has the 1mb/sec throughput limitation in demo mode. That’s great, but some of my configuration speed optimizations blew up on lab day. That repeated in several areas, including differences between the 3750x in my lab and the 3850 in the actual lab.
Bottom line: It’s a lot of money and stress to go sit the lab, and when the clock is ticking and you’re running into unexpected platform and software issues, the little bit of money you saved on that screaming ebay deal for an older part will be long forgotten.
About licensing the appliances in your lab.
Most of stuff is not a problem, you can run them on demo licenses. The exceptions are NGIPS WSA and ESA which are a pain. If you don’t work for a partner or Cisco, your local Cisco SE should be able to get you 3 month licenses for those appliances to assist with lab prep.
Establishing the daily routine
This is a critical item. It’s just like a fitness routine. Consistency is everything. You’re not going to be able skip studying during the week and binge on weekends expecting quality results. It just doesn’t work that way.
In my view, the bare minimum do to this in 18 months is 20 hours a week. That’s 20 focused hours without distractions.
There are going to be days when you *really* don’t want to study. Embrace the suck my friend, and just do it anyway.
- 3-4 hours a day Monday through Friday
- 8 hours Saturday
- Sunday off
If you stick to that week in and week out, you will make good steady progress.
One thing that really helped me is a tool called tomato timer. The idea is for 25 minutes you’re hyper focused. Then you take 5 minutes off. Then 25 minutes on again. It really helps with procrastination and the urge to check your twitter feed and all that.
Responding when life gets in the way
FACT: Things are going to happen that are out of your control. You’re probably going to get pulled away from your studies for a week or two here and there.
Try to at least read a little bit and stay mentally engaged on some level every day until the clouds part and you’re able to get back on the grind again. Main thing is don’t let a short break turn into a long one; it happens very easily and you’ll be upset with yourself as you have to retrace your steps to get back to where your were in your progress.
Here’s a list of learning resources that I found to be particularly helpful. (Not including obvious things like product documentation).
- IKEv2 IPsec Virtual Private Networks – Amjad Inamdar, Graham Bartlett
- Cisco ISE for BYOD & Secure Unified Access – Jamey Heary, Aaron Woland
- PKI Uncovered – Francois Dessart, Srinivas Tenneti, Andre Karamanian
- Windows Server® 2008 PKI and Certificate Security – Brian Komar
- Cisco Firepower Threat Defense (FTD) – Nazmul Raji
- CCIE Evolving Technologies Study Guide – Nick Russo (essential for the written)
- Labminutes ISE 2.2 video Bundle
- Labminutes ISE 2.0 Video Bundle
- Labminutes FTD 6.1 Video Bundle
- Labminutes FlexVPN Video Bundle
- Kahn Academy: Journey into crytpography
Instructor led Training
- Micronics Mastering ASA
- Micronics Mastering VPN
- Micronics CCIE Security Bootcamp Workbook (comes with class only)
- Security Everywhere
- IPSEC VPN troubleshooting
- Web Security Appliance Lab v1
Cisco Live Sessions
- BRKCCIE-3340 CCIE Security Exam tutorial
- BRKSEC-3121 FTD Capablities, Deployment, Troubleshooting
- BRKSEC-3005 Cryptographic protocols and Algorithms
- BRKSEC-3054 Advanced FlexVPN
- BRKSEC-3001 Advanced IKEv2
- BRKSEC-3051 Troubleshooting GETVPN Deployments
- BRKSEC-3052 Advanced DMVPN
- BRKSEC-3690 Advanced Security Group Tags
- BRKSEC-3697 Advanced ISE Svcs/Tips/Tricks
- BRKSEC-3771 Web Security Deployment with WSA
- BRKEWN-2014 Wireless Guest Access
- BRKSEC-2501 Anyconnect SSL VPN
- BRKSEC-3033 Advanced Anyconnect Deployment
- BRKSEC-2697 Clientless VPN
Cisco ISE community
- ISE Auth Feature Flows
- Universal IOS Switch Configuration for ISE
- Universal WLC configuration for ISE
- TACACS Configuration for IOS
- TACACS Configuration for ASA
- TACACS Configuration for WLC
I hope this answered some of your questions and gave you a sense of a path to get to the CCIE Security lab. I really enjoyed reliving all the memories creating this post triggered. It’s been quite an adventure. 🙂
Best regards, and may you care for yourself with ease,
I sat my second attempt at the CCIE security lab on Monday, March 5th, 2018. This time I got the pass. It was not without a bit of drama.
TL;DR: It’s a very tough exam, harder than you think, harder than you remember. Fight with everything you have until the proctor tells you to stop.
The longer version.
Had the usual anxiety thing in the days leading up to lab day. Did what I could to minimize it. Got ok sleep the night before. On Lab day I wasn’t nearly as nervous as I was in January.
Tshoot. I came out swinging and quickly solved most of the tickets. Then I had an issue with ISE that left me dead in the water for half an hour. left one ticket unsolved.
Diag was like last time, pretty easy, chance to take a breather and get ready for the main event.
Config started out well, was flying through the first few tasks. Had some annoying issues related to code and platform differences between my personal lab and the actual exam, don’t recall that happening the last attempt.
Then the trouble began.
ISE was acting up, so I attempted to restart it. Application shut down dirty with database errors and would not start. While all this was going on I was trying to work on other stuff and keep moving. Eventually the proctor got ISE running somehow, but the database corruption was an issue and ISE wasn’t adding endpoints to the database, which made several tasks impossible to complete. The lowest point was when he was sending people home, and I knew I didn’t have the points. When he said I could continue I put my head down and kept grinding.
I was just about out on my feet but I kept battling. When I was told I’d have to stop, I looked at my scorecard to add the points, and all the tasks that could be completed were. Scorecard said I had the points with a small cushion. Would the tasks verify though?
To my amazement I got the email about 3 hours later with the good news.
Much respect to the proctor who probably stayed at work late in order to let me make up the time lost by the ISE debacle. Those folks have a very tough job putting up with us.
Final thought: The CCIE lab is **always** so much harder than you remember it. You can’t simulate what it’s really like. Even after just 7 weeks gap I was going “what the hell!?!”.
Now I get to make things up to my wife and try to lose the 10 lbs of body fat I put on in the final push.
I sat the CCIE security lab on Friday, January 12th. Result was Pass/Pass/Fail. That means overall I failed.
For those who are unaware, there are three sections to the exam; Troubleshooting, Diagnostics, and configuration.
Here’s my interpretation of those sections:
Troubleshooting is like getting a bunch of 2:30am phone calls where something is broken and you need to hop on and fix it now.
Diagnostics is like getting emails at work where a field tech and/or the Helpdesk are stuck on a problem, and you have to figure it out from a couple of screenshots and some log/device output. You do not get to actually touch the devices.
Config is like being a consultant asked to complete a build on an enterprise network where sales severely underbid the required project hours and you’ve got to make it happen.
With that out of the way, I’d like to break down what went well, what went less than well, and what I need to improve on.
What Went Well
I did a lot of self care in the 36 hours leading up to sitting the lab. Mostly breath meditation, and reliving an amazing road trip I took last summer. Looked at photos, listened to a playlist of music I listened to on the trip, closed my eyes and relived moments. That was very effective, it’s amazing what we can draw from those times when we need them.
In the run up my waking and sleeping hours were all over the map in a mad dash to get all the labbing in I could, but I needed to rein that in. About a week prior to lab day I started adhering to a rigid routine of going to bed and getting up early. A week was enough to get my body clock reset.
Structured approach to the exam
Keeping a score card, task list, reading questions multiple times to make sure I’m not overlooking small but critical details, and making notes. Using checklists for long tasks. This all worked very well, no problem there.
Keeping all cli config in notepad
Building everything in notepad makes it dead easy to make updates and corrections, and to refer back to what you did earlier. Its also an exam saver if for some reason you have to reinitialize a device. Little things that I do cli first, I will still transfer those to notepad.
Quickly discovering relevent information
I didn’t have much trouble quickly isolating the relevant area I needed to look at in troubleshooting or diag, that was a good feeling and gave me confidence.
Theory and technology knowledge
I was pleased that my topic knowledge was pretty much up to snuff. All the hard work of learning the technology paid off when I found the tickets straightforward and I never got lost or confused. There is something related to TS I need to improve to save time for config, I’ll touch on that in the next section.
What Did Not Go Well
Slow at verification
This first showed up in tshoot. I was spending more time verifying corrections than I was fixing the issue. I think there were two causes.
1. I have no field experience with some of the technology because I’ve never seen it on a live network. This is a disadvantage of working in enterprise; I have a narrower cross section of exposure.
2, In my final prep I was using fixed topologies and designs that I knew intimately. Consequently I didn’t have to put a lot of effort into verification. On a network you’ve never seen, and you don’t know the underlying details of the routing design other than what’s depicted on the topology diagram, it’s a completely different situation.
These significantly impacted my speed in tshoot and config. Spent way too much time on verification.
Managing Information Overload
I was feeling pretty good coming out of Troubleshooting and Diag. As I started reading through the config section, I did not have that good feeling. Some of the tasks were very long and packed with small details. There are key details that you will only pick up on if you read carefully and understand the technology. This intimidated me and caused me to worry too much.
I spent way too much time double checking and verifying those details. I think you have to trust yourself enough to bang it out, and double check your work at the end of day, otherwise you won’t get enough done.
Not Checking the Preconfigure
There may be stealth faults in the configuration section; not checking the preconfigure can extract a heavy toll in wasted time as you have to reset appliances and clean up the mess. Unfortunately this happened to me. Rookie mistake,
Not enough speed/memorization of cli commands
There is a lot to do in config. You have to be capable generating large blocks of config in notepad and pasting them into the devices error-free for any of the technologies being tested. For gui based task elements, you have to be able zip through the ui and configure whats needed without any poking around. That’s the bar, simple as. Some things I was at that level with, others I was not.
This exam does ask for some knob turning (i.e. advanced requirements). That was a bit of a mixed bag for me. I think these are intended to slow you down more than anything, and that they will if you don’t know where the knob lives.
What I Need To Improve
That means giving it consistent attention and focus
That means being able to create in notepad any tested cli based technology cold….and have it paste in error-free 10x out of 10.
For example if you’re asked to spin up flex mesh, you need to be able to look at the requirements given for it in the task, type it up in notepad, paste it into the routers, and have it come up and work right out of the gate. You may be able to get away with checking cli syntax for some of the knob turning, but at a minimum you need to be able to quickly spin up a basic working configuration from memory without errors.
Add knob turning elements into practice exercises
Knowledge hole patching
Study a couple of small things I wasn’t knowledgeable enough on.
Overall it wasn’t a bad showing, would like to have done a bit better. But the results weren’t terrible. Encouraged that most of what needs to be done is simple repetition to gain speed, rather than correcting issues in fundamental knowledge or approach.
Hopefully I can even the score next time.
Thanks for walking with me.
Today we’re talking about zone based firewall.
Zone based firewall is a stateful firewall available as a feature on cisco routers running ios and ios-xe. It’s capable of using nbar to identify traffic and and can perform deep packet inspection (DPI) on a few protocols (the most notable being http). Interestingly it can use Trustsec Security Group Tags (SGTs) as a matching condition.
It offers a cost effective solution for a couple of common cases. One would be small branch offices utilizing an internet connection for Guest internet and a backup dmvpn tunnel.
ZBFW provides a strong alternative to the care and feeding of a separate firewall in some situations, especially now that IOS-XE is capable of running some pretty cool containerized security apps like Snort and Stealthwatch learning network.
There are a couple of tradeoffs to using ZBFW that I noted when I was playing with it. The biggest one is classes in the service policy cannot be reordered or inserted inline. you have to delete and re-create the policy, then add it back in to the zone pair. That could lead to some change management headaches.
With that out of the way, let’s go ahead and take a look at this thing.
What does Stateful mean?
The term stateful in the context of a firewall means the firewall builds and maintains a connection table based on traffic it receives on it’s interfaces. It uses this information to automatically allow the return traffic when it sees the match in the connection table.
A row in a generic connection would look something like this:
This would be a http connection from 10.1.1.1 to 126.96.36.199. when the return packet from comes back the firewall sees the match in the connection table and automatically allows the traffic. This neat little trick is what makes a firewall a firewall in the sense that most people understand it.
Basic ZBFW operation
for starters have a quick look at this cheat-sheet that shows a simple use case involving internet access for some corporate computers and guest endpoints.
The way it works at a high level is classes are use to match traffic. A policy calls the classes and sets the action for the matching traffic . Finally the policy is attached to a zone-pair. A zone pair defines the source and destination interfaces that the policy will be applied to.
Definitions of terms
Zones define interfaces that share a common security policy. Traffic can move freely between interfaces in same zone.
It’s important to know, Interfaces that belong to security zones cannot communicate with interfaces that do not belong to security zones. This is something to keep in mind when designing the deployment and when troubleshooting.
Classes and polices:
There are two kinds of classes and policies. Layer 3/4 and layer 7.
Layer 3/4 Class maps
Layer 3/4 class maps are traffic selectors. Classes are used to identify what traffic we want to apply an action to. Class maps can match based on protocol (NBAR), access-lists, another class map for compound conditions, security group tags (trustsec), and user groups.
layer 7 class maps
A layer 7 class map is used for Deep packet inspection DPI. Most commonly used with HTTP to match attributes or content of http traffic. It’s outdated since most web traffic is encrypted now, but it’s there so I’m mentioning it.
Layer 3 Policy maps
Layer 3 Policy maps contain a list of classes, and the actions we want to perform on the classes.
There is a built-in class called class-default which matches anything not explicitly matched. It can be very useful for troubleshooting to call that class and add the log action to it to see what’s being blocked.
Layer 7 policy maps
Layer 7 policy maps apply actions to the traffic identified by the layer 7 classes. These classes cannot be called directly from a service policy. they are called under a layer 3/4 class in a layer 3 policy map that identifies the traffic flows we want to perform DPI on.
Here’s a example
Class-map type inspect http CM-L7-HTTP-1
match response header length gt 5000
policy-map type inspect http PM-L7-HTTP-1
Class-map type inspect CM1
match protocol http
policy-map type inspect PM1
service-policy http PM-L7-HTTP-1
Important note about classes in policies
Note: Classes need to be called in most to least specific order. For example the there was a class matching on tcp then a class matching on http, the http rule would never fire. To re-order the classes, you must delete and re-create the policy, then re-add it back to the zone pair.
There two types of nesting. nested classes and nested policy maps
Nested classes are used for creating compound matching conditions. i.e. example inspect these protocols for those hosts.
Nested policy maps are used in deep packet inspection. layer 7 policy maps are called under a class in a layer 3 policy map
Nested class example. this accomplishes the goal of allowing a specific list of protocols for a specific host. It’s a basic compound condition.
access-list 100 permit ip 10.1.1.1 any
class-map type inspect match any CM1-L4
match protocol http
match protocol dns
match protocol icmp
Class-map type inspect match-all CM1-L3
match access-group 101
match class CM1-L4
There are quite a few different kinds of parameter maps. I’m going to cover a couple of them briefly.
Layer 3/4 maps are used primarily for DDOS prevention. they are attached to the action in a policy map i.e. “inspect PARAM-MAP” where param-map is a parameter map.
The most common layer 7 parameter map is regex which is used to match strings in http traffic. It’s applied as an argument to a layer 7 class map.
There is a global inspect parameter map, and this where NBAR2 for protocol classification can be enabled, as well as general connection controls.
A zone pair defines what traffic is allowed to pass from a source zone to a destination zone. Being stateful, the zone pair builds a connection table based on outgoing traffic. return traffic that matches an entry in the connection table is allowed.
The self zone
The self zone is a special built in zone that’s used to control traffic to and from the control plane of the router.
The self zone has some different behaviors and restrictions from normal zones, and it’s worth taking a closer look at it.
Zone pairs involving the self zone are not stateful. They behave a lot more like access control lists. There have been changes to how self works from version to version, so be mindful of that and test the your image before deploying self zone on a live network. For the version of IOS I tested 15.6(2)T, this is what I found:
By default traffic is allowed to and from the control plane to any zone unless the self zone is added to a zone pair, and a policy attached to it.
If a zone pair is created using self but no service policy is attached, traffic is still allowed.
If a service policy is attached, traffic in that direction is now restricted to what’s permitted in the policy, however, traffic in the other direction is not affected.
Because the self zone is not stateful, you must use the pass argument instead of inspect to allow the traffic, as the inspect directive has no meaning.
ZBF configuration workflow
1. diagram out your zones and policies
a. determine if you need nested classes
i. i.e. match x protocols for y hosts
2. define zones
3. define parameter maps if called for (advanced)
4. create traffic selectors
5. create policy maps
a. call class(es)
b. set action
6. create zone pairs and assign policies
7. assign interfaces to zones
8. test policies
9. review output to verify
1. good naming convention: type, direction.
a. ex: for layer 3/4 class map something like: CM-IN2OUT
b. ex: layer 7 CM-L7-BLOCKED-SITES
2. call class-default with drop log action to catch errors
3. pro tip: you can flip a class between match-any and match-all by just re-inputing the command. don’t have to remove re-add
1. show zone security – shows zone interface assignment
2. show zone pair security – verify overall configuration
3. show policy-map type inspect zone-pair – shows statistics for zone pair
4. show policy-map type inspect zone-pair sessions | s Established – shows connection table info.
5. show run | s class|policy|parameter|zone – basic dump of the config elements
This lab topology takes ZBFW and places it in the context of a typical branch office network doing Direct Internet Access (DIA). The zip contains the challenge and a solution. enjoy!
In this post, I’m going to go through my troubleshooting workflow for webauth redirect.
Webauth redirect is a core function of providing Network Access control with Identity Services Engine ISE. It’s used for a number of critical authentication flows, and when it does not work, you will not be able to provide guest access or onboard devices.
Taken as a whole the configuration and processes between ISE and the Network Access devices (NADs, which are Switches, and Wireless LAN Controllers) is quite complex, especially in the case of the switches. Trying to troubleshoot by staring at pages and pages of config, and making random google searches is going to be slow and painful. It’s much better to understand the information flow and dependencies, and using the device output to logically deduce where the problem lies.
This is the toplogy we’re going to use for our example. It’s a cobbled together lab. ISE is running in a remote location, the vWLC is running in esxi on an intel NUC in my home lab.
Our ISE configuration is going to as simple as I can make it. This will be our policy:
There will be separate policy sets for wired and wireless for the sake of clarity.
- Mac Address Bypass (MAB)
Continue if authentication fails
- IT – MAC addresses in the IT group get full access to the network
- GUEST – Guest users Will have access to the internet only
- Webauth Redirect – If the connecting device is unknown to ISE, we’ll:
- Apply an ACL that restricts network access to the bare minimum
- Redirect the user to a registration portal.
Guest Registration Portal Configuration
To keep things simple, we’re going to allow self registration, which is not something you would normally do. After registering, the MAC address of the user’s device will be placed in the GuestEndpoints Group, which is the default for ISE.
WebAuth Redirect process flow
At a high level a basic flow works like this:
- Unkown device connects to network
- ISE returns a result to the Network Access Device (NAD) with a redirect URL.
- when the user tries to connect to a website the NAD intercepts the HTTP get request and returns a 302 redirect, pointing towards the web portal on ISE. The redirect also contains the Radius session ID and the original url the user was trying to reach.
- The user and device are onboarded.
- in the case of guests, the mac address of the device is stored in the endpoint database and associated with a guest endpoint group
- for BYOD, the device will be issued a certificate for dot1x authentication
- after onboarding, ISE will send a Change of Authorization (CoA) to the NAD. This re-triggers the authentication process as if the user had just connected.
- Because the Device is now known to ISE, it should match an authorization rule, giving the device network access.
To troubleshoot efficiently, it’s important to have a workflow that produces consistent results, then you need to trust it! It can be tempting to take shots in the dark hoping for the quick fix, but in the long run a process oriented approach will be consistently more productive and less stressful.
Troubleshooting workflow summary:
- Trigger the process
- connect to network
- issue http get via web brower
- Check the radius log on ISE to verify output
- Check the Authentication result on the Switch/WLC to verify output
- Determine what stage of the flow the fault is occurring at
- Check dependencies for the failed portion of the flow
- Correct what appears to be the problem.
- Trigger the process again
- If it’s still not working did we fix a fault?
- if not, put the change back
- return to step 1
- If still not working but a fault was fixed
- go to step 5
- if not, put the change back
- If it’s still not working did we fix a fault?
Now that we have our troubleshooting flow written down, let’s get started
Wireless WebAuth Redirect
on the WLC, the Redirect ACL uses opposite processing logic from the switches. Permit means “dot not redirect this traffic”. All other traffic will be redirected. This means we don’t have to bother with referencing an airespace ACL on the WLC as the implicit deny will prevent the remaining traffic from going anywhere.
Additionally, ACLs on the WLC have the following additional differences.
- They use natural masks not wildcard masks
- They’re bidirectional
- you must define the flow in both directions
Dependencies on the WLC
- Redirect ACL
- must be defined on the WLC
- CoA support
- disabled by default when defining a radius server
- MAC filtering on the layer 2 security configuration
Troubleshooting a wireless problem
Portal page doesn’t load in the browser
Typically a problem with the redirect ACL on the WLC. Steps to investigate:
- Check ISE livelog for webauth redirect status
- Check Client Status on WLC
- Check webauth ACL.
- Make sure DNS is able to resolve hostname in URL
- Ensure ACL is permitting the ip address of the ISE node
User does not gain access to the network after logging in to portal
VLAN change in portal config without Java capable browser
Changing VLANs at the end of a guest flow requires the the user to download and run the NAC Agent to release and renew the IP address of the client. The NAC agent is Java based, and not supported in current web browsers. Vlan change for guest flows is not a good idea.
Enabling this setting will trigger a java applet download to run the NAC agent when the users completes log in. Probably not a good idea.
Change of authorization Failure
CoA is disabled by default in the Radius server configuration on the WLC, so it’s probably a common fault. Let’s use this one as an example of how to look at the device output to work out what’s going on.
First place to look is the Radius LiveLog. Looking at the sequence of events, we can see that the client hit the webauth rule, logged in, and then something went wrong.
We already know from glancing at this that first few steps worked and the user access the portal, but let’s take the opportunity to look at output on ISE and WLC for the webauth redirect step.
Checking the initial log entry on ISE (this is at the very bottom):
there’s the redirect with the with the session id. The user-name is the mac address of the wireless in the client.
Now let’s go to the WLC and take a look:
the redirect URL and redirect ACL have been applied to the client. The WLC has client in a Central Web Auth State.
Let’s take a quick look at the redirect ACL.
Redirect ACL allows DNS and traffic to and from ISE. all non http/s traffic will be dropped. HTTP/s traffic will be intercepted and the redirect mesages with the redirect URL will be send to the client.
So everything looks good until the end. Let’s take a look at that last log entry in ISE.
Change of Authorization Failure.
Ok, let’s go take a look at the configuration on the WLC.
Sure enough. Change of Authorization was disabled.
Ok, let’s enable CoA, delete the endpoint and guest account from ISE, and give it another try.
Let’s break down what’s going on here.
- User gets redirected
- He signs in
- mac address of the deivce is added to the guest endpoints Identity group
- Change of Authorization is sent to the WLC
- The WLC authenticates the user against ISE
- The authentication request loops back through our Authorization policy only this time it it hits on our wireless guest rule because the device is now in the guest endpoints group.
Ok that was cool right?
Wired WebAuth Redirect
From the perspective of ISE
This screenshot depicts the set of Radius attributes that are sent down to the NAD. There are 4 items.
- Radius Access-Accept
- The Downloadable ACL to restrict network access
- The redirect URL for guest registration portal
- The Redirect ACL
The Redirect ACL is the always the confusing part. Put simply, The redirect ACL tells the switch what traffic NOT to redirect and what traffic to intercept for redirection. This solves a chicken and egg problem. If http get requests to ISE were intercepted by the switch, what you would have is a redirect loop and you would never reach the portal.
Short answer: Permit means “redirect this traffic”. Deny means “leave this traffic alone”.
Keep in mind however only http/https traffic can actually be redirected. The Switch uses the built in web server to create the http redirect message that’s sent back to the client web browser.
Dependencies on the Switch
In addition to the normal radius and dot1x stuff that’s required, WebAuth requires some additional items.
- Redirect ACL
- is defined locally on the switch
- Authentication proxy
- needed to support web authentication feature
- CoA Support
- needed to initiate re-authentication after registering in the portal
- IP Device tracker
- Needed for the Enforcement Policy Module (EPM) to have awareness of the endpoint’s ip address.
- HTTP/HTTPS server enabled
- The switches onboard web server is what intercepts the connection request, handshakes with the client’s web browsers, sends the redirect message.
Ok, let’s apply our troubleshooting workflow to a wired problem.
Troubleshooting a Wired problem
User doesn’t get portal page when connecting.
Just like with wireless, we start by looking at log and device output to narrow the scope.
Let’s start with the radius log. It looks like we’re stopped at webauth redirect.
Let check the switch.
we can see we’re on switchport g1/0/1 and that the port is in the state we expect. so why isn’t it working? Let’s check the Redirect ACL.
Redirect ACL looks good. hmm. What are our other dependencies? Ah!
Http server is not running. ok let’s fix that and try again.
What? Why is the switch sending a reset instead of redirecting? we know the http server being shut off was an problem. It must mean there’s another fault with the switch. Let’s check for ip device tracking.
What’s this, command doesn’t work? Better check the documentation.
Ah, looks like our trusty dot1x template for ios-xe 15 isn’t working so well for ios 16, now we have to use some different commands to get ip device tracking to work. So let’s do that:
These are hard problems to solve because you have multiple faults and when the first thing you find doesn’t fix it, it’s easy to start doubting yourself. maybe it’s a problem with the endpoint etc. But by trusting the Radius logs, device output, and our knowledge of the process flow, we can deduce where the problem has to be, and you have to follow the output where it leads you.
I hope this was helpful. It’s helpful for me to write it down.
Until next time.
This is a lab topology I put together in EVE-NG to help me sharpen up my knowledge and skills with IKev2/FlexVPN. The baseline configuration uses pre-shared keys and there’s quite a bit of preconfigure.
Screenshot of topology
Here’s a screenshot of the topology. I’ll give a big shout out to whomever can explain why tunnel 100 between r1 and r2 exists. 🙂
There’s two initial versions in the repo.
- There’s the actual challenge lab where I removed some bits of config here and there to make it more fun.
- There’s a partially completed version using pre-share keys. flex mesh works, the ASA part works, flex client works. No config on R7. You could use this as a starting point to work on the tasks in the lab.
Fire it up and let me know what you think! If you get everything working, feel free to do a pull, inject some faults and create some troubleshooting labs. Here’s the link to the github repo:
Welcome to part 2 of this series of blog posts covering Network Address Translation on the ASA and Firepower threat Defense. In this installment, we’re going to cover some terminology.
Notes on (confusing) terminology
Something that has always made NAT somewhat confusing to work with is overlapping terminology. And it’s not just between vendors, or even vendor platforms, but also within the product documentation and the implementation of the same product.
Here’s the most relevant example. The first screenshot is from the ASA 9.6 documentation. CLI book 2, NAT section. The second screenshot is show output from an ASA that has the items referenced in the documentation snippet configured.
If you are brand new to the ASA, you’d never know from looking at these images that they are referencing the exact same thing. When you’re trying to learn and discern what’s important, this terminology discrepancy presents an impediment. This section is intended to help with that as best as I can.
Overall there are two categories where the terminology can be a bit confusing. The first is items referred to with one term in the documentation, and a different one in the device . We’ll call these synonymous terms:
The second is terms for what I would call use cases in the documentation, that don’t have an analog in the device configuration. It’s just a term for specific use. We’ll call these use case terms
When discussing configuring the technology where terms are synonymous, I’ll do my best to always use the term that’s referenced in the device configuration as it’s the most relevant.
When discussing use cases, I’ll point out where there’s a specific term defined for that use, but it’s not used anywhere in the device itself.
Auto NAT = Object NAT
They are the same thing, the documentation calls it Object NAT, the command line interface calls it Auto NAT. It’s called Object NAT in the documentation because the translation is always the property of an object.
!——-auto nat example—–!
object network foo
nat (i,o) static 188.8.131.52
Manual NAT = Twice NAT
They are exactly the same thing. It’s called Twice NAT in the documentation to signify that both source and destination fields in the packet can be matched and altered. NOTE: This can also be done with object NAT on a limited basis by using multiple statements, one in each direction.
!——-manual nat example—–!
object network foo
object network MAP-foo
object MAP-puppy 184.108.40.206
nat (i,o) source static foo MAP-foo destination static puppy MAP-puppy
Use Case Terms
Identity NAT is a use of Manual NAT. The firewall is instructed not to translate any of the fields in the flow. This is used in situations where you would be performing NAT for most traffic flows, but you want to exempt speicifc destinations from translations.
The most common case for Identity NAT is VPN traffic.. As NAT is performed prior to encryption, most of the time you’ll want to make sure that the firewall doesn’t translate the source IP of the return packets, which prevent it from being sent over the tunnel as it would no longer match the crypto ACL. Even if for the sake of argument the packet did get sent over the tunnel, because it’s source address has been altered, the receiving host would not recognize the packet as part of a flow and it would be dropped.
Here’s an illustrated example:
Policy NAT is a use of manual NAT. The term policy means setting the condition based on a specific destination. Identity NAT does the same thing, but the difference is in the case of policy NAT a translation will be performed.
A common use for Policy NAT is Extranets and Software as a Service (SaaS) providers. These types of connectivity typically call for the customer server to be seen as coming from a specific Mapped address.
For example, let’s say Server-web runs a web app that’s accessed from the Internet with Mapped address of 203.0.113.121. A SaaS provider needs to get information from this server, but the server-web must appear to be at the IP address 198.51.100.111, which has been designated by the SaaS provider.
Here’s a graphical representation:
A strict definition of Destination NAT is Manual Nat where destination address and/or port number is being translated. On the surface this implies the use of manual NAT, as it’s the only form of NAT that allows us to map the destination address or port to a different value.
Considered this for a moment:
Object network Server-A-In
nat (i,o) static 220.127.116.11
Object network Server-A-Out
host host 18.104.22.168
nat (o,i) 10.11.11.11
These two statements produce the exact same end result. The distinction being with the second object, we’re making the public facing address the real address and the internal address the mapped address. So the term destination NAT is really a matter of perspective.
Access Control Rule Processing and NAT
Access control rules in ASA NAT are always applied to the real address. This is really very straightforward, but it can be a bit counter-intuitive at first. Let’s illustrate with a simple example:
Let’s say we have a web server in a DMZ, and we want to allow access to it from the internet.
Object network server-web
nat (d,o) static 22.214.171.124
access-list outside-in permit tcp any host 172.16.1.11 eq www
access-group outside-in in interface outside
The intuitive thing would be to consider that an internet user would use the address 126.96.36.199 to access the web server, therefore that would be the destination IP you would use in the access control list. so this is very confusing. A good way to think of this is that we’re really applying the ACL to the object. Since the object can only have one real address, but potentially many mapped addresses, it makes sense for the ACL logic work this way.
In fact, using objects and object groups in Access-control lists instead of bare ip addresses is quite handy. If you get in the mindset of working with objects instead of addresses (which are the properties of an object), the operational logic of the ASA then becomes more intuitive.
Let’s re-write our ACL to use the object instead:
Object network server-web
nat (d,o) static 188.8.131.52
access-list outside-in permit tcp any host object server-web eq www
access-group outside-in in interface outside
Makes more sense now right?
My goodness, where does the time go. In our next installment we’ll we’ll talk about static and dynamic NAT and PAT, services, objects and object groups.
In this series of posts, I’m going to discuss NAT on Firepower and the ASA in the way that I comprehend it. For this post I don’t plan on to getting too far into the configuration and verification aspects, we’ll dive into that later.
It’s not as complex as it looks. Really.
In my opinion this is one of those topics where the difficulty level can be measured by the quality of your foundation knowledge. Put plainly – if you can clearly visualize what the traffic flow should look like, then it’s easy. The actual configuration syntax is easy to learn and work with. And although the configuration interface may look different, there is in actuality zero difference between FTD and ASA NAT. So you only have to learn it once for both platforms, nice right?!?
Armed with these insights, let’s do a level set on traffic flows. This baseline will give NAT syntax and semantics their context, so please at least skim it. You can always come back to it as needed.
The 5 tuple flow (and stateful firewalls).
A unique IP traffic flow is defined by 5 fields.
- Source IP address
- Destination IP address
- Transport protocol (Ex: TCP/UDP)
- Source port
- Destination port
For the purpose of working with NAT, I find it helpful to visualize this in a left to right fashion like this:
[203.0.113.111| 10.10.10.10 | TCP | 50922 | 80]
In this example a device at 203.0.113.111 is communicating with a web server at 10.10.10.10
A conversation between two hosts can be seen as two unidirectional Flows were the IP addresses and Ports are a mirror image in the reverse direction.
Taking our above example, that is what the two flows in the conversation look like:
[203.0.113.111 | 10.10.10.10 | TCP | 50922 | 80]
[10.10.10.10 | 203.0.113.111 | TCP | 80 | 50922]
And that’s pretty much what it looks like in a packet capture:
firewall# capture example int inside
firewall# sh capture example
5 packets captured
1: 00:39:03.453925 203.0.113.111.50922 > 10.10.10.10.80: […] 1520352048:1520352048(0) win 4128 <mss 536>
2: 00:39:03.459921 10.10.10.10.80 > 203.0.113.111.50922: […]
A stateful firewall recognizes these mirror image flows and identifies them as related. This simplifies usage – we only have to define our traffic rules in one direction, and the firewall can imply how the return traffic should be processed.
This logic also applies to NAT. If you define the flow in one direction, the NAT engine in the firewall processes the mirror image packets to look for a match.
Note how the firewall sees the two flows as a single connection:
firewall# sh conn
1 in use, 1 most used
TCP outside 203.0.113.111:50922 inside 10.10.10.10:80, idle 0:00:07, bytes 0, flags UB
Nat rule types
FTD and the ASA have two types of NAT rules: Auto and Manual.
Auto NAT is the simplest and easiest form of NAT to configure. I think of it as the microwave popcorn button of NAT. We define a network object, then attach a NAT statement to the object that tells that firewall what translation we want to perform based on the source and destination interface.
Here is a basic example:
Object network Server-A
nat (inside,outside) static 203.0.113.10
This is how the fields relate to the flow:
[10.10.10.10 | 198.51.100.100 | TCP | 80 | 50922]
[203.0.113.10 | 198.51.100.100 | TCP | 80 | 50922]
The individual components:.
Host 10.10.10.10 – The firewall will evaluate this against the first interface in the NAT statement. With object NAT this is always going to be the source IP address
Nat (inside,outside) – The source ip address is coming from the inside interface of the router, and the destination ip address is on the outside interface of the router. the destination interface is determined by the routing table.
Static – The source and destination address will be linked together in a fixed 1:1 relationship. This is most commonly used for servers that require a fixed public (mapped) ip address. Examples would be a web or email server.
203.0.113.10 – The mapped (public) ip address. If source address, source interface, and destination interfaces all match, then the firewall will perform the translation.
Here’s what our NAT table looks like with the Auto Nat Rule Configured:
firewall# sh nat
Auto NAT Policies (Section 2)
1 (inside) to (outside) source static Server-A 203.0.113.10
translate_hits = 0, untranslate_hits = 0
Manual NAT is useful for more advanced requirements, such as translating multiple fields in both directions, and conditional translation.
Happily, the way the NAT configuration syntax is structured makes it very easy to work with once one can relate the fields in the flow to how that NAT statements are laid out.
In Manual Nat, the full five tuple flow can be matched and transformed. Be aware that unlike object Nat where the mapped address can be given directly, Manual Nat requires that all addresses/ranges/subnets in the statement be predefined as objects.
Manual NAT basic example:
Object network Server-A
Object network MAP-Server-A
nat (inside,outside) source static Server-A MAP-Server-A
As you can see, when you’re doing simple source translation Manual NAT requires more lines of configuration to accomplish the same result as auto nat.
Here’s what our NAT rule looks like for the above manual NAT example:
firewall# sh nat
Manual NAT Policies (Section 1)
1 (inside) to (outside) source static Server-A MAP-Server-A
translate_hits = 0, untranslate_hits = 0
Let’s look at common use for Manual NAT. We’ll dive deeper in a subsequent post, This is just to give you a starting example.
Let’s say you’ve configured the Auto NAT translation shown earlier. You are then asked to create a site to site VPN with a branch office. In that case, when users from the branch office attempt to connect to Server-A, the auto NAT rule will kick in, translating the source address of the server on the return leg, and traffic will not return over the VPN tunnel.
*source IP in return traffic is translated, breaking the flow
So how do we fix this problem? We use manual NAT to tell the firewall not to translate the address of Server-A when the destination is Branch-PC. Like This:
Object network Srv-A
Object network Br-PC
nat (inside,outside) source static Srv-A Srv-A destination static Br-PC Br-PC
Here is the generalized form is what this statement is doing:
(incoming, exit) static source real mapped destination static real mapped
The important thing to grasp is that for both source and destination, we’re setting a condition. Match this . If all conditions match, then change to this.
We’re telling the firewall, if you have this source and destination pair, Don’t change anything. This overrides the Auto NAT and allows it and our VPN connection to co-exist. This a use called Identity NAT.
Nat rule table structure
The Nat rule Table has Three sections.
Manual before auto
Manual After Auto
The user can set the order of the Manual NAT rules. The Auto Nat rule order is set by the firewall automatically from most to least specific traffic match. i.e. a host object would be ordered before a subnet object.
NAT table with Auto NAT rule, plus the identity nat override
firewall# sh nat
Manual NAT Policies (Section 1)
1 (inside) to (outside) source static Srv-A Srv-A destination static Br-PC Br-PC
translate_hits = 0, untranslate_hits = 0
Auto NAT Policies (Section 2)
1 (inside) to (outside) source static Server-A 203.0.113.10
translate_hits = 0, untranslate_hits = 0
Now a reason to order Manual NAT ahead of Auto NAT makes some sense right?
Well that’s quite a few words for an introduction, so let’s stop here. In our next post we’ll go deeper into the terminology and usage examples .