The Subtleties of Enterprise Security Vulnerability Management — at Scale

Thursday, July 11, 2013

Rafal Los


Enterprises face some interesting challenges as they grow. Once you scale to any real size, tasks that seemed simple become unmanageable and difficult, even confused. One of those tasks is vulnerability management, and while it may sound simple and trivial, I assure you it is, indeed, not.


At small scale


When vulnerability management is done at a small scale, before you get into the large enterprise space, it is relatively simple. You fire up your favorite vulnerability scanner, scan your IP space, which is presumably well-defined, and manually validate the results with one of your ace security analysts. Simple.


Managing the whole thing isn’t too tough and can even be done on a spreadsheet (for the ultra-low-budget SMB), or via a pretty dashboard and management interface. These interfaces usually feature re-test capabilities, trend reports, and deeper insight into the types of issues you face. This is, of course, on a small scale.


Even in a situation where your network is segmented, and you have multiple environments, you only need to worry about getting access to the segment and getting permission to scan the various production and non-production systems. Still relatively simple on a small scale.


Factor in the added complexity of having sensitive network segments, such as those governed by PCI, or something similar and still … it’s a matter of getting signatures and executing purposefully and carefully. Again, relatively simple — on a small scale.


At the enterprise scale


Once you hit enterprise scale with hundreds of thousands of nodes, you start running into scaling issues. There are multiple problems with assessing vulnerability of scale that large, but the most prominent two are completeness andcertainty.


Sure, you get technology that scales to a server/sensor model, where you can place sensors inside sensitive segments, and dedicate sensors to specific IP ranges. SPI Dynamics did this with the AMP Server and Webinspect a decade ago and many other vendors do it well today, but as with many things in the security space, technology is not the issue.


When I say that completeness is an issue I don’t mean just being able to scan everything in a meaningful amount of time(although that is definitely something to think about), I’m talking about something as seemingly simple as knowing your IP space, and having a good sense of where endpoints and nodes exist, where they are likely to be, and in what quantities. You need to know this so you can appropriately plan resources for scanning. Completeness also refers to challenge of scheduling regular, repetitive scanning of the various environments you have. From test, staging or production environments to general network, applications hosting, and third-party space — the challenge of getting permission to scan your space at regular intervals that are meaningful is, in itself, a problem. Then there’s that stipulation: “In a meaningful amount of time,” which we continue to struggle with in the enterprise space.


Consider an environment, such as a private cloud segment where you’re likely to see developers set up, use, and tear down virtual servers faster than you can scan them in many cases. How you define completeness has everything to with how you’re going to approach that environment, and whether you care if you scan every IP address that shows up.


The other thing you need to give serious consideration to when you hit any serious scale is certainty. When you are scanning 100 IP addresses you can manually go through and validate what your scanner turns up for false positives. When you’re scanning a million IP addresses or more … this gets a little tricky.


On one hand, you can simply trust your automation, and your vendor, that it’s doing its job with a very low false-positive rate (which I’m pretty sure most vendors claim anyway). This is fine until you run into situations where you ask someone to fix a false positive, and they start to question all of your results.


I wish I could say this is uncommon.


overwhelmed-help-300x190.jpgOn the other hand, you could attempt to manually validate some subset of the total scan results. Doing this on a million IP addresses for even only critical issues gets tricky. Especially when critical is subjective to specific environments and specific issues on specific systems. Handling this at the enterprise scale is difficult.


Making it work at scale


How does one make this work at scale? Automation and smart audit strategy is my best enterprise-tested answer.


First, you have to leverage your automation and develop a plan. Decide what your exposure window tolerance is, and then start to build out your scanning environment with that in mind. If your exposure window is 24 hours, you need to make sure that you have enough automation to complete a scan of any environment you have it attached to in ~18 hours. This allows for six hours of what I’ll simply refer to as wiggle room. Next, determine how thorough you want your scans to be, and what you’ll be scanning for. Each environment will likely have a slightly tailored scan policy, so that you’re not scanning for all 50,000 signatures on every note or you’ll never finish, and you’re likely to blow up production systems this way. (Not that it’s happened to me before … ) Now that you have that and you’re scanning all the critical stuff you want to scan (hopefully that’s your entire environment, at least at some level) with meaningful policy, you’re going to need to figure out how to scale the certainty of your results.


Environments of different criticality levels receive different levels of audit/verification scrutiny. In an environment where you can ill-afford to miss something, or get a false-positive, you’ll want to perhaps do a 25 percent random sample analysis of your critical and high issues; where in less critical IP space you may be okay with a random sample of 10 percent on just critical issues. This gives you a chance to plow through your vulnerabilities library in (you guessed it) a meaningful amount of time so that you can move on to fixing the issues — and there starts an entirely different battle.


Hopefully I’ve given you some insight into this very critical issue, which I know many of you are facing right now but haven’t found good scalable solutions for. As I work with more and more organizations to design, implement and test these types of strategies I’ll keep sharing (anonymously, of course) these types of lessons learned so that the benefit is maximized.


Good luck, and remember you can always get ahold of me to discuss this issue more in depth, suggest changes/efficiencies, or ask for help.


Cross Posted From Following the Wh1t3 Rabbit

Possibly Related Articles:
Enterprise Security CVE Vulnerabilities
Enterprise Vulnerability Management security patching
Post Rating I Like this!
Ian Tibble Yep thanks for sharing.
With some tools, especially unauthenticated VA tools, the false positives will be unmanageable even in much smaller networks.
I think you said it well with "In an environment where you can ill-afford to miss something, or get a false-positive, you’ll want to perhaps do a 25 percent random sample analysis...". This is basically saying "we cannot cover all we need to cover even in our most critical subnets", i.e. there is no solution.

The best thing about this post is that its stating honestly the challenges, and can be used to help justify internal network access controls - disruptive? yes, hard argument? yes, but this post may help to paint the picture if any help is needed.

The sad thing is - the possibility really does exist to almost completely nail the false positives problem, and to a much better degree, also the false negatives problem. But as of 2013 we still are not really yet. It really is possible to automate this and get workable results that permit a actual VA solution.
The views expressed in this post are the opinions of the Infosec Island member that posted this content. Infosec Island is not responsible for the content or messaging of this post.

Unauthorized reproduction of this article (in part or in whole) is prohibited without the express written permission of Infosec Island and the Infosec Island member that posted this content--this includes using our RSS feed for any purpose other than personal use.