Intelligence

While a time existed when people believed they could prevent sophisticated attackers from breaching advanced networks, I think most people are finally coming around to the fact that prevention eventually fails. No matter how diligent you are in your prevention effort, eventually a motivated attacker can gain some level of access to your network. This is exactly why rapid detection and response is a critical part of network defense strategy.

As a network defender, I think it can be very easy to get discouraged and to feel like you are fighting a losing battle when dealing with sophisticated attackers. Because of that, I’m always searching for the “big wins” that come from detection and response efforts. For that reason, I was thrilled to read this year’s version of Mandiant's annual M-Trends report. I’d like to share a few excerpts from M-Trends 2014: Beyond the Breach that illustrate why.

Last February we released a report that exposed activity that linked People’s Liberation Army Unit 61398, which we call APT1, with espionage against private US companies. M-Trends 2014 tells us that this resulted in an operational pause for APT1.

Mandiant’s release of the APT1 report coincided with the end of Golden Week, a seven-day government holiday that follows Chinese New Year. APT1 was inactive for 41 days longer than normal following Golden Week and the release of Mandiant’s report compared to patterns of activity from 2010–2012.

When APT1 did become active again, it operated at lower-than-normal levels before returning to consistent intrusion activity nearly 160 days after its exposure.

APT1 2013 Activity (Source: M-Trends 2014)

APT1 2013 Activity (Source: M-Trends 2014)

By exposing the tools, tactics, and procedures used by APT1, this pause in operations indicates that this unit was forced to reassess their operations. Beyond this, we find that associated groups that share similar TTPs, such as APT12, were also forced to temporarily pause or slow their operations.

APT12 briefly resumed operations five days after its exposure in The New York Times article, but did not return to consistent intrusion activity until 81 days later. Even then, APT12 waited until roughly 150 days after the article’s release to resume pre-disclosure levels of activity.

APT12 2013 Activity (Source: M-Trends 2014)

APT12 2013 Activity (Source: M-Trends 2014)

These findings prove that even heavily motivated and highly resourced attackers are susceptible to having their operational tempo slowed when their TTP’s are exposed. While we will probably never stop attackers of this nature, degrading their effectiveness to this degree is one of the big wins that keep me motivated as a defender.

In a world where the bad guy always seems to win and we are constantly working in a reactionary state, the ability to share strategic, operational, and tactical threat intelligence gives us a stronger collective ability to rapidly detect and respond to sophisticated attackers. I think there are a few things that need to happen in order for an organization to be able to do this effectively. First, you have to understand the relationship between the different types of threat intelligence. Next, you should understand the impact on the adversary of successfully detecting specific indicator types. Then, you should be able to collect, organize, and share your threat intelligence effectively. When you can do these things well you will be postured to effectively apply threat intelligence to your detection capabilities so that you can really bring the pain to sophisticated adversaries.

You can find some other interesting data points by reading M-Trends 2014: Beyond the Breach.

While signature-based detection isn’t enough on its own to protect a network against structured attackers, it is one of the cornerstones of a successful network security monitoring capability. If you have ever managed a signature-based detection mechanism then you know that you can’t simply turn the device on and let it work its magic. A signature-based detection mechanism like Snort, Suricata, or various commercial offerings requires careful deployment and tuning of signatures (often called rules) to ensure that you receive reliable high quality alerts. The ultimate goal, while never fully achievable, is that every alert that is generated by these detection mechanisms represents the activity it was designed to detect. This means that every alert will be actionable, and you won’t waste a lot of time chasing down false positives.

A key to achieving high fidelity alerting is to have the ability to answer the question, “Do my signatures effectively detect that activity they are designed to catch?” In order to answer that question, we need a way to track the performance of individual signatures. In the past, organizations have relied on counting false positive alerts in order to determine how effective a signature is. While this is a step in the right direction, I believe that it is one step short of a useful statistic. In this article I will discuss how a statistic called precision can be used to measure the effectiveness of IDS signatures, regardless of the platform that you are using.

Definitions

Before we get started, I think its necessary to describe a few terms that are discussed throughout this article. If we are going to play ball, it helps if we are standing on the same field. This article is meant to be platform-agnostic however, so the equipment you use doesn’t matter.

There are four main data points that are generally referred to when attempting to statistically confirm the effectiveness of a signature: true positives, false positives, true negatives, and false negatives.

 True Positive (TP): An alert that has correctly identified a specific activity. If a signature was designed to detect a certain type of malware, and an alert is generated when that malware is launched on a system, this would be a true positive, which is what we strive for with every deployed signature.

False Positive (FP): An alert has incorrectly identified a specific activity. If a signature was designed to detect a specific type of malware, and an alert is generated for an instance in which that malware was not present, this would be a false positive.

True Negative (TN): An alert has correctly not been generated when a specific activity has not occurred. If a signature was designed to detect a certain type of malware, and no alert is generated without that malware being launched, then this is a true negative, which is also desirable. This is difficult, if not impossible, to quantify in terms of NSM detection.

False Negative (FN): An alert has incorrectly not been generated when a specific activity has occurred. If a signature was designed to detect a certain type of malware, and no alert is generated when that malware is launched on a system, this would be a false negative. A false negative means that we aren’t detecting something we should be detecting, which is the worst-case scenario. False negatives aren’t detectable unless you have a secondary detection mechanism or signature designed to detect that activity that was missed.

 

The Fallacy of Relying Solely on False Positives

Historically, the measure of success for measuring the effectiveness for IDS signatures was the count of false positives over an arbitrary time period compared to a certain threshold. Let’s consider a scenario for this statistic. In this scenario, we’ve deployed signature 100, which is designed to detect the presence of a specific command and control (C2) string in network traffic. Over the course of 24 hours, this signature has generated 500 false positive alerts, which results in an FP rate of 20.8/hour. You have determined that an acceptable threshold for false positives is 0.5/hour, so this signature would be deemed ineffective based upon that threshold.

At face value, this approach may seem effective, but it is flawed. If you remember from a few paragraphs ago, the question we want to answer centers on how well a signature can effectively detect the activity it is designed to catch. When we only consider FP’s, then we are actually only considering how well a signature DOESN’T detect what it is designed to catch. While it sounds similar, detecting whether or not something succeeds is not the same as detecting whether or not it fails.

Earlier, we stated that signature 100 was responsible for 500 false positives over a 24-hour period, meaning that the signature was not effective. However, what if I told you that this signature was also responsible for 5,000 true positives during the same time period? In this case, the FPs were well worth it in order to catch 5,000 other actual infections! This is the key to precision -- taking both false positives and true positives into consideration.

 

Defining Precision

At this point it’s probably important to say that I’m not a statistician. As a matter of fact, I only took the bare minimum required math courses in college, but fortunately precision isn’t too tricky to understand. In short, precision refers to the ability to identify positive results. Often referred to as positive predictive value, precision is shown by determining the proportion of true positives against all positive results (both true positives and false positives) with the formula:

Precision = TP / (TP + FP)

 

This value is expressed as a percentage, and can be used to determine the probability that, given an alert being generated, the activity that has been detected has truly occurred. Therefore, if a signature has a high precision and an alert is generated, then the activity has very likely occurred. On the other hand, if a signature with low precision generates an alert, then it is unlikely that the activity you are attempting to detect has actually occurred. Of course, this is an aggregate statistic, so more data points will generate result in a more reliable precision stat.

In the example we identified in the previous section, we could determine that signature 100, which had 500 FPs and 5000 TPs has a precision of 90.9%.

 

5000 / (5000 + 500) = 90.9

 

That's pretty good!

 

Using Precision in the SOC

Now that you understand precision, it is helpful to think about ways it can be used in a SOC environment. After all, you can have the best statistics in the world but if you don’t use them effectively then they aren’t providing a lot of value. This begins by actually making the commitment to track a precision value for each signature being deployed for a given detection mechanism. Again, it doesn’t matter what detection mechanism you are using. Whenever you deploy a new signature it would be assigned a precision of 0. As analysts review alerts generated by signatures, they will mark the alerts as either a TP or FP. As time goes on, these data points help to determine the precision value, expressed as a percentage.

 

Precision for Signature Tuning

First and foremost, precision provides a mechanism that can be used to track the effectiveness of individual signatures. This provides quite a bit of flexibility that you can tune to the sensitive of your own staffing. If you are a small organization with only a single analyst you can choose to only keep signatures with a precision greater than a high value like >80%. If you have a larger staff and have more resources to devote to alert review, or if you simply have a low risk tolerance, you can choose to keep signatures with lower precision values like >30%. In some instances, you can even define your precision threshold based upon the nature of the threat you are dealing with. Unstructured or commodity signatures could allow for precision >90% but structured or targeted signatures might only be considered effective if they are >20% precision.

There are a few different paths that can be taken when you determine that a signature is ineffective based upon the precision standards you have set. One path would be to spend more time researching whatever it is that you are trying to detect with the signature, and attempt to strengthen it by making your search more specific, or adding more specifics to the content of the signature. Alternatively, you can configure low precision signatures to simply log instead of an alert (an option in a lot of popular IDS software), or you can configure your analysis console (SIEM, etc.) to hide alerts below your acceptable precision threshold.

 

Precision for Signature Comparison

On occasion, you may run into situations where you have more than one signature capable of detecting similar activity. When that happens, precision provides a really useful statistic for comparing those signatures. While you some instances might warrant deployment of both signatures, resources on detection systems can be scarce at times. As such, it might be helpful for performance to only enable to most useful signature. In a scenario where signature 1 has a precision of 85% and signature 2 has a precision of 22%, then signature 1 would be the best candidate for deployment.

 

Precision for Analyst Confidence

Whenever you write a signature, you should consider the analyst who will be reviewing alerts that are generated from that signature. After all, they are the ultimate consumer for the product you are generating, so any context or help you can provide the analyst in relation to the signature is helpful.

One item in particular that can help the analysts is the confidence rating associated with a signature. A lot of detection mechanisms will allow their signature to include some form of confidence rating. In some cases, this is automatically calculated based on magic, which is how a lot of commercial SIEMs operate. In other scenarios, it might be a blank field where you can input a numeric value, or a simple low/medium/high option. I would propose that a good value for this field would be the precision statistic. While precision might not be the only factor that goes into a confidence calculation, I certainly believe it has to be one of them. Another approach might be to have a signature developer subjective confidence rating (human confidence) and a precision-based confidence rating (calculated confidence.)

A confidence rating based on precision is useful for an analyst because it helps them direct their efforts while analyzing alerts. That way, when an alert is generated and the analyst is on the fence regarding whether it might be a TP or FP, they can rely somewhat on that confidence value to help them determine how much investigative effort should be put into attempting to classify the alert. If they aren’t finding much supporting evidence that provides the alert is a TP and it has a precision-based confidence rating of 10%, then it is likely a FP and it might not be worth it to spend too much time on the alert. On the other hand, if it has a rating of 98%, then the analyst should spend a great deal of time and perform thorough due diligence to determine if the alert is indeed a true positive.

 

Conclusion

Signature management can be a tricky business, but it is such a critical part of NSM detection and analysis. Because of that it should be a heavily tracked and constantly improving process. While the precision statistic is by no means a new concept, it is one that I think most people are unaware of as it relates to signature management. Having seen precision used very successfully to help manage IDS signatures in a few varying and larger organizations, I think it is a statistic that could find a home in many SOC environments.

I’ve had the opportunity to directly and indirectly lead teams of talented individuals while working for the Department of Defense in various SOC leadership roles. Anybody who has worked for or with me in those roles knows about my “dirty words” list. Now, these aren’t the typical seven dirty words that the FCC will fine you for if you happen to let one slip on network television, but rather, a series of buzzwords and phrases relevant to information security that tend to be inappropriately applied to certain scenarios or used in the wrong context.

You probably already know about some of these words. For instance, the most revered amongst security practitioners is probably “Advanced Persistent Threat”, which every security appliance vendor on the planet now claims to be able to detect or prevent, even if they can’t clearly define it. Two more favorites are “sophisticated” and “motivated.” These terms are used often to describe attacks, without honoring the fact that the degree of difficulty involved in an attack is very relative to the audience who is analyzing it. While a skilled defender might not consider an attack sophisticated, the attack may still be very advanced for a non-technical person. Furthermore, an attacker is only as sophisticated or motivated as their objective requires. If their tactics allows them to achieve their goals, then the attacker was motivated and sophisticated enough.

Unfortunately, “intelligence” is becoming one of these dirty words. You don’t have to look far to find a company or product that claims to provide “incredible insight through advanced network intelligence” or “the answer to network defense through thorough threat intelligence.” However, even though intelligence has become the latest major buzzword in network defense, I think that it is important when used appropriately. After all, intelligence IS a crucial part of network defense strategy.

So, how do we get away from using “intelligence” as a dirty word? I think the answer lies in carefully identifying what types of intelligence we are producing.

Intelligence has many definitions depending on the application. The definition that most closely aligns to information security is drawn from Department of Defense Joint Publication 1-02, and says that “intelligence is a product resulting from the collection, processing, integration, evaluation, analysis, and interpretation of available information concerning foreign nations, hostile or potentially hostile forces or elements, or areas of actual or potential operations .”

While this definition might not fit perfectly in all instances (particularly the part about information concerning foreign nations since an attacker might be domestic), it does provide the all-important framing required to begin thinking about generating intelligence. The key component of this definition is that intelligence is a product. This doesn’t mean that it is bought or sold for profit, but more specifically, that it is produced from collected data, based upon a specific requirement. This means that an IP address, or the registered owner of that address, or the common characteristics of the network traffic generated by that IP address are not intelligence products. When those things are combined with context through the analysis process and delivered to meet a specific requirement, they become an intelligence product.

In information security, we are generally most concerned with the development of threat intelligence products. These products seek to gather data to support the creation of an intelligence product that can be used to make determinations about the nature of a threat. What is lost on most is that there are actually three major subsets of threat intelligence: strategic, operational, and tactical intelligence.

Strategic Intelligence is information related to the strategy, policy, and plans of an attacker at a high level. Typically, intelligence collection and analysis at this level only occurs by government or military organizations in response to threats from other governments or militaries. With that said, larger organizations are now developing these capabilities, and some of these organizations now sell strategic intelligence as a service. This is focused on the long-term goals of the force supporting the individual attacker or unit. Artifacts of this type of intelligence can include policy documents, war doctrine, position statements, and government, military, or group objectives.

Operational Intelligence is information related to how an attacker or group of attackers plans and supports the operations that support strategic objectives. This is different from strategic intelligence because it focuses on narrower goals, often more timed for short-term objectives that are only a part of the big picture. While this is, once again, usually more within the purview of government or military organizations, it is common that individual organizations will fall victim to attackers who are performing actions aimed at satisfying operational goals. Because of this, some public organizations will have visibility into these attacks, with an ability to generate operational intelligence. Artifacts of this type of intelligence are similar, but often more focused versions of artifacts used for the creation of strategic intelligence.

Tactical Intelligence refers to the information regarding specific actions taken in conducting operations at the mission or task level. This is where we dive into the tools, tactics, and procedures used by an attacker, and where 99% of information security practitioners will focus their efforts. It is here that the individual actions of an attacker or group of attackers are analyzed and collected. This often includes artifacts such as indicators of compromise (IP addresses, file names, text strings) or listings of attacker specific tools. This intelligence is the most transient, and becomes outdated quickly.

The discussion of these types of threat intelligence naturally leads us to another recently popularized dirty word, “attribution.”

Attribution occurs when the actions of an adversary are actually tied back to a physical person or group. The issue with this word arises when information security practitioners attempt to perform attribution as a sole function of intrusion detection without the right resources. It is important to realize that detection and attribution aren’t the same thing, and because of this, detection indicators and attribution indicators aren’t the same thing. Detection involves discovering incidents, where as attribution involves tying those incidents back to an actual person or group. While attribution is most certainly a positive thing, it cannot be done successfully without the correlation of strategic, operational, and tactical threat intelligence data.

Generally speaking, this type of intelligence collection and analysis capability is not present within most private sector organizations without an incredibly large amount of visibility or data sharing from other organizations. The collection of indicators of compromise from multiple network attacks to generate tactical intelligence is an achievable goal. However, collecting and analyzing data from other traditional sources such as human intelligence (HUMINT), signals intelligence (SIGINT), and geospatial intelligence (GEOINT) isn’t within the practical capability of most businesses. Furthermore, even organizations that might have this capability are often limited in their actions by law. Of course, there are some companies that exist who are producing high quality attribution intelligence, so there are exceptions to the rule.

Intelligence is a tremendously valuable thing, and when it is used in the proper context, it shouldn’t have to be a dirty word. The key to not misusing this word in your organization is to ensure that you are focused on intelligence that you actually have the capability to collect, analyze, and utilize.

 

** Note: This content originally appeared on the InGuardians Labs blog. I'm reposting it here since I've changed employment.

Just as I was about to pack up from my home office and walk downstairs and go to bed last night, I happened to stray on the Twitter and find that Mandiant had released it's detailed report on the Chinese espionage group it is calling APT1. The excitement overwhelmed me and I wound up staying up for a few more hours to read the entire report, check some of the indicators against other indicators I had, and read some of the reaction on Twitter.

First of all, I have to tip my hat to Mandiant on a really well put together report. Before joining InGuardians, I spent several years of my career in the DoD, and have read a lot of intelligence reports. I've also had the pleasure (misfortune) to handle my fair share of chinese-related incidents. With that in mind, I can assert that the APT1 report is top notch. As an organization, and as individuals, Mandiant and its employees are exposing themselves to a great deal of risk by publishing this data, which I'm sure they aren't taking lightly.

The success of Mandiant in the creation of this intelligence product is evident, but as an industry, now is not the time to rest on our laurels and bask in the glory of exposing PLA Unit 61398. The information published in the report isn't very useful if it isn't made actionable. That said, if you are responsible for network security monitoring in your organization, how can you make use of these indicators?

Making Intelligence Actionable with the Intrusion Kill Chain

In order to effectively utilize indicators of compromise (IOCs), I turn to the framework provided by the Intrusion Kill Chain and US DOD JP 3-13 on Information Assurance. There has been a significant focus on the application of the intrusion kill chain in the past year or so, and while it's certainly not a silver bullet, it is a nice tool for determining how defensive technologies can be used, and how to choose where indicators should be deployed.

This framework focuses on specific defensive capabilities. I won't rehash them fully (you can read the docs for that, linked at the bottom of this post), but briefly, they are:

  • Detect: Can you see/find it?
  • Deny: Can you stop it from happening?
  • Disrupt: Can you stop it while it’s happening?
  • Degrade: Can you make it not worth it?
  • Deceive: Can you trick them [the adversary]?
  • Destroy: Can you blow it up?

The framework argues that the capabilities of detect, deny  disrupt, degrade, deceive, and destroy can be mapped to different phases of a network attack to form a course of actions matrix. The common phases of attack are recon, delivery, exploitation, installation, C2, and actions/objectives. Recognizing that different organizations use different models to represent the phases of a network attack, you can plug any model in here to generate actions from this framework. For instance, you could also use the attack phases mentioned in the APT1 report, which are initial recon, initial compromise, establish foothold, escalate privileges, internal recon, move laterally, and maintain presence.

The results of the course of actions matrix are a mapping of what defensive mechanisms you can use to employ indicators. Given any particular attack technique or piece of malware, you should be able to come up with something mirroring the following to determine what courses of action you have available to you, and where IOCs related to a particular technique or piece of malware can be deployed.

killchaintable

Table 1: Course of Actions Matrix

In this matrix, defensive capabilities are shown on the top horizontal axis, and phases of an attack are shown in the left vertical axis. At each point where a capability intersects with a phase, an action that can be used to apply the defensive capability to the attack phase is identified. For example, we see that in the recon phase, a firewall ACL could be used to deny the adversaries from completing his goal, which might be an attempting connection to a specific server. In another example, we see that a DEP solution could be used to disrupt the adversaries’ ability to complete their objective, which might be to exfiltrate data from the targeted network.

NOTE: Although the intrusion kill chain mentions the destroy capability, it has been left out of this post. The destroy capability falls beyond the scope of NSM, unless of course, you have your own fleet of Predator drones configured to act in harmony with your IDS alerts.

Keep in mind that the table above is meant to be an example that list a variety of different defensive technologies, some of which that can be used along with indicators. The course of actions matrix isn’t meant to be a solitary entity that defines the scope of every possible attack, but rather, a framework that can be used to assess what actions you can take to respond to various threats based upon the intelligence you have at hand. You will note that some areas have multiple actions available, and the actions you take will depend upon what tools and data you have at your disposal. In addition, there may be instances where no actions exist to address certain capabilities within the kill chain. Specifically, it’s very common to only be able to find detect or deny actions, without being able to develop anything to disrupt, degrade, or deceive. In a perfect world, everything would be denied and we would only employ detection mechanisms as a backup. You should always aim to satisfy the detect and deny capabilities first.

Conclusion

Ultimately, the value of an intelligence product isn't realized if no one takes action on it. If you are in an organization where you are concerned that you may be a target of APT1, then you should read the Mandiant report and use a framework like the intrusion kill chain to determine how you can best make use of the indicators they have provided. Actionable intelligence isn't the answer to the problem, it's merely a mechanism used to achieve a goal. In this case, that goal is protecting your network and the information assets within it.

 

References