The following is a guest post from my good friend and colleague Patrick Perry. Patrick makes a lot of great points here about the value of triage, recognizing cognitive bias, and the importance of questioning. These are all framed through the lens of emergency medicine and a recent experience of his. You can find Patrick on Twitter at @pjbperry.  - CS


I read Chris Sanders' recent blog post on investigations and prospective data collection with great interest. Before I explain why this is I should reveal some of my biases. I think transparency is something that is always important in understanding what place or frame of mind someone is coming from and we all have biases. First, Chris is my boss, a mentor and someone I consider a friend. In fairness to the reader just something I think they should know. Second, I was an emergency medical technician (EMT) in college and volunteered several hours a week on a very active rescue squad ambulance. This is not to confuse me with an ER doctor. I went on calls ranging from stabbings to motor vehicle accidents to people in cardiac arrest. In this way I have a different perception of emergency medicine than most people. Finally, I have worked as a security analyst in one form or another for several years now. Doing a bit with everything from intel to incident response to thinking alot about triage in a console. These biases help me form the following opinions. Chris makes a very good arguement for what he calls prospective collection and how to better apply a medical practioner's approach to what we do as security analysts. I think he is on to something and the blog post made think in great detail about something I went through recently where I felt the emergency medical system could learn a lot from a SOC. While Chris focused more broadly on medical care I want to focus on emergency medical care.

My story really starts just over six weeks ago. I am a heavy sleeper so it was very strange to wake up around 6AM on a Sunday with difficulty breathing. Not just the stuffy nose kind of thing but the panic inducing feeling that my throat was closing on me. I tried to take some ibuprofen but was unable to swallow because of the level of constriction. All I could wonder was if I had been stung by something in my sleep and had developed an anaphylactic reaction. I live about 35 minutes away from a pretty good hospital so my wife and I got in the car and headed to the ER. On the way my ability to speak continued to worsen and my voice had physically changed such that it sounded like I had something in my throat. Upon arrival we got in line at the first check in station. I was now gasping for air like Lando Calirisian when Chewie was choking him out in the Empire Strikes Back. For some reason I started to think about myself as an alert and how a SOC would handle something like this. It was immediately outrageous to me that while I was struggling with breathing there were people in front of me with a sore arm and another with an upset stomach. I had to wait while insurance information was collected and people were asked why they were there. By the time I got to the desk my wife was speaking for me and I was given paperwork to fill out and told to wait. It occurred to me this was my first interaction with the hospital and they weren't doing any sort of triage at this point. Just collecting payment information. In the infosec world if you have a queue of alerts and you approach them serially you are doing something wrong. A high fidelity exfil alert should generally take precedence over an alert for a policy violation. As a responder I want to start distinguishing between priorities as soon as possible. This particular ER could improve here.

I then was able to make my way to the actual triage nurse after the people that were in front of me. I was asked a couple of questions and I conveyed that I was having difficulty breathing but still had an open airway. I was asked if I could swallow. I said I could swallow water but not food. I was also asked if I had been sick. I told the nurse I had a sore throat for the past three days. This is where I believe my experience as an EMT and the nurse's own biases started to work against me. Having been an EMT, even though I was panicked about my airway, I kept trying to remain calm. I think the nurse mistook this calm for "this is no big deal". I also think he immediately latched on to the sore throat being related to whatever was effecting me, thus resulting in less concern. These are mistakes I see many junior analysts make as well as managers with no experience in the trenches. The mistake of judging a situation based on people's reactions rather than the evidence at hand. I told the nurse calmly that I felt like my airway was closing and I was losing my ability to breath. As an EMT treating this it would immediately get my attention regardless of the patient's level of calmness. An inability to breath is a serious problem, just like one of those handful of alerts seasoned analysts see and immediately get a bad feeling about because it almost always means you have caught an actor mid-exfil or something of that level. New analysts are prone to not worry about more serious alerts and instead be more concerned on less worrisome ones where they have been conditioned to be upset by the environment. An example of this might be a concerned user contacting the CIRT because they noticed an unfamiliar icon on their desktop and are convinced based on a news report that this must have been maliciously placed there by an attacker. They happen to speak to a junior analyst or manager with no experience who hears the user's panic. They in turn panic because this must be really serious or why else would this user be so upset. Finally, an investigation reveals that an admin had installed Chrome for the user the day before trying to be helpful while the user was out. People are bad judges of lots of things. Let evidence lead you, not emotion. Good analysts know this and I wish the triage nurse in my experience also did.

Finally, I was in a hospital bed where another nurse (after another 45 minutes, I mean I only had a sore throat, right?) administered steroids via inhaler and liquid ibuprofen drip to help any swelling in the throat. I started to feel like I could breath again normally within a few minutes and then I waited and waited ...and waited for a doctor to show up. After about two hours a resident (this is a teaching hospital) came in to see me. I told her I was feeling better now but did not know what the root cause was. I was asked a surprisingly low number of questions. Not where had I been or what had I been doing recently? "Eat anything strange?", I was asked. The problem with this is you are relying on the patient to tell you what is strange. That is tantamount to asking a user in the HR department with a compromised system if they noticed any anomalous CPU usage recently. The doctors need to find out everything I ate recently and tell me if it is strange. I had a cantalope late the night before and later learned that can cause anaphylaxis in rare cases but I didn't mention it because I was unaware that might be a strange thing to eat. I knew shellfish could cause anaphylaxis and I had eaten some two days prior so I mentioned that even though it seemed to me to be too far away to be causing my current condition. The resident immediately latched onto this and assured me I probably had a small piece of shell stuck in my throat that had probably caused the whole episode but as I was feeling fine now there wasn't much else to do. This immediately reminded me of a junior analyst. Find something that might fit the description of what could have occurred and call it a day. As an analyst you want to be sure you have identified and then collected all of the relevant data sources that can help you come to a proper conclusion. In this case, the resident failed to adequately collect information and then seized on the first possibility. It also did not go unnoticed by me that by treating me with steroids and ibuprofen drip before ever looking in my throat the ER had committed the cardinal sin in a SOC of just running an AV scan on a system that is acting suspiciously enough that further investigation is warranted. It would now be much harder to know for sure what had been going on in my throat.

At about this time the supervising ER doctor came in. I immediately noticed more probing questions being asked. Additionally, this doctor wanted to see my throat so I got endoscoped (scope through the nostril down to the top of the larynx) twice. Why twice? Because the resident wanted a turn as she had never done it before. It was as enjoyable as it sounds. I am a sucker for learning experiences. The supervising doctor of course saw no problem with my airway as I had already been treated but did notice some white areas around the base of my tongue he was concerned about. He ultimately suggested a diagnosis of cancer before referring me to an ear nose and throat (ENT) doctor to check it out. After a couple of weeks of panic the ENT was able to see me and saw nothing he considered abnormal. The specialist asked even further probing questions and in the end thinks that as I had mowed a two acre hay field the night before my original episode that I had experienced an extreme reaction to a pollen allergy. While this was a relief to hear this was information neither the resident in the ER nor the ER doctor ever got because they did not ask enough questions. Ultimately, my impression from the final ER doctor was that a diagnosis was needed so he saw something strange in an area of which he was not an expert and thought it was the worst thing. While maybe being great at emergency medicine it seems as though he was acting as a junior analyst in his capacity to examine my throat. This is understandable. If you are unsure of what is going on though whether it be in a medical setting or in an active IR, suggesting it is the worst thing imaginable to concerned parties is probably not a responsible decision.

In summary, I understand this same situation could have played out differently at a different ER but I bet it also could have gone similarly at a lot of different ERs. Emergency medicine can at least be reminded by a good CIRT on the importance of collecting appropriate data, letting data lead the investigation, following up on those loose strings to pull, remaining calm, being honest about your assessment abilities in a specialized investigation, and finally getting the triage process better up front. It is an emergency after all.

**TL;DR - You can download FlowBAT at **

Above all else, we know that network visibility is critical in the modern threat landscape. In a perfect world organizations could collect and store mountains of full packet capture data for long periods of time. Unfortunately, storing packet data for an extended duration doesn't scale well, and it can be cost prohibitive for even for small networks. Even if you can afford to store some level of packet data, parsing and filtering through it to perform network or security analysis can be incredibly time consuming.

Network Flow data is ideal because it provides a significant amount of context with minimal storage overhead. This means that it can be stored for an extended amount of time, providing historical data that can account for every connection into and out of your network. The storage footprint is so minimal, that most organizations measure the amount of flow data they store by years rather than by hours or days. This provides an unbelievable amount of flexibility while investigating events or breaches that have occurred in the past.



Even though flow data is so versatile, its adoption has been slowed because most of the tools available for performing flow data analysis can be challenging to use. These tools are often command-line based and lack robust analysis features. After all, spending all day examining data that looks like this isn't always the most efficient:


We developed the Flow Basic Analysis Tool (FlowBAT) to address this need by providing an analyst-focused graphical interface for analyzing flow data. FlowBAT was designed by analysts, for analysts and provides a feature set that is applicable to many use cases, including Network Security Monitoring, Intrusion Detection, Incident Response, Network Forensics, System and Network Troubleshooting, and Compliance Auditing.



FlowBAT Features

FlowBAT has several features that make it applicable for analysts with multiple goals operating in a wide array of environments. This includes:

Multiple Deployment Scenarios

FlowBAT can be deployed in an existing SiLK environment or as a part of a new installation. You can deploy SiLK in two ways: local or remote. A local FlowBAT installation requires that you install FlowBAT on the same system as your SiLK database. This method is fastest as it doesn't have to traverse the network to query flow data. A remote FlowBAT installation allows you to install SiLK on a system separate from your SiLK database. In this scenario, FlowBAT queries flow data by utilizing the SSH capability of an existing server running SiLK. This allows FlowBAT to transmit queries and receive data securely with minimal additional setup. You can even deploy FlowBAT on a cloud based system as long as it can reach your SiLK database over SSH. In either deployment scenario, FlowBAT can be up and running in a matter of minutes.

Quick Query Interface

Analysis is all about getting data and getting it quickly. While we have included an interface that makes this easy for seasoned flow analysis pros, we also provide a query interface designed to present all of the possible data retrieval options to analysts who might not be as experienced, or who simply want a more visual way of getting the data they want. The quick query interface allows the analyst to iteratively build data queries and easy tweak them after the queries initial execution. This means that you don't have to spend a ton of time looking up commands to get the exact data you want.


Rapid Data Pivoting

When you are hunting through large amounts of data, you need to move quick. Using traditional analysis techniques this requires a lot of typing, multiple open terminals, and constantly copying and pasting commands. With FlowBAT, you can simply click on field values in a set of query results to add additional parameters to your existing query or to create a new query. For example, while looking at a series of flow records associated with an individual service on a specific port, you can click on a specific IP address and pivot to a data set showing all communication to and from that host. From there, you can click on a timestamp from an individual flow record and automatically retrieve flow records occurring five minutes before and five minutes after that time frame. This can all occur within a matter of seconds. This same workflow using traditional command-line analysis tools could easily take several minutes or more.

Saved Queries and Dashboards

Analysts often find queries they like and will reuse them constantly. In the past, this resulted in dozens of text files thrown haphazardly in multiple directories that contain commonly used queries. Using FlowBAT's saved queries feature, you can store these queries right in the tool and execute them with a single click. Furthermore, if you use these saved queries very often, you can save them to an interactive dashboard and schedule them to periodically update over set time intervals. Using this mechanism you can stay constantly up to date on specific activity on your network. For instance, you can configure a saved query that is used to identify web servers on your network. With this query configured to execute on a periodic basis, you will be the first to know if an unexpected device starts receiving data on a common HTTP port on your network.


Graphing and Statistical Capability

One of the most powerful features of flow data is the power to generate statistics from aggregated data. This can yield very powerful detection capabilities such as:

  • Calculating Device Throughput
  • Identifying Top Talking Devices
  • Identifying Odd Inbound/Outbound Traffic Rations
  • Examining Throughput Distribution Across Network Segments
  • Locating Unusual Periodic and Repetitive Traffic Patterns

While some of these statistics are best interpreted as text, sometimes it becomes easier to interpret statistical data when it is presented visually. FlowPlotter allows you to send statistical data to a graphing engine to automatically generate bar, line, column, and pie charts. This level of visualization is useful for analysis, and for helping to provide visual examples of flow data in various forms of reporting that may be required as a part of your analysis duties.


Flexible Data Display

Every analysts processes and interprets information differently. As analysts, one thing we hate is when a tool locks you into viewing data in a very specific manner. With FlowBAT, we designed the display of flow data so that it is extremely customizable to each analysts needs. With this in mind, you can rearrange, sort, and add/remove columns as needed. This provides an analysis experience that can be customized to your personal taste, as well as to specific scenarios.


FlowBAT Demos

 CLI Query Mode


Guided Query Mode


Manipulating Data


Pivoting with Data


Downloading and Installing FlowBAT

We've spent quite a bit of time making sure that FlowBAT is easy to install and getting running with. You can download FlowBAT on the FlowBAT Downloads pages, and you can find explicit installation instructions on the FlowBAT Installation page. General support links and a user manual (still being written) can be found at We are excited to see what you think of FlowBAT, so please give it a try and let us know what you think!

Jason and I recently had the opportunity and pleasure to speak at MIRCon 2014. The topic of the presentation was "Applied Detection and Analysis with Flow Data." We had a great time talking about effective ways to use flow data for NSM, as well as introducing the world to FlowBAT.


You can view the slides from this presentation here:

I recently gave a presentation at BSides Augusta on the topic "Defeating Cognitive Bias and Developing Analytic Technique".


At the center of many defensive processes is human analysis. While we spend a lot of time performing analysis, we don’t spend nearly enough time thinking about how we perform analysis. The human mind is poorly wired to deal with most complex analysis scenarios effectively. This can be attributed to the inherent complexity of solving technical issues where so many uncertainties exist, and also to the cognitive and unmotivated biases that humans unknowingly apply to their analysis. All of these things can diminish our ability to get from alert to diagnoses quickly and effectively.

In this presentation, I plan to discuss the mental challenges associated with technical defensive analysis by leveraging research associated with traditional intelligence analysis. I will discuss how complexity can overwhelm analysis, how cognitive bias can negatively influence analysis, and techniques for recognizing and overcoming these limiting factors. This will include a few fun mental exercises, as well as an overview of several strategic questioning techniques including analysis of competing hypothesis, red cell analysis, and “what if” analysis. Finally, I will discuss several structured analysis techniques, including two different techniques that can be used specifically for NSM analysis: relational investigation and differential diagnosis.


The video for this presentation can be found here:

The slides for this presentation can be found here:

In my previous post where I introduced FlowPlotter I showed you some of the basics behind generating data from SiLK and turning it into nice looking visual representation that leveraged Google's Visualization API. That is all wrapped up into an easy to run tool called "FlowPlotter" that runs directly off of data piped from rwfilter to SiLK via stdout. Today I'm pleased to reveal some of the D3 visualizations I promised earlier. Previously I had planned to replicate the Google visualizations with D3 to provide an alternative look, but instead I've gone forward to fill in some of the features that Google Charts is lacking.

D3.js is a popular javascript library that is often used to streamline the way people make visualizations with their data. These visualizations are usually friendly with most browsers and can be augmented to do just about anything you can think of. For the integration of these new graphs, I've went with the same "template" approach as I did with the Google charts, and again with the data being embedded into the html file that is generated. In most D3 visualizations people will host the data on a web server and reference that data directly with d3.js. But, due to how the data is used here, it is more useful for us to be able to generate it for immediate use. In theory you can reference files on the local system, but most browsers (like Chrome) have strict security settings that prohibit this. Don't expect other more forgiving browsers (Firefox) to allow this down the road either. For all of those reasons, FlowPlotter D3 templates have been designed to contain all data within a single HTML file.

Force Directed Link Graphs

So far I've generated two new chart types in D3. The first is a force-directed link graph similar to what is generated with Afterglow. We discussed in the last FlowPlotter post that the purpose behind FlowPlotter is to have a streamlined way of generating visualizations for all of your data and having it easily available to anyone that needs to view it. While Afterglow is great to work with, it doesn't really "flow" with the rest of what we're generating, and streamlining it for wide use can be challenging. The D3 force-directed link graph that I've integrated with SiLK shows the relationship between 2 nodes, with links between of varying opacity based on the value of a 3rd value. Simply put, it is created from a 3 column CSV with a source, target, and value in that order. The overall design was borrowed partly from the work of d3noob.

Force-Directed Link Graph showing all traffic from a specific country

Figure 1: Force-Directed Link Graph showing all traffic to or from a specific country

To create a force-directed link graph directly from rwfilter output, you'll need to run the following:

rwfilter ../Sampledata/ --scc=kr --proto=0- --type=all --pass=stdout | ./ forceopacity sip dip distinct:dport 100 > forcetest.html

You'll notice that there are 4 options after forceopacity. This chart module relies on rwstats to generate this data. The communication you see in Figure 1 represents the top 100 sip-dip pairs for all traffic from South Korea based on total amount of distinct destination port communication attempts. The greater the number, the darker the link between nodes. Similarly you could do something more practical like generating a force-directed link graph between the top 100 sip-dport pairs for outgoing web traffic to South Korea:

rwfilter ../Sampledata/ --dcc=kr --proto=0- --type=outweb --pass=stdout | ./ forceopacity sip dport bytes 100 > forceport.html

There is still a lot I would like to do with FlowPlotter's force-directed link graph. For one, it currently does not replace the usefulness of Afterglow as the big benefit of Afterglow lies in being able to specify "rules" about how your data will be displayed, as seen in ANSM. Down the road I would like to be able to create something similar, however it isn't something that is viable from the current stdout data output. Another feature that is currently only in testing is the opacity options of links between nodes. They are representative of the values in the 3rd column for the connection between source and target node, but they could be much more. The combination of these would require a nice legend to read, but would allow for some genuinely useful visualizations.

Automated Asset Discovery

With this FlowPlotter update I'm also showing off the new Asset Discovery addition. In chapter 14 of ANSM we discuss methods of gathering "friendly intelligence" and how the importance of having friendly intelligence can't be overstated. Friendly intelligence is information related to the assets that you are tasked with protecting. In almost all environments, the overall picture of those assets frequently changes, so gathering friendly intelligence is a continuous effort. Since flow data provides are very good high level view of the communications between all devices, it is ideal for creating a general idea of the assets you are monitoring on a given network. It also is easily queried, giving you the ability to keep a constant vigilance on new assets. To generate an asset model from SiLK data, I leverage the friendly intel script provided in chapter 14 of ANSM. To stay with the template-based approach to FlowPlotter, I've made the script a standalone part of FlowPlotter that is referenced within The premise behind doing this is to ensure that changes and additions to the asset model script can be easily made without extreme additions to Like the previous graph, I examined many d3 galleries out there to find the best examples of great ways to visualize parent-child relationships, with the final design borrowing from the collapsible tree layout in the d3 galleries. Whereas the force-directed link graph leveraged a CSV as input, I am using JSON as input for the asset discovery module. This is mainly because extensive parent-child relationships aren't easy to represent with CSV data, but instead I can literally specify the data and fields in a JSON tree, with a good idea of how the data will turn out in the collapsible tree layout results. Also, I like JSON because by name is Jason, and it just feels right.

Auto-generated Asset List based on one hour of traffic.

Figure 2: Auto-generated Asset List based on one hour of traffic.

To create an asset model you can either do it from new data piped in via stdout like other FlowPlotter modules or you can create it based on existing rwfilter files. The important thing to remember is that the dataset that you are looking at should be fairly large, or at least representative of normal network traffic over time. For steady network traffic with production systems, an hour might be fine. For some large enterprise environments, shorter timespans will work.

To generate the data from an rwfilter stdout, do the following:

rwfilter --start-date=2014/02/06 --proto=0- --type=all --pass=stdout | ./ assetdiscovery > assetlist.html

Alternatively you can create the rwfilter file, then generate the asset model from that:

rwfilter --start-date=2014/02/06 --proto=0- --type=all

cat | ./ assetdiscovery > assetlist.html

The asset model currently attempts to identify various service types (HTTP, DNS, VPN, etc) but is receiving regular updates to include more data that could benefit from regular continuous regeneration. This is currently limited to friendly intelligence, but soon I will be including some additional less friendly detections from some of the statistical approaches discussed in ANSM and using some of the less frequently discussed SiLK features. These will all be wrapped up within FlowPlotter as usual. Be sure to digest the README for FlowPlotter to understand the additions a little better. For instance, you may notice that perhaps you're not seeing EVERY "asset" from your network for a given service. That might be due to thresholding which is set by default to look at the "Servers" that display at least 1% of the total "Server-like" traffic representative of a particular service. This default can be changed easily with options located in the README. As always, you should be aware of how far abstracted from your data you really are when generating visualizations like this.

As usual, I'm open to any and all requests for additions or advice on optimizing anything that currently exists in FlowPlotter.

In Applied NSM we wrote quite a bit about both LogStash and Snorby. Recently, a reader of this blog posed the question if there is a way to pivot from Snorby events to your Bro logs in Logstash. Well, it is actually quite easy.

To start, you'll obviously need to have a functional instance of Snorby and Logstash with Bro logs (or any other relevant parsed PSTR-type data) feeding in to it. In this case, we'll assume that to reach our logstash dashboard, we go to

If you've played around with Snorby, you've probably noticed the lookup sources when you click on an IP in an event. There are default sources, but you can also add additional sources. Those are configured from the Administration tab at the top right.

Lookup Sources

Figure 1: Lookup Sources Option in Snorby

At this time you're only allowed to use two different variables, ${ip} and ${port}, and from my testing, you can only use them once in a given lookup URL. Normally this isn't an issue if, for instance, you are doing research on an IP from your favorite intel source and you're feeding the IP in as a variable on the URL. However, if for some reason you're needing to feed it in twice, referencing ${ip} will only fill the initial variable, and leave the second blank. This becomes an issue with parsed Bro logs in Logstash.

Though not immediately obvious, Logstash allows for you to control the search from the URL as such:

Test it out!

In Snorby, the lookup source for this would be:${ip}

However, lets assume you wanted to find logs where existed as either the source or destination address:

That search is perfectly valid for use in Logstash, and the URL functions as expected. However, if you make the lookup source in Snorby to match that by pairing ${ip}, you'll notice that only the initial use of ${ip} is completed, and the use on id.resp.h is left blank. For the reason, I would recommend applying the simpler method of just using the message field (the unanalyzed raw log essentially) in the query. We'll also add in ${port} to narrow down more.${ip} AND ${port})

That lookup source will look for any instance of the ip and matching port within the log. Word of warning that there is a small chance you'll get an unexpected blip somewhere with this method, as it is literally looking for that IP address and that port number as any unique string within the message, in any order. Hypothetically, you could have an odd log that has the selected IP, but the port actually is the response_body_len or some other integer field, though that would be extremely unlikely.

You'll notice that the lookup source is defaulting to the past 24 hours, and is displaying the entire log by default. If we want to change this, we're going to have to utilize a slightly different method by using a "scripted dashboard". The two Logstash searches below search for traffic that contains "" and "80". However, there are a few differences.,id.orig_h,id.orig_p,id.resp_h,id.resp_p,_type,host,uri

In testing both of these, the difference is immediate. You have the ability to custom output fields, which is essential. You'll also notice that on the second URL we're looking at /dashboard/script/logstash.js instead of /dashboard/file/logstash.json. "Scripted Dashboards" are entirely javascript, thus allowing for full control over the output. While we're adding custom fields, lets go ahead and also say that we want to look at the past 7 days (&from=7d) along with referencing the 7 days based on timestamp collected instead of ingestion time (&timefield=@timestamp).,id.orig_h,id.orig_p,id.resp_h,id.resp_p,_type,host,uri

Like we did before, lets go ahead and add that as a lookup source with the following URL in snorby:${ip}%20AND%20${port})&from=7d&timefield=@timestamp&fields=@timestamp,id.orig_h,id.orig_p,id.resp_h,id.resp_p,_type,host,uri


Examine the URL for example syntax

Figure 2: Note the Special URL Syntax in this LogStash Example

To summarize, here are some of the possible lookup sources I've mentioned, with the advanced lookup being my recommendation for these purposes:

3 Possible Lookup Sources for Pivoting to Logstash

Figure 3: Three Possible Lookup Sources for Pivoting to Logstash

While a time existed when people believed they could prevent sophisticated attackers from breaching advanced networks, I think most people are finally coming around to the fact that prevention eventually fails. No matter how diligent you are in your prevention effort, eventually a motivated attacker can gain some level of access to your network. This is exactly why rapid detection and response is a critical part of network defense strategy.

As a network defender, I think it can be very easy to get discouraged and to feel like you are fighting a losing battle when dealing with sophisticated attackers. Because of that, I’m always searching for the “big wins” that come from detection and response efforts. For that reason, I was thrilled to read this year’s version of Mandiant's annual M-Trends report. I’d like to share a few excerpts from M-Trends 2014: Beyond the Breach that illustrate why.

Last February we released a report that exposed activity that linked People’s Liberation Army Unit 61398, which we call APT1, with espionage against private US companies. M-Trends 2014 tells us that this resulted in an operational pause for APT1.

Mandiant’s release of the APT1 report coincided with the end of Golden Week, a seven-day government holiday that follows Chinese New Year. APT1 was inactive for 41 days longer than normal following Golden Week and the release of Mandiant’s report compared to patterns of activity from 2010–2012.

When APT1 did become active again, it operated at lower-than-normal levels before returning to consistent intrusion activity nearly 160 days after its exposure.

APT1 2013 Activity (Source: M-Trends 2014)

APT1 2013 Activity (Source: M-Trends 2014)

By exposing the tools, tactics, and procedures used by APT1, this pause in operations indicates that this unit was forced to reassess their operations. Beyond this, we find that associated groups that share similar TTPs, such as APT12, were also forced to temporarily pause or slow their operations.

APT12 briefly resumed operations five days after its exposure in The New York Times article, but did not return to consistent intrusion activity until 81 days later. Even then, APT12 waited until roughly 150 days after the article’s release to resume pre-disclosure levels of activity.

APT12 2013 Activity (Source: M-Trends 2014)

APT12 2013 Activity (Source: M-Trends 2014)

These findings prove that even heavily motivated and highly resourced attackers are susceptible to having their operational tempo slowed when their TTP’s are exposed. While we will probably never stop attackers of this nature, degrading their effectiveness to this degree is one of the big wins that keep me motivated as a defender.

In a world where the bad guy always seems to win and we are constantly working in a reactionary state, the ability to share strategic, operational, and tactical threat intelligence gives us a stronger collective ability to rapidly detect and respond to sophisticated attackers. I think there are a few things that need to happen in order for an organization to be able to do this effectively. First, you have to understand the relationship between the different types of threat intelligence. Next, you should understand the impact on the adversary of successfully detecting specific indicator types. Then, you should be able to collect, organize, and share your threat intelligence effectively. When you can do these things well you will be postured to effectively apply threat intelligence to your detection capabilities so that you can really bring the pain to sophisticated adversaries.

You can find some other interesting data points by reading M-Trends 2014: Beyond the Breach.

While signature-based detection isn’t enough on its own to protect a network against structured attackers, it is one of the cornerstones of a successful network security monitoring capability. If you have ever managed a signature-based detection mechanism then you know that you can’t simply turn the device on and let it work its magic. A signature-based detection mechanism like Snort, Suricata, or various commercial offerings requires careful deployment and tuning of signatures (often called rules) to ensure that you receive reliable high quality alerts. The ultimate goal, while never fully achievable, is that every alert that is generated by these detection mechanisms represents the activity it was designed to detect. This means that every alert will be actionable, and you won’t waste a lot of time chasing down false positives.

A key to achieving high fidelity alerting is to have the ability to answer the question, “Do my signatures effectively detect that activity they are designed to catch?” In order to answer that question, we need a way to track the performance of individual signatures. In the past, organizations have relied on counting false positive alerts in order to determine how effective a signature is. While this is a step in the right direction, I believe that it is one step short of a useful statistic. In this article I will discuss how a statistic called precision can be used to measure the effectiveness of IDS signatures, regardless of the platform that you are using.


Before we get started, I think its necessary to describe a few terms that are discussed throughout this article. If we are going to play ball, it helps if we are standing on the same field. This article is meant to be platform-agnostic however, so the equipment you use doesn’t matter.

There are four main data points that are generally referred to when attempting to statistically confirm the effectiveness of a signature: true positives, false positives, true negatives, and false negatives.

 True Positive (TP): An alert that has correctly identified a specific activity. If a signature was designed to detect a certain type of malware, and an alert is generated when that malware is launched on a system, this would be a true positive, which is what we strive for with every deployed signature.

False Positive (FP): An alert has incorrectly identified a specific activity. If a signature was designed to detect a specific type of malware, and an alert is generated for an instance in which that malware was not present, this would be a false positive.

True Negative (TN): An alert has correctly not been generated when a specific activity has not occurred. If a signature was designed to detect a certain type of malware, and no alert is generated without that malware being launched, then this is a true negative, which is also desirable. This is difficult, if not impossible, to quantify in terms of NSM detection.

False Negative (FN): An alert has incorrectly not been generated when a specific activity has occurred. If a signature was designed to detect a certain type of malware, and no alert is generated when that malware is launched on a system, this would be a false negative. A false negative means that we aren’t detecting something we should be detecting, which is the worst-case scenario. False negatives aren’t detectable unless you have a secondary detection mechanism or signature designed to detect that activity that was missed.


The Fallacy of Relying Solely on False Positives

Historically, the measure of success for measuring the effectiveness for IDS signatures was the count of false positives over an arbitrary time period compared to a certain threshold. Let’s consider a scenario for this statistic. In this scenario, we’ve deployed signature 100, which is designed to detect the presence of a specific command and control (C2) string in network traffic. Over the course of 24 hours, this signature has generated 500 false positive alerts, which results in an FP rate of 20.8/hour. You have determined that an acceptable threshold for false positives is 0.5/hour, so this signature would be deemed ineffective based upon that threshold.

At face value, this approach may seem effective, but it is flawed. If you remember from a few paragraphs ago, the question we want to answer centers on how well a signature can effectively detect the activity it is designed to catch. When we only consider FP’s, then we are actually only considering how well a signature DOESN’T detect what it is designed to catch. While it sounds similar, detecting whether or not something succeeds is not the same as detecting whether or not it fails.

Earlier, we stated that signature 100 was responsible for 500 false positives over a 24-hour period, meaning that the signature was not effective. However, what if I told you that this signature was also responsible for 5,000 true positives during the same time period? In this case, the FPs were well worth it in order to catch 5,000 other actual infections! This is the key to precision -- taking both false positives and true positives into consideration.


Defining Precision

At this point it’s probably important to say that I’m not a statistician. As a matter of fact, I only took the bare minimum required math courses in college, but fortunately precision isn’t too tricky to understand. In short, precision refers to the ability to identify positive results. Often referred to as positive predictive value, precision is shown by determining the proportion of true positives against all positive results (both true positives and false positives) with the formula:

Precision = TP / (TP + FP)


This value is expressed as a percentage, and can be used to determine the probability that, given an alert being generated, the activity that has been detected has truly occurred. Therefore, if a signature has a high precision and an alert is generated, then the activity has very likely occurred. On the other hand, if a signature with low precision generates an alert, then it is unlikely that the activity you are attempting to detect has actually occurred. Of course, this is an aggregate statistic, so more data points will generate result in a more reliable precision stat.

In the example we identified in the previous section, we could determine that signature 100, which had 500 FPs and 5000 TPs has a precision of 90.9%.


5000 / (5000 + 500) = 90.9


That's pretty good!


Using Precision in the SOC

Now that you understand precision, it is helpful to think about ways it can be used in a SOC environment. After all, you can have the best statistics in the world but if you don’t use them effectively then they aren’t providing a lot of value. This begins by actually making the commitment to track a precision value for each signature being deployed for a given detection mechanism. Again, it doesn’t matter what detection mechanism you are using. Whenever you deploy a new signature it would be assigned a precision of 0. As analysts review alerts generated by signatures, they will mark the alerts as either a TP or FP. As time goes on, these data points help to determine the precision value, expressed as a percentage.


Precision for Signature Tuning

First and foremost, precision provides a mechanism that can be used to track the effectiveness of individual signatures. This provides quite a bit of flexibility that you can tune to the sensitive of your own staffing. If you are a small organization with only a single analyst you can choose to only keep signatures with a precision greater than a high value like >80%. If you have a larger staff and have more resources to devote to alert review, or if you simply have a low risk tolerance, you can choose to keep signatures with lower precision values like >30%. In some instances, you can even define your precision threshold based upon the nature of the threat you are dealing with. Unstructured or commodity signatures could allow for precision >90% but structured or targeted signatures might only be considered effective if they are >20% precision.

There are a few different paths that can be taken when you determine that a signature is ineffective based upon the precision standards you have set. One path would be to spend more time researching whatever it is that you are trying to detect with the signature, and attempt to strengthen it by making your search more specific, or adding more specifics to the content of the signature. Alternatively, you can configure low precision signatures to simply log instead of an alert (an option in a lot of popular IDS software), or you can configure your analysis console (SIEM, etc.) to hide alerts below your acceptable precision threshold.


Precision for Signature Comparison

On occasion, you may run into situations where you have more than one signature capable of detecting similar activity. When that happens, precision provides a really useful statistic for comparing those signatures. While you some instances might warrant deployment of both signatures, resources on detection systems can be scarce at times. As such, it might be helpful for performance to only enable to most useful signature. In a scenario where signature 1 has a precision of 85% and signature 2 has a precision of 22%, then signature 1 would be the best candidate for deployment.


Precision for Analyst Confidence

Whenever you write a signature, you should consider the analyst who will be reviewing alerts that are generated from that signature. After all, they are the ultimate consumer for the product you are generating, so any context or help you can provide the analyst in relation to the signature is helpful.

One item in particular that can help the analysts is the confidence rating associated with a signature. A lot of detection mechanisms will allow their signature to include some form of confidence rating. In some cases, this is automatically calculated based on magic, which is how a lot of commercial SIEMs operate. In other scenarios, it might be a blank field where you can input a numeric value, or a simple low/medium/high option. I would propose that a good value for this field would be the precision statistic. While precision might not be the only factor that goes into a confidence calculation, I certainly believe it has to be one of them. Another approach might be to have a signature developer subjective confidence rating (human confidence) and a precision-based confidence rating (calculated confidence.)

A confidence rating based on precision is useful for an analyst because it helps them direct their efforts while analyzing alerts. That way, when an alert is generated and the analyst is on the fence regarding whether it might be a TP or FP, they can rely somewhat on that confidence value to help them determine how much investigative effort should be put into attempting to classify the alert. If they aren’t finding much supporting evidence that provides the alert is a TP and it has a precision-based confidence rating of 10%, then it is likely a FP and it might not be worth it to spend too much time on the alert. On the other hand, if it has a rating of 98%, then the analyst should spend a great deal of time and perform thorough due diligence to determine if the alert is indeed a true positive.



Signature management can be a tricky business, but it is such a critical part of NSM detection and analysis. Because of that it should be a heavily tracked and constantly improving process. While the precision statistic is by no means a new concept, it is one that I think most people are unaware of as it relates to signature management. Having seen precision used very successfully to help manage IDS signatures in a few varying and larger organizations, I think it is a statistic that could find a home in many SOC environments.

notebook-angle1In Applied NSM, I write about the importance of creating a culture of learning in a SOC. This type of culture goes well beyond simply sending analysts to training or buying a few books here and there. It requires dedication to the concepts of mutual education, shared success, and servant leadership. It’s all about every single moment in a SOC being spent teaching or learning, no exceptions. While most analysts live for the thrill of hunting adversaries, the truth is that the majority of an analysts time will be spent doing less exciting tasks such as reviewing benign alerts, analyzing log data, and building detection signatures. Because of this, it can be difficult to find ways to foster teaching and learning during these times. I’ve struggled with this personally as an analyst and as a technical manager leading analyst teams. In the article, I’m going to talk about an item that I’ve use to successfully enhance the culture of learning in SOC environments I’ve worked in: a spiral notebook.



At some point while I was working at the Bowling Green, KY enclave of the Army Research Laboratory I realized that I had a lot of sticky notes laying around. These sticky notes contained items that you might expect analysts to write down during the course of an investigation: IP addresses, domain names, strings, etc. I decided that I should really keep my desk a bit cleaner and organize my notes better in case I needed to go back to them for any reason. I figured the best way to do this was to just put them in a notebook that I kept with me, so I walked to the Dollar General next door and bought a college-ruled spiral notebook for 89 cents. Henceforth, any notes I took while performing analysis stayed in this notebook.

Over time, I began to expand the use of my notebook. Instead of just scribbling down notes, I started writing down more information. This included things like hypotheses related to alerts I was currently investigating and notes about limitations of tools that I experienced during an investigation. I became aware of the value of this notebook pretty quickly. As a senior analyst on staff, one of my responsibilities was to help train our entry-level analysts along with my normal analyst duties. Invariably, these analysts would run into some of the same alerts that I had already looked at. I found that when this happened and these analysts had questions, I could quickly look back at my notebook and explain my investigation of the event as it occurred. The notebook had become an effective teaching tool.

Fast forward a little bit, and I had been promoted to the lead of the intrusion detection team. The first thing I did was to walk down to the Dollar General and buy a couple dozen notebooks for all of my analysts. Let’s talk about a few reasons why.


The Analyst Notebook for Learning and Teaching

As an analyst, I am constantly striving for knowledge. I want to learn new things so that I can enhance my skill and refine my processes so that I am better equipped to detect the adversary when they are attacking my network. This isn’t unique to me; it is a quality present in all NSM analysts to some degree. This is so important to some analysts that they will seek new employment if they feel that they aren’t in a learning environment or being given an adequate opportunity to grow their skills. I surveyed 30 of my friends and colleagues who had left an analysis job to pursue a similar job at another employer within the past five years. I asked them what was it that ultimately caused them to leave. The most logical guess would be that the analysts were following a bigger paycheck or a promotion. Believe it or not, that was true for only 23% of respondents. However, an overwhelming 63% of those surveyed cited a lack of educational opportunity as the main reason they left their current analysis job.



Figure 1: Survey Results for Why Analysts Leave Their Jobs


This statistic justifies a need for a culture of learning. I think that the analyst notebook can be a great way to foster that learning environment because I know that it has been a great learning tool for me. This really clicked for me when I started asking a very important question as I was performing analysis.




This lead to questions like this:

  • Why does it take so long to determine if a domain is truly malicious?
  • Why do IP addresses in this friendly range always seem to generate these types of alerts?
  • Why do I rarely ever use this data type?
  • Why don’t I have a data type that lets me do this?
  • Why does this detection method never seem to do what it is supposed to di?
  • Why don’t I have any additional intelligence sources that can help with an investigation like this one?
  • Why don’t I have more context with this indicator?
  • Why do I need to keep referencing these old case numbers? Is there a relationship there?
  • Why do I keep seeing this same indicator across multiple attacks? Is this tied to a single adversary?


These questions are very broad, but they are all about learning your processes and generating ideas. These ideas can lead to conversations, and those conversations can lead to change that helps you more effectively perform the task at hand. Small scribbles in a notebook can lead to drastic changes in how an organization approaches their collection, detection, and analysis processes.  In the Applied NSM book, I write about two different analysis methods called the Differential Diagnosis and Relational Investigation. These are methods that I use and teach, and they both started from notes in my notebook. As a matter of fact, a lot of the concepts I describe in Applied NSM can be found in a series of analyst notebook that I’ve written in over the years. As an example, Figure 2 shows an old analyst notebook of mine that contains a note that led to the concept of Sensor Visibility Diagrams, which I described in Chapter 3 of Applied NSM and implemented in most every place I’ve worked since then.



Figure 2: A Note that Led to the Development of Sensor Visibility Maps


I think the formula is pretty simple. Write down notes as you are doing investigations, regularly question your investigative process by revisiting those notes, and write down the ideas you generate from that questioning. Eventually, you can flesh those ideas out more individually or in a group setting. You will learn more about yourself, your environment, and the process of NSM analysis.

Analyst Notebook Best Practices

If I’ve done a good job so far, then maybe I’ve already convinced you that you need to walk down to the store and buy a bunch of notebooks for you and all of your friends. Before you get started using your notebook, I want to share a few “best practices” for keeping an analyst notebook. Of course, these are based upon my experience and have worked for the kind of culture I’ve wanted to create (and be a part of). Those things might be different for you, so your mileage may vary.

Let’s start with a few ground rules for how the notebook should be used. These are very broad, but I think they hold true to most scenarios for effective use.

  1. The Analyst Notebook should always be at your desk when you are. If it isn’t, then you won’t write in it while you performing analysis, which is the whole point.
  2. The Analyst Notebook should go to every meeting with you. If an analyst is in a meeting then there is a good chance they will have to discuss a specific investigation, their analysis process, or the tools they use. Having the notebook handy is important so that relevant notes can be analyzed.
  3. The Analyst Notebook should never leave the office.  This is for two reasons. First, this tends to result in the notebook being left at home on accident. Second and most important, I believe strongly in a separation of work and home life. There is nothing wrong with putting in a few extra hours here and there, but all work and no play ultimately lead to burnout. This is a serious problem in our industry where it seems as though people are expected to devote 80+ hours a week to their craft. Being an analyst is what I do, but isn’t who I am. The analyst notebook stays at work. When you go home, focus on your family and other hobbies.
  4. Every entry in the Analyst Notebook should be dated. Doing this consistently will ensure that you can piece together items from different dates when you are trying to reconstruct a long-term stream of events. It will also allow you to tie specific notes (whether they are detailed or just scribbles of IP addresses) to case numbers.
  5. An analyst must write something in the notebook every day. In general, the investigative process should yield itself to plenty of notes. If you find that isn’t the case, then start daydreaming a bit. What do you wish one of your tools could do that it can’t? What type of data do you wish you had? How much extra time did you spend on a task because of a process inefficiency? These things can come in handy later when you are trying to justify a request to management or senior analysts. This is hard to get in the groove of at first, but it is a habit that can be developed.
  6. The analyst notebook should be treated as a sensitive document. The notebook will obviously contain information that could cause an issue for you or your constituents if a party with malicious intent obtained it. Accordingly, the notebook should be protected at all times.  This means you shouldn’t forget it on the subway or leave it sitting on the table at Chick-Fil-A while you go to the bathroom.


Effectively Using an Analyst Notebook

Finally, let’s look at some strategies for effective analyst notebook use that I think are applicable to people of different experience levels. My goal is for this article to be valuable to new analysts, senior analysts, and analyst managers alike. With that in mind, this section is broken into a section for each group.


I’m a New Analyst!

Because new analysts are often overwhelmed by the amount of data and the number of tools they have to work with, I encourage you to write down every step they take during an investigation so you can look back and review the process holistically. While this does take a bit of time, it will eventually result in time savings by making your analysis process more efficient overall. This isn’t meant to describe why you took the actions you took and be overly specific, but should help you replay the what steps you took so you can piece together your process. This might look like Figure 3.

 notebook-Figure3Figure 3: A Note Detailing the Analysis Steps Taken

This exercise becomes more useful when you are paired with more senior analysts so that they can review the investigation that was completed. This provides the opportunity to walk the senior analyst through your thought process and how you arrived at your conclusion. This also provides the senior analyst with the ability to describe what they would have done differently.

This type of pairing is a valuable tool for overcoming some of the initial process hurdles that can trip up new analysts. For instance, I’ve written at length about how most new analysts tend to operate with a philosophy that all network traffic is malicious unless you can prove it is not. As most experienced analysts know, this isn’t a sustainable philosophy, and in truth all network traffic should be treated as inherently good unless you can prove it is malicious. I’ve noticed that by having new analysts take detailed notes and then review those notes and their process with a more experienced analyst, they get over this hump quicker.


I’m a Senior Analyst!

As a more experienced analyst, it is likely that you’ve already refined your analysis technique quite a bit. Because of this, in addition to general analysis duties you are likely going to be tasked with bigger picture thinking, such as helping to define how collection, detection, and analysis can be improved. In order to help with this, I recommend writing down items relevant to these processes for later review. This can include things like tool deficiencies, new tool ideas, data collection gaps, and rule/signature tweak suggestions.

As an example, consider a scenario where you are performing analysis of an event and notice that a user workstation that normally acts as a consumer of data has recently become a producer of data. This means that a device that normally downloads much more than it uploads from external hosts has now begun doing the opposite, and is uploading much more than it downloads. This might eventually lead you to find that this host is participating in commodity malware C2 or is being used to exfiltrate data. In this case, you may have stumbled upon this host because of an IDS alert or through manual hunting activities. When the investigation heats up you probably aren’t going to have time to flesh out your notes on how you can identify gaps in your detection capability, but you can quickly use an analyst notebook to jot down a note about how you think there might be room to develop a detection capability associated with detecting changing in producer/consumer (upload/download) ratio.


Figure 4: A Note Detailing a Potential Detection Scenario

You may not yet realize it but you’ve identified a use case for a new statistical detection capability. Now you can go back later and flesh this idea out and then present it to your peers and superiors for detection planning purposes and possible capability development. This could result in the development of a new script that works off of flow data, a new Bro script that detects this scenario out right, or some other type of statistical detection capability.


I’m an Analyst Manager!

As a manager of analysts, you are probably responsible for general analysis duties, helping to refine the SOC processes, and for facilitating training amongst your analysts. While I still recommend keeping an analyst notebook at this level for the reasons already discussed, the real value of the analyst notebook here is your ability to leverage the fact that all of the analysts you manage are keeping their notebooks. In short, it is your responsibility to ensure that the notes your analysts keep in their notebooks become useful by providing them opportunities to share their thoughts. I think there are a couple of ways to do this.

The first way to utilize the notebooks kept by your analysts is through periodic case review meetings. I think there are several ways to do this, but one method I’ve grown to like is to borrow from medical practitioners and have Morbidity and Mortality (M&M) style case reviews. I’ve written about this topic quite extensively, and you can read more about this here ( or in Chapter 15 of the Applied NSM book. These meetings are especially important for junior level analysts who are just getting their feet wet.

Another avenue for leveraging your analysts and their notebooks is through periodic collection and detection planning meetings. In general, organizations tasked with NSM missions should be doing this regularly, and I believe that analysts should be highly involved with the process. This gives your senior level analyst an avenue to share their ideas based upon their work in the trenches. I speak to collection planning and the “Applied Collection Framework” in Chapter 2 of the Applied NSM book, and I speak to detection planning a bit here while discussing ways to effectively use APT1 indicators:



I sincerely believe that a simple spiral notebook can be an analyst’s best tool for professional growth. If you are a junior analyst, use it as a tool to develop your analytic technique. If you are a senior analyst, use it as a tool to refine NSM-centric processes in your organization. If you are responsible for leading a team of analysts, ensure that your team is provided the opportunity to use their notebook effectively to better themselves, and your mission. An $0.89 cent notebook can be more powerful than you’d think.