**TL;DR - You can download FlowBAT at **

Above all else, we know that network visibility is critical in the modern threat landscape. In a perfect world organizations could collect and store mountains of full packet capture data for long periods of time. Unfortunately, storing packet data for an extended duration doesn't scale well, and it can be cost prohibitive for even for small networks. Even if you can afford to store some level of packet data, parsing and filtering through it to perform network or security analysis can be incredibly time consuming.

Network Flow data is ideal because it provides a significant amount of context with minimal storage overhead. This means that it can be stored for an extended amount of time, providing historical data that can account for every connection into and out of your network. The storage footprint is so minimal, that most organizations measure the amount of flow data they store by years rather than by hours or days. This provides an unbelievable amount of flexibility while investigating events or breaches that have occurred in the past.



Even though flow data is so versatile, its adoption has been slowed because most of the tools available for performing flow data analysis can be challenging to use. These tools are often command-line based and lack robust analysis features. After all, spending all day examining data that looks like this isn't always the most efficient:


We developed the Flow Basic Analysis Tool (FlowBAT) to address this need by providing an analyst-focused graphical interface for analyzing flow data. FlowBAT was designed by analysts, for analysts and provides a feature set that is applicable to many use cases, including Network Security Monitoring, Intrusion Detection, Incident Response, Network Forensics, System and Network Troubleshooting, and Compliance Auditing.



FlowBAT Features

FlowBAT has several features that make it applicable for analysts with multiple goals operating in a wide array of environments. This includes:

Multiple Deployment Scenarios

FlowBAT can be deployed in an existing SiLK environment or as a part of a new installation. You can deploy SiLK in two ways: local or remote. A local FlowBAT installation requires that you install FlowBAT on the same system as your SiLK database. This method is fastest as it doesn't have to traverse the network to query flow data. A remote FlowBAT installation allows you to install SiLK on a system separate from your SiLK database. In this scenario, FlowBAT queries flow data by utilizing the SSH capability of an existing server running SiLK. This allows FlowBAT to transmit queries and receive data securely with minimal additional setup. You can even deploy FlowBAT on a cloud based system as long as it can reach your SiLK database over SSH. In either deployment scenario, FlowBAT can be up and running in a matter of minutes.

Quick Query Interface

Analysis is all about getting data and getting it quickly. While we have included an interface that makes this easy for seasoned flow analysis pros, we also provide a query interface designed to present all of the possible data retrieval options to analysts who might not be as experienced, or who simply want a more visual way of getting the data they want. The quick query interface allows the analyst to iteratively build data queries and easy tweak them after the queries initial execution. This means that you don't have to spend a ton of time looking up commands to get the exact data you want.


Rapid Data Pivoting

When you are hunting through large amounts of data, you need to move quick. Using traditional analysis techniques this requires a lot of typing, multiple open terminals, and constantly copying and pasting commands. With FlowBAT, you can simply click on field values in a set of query results to add additional parameters to your existing query or to create a new query. For example, while looking at a series of flow records associated with an individual service on a specific port, you can click on a specific IP address and pivot to a data set showing all communication to and from that host. From there, you can click on a timestamp from an individual flow record and automatically retrieve flow records occurring five minutes before and five minutes after that time frame. This can all occur within a matter of seconds. This same workflow using traditional command-line analysis tools could easily take several minutes or more.

Saved Queries and Dashboards

Analysts often find queries they like and will reuse them constantly. In the past, this resulted in dozens of text files thrown haphazardly in multiple directories that contain commonly used queries. Using FlowBAT's saved queries feature, you can store these queries right in the tool and execute them with a single click. Furthermore, if you use these saved queries very often, you can save them to an interactive dashboard and schedule them to periodically update over set time intervals. Using this mechanism you can stay constantly up to date on specific activity on your network. For instance, you can configure a saved query that is used to identify web servers on your network. With this query configured to execute on a periodic basis, you will be the first to know if an unexpected device starts receiving data on a common HTTP port on your network.


Graphing and Statistical Capability

One of the most powerful features of flow data is the power to generate statistics from aggregated data. This can yield very powerful detection capabilities such as:

  • Calculating Device Throughput
  • Identifying Top Talking Devices
  • Identifying Odd Inbound/Outbound Traffic Rations
  • Examining Throughput Distribution Across Network Segments
  • Locating Unusual Periodic and Repetitive Traffic Patterns

While some of these statistics are best interpreted as text, sometimes it becomes easier to interpret statistical data when it is presented visually. FlowPlotter allows you to send statistical data to a graphing engine to automatically generate bar, line, column, and pie charts. This level of visualization is useful for analysis, and for helping to provide visual examples of flow data in various forms of reporting that may be required as a part of your analysis duties.


Flexible Data Display

Every analysts processes and interprets information differently. As analysts, one thing we hate is when a tool locks you into viewing data in a very specific manner. With FlowBAT, we designed the display of flow data so that it is extremely customizable to each analysts needs. With this in mind, you can rearrange, sort, and add/remove columns as needed. This provides an analysis experience that can be customized to your personal taste, as well as to specific scenarios.


FlowBAT Demos

 CLI Query Mode


Guided Query Mode


Manipulating Data


Pivoting with Data


Downloading and Installing FlowBAT

We've spent quite a bit of time making sure that FlowBAT is easy to install and getting running with. You can download FlowBAT on the FlowBAT Downloads pages, and you can find explicit installation instructions on the FlowBAT Installation page. General support links and a user manual (still being written) can be found at We are excited to see what you think of FlowBAT, so please give it a try and let us know what you think!

Jason and I recently had the opportunity and pleasure to speak at MIRCon 2014. The topic of the presentation was "Applied Detection and Analysis with Flow Data." We had a great time talking about effective ways to use flow data for NSM, as well as introducing the world to FlowBAT.


You can view the slides from this presentation here:

While signature-based detection isn’t enough on its own to protect a network against structured attackers, it is one of the cornerstones of a successful network security monitoring capability. If you have ever managed a signature-based detection mechanism then you know that you can’t simply turn the device on and let it work its magic. A signature-based detection mechanism like Snort, Suricata, or various commercial offerings requires careful deployment and tuning of signatures (often called rules) to ensure that you receive reliable high quality alerts. The ultimate goal, while never fully achievable, is that every alert that is generated by these detection mechanisms represents the activity it was designed to detect. This means that every alert will be actionable, and you won’t waste a lot of time chasing down false positives.

A key to achieving high fidelity alerting is to have the ability to answer the question, “Do my signatures effectively detect that activity they are designed to catch?” In order to answer that question, we need a way to track the performance of individual signatures. In the past, organizations have relied on counting false positive alerts in order to determine how effective a signature is. While this is a step in the right direction, I believe that it is one step short of a useful statistic. In this article I will discuss how a statistic called precision can be used to measure the effectiveness of IDS signatures, regardless of the platform that you are using.


Before we get started, I think its necessary to describe a few terms that are discussed throughout this article. If we are going to play ball, it helps if we are standing on the same field. This article is meant to be platform-agnostic however, so the equipment you use doesn’t matter.

There are four main data points that are generally referred to when attempting to statistically confirm the effectiveness of a signature: true positives, false positives, true negatives, and false negatives.

 True Positive (TP): An alert that has correctly identified a specific activity. If a signature was designed to detect a certain type of malware, and an alert is generated when that malware is launched on a system, this would be a true positive, which is what we strive for with every deployed signature.

False Positive (FP): An alert has incorrectly identified a specific activity. If a signature was designed to detect a specific type of malware, and an alert is generated for an instance in which that malware was not present, this would be a false positive.

True Negative (TN): An alert has correctly not been generated when a specific activity has not occurred. If a signature was designed to detect a certain type of malware, and no alert is generated without that malware being launched, then this is a true negative, which is also desirable. This is difficult, if not impossible, to quantify in terms of NSM detection.

False Negative (FN): An alert has incorrectly not been generated when a specific activity has occurred. If a signature was designed to detect a certain type of malware, and no alert is generated when that malware is launched on a system, this would be a false negative. A false negative means that we aren’t detecting something we should be detecting, which is the worst-case scenario. False negatives aren’t detectable unless you have a secondary detection mechanism or signature designed to detect that activity that was missed.


The Fallacy of Relying Solely on False Positives

Historically, the measure of success for measuring the effectiveness for IDS signatures was the count of false positives over an arbitrary time period compared to a certain threshold. Let’s consider a scenario for this statistic. In this scenario, we’ve deployed signature 100, which is designed to detect the presence of a specific command and control (C2) string in network traffic. Over the course of 24 hours, this signature has generated 500 false positive alerts, which results in an FP rate of 20.8/hour. You have determined that an acceptable threshold for false positives is 0.5/hour, so this signature would be deemed ineffective based upon that threshold.

At face value, this approach may seem effective, but it is flawed. If you remember from a few paragraphs ago, the question we want to answer centers on how well a signature can effectively detect the activity it is designed to catch. When we only consider FP’s, then we are actually only considering how well a signature DOESN’T detect what it is designed to catch. While it sounds similar, detecting whether or not something succeeds is not the same as detecting whether or not it fails.

Earlier, we stated that signature 100 was responsible for 500 false positives over a 24-hour period, meaning that the signature was not effective. However, what if I told you that this signature was also responsible for 5,000 true positives during the same time period? In this case, the FPs were well worth it in order to catch 5,000 other actual infections! This is the key to precision -- taking both false positives and true positives into consideration.


Defining Precision

At this point it’s probably important to say that I’m not a statistician. As a matter of fact, I only took the bare minimum required math courses in college, but fortunately precision isn’t too tricky to understand. In short, precision refers to the ability to identify positive results. Often referred to as positive predictive value, precision is shown by determining the proportion of true positives against all positive results (both true positives and false positives) with the formula:

Precision = TP / (TP + FP)


This value is expressed as a percentage, and can be used to determine the probability that, given an alert being generated, the activity that has been detected has truly occurred. Therefore, if a signature has a high precision and an alert is generated, then the activity has very likely occurred. On the other hand, if a signature with low precision generates an alert, then it is unlikely that the activity you are attempting to detect has actually occurred. Of course, this is an aggregate statistic, so more data points will generate result in a more reliable precision stat.

In the example we identified in the previous section, we could determine that signature 100, which had 500 FPs and 5000 TPs has a precision of 90.9%.


5000 / (5000 + 500) = 90.9


That's pretty good!


Using Precision in the SOC

Now that you understand precision, it is helpful to think about ways it can be used in a SOC environment. After all, you can have the best statistics in the world but if you don’t use them effectively then they aren’t providing a lot of value. This begins by actually making the commitment to track a precision value for each signature being deployed for a given detection mechanism. Again, it doesn’t matter what detection mechanism you are using. Whenever you deploy a new signature it would be assigned a precision of 0. As analysts review alerts generated by signatures, they will mark the alerts as either a TP or FP. As time goes on, these data points help to determine the precision value, expressed as a percentage.


Precision for Signature Tuning

First and foremost, precision provides a mechanism that can be used to track the effectiveness of individual signatures. This provides quite a bit of flexibility that you can tune to the sensitive of your own staffing. If you are a small organization with only a single analyst you can choose to only keep signatures with a precision greater than a high value like >80%. If you have a larger staff and have more resources to devote to alert review, or if you simply have a low risk tolerance, you can choose to keep signatures with lower precision values like >30%. In some instances, you can even define your precision threshold based upon the nature of the threat you are dealing with. Unstructured or commodity signatures could allow for precision >90% but structured or targeted signatures might only be considered effective if they are >20% precision.

There are a few different paths that can be taken when you determine that a signature is ineffective based upon the precision standards you have set. One path would be to spend more time researching whatever it is that you are trying to detect with the signature, and attempt to strengthen it by making your search more specific, or adding more specifics to the content of the signature. Alternatively, you can configure low precision signatures to simply log instead of an alert (an option in a lot of popular IDS software), or you can configure your analysis console (SIEM, etc.) to hide alerts below your acceptable precision threshold.


Precision for Signature Comparison

On occasion, you may run into situations where you have more than one signature capable of detecting similar activity. When that happens, precision provides a really useful statistic for comparing those signatures. While you some instances might warrant deployment of both signatures, resources on detection systems can be scarce at times. As such, it might be helpful for performance to only enable to most useful signature. In a scenario where signature 1 has a precision of 85% and signature 2 has a precision of 22%, then signature 1 would be the best candidate for deployment.


Precision for Analyst Confidence

Whenever you write a signature, you should consider the analyst who will be reviewing alerts that are generated from that signature. After all, they are the ultimate consumer for the product you are generating, so any context or help you can provide the analyst in relation to the signature is helpful.

One item in particular that can help the analysts is the confidence rating associated with a signature. A lot of detection mechanisms will allow their signature to include some form of confidence rating. In some cases, this is automatically calculated based on magic, which is how a lot of commercial SIEMs operate. In other scenarios, it might be a blank field where you can input a numeric value, or a simple low/medium/high option. I would propose that a good value for this field would be the precision statistic. While precision might not be the only factor that goes into a confidence calculation, I certainly believe it has to be one of them. Another approach might be to have a signature developer subjective confidence rating (human confidence) and a precision-based confidence rating (calculated confidence.)

A confidence rating based on precision is useful for an analyst because it helps them direct their efforts while analyzing alerts. That way, when an alert is generated and the analyst is on the fence regarding whether it might be a TP or FP, they can rely somewhat on that confidence value to help them determine how much investigative effort should be put into attempting to classify the alert. If they aren’t finding much supporting evidence that provides the alert is a TP and it has a precision-based confidence rating of 10%, then it is likely a FP and it might not be worth it to spend too much time on the alert. On the other hand, if it has a rating of 98%, then the analyst should spend a great deal of time and perform thorough due diligence to determine if the alert is indeed a true positive.



Signature management can be a tricky business, but it is such a critical part of NSM detection and analysis. Because of that it should be a heavily tracked and constantly improving process. While the precision statistic is by no means a new concept, it is one that I think most people are unaware of as it relates to signature management. Having seen precision used very successfully to help manage IDS signatures in a few varying and larger organizations, I think it is a statistic that could find a home in many SOC environments.

notebook-angle1In Applied NSM, I write about the importance of creating a culture of learning in a SOC. This type of culture goes well beyond simply sending analysts to training or buying a few books here and there. It requires dedication to the concepts of mutual education, shared success, and servant leadership. It’s all about every single moment in a SOC being spent teaching or learning, no exceptions. While most analysts live for the thrill of hunting adversaries, the truth is that the majority of an analysts time will be spent doing less exciting tasks such as reviewing benign alerts, analyzing log data, and building detection signatures. Because of this, it can be difficult to find ways to foster teaching and learning during these times. I’ve struggled with this personally as an analyst and as a technical manager leading analyst teams. In the article, I’m going to talk about an item that I’ve use to successfully enhance the culture of learning in SOC environments I’ve worked in: a spiral notebook.



At some point while I was working at the Bowling Green, KY enclave of the Army Research Laboratory I realized that I had a lot of sticky notes laying around. These sticky notes contained items that you might expect analysts to write down during the course of an investigation: IP addresses, domain names, strings, etc. I decided that I should really keep my desk a bit cleaner and organize my notes better in case I needed to go back to them for any reason. I figured the best way to do this was to just put them in a notebook that I kept with me, so I walked to the Dollar General next door and bought a college-ruled spiral notebook for 89 cents. Henceforth, any notes I took while performing analysis stayed in this notebook.

Over time, I began to expand the use of my notebook. Instead of just scribbling down notes, I started writing down more information. This included things like hypotheses related to alerts I was currently investigating and notes about limitations of tools that I experienced during an investigation. I became aware of the value of this notebook pretty quickly. As a senior analyst on staff, one of my responsibilities was to help train our entry-level analysts along with my normal analyst duties. Invariably, these analysts would run into some of the same alerts that I had already looked at. I found that when this happened and these analysts had questions, I could quickly look back at my notebook and explain my investigation of the event as it occurred. The notebook had become an effective teaching tool.

Fast forward a little bit, and I had been promoted to the lead of the intrusion detection team. The first thing I did was to walk down to the Dollar General and buy a couple dozen notebooks for all of my analysts. Let’s talk about a few reasons why.


The Analyst Notebook for Learning and Teaching

As an analyst, I am constantly striving for knowledge. I want to learn new things so that I can enhance my skill and refine my processes so that I am better equipped to detect the adversary when they are attacking my network. This isn’t unique to me; it is a quality present in all NSM analysts to some degree. This is so important to some analysts that they will seek new employment if they feel that they aren’t in a learning environment or being given an adequate opportunity to grow their skills. I surveyed 30 of my friends and colleagues who had left an analysis job to pursue a similar job at another employer within the past five years. I asked them what was it that ultimately caused them to leave. The most logical guess would be that the analysts were following a bigger paycheck or a promotion. Believe it or not, that was true for only 23% of respondents. However, an overwhelming 63% of those surveyed cited a lack of educational opportunity as the main reason they left their current analysis job.



Figure 1: Survey Results for Why Analysts Leave Their Jobs


This statistic justifies a need for a culture of learning. I think that the analyst notebook can be a great way to foster that learning environment because I know that it has been a great learning tool for me. This really clicked for me when I started asking a very important question as I was performing analysis.




This lead to questions like this:

  • Why does it take so long to determine if a domain is truly malicious?
  • Why do IP addresses in this friendly range always seem to generate these types of alerts?
  • Why do I rarely ever use this data type?
  • Why don’t I have a data type that lets me do this?
  • Why does this detection method never seem to do what it is supposed to di?
  • Why don’t I have any additional intelligence sources that can help with an investigation like this one?
  • Why don’t I have more context with this indicator?
  • Why do I need to keep referencing these old case numbers? Is there a relationship there?
  • Why do I keep seeing this same indicator across multiple attacks? Is this tied to a single adversary?


These questions are very broad, but they are all about learning your processes and generating ideas. These ideas can lead to conversations, and those conversations can lead to change that helps you more effectively perform the task at hand. Small scribbles in a notebook can lead to drastic changes in how an organization approaches their collection, detection, and analysis processes.  In the Applied NSM book, I write about two different analysis methods called the Differential Diagnosis and Relational Investigation. These are methods that I use and teach, and they both started from notes in my notebook. As a matter of fact, a lot of the concepts I describe in Applied NSM can be found in a series of analyst notebook that I’ve written in over the years. As an example, Figure 2 shows an old analyst notebook of mine that contains a note that led to the concept of Sensor Visibility Diagrams, which I described in Chapter 3 of Applied NSM and implemented in most every place I’ve worked since then.



Figure 2: A Note that Led to the Development of Sensor Visibility Maps


I think the formula is pretty simple. Write down notes as you are doing investigations, regularly question your investigative process by revisiting those notes, and write down the ideas you generate from that questioning. Eventually, you can flesh those ideas out more individually or in a group setting. You will learn more about yourself, your environment, and the process of NSM analysis.

Analyst Notebook Best Practices

If I’ve done a good job so far, then maybe I’ve already convinced you that you need to walk down to the store and buy a bunch of notebooks for you and all of your friends. Before you get started using your notebook, I want to share a few “best practices” for keeping an analyst notebook. Of course, these are based upon my experience and have worked for the kind of culture I’ve wanted to create (and be a part of). Those things might be different for you, so your mileage may vary.

Let’s start with a few ground rules for how the notebook should be used. These are very broad, but I think they hold true to most scenarios for effective use.

  1. The Analyst Notebook should always be at your desk when you are. If it isn’t, then you won’t write in it while you performing analysis, which is the whole point.
  2. The Analyst Notebook should go to every meeting with you. If an analyst is in a meeting then there is a good chance they will have to discuss a specific investigation, their analysis process, or the tools they use. Having the notebook handy is important so that relevant notes can be analyzed.
  3. The Analyst Notebook should never leave the office.  This is for two reasons. First, this tends to result in the notebook being left at home on accident. Second and most important, I believe strongly in a separation of work and home life. There is nothing wrong with putting in a few extra hours here and there, but all work and no play ultimately lead to burnout. This is a serious problem in our industry where it seems as though people are expected to devote 80+ hours a week to their craft. Being an analyst is what I do, but isn’t who I am. The analyst notebook stays at work. When you go home, focus on your family and other hobbies.
  4. Every entry in the Analyst Notebook should be dated. Doing this consistently will ensure that you can piece together items from different dates when you are trying to reconstruct a long-term stream of events. It will also allow you to tie specific notes (whether they are detailed or just scribbles of IP addresses) to case numbers.
  5. An analyst must write something in the notebook every day. In general, the investigative process should yield itself to plenty of notes. If you find that isn’t the case, then start daydreaming a bit. What do you wish one of your tools could do that it can’t? What type of data do you wish you had? How much extra time did you spend on a task because of a process inefficiency? These things can come in handy later when you are trying to justify a request to management or senior analysts. This is hard to get in the groove of at first, but it is a habit that can be developed.
  6. The analyst notebook should be treated as a sensitive document. The notebook will obviously contain information that could cause an issue for you or your constituents if a party with malicious intent obtained it. Accordingly, the notebook should be protected at all times.  This means you shouldn’t forget it on the subway or leave it sitting on the table at Chick-Fil-A while you go to the bathroom.


Effectively Using an Analyst Notebook

Finally, let’s look at some strategies for effective analyst notebook use that I think are applicable to people of different experience levels. My goal is for this article to be valuable to new analysts, senior analysts, and analyst managers alike. With that in mind, this section is broken into a section for each group.


I’m a New Analyst!

Because new analysts are often overwhelmed by the amount of data and the number of tools they have to work with, I encourage you to write down every step they take during an investigation so you can look back and review the process holistically. While this does take a bit of time, it will eventually result in time savings by making your analysis process more efficient overall. This isn’t meant to describe why you took the actions you took and be overly specific, but should help you replay the what steps you took so you can piece together your process. This might look like Figure 3.

 notebook-Figure3Figure 3: A Note Detailing the Analysis Steps Taken

This exercise becomes more useful when you are paired with more senior analysts so that they can review the investigation that was completed. This provides the opportunity to walk the senior analyst through your thought process and how you arrived at your conclusion. This also provides the senior analyst with the ability to describe what they would have done differently.

This type of pairing is a valuable tool for overcoming some of the initial process hurdles that can trip up new analysts. For instance, I’ve written at length about how most new analysts tend to operate with a philosophy that all network traffic is malicious unless you can prove it is not. As most experienced analysts know, this isn’t a sustainable philosophy, and in truth all network traffic should be treated as inherently good unless you can prove it is malicious. I’ve noticed that by having new analysts take detailed notes and then review those notes and their process with a more experienced analyst, they get over this hump quicker.


I’m a Senior Analyst!

As a more experienced analyst, it is likely that you’ve already refined your analysis technique quite a bit. Because of this, in addition to general analysis duties you are likely going to be tasked with bigger picture thinking, such as helping to define how collection, detection, and analysis can be improved. In order to help with this, I recommend writing down items relevant to these processes for later review. This can include things like tool deficiencies, new tool ideas, data collection gaps, and rule/signature tweak suggestions.

As an example, consider a scenario where you are performing analysis of an event and notice that a user workstation that normally acts as a consumer of data has recently become a producer of data. This means that a device that normally downloads much more than it uploads from external hosts has now begun doing the opposite, and is uploading much more than it downloads. This might eventually lead you to find that this host is participating in commodity malware C2 or is being used to exfiltrate data. In this case, you may have stumbled upon this host because of an IDS alert or through manual hunting activities. When the investigation heats up you probably aren’t going to have time to flesh out your notes on how you can identify gaps in your detection capability, but you can quickly use an analyst notebook to jot down a note about how you think there might be room to develop a detection capability associated with detecting changing in producer/consumer (upload/download) ratio.


Figure 4: A Note Detailing a Potential Detection Scenario

You may not yet realize it but you’ve identified a use case for a new statistical detection capability. Now you can go back later and flesh this idea out and then present it to your peers and superiors for detection planning purposes and possible capability development. This could result in the development of a new script that works off of flow data, a new Bro script that detects this scenario out right, or some other type of statistical detection capability.


I’m an Analyst Manager!

As a manager of analysts, you are probably responsible for general analysis duties, helping to refine the SOC processes, and for facilitating training amongst your analysts. While I still recommend keeping an analyst notebook at this level for the reasons already discussed, the real value of the analyst notebook here is your ability to leverage the fact that all of the analysts you manage are keeping their notebooks. In short, it is your responsibility to ensure that the notes your analysts keep in their notebooks become useful by providing them opportunities to share their thoughts. I think there are a couple of ways to do this.

The first way to utilize the notebooks kept by your analysts is through periodic case review meetings. I think there are several ways to do this, but one method I’ve grown to like is to borrow from medical practitioners and have Morbidity and Mortality (M&M) style case reviews. I’ve written about this topic quite extensively, and you can read more about this here ( or in Chapter 15 of the Applied NSM book. These meetings are especially important for junior level analysts who are just getting their feet wet.

Another avenue for leveraging your analysts and their notebooks is through periodic collection and detection planning meetings. In general, organizations tasked with NSM missions should be doing this regularly, and I believe that analysts should be highly involved with the process. This gives your senior level analyst an avenue to share their ideas based upon their work in the trenches. I speak to collection planning and the “Applied Collection Framework” in Chapter 2 of the Applied NSM book, and I speak to detection planning a bit here while discussing ways to effectively use APT1 indicators:



I sincerely believe that a simple spiral notebook can be an analyst’s best tool for professional growth. If you are a junior analyst, use it as a tool to develop your analytic technique. If you are a senior analyst, use it as a tool to refine NSM-centric processes in your organization. If you are responsible for leading a team of analysts, ensure that your team is provided the opportunity to use their notebook effectively to better themselves, and your mission. An $0.89 cent notebook can be more powerful than you’d think.




Session data is the summary of the communication between two network devices. Also known as a conversation or a flow, this summary data is one of the most flexible and useful forms of NSM data. If you were to consider full packet capture equivalent to having a recording of every phone conversation someone makes from a their mobile phone, then you might consider session data to be equivalent having a copy of the call log on the bill associated with that mobile phone. Session data doesn’t give you the “What”, but it does give you the “Who, Where, and When”.

When session or flow records are generated, at minimum, the record will usually include the standard 5-tuple: source IP address and port, the destination IP address and port, and the protocol being used. In addition to this, session data will also usually provide a timestamp of when the communication began and ended, and the amount of data transferred between the two devices. The various forms of session data such as NetFlow v5/v9, IPFix, and jFlow can include other information, but these fields are generally common across all implementations of session data.

There are a few different applications that have the ability to collect flow data and provide tools for the efficient analysis of that data. My personal favorite is the System for Internet-Level Knowledge (SiLK), from the folks at CERT NetSA ( In Applied NSM we use SiLK pretty extensively.

One of the best ways to learn about different NSM technologies is the Security Onion distribution, which is an Ubuntu-based distribution designed for quick deployment of all sorts of NSM collection, detection, and analysis technologies. This includes popular tools like Snort, Suricata, Sguil, Squert, Snorby, Bro, NetworkMiner, Xplico, and more. Unfortunately, SiLK doesn’t currently come pre-packaged with Security Onion. The purpose of this guide is to describe how you can get SiLK up and running on a standalone Security Onion installation.



To follow along with this guide, you should have already installed and configured Security Onion, and ensured that NSM services are already running. This guide will assume you’ve deployed a standalone installation. If you need help installing Security Onion, this installation guide should help:

For the purposes of this article, we will assume this installation has access to two network interfaces. The interface at eth0 is used for management, and the eth1 interface is used for data collection and monitoring.

Now is a good time to go ahead and download the tools that will be needed. Do this by visiting this URL and downloading the following

  • SiLK (3.7.2 as of this writing)
  • YAF (2.4.0 as of this writing)
  • Fixbuf (1.30 as of this writing)

Alternatively, you can download the packages directly from the command line with these commands:


This guide reflects the current stable releases of each tool. You will want to ensure that you place the correct version numbers in the URLs above when using wget to ensure that you are getting the most up to date version.


The analysis of flow data requires a flow generator and a collector. So, before we can begin collecting and analyzing session data with SiLK we need to ensure that we have data to collect. In this case, we will be installing the YAF flow generation utility. YAF generates IPFIX flow data, which is quite flexible. Collection will be handled by the rwflowpack component of SiLK, and analysis will be provided through the SiLK rwtool suite.

SiLK Workflow

Figure 1: The SiLK Workflow


To install these tools, you will need a couple of prerequisites. You can install these in one fell swoop by running this command:

sudo apt-get install glib2.0 libglib2.0-dev libpcap-dev g++ python-dev

With this done, you can install fixbuf using these steps:

1. Extract the archive and go to the newly extracted folder

tar –xvzf libfixbuf-1.3.0.tar.gz

cd libfixbuf-1.3.0/

2. Configure, make, and install the package

sudo make install


Now you can install YAF with these steps:

1. Extract the archive and go to the newly extracted folder

tar –xvzf yaf-2.4.0.tar.gz
cd yaf-2.4.0/

2. Export the PKG configuration path

export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig

3. Configure with applabel enabled

./configure --enable-applabel

4. Make and install the package

sudo make install

If you try to run YAF right now, you’ll notice an error. We need to continue the installation process before it will run properly. This process continues by installing SiLK with these steps:

1. Extract the archive and go to the newly extracted folder

tar –xvzf silk-3.7.2.tar.gz
cd silk-3.7.2/

2. Configure with a specified fixbuf path and python enabled

./configure --with-libfixbuf=/usr/local/lib/pkgconfig/ --with-python

3. Make and install the package

sudo make install

With everything installed, you need to make sure that all of the libraries we need are linked properly so that the LD_LIBRARY_PATH variable doesn’t have to be exported each time you use SiLK. This is can be done by creating a file named silk.conf in the /etc/ directory with the following contents:


To apply this change, run:

sudo ldconfig

Configuring SiLK

With everything installed, now we have to configure SiLK to use rwflowpack to collect the flow data we generate. We need three files to make this happen: silk.conf, sensors.conf, and rwflowpack.conf.


We will start by creating the silk.conf site configuration file. This file controls how SiLK parses data, and contains a list of sensors. It can be found in the previously unzipped SiLK installation tarball at silk-3.7.2/site/twoway/silk.conf. We will copy it to a directory that Security Onion uses to store several other configuration files:

sudo cp silk-3.7.2/site/twoway/silk.conf /etc/nsm/<$SENSOR-$INTERFACE>/

The site configuration file should work just fine for the purposes of this guide, so we won’t need to modify it.


The sensor configuration file sensors.conf is used to define the sensors that will be generating session data, and their characteristics. This file should be created at /etc/nsm/<$SENSOR-$INTERFACE>/sensors.conf. For this example, our sensors.conf will look like this:

probe S0 ipfix
  listen-on-port 18001
  protocol tcp
end probe
group my-network
end group
sensor S0
  ipfix-probes S0
  internal-ipblocks @my-network
  external-ipblocks remainder
end sensor

This sensors.conf has three different sections: probe, group, and sensor.

The probe section tells SiLK where to expect to receive data from for the identified sensor. Here, we’ve identified sensor S0, and told SiLK to expect to receive ipfix data from this sensor via the TCP protocol over port 18001. We’ve also defined the IP address of the sensor as the local loopback address, In a remote sensor deployment, you would use the IP address of the sensor that is actually transmitting the data.

The group section allows us to create a variable containing IP Blocks. Because of the way SiLK bins flow data, it is important to define internal and external network ranges on a per sensor basis so that your queries that are based upon flow direction (inbound, outbound, inbound web traffic, outbound web traffic, etc.) are accurate. Here we’ve defined a group called my-network that has two ipblocks, and You will want to customize these values to reflect your actual internal IP ranges.

The last section is the sensor section, which we use to define the characteristics of the S0 sensor. Here we have specified that the sensor will be generating IPFIX data, and that my-network group defines the internal IP ranges for the sensor, with the remainder being considered external.

Be careful if you try to rename your sensors here, because the sensor names in this file must match those in the site configuration file silk.conf. If a mismatch occurs, then rwflowpack will fail to start. In addition to this, if you want to define custom sensors names then I recommend starting by renaming S1. While it might make sense to start by renaming S0, I’ve seen instances where this can cause odd problems.


The last configuration step is to modify rwflowpack.conf, which is the configuration file for the rwflowpack process that listens for and collects flow records. This file can be found at /usr/local/share/silk/etc/rwflowpack.conf. First, we need to copy this file to /etc/nsm/<$SENSOR-$INTERFACE>/

sudo cp /usr/local/share/silk/etc/rwflowpack.conf /etc/nsm/<$SENSOR-$INTERFACE>/

Now we need to change seven values in the newly copied file:


This will enable rwflowpack


A convenience variable used for setting the location of other various SiLK files and folders


This will allow for the creation of specified data subdirectories


The path to the sensor configuration file


The base directory for SiLK data storage


The path to the site configuration file


Sets the logging format to legacy


The path for log storage

Finally, we need to copy rwflowpack startup script into init.d so that we can start it like a normal service. This command will do that:

sudo cp /usr/local/share/silk/etc/init.d/rwflowpack /etc/init.d

Once you’ve copied this file, you need to change one path in it. Open the file, and change the SCRIPT_CONFIG_LOCATION variable from “/usr/local/etc/” to “/etc/nsm/<$SENSOR-$INTERFACE>/”

Starting Everything Up

Now that everything is configured, we should be able to start rwflowpack and YAF and begin collecting data.

First, we can start rwflowpack by simply typing the following:

sudo service rwflowpack start

If everything went well, you should see a success message, as shown in Figure 2:

 Starting rwflowpack

Figure 2: Successfully Starting rwflowpack

If you want to ensure that rwflowpack runs at startup, you can do so with the following command:

sudo update-rc.d rwflowpack start 20 3 4 5 .

Now that our collector is waiting for data, we can start YAF to begin generating flow data. If you’re using “eth1” as the sensors monitoring interface as we are in this guide, that command will look like this:

sudo nohup /usr/local/bin/yaf --silk --ipfix=tcp --live=pcap  --out= --ipfix-port=18001 --in=eth1 --applabel --max-payload=384 &

You’ll notice that several of the arguments we are calling in this YAF execution string match values we’ve configured in our SiLK configuration files.

You can verify that everything started up correctly by running ps to make sure that the process is running, as is shown in Figure 3. If YAF doesn’t appear to be running, you can check the nohup.out file for any error messages that might have been generated.

 Checking YAF

Figure 3: Using ps to Verify that YAF is Running

That’s it! If your sensor interface is seeing traffic, then YAF should begin generating IPFIX flow data and sending it to rwflowpack for collection. You can verify this by running a basic rwfilter query, but first we have to tell the SiLK rwtools where the site configuration file is. This can be done by exporting the SILK_CONFIG_FILE variable.

export SILK_CONFIG_FILE=/etc/nsm/<$SENSOR-$INTERFACE>/silk.conf
export SILK_DATA_ROOTDIR=/nsm/sensor_data/<$SENSOR-$INTERFACE>/silk/

If you don’t want to have to do this every time you log into this system, you can place these lines in your ~/.bashrc file.

You should be able to use rwfilter now. If everything is setup correctly and you are capturing data, you should see some output from this command:

rwfilter --sensor=S0 --proto=0-255 --type=all  --pass=stdout | rwcut

If you aren’t monitoring a busy link, you might need to ping something from a monitored system (or from the sensor itself) to generate some traffic.

Figure 4 shows an example of SiLK flow records being output to the terminal.

Flow Data

Figure 4: Flow Records Means Everything is Working

Keep in mind that it may take several minutes for flow records to actual become populated in the SiLK database. If you run into any issues, you can start to diagnose them by accessing the rwflowpack logs in /var/log/.

Monitoring SiLK Services

If you are deploying SiLK in production, then you will want to make sure that the services are constantly running. One way to do this might be to leverage the Security Onion “watchdog” scripts that are used to manage other NSM services, but if you modify those scripts then you run the risk of wiping out your changes any time you update your SO installation. Because of this, the best idea might be to run separate watchdog scripts to monitor these services.

This script can be used to monitor Yaf to ensure that it is always running:

function SiLKSTART {
  sudo nohup /usr/local/bin/yaf --silk --ipfix=tcp --live=pcap --out= --ipfix-port=18001 –in=eth1 --applabel --max-payload=384 --verbose --log=/var/log/yaf.log &

function watchdog {
  pidyaf=$(pidof yaf)
  if [ -z “$pidyaf” ]; then
    echo “YAF is not running.”

This script can be used to monitor rwflowpack to ensure that it is always running:

pidrwflowpack=$(pidof rwflowpack)
if [ -z “$pidrwflowpack” ]; then

  echo “rwflowpack is not running.”
  sudo pidof rwflowpack | tr ’ ’ ’\n’ | xargs -i
  sudo kill -9 {} sudo service rwflowpack restart


These scripts can be set to run automatically at startup for ensured success


I always tell people that session data is the best “bang for your buck” data type you will find. If you just want to play around with SiLK, then installing it on Security Onion is a good way to get your feet wet. Even better, if you are using Security Onion in production on your network, it is a great platform to use for getting up and running with session data in addition to the many other data types. If you want to learn more about using SiLK for NSM detection and analysis, I recommend checking out Applied NSM when it comes out December 15th, or to sink your teeth into session data sooner, check out their excellent documentation (which includes use cases) at

Recently, Liam published a great tutorial on syntax highlighting for bro. We all recalled the excellent emacs addition that Scott Runnels posted on his github and thought about how much more accessible this makes Bro for the average user who will find himself scripting with BNPL.

For many people, myself included, nano is the preferred text editor due to its extreme simplicity and usability. Nano is quite bare in that it isn't nearly as pretty as something like Sublime Text 2, and it doesn't have quite the editing power of VIM. However, it is easy to become fond of nano as a beginner due to its layout, and it seems to stick on people quite well. In an effort to bring bro to aspiring data parsers and analysts who might not be comfortable with the cold nature of VIM, I present BNPL syntax highlighting in nano.

First off, if your favored distro of linux does not include nano out of the box, it is supplied in all base repositories. Depending on your linux flavor, install nano with;

sudo yum install nano
sudo apt-get install nano

The first thing that will be needed is the bro.nanorc file that nano will reference for syntax highlighting. You can download that here. The attached bro.nanorc uses the regular expressions from the emacs example that Scott Runnels posted.  Small changes in escape characters were required to make the regular expressions compatible, but otherwise all syntax highlighting should remain consistent with all other BNPL highlighting mechanisms in previously posted editors.

Once you've downloaded this file, it should be placed in /usr/share/nano/

In order to configure nano to utilize this syntax highlighting for *.bro files, it must be enabled in the /etc/nanorc file. For our purposes, we will need to add the following lines to this file:

## bro files
include "/usr/share/nano/bro.nanorc"

Once this change is saved, you should be ready to rock.



I am excited to see so many people starting to take an interest in Bro-IDS and the Bro Network Programming Language (BNPL). The BNPL is a remarkably easy language to program in and while I have spent my fair share of time in emacs and vi sometimes it makes life a little bit easier to have a light weight code editor. So today we're going to walk through the quick and easy way to get syntax highlighting setup using Sublime Text 2. If you have never used Sublime Text 2 before it is quite elegant- with builds for Linux, Mac OS/X and Windows the cross platform scripting, libraries of existing code, and intuitive features should help you to get started in the BNPL a bit faster.

Download Sublime:
tar -xvjpf Sublime Text 2.0.1 x64.tar.bz2

Go ahead and start Sublime for the first time:

cd Sublime Text 2/
./sublime_text to launch

Enter the Sublime control windowwith the shortcut CONTROL+TICK; the tick: ` is in the top left corner under the tilda: ~:


Install package control window just cut and paste the following:

import urllib2,os; pf='Package Control.sublime-package'; ipp=sublime.installed_packages_path(); os.makedirs(ipp) if not os.path.exists(ipp) else None; urllib2.install_opener(urllib2.build_opener(urllib2.ProxyHandler())); open(os.path.join(ipp,pf),'wb').write(urllib2.urlopen(''+pf.replace(' ','%20')).read()); print 'Please restart Sublime Text to finish installation'

Restart Sublime Text 2 for the changes to take effect. If you do not laready have Git installed you can do so quickly with:

sudo apt-get install git

Let's go ahead and add the bro.tmbundle out of Seth Halls Github repository.

cd $HOME
cd .config/sublime-text-2/Packages/
mkdir Bro
cd Bro
git clone

Dop Comments that:

For the OSX people out there, that config directory will be “~/Library/Application Support/Sublime Text 2/Packages”

Restart Sublime Text 2 and all of your ".bro" files should now have some simple syntax highlighting.If you are new to Sublime Text 2 they have a some great tutorials and documentation listed right off of their webpage.

Sublime Text 2 with Bro Syntax Highlighting

Sublime Text 2 with Bro Network Programming Language Syntax Highlighting

If you would rather stay in the shell and you haven't seen it yet Scott Runnels of Mandiant has posted his Bro Scripting Language Major Mode for emacs right up on his github.

An added bonus, if you have not seen it yet- Bro super star Matthias Vallentin recently posted his BNPL Cheat Sheet to help you get started- you can find that an a whole lot more at his Github Repo.