notebook-angle1In Applied NSM, I write about the importance of creating a culture of learning in a SOC. This type of culture goes well beyond simply sending analysts to training or buying a few books here and there. It requires dedication to the concepts of mutual education, shared success, and servant leadership. It’s all about every single moment in a SOC being spent teaching or learning, no exceptions. While most analysts live for the thrill of hunting adversaries, the truth is that the majority of an analysts time will be spent doing less exciting tasks such as reviewing benign alerts, analyzing log data, and building detection signatures. Because of this, it can be difficult to find ways to foster teaching and learning during these times. I’ve struggled with this personally as an analyst and as a technical manager leading analyst teams. In the article, I’m going to talk about an item that I’ve use to successfully enhance the culture of learning in SOC environments I’ve worked in: a spiral notebook.

 

Background

At some point while I was working at the Bowling Green, KY enclave of the Army Research Laboratory I realized that I had a lot of sticky notes laying around. These sticky notes contained items that you might expect analysts to write down during the course of an investigation: IP addresses, domain names, strings, etc. I decided that I should really keep my desk a bit cleaner and organize my notes better in case I needed to go back to them for any reason. I figured the best way to do this was to just put them in a notebook that I kept with me, so I walked to the Dollar General next door and bought a college-ruled spiral notebook for 89 cents. Henceforth, any notes I took while performing analysis stayed in this notebook.

Over time, I began to expand the use of my notebook. Instead of just scribbling down notes, I started writing down more information. This included things like hypotheses related to alerts I was currently investigating and notes about limitations of tools that I experienced during an investigation. I became aware of the value of this notebook pretty quickly. As a senior analyst on staff, one of my responsibilities was to help train our entry-level analysts along with my normal analyst duties. Invariably, these analysts would run into some of the same alerts that I had already looked at. I found that when this happened and these analysts had questions, I could quickly look back at my notebook and explain my investigation of the event as it occurred. The notebook had become an effective teaching tool.

Fast forward a little bit, and I had been promoted to the lead of the intrusion detection team. The first thing I did was to walk down to the Dollar General and buy a couple dozen notebooks for all of my analysts. Let’s talk about a few reasons why.

 

The Analyst Notebook for Learning and Teaching

As an analyst, I am constantly striving for knowledge. I want to learn new things so that I can enhance my skill and refine my processes so that I am better equipped to detect the adversary when they are attacking my network. This isn’t unique to me; it is a quality present in all NSM analysts to some degree. This is so important to some analysts that they will seek new employment if they feel that they aren’t in a learning environment or being given an adequate opportunity to grow their skills. I surveyed 30 of my friends and colleagues who had left an analysis job to pursue a similar job at another employer within the past five years. I asked them what was it that ultimately caused them to leave. The most logical guess would be that the analysts were following a bigger paycheck or a promotion. Believe it or not, that was true for only 23% of respondents. However, an overwhelming 63% of those surveyed cited a lack of educational opportunity as the main reason they left their current analysis job.

 

notebook-figure1

Figure 1: Survey Results for Why Analysts Leave Their Jobs

 

This statistic justifies a need for a culture of learning. I think that the analyst notebook can be a great way to foster that learning environment because I know that it has been a great learning tool for me. This really clicked for me when I started asking a very important question as I was performing analysis.

 

Why?

 

This lead to questions like this:

  • Why does it take so long to determine if a domain is truly malicious?
  • Why do IP addresses in this friendly range always seem to generate these types of alerts?
  • Why do I rarely ever use this data type?
  • Why don’t I have a data type that lets me do this?
  • Why does this detection method never seem to do what it is supposed to di?
  • Why don’t I have any additional intelligence sources that can help with an investigation like this one?
  • Why don’t I have more context with this indicator?
  • Why do I need to keep referencing these old case numbers? Is there a relationship there?
  • Why do I keep seeing this same indicator across multiple attacks? Is this tied to a single adversary?

 

These questions are very broad, but they are all about learning your processes and generating ideas. These ideas can lead to conversations, and those conversations can lead to change that helps you more effectively perform the task at hand. Small scribbles in a notebook can lead to drastic changes in how an organization approaches their collection, detection, and analysis processes.  In the Applied NSM book, I write about two different analysis methods called the Differential Diagnosis and Relational Investigation. These are methods that I use and teach, and they both started from notes in my notebook. As a matter of fact, a lot of the concepts I describe in Applied NSM can be found in a series of analyst notebook that I’ve written in over the years. As an example, Figure 2 shows an old analyst notebook of mine that contains a note that led to the concept of Sensor Visibility Diagrams, which I described in Chapter 3 of Applied NSM and implemented in most every place I’ve worked since then.

 

notebook-samplenote1

Figure 2: A Note that Led to the Development of Sensor Visibility Maps

 

I think the formula is pretty simple. Write down notes as you are doing investigations, regularly question your investigative process by revisiting those notes, and write down the ideas you generate from that questioning. Eventually, you can flesh those ideas out more individually or in a group setting. You will learn more about yourself, your environment, and the process of NSM analysis.

Analyst Notebook Best Practices

If I’ve done a good job so far, then maybe I’ve already convinced you that you need to walk down to the store and buy a bunch of notebooks for you and all of your friends. Before you get started using your notebook, I want to share a few “best practices” for keeping an analyst notebook. Of course, these are based upon my experience and have worked for the kind of culture I’ve wanted to create (and be a part of). Those things might be different for you, so your mileage may vary.

Let’s start with a few ground rules for how the notebook should be used. These are very broad, but I think they hold true to most scenarios for effective use.

  1. The Analyst Notebook should always be at your desk when you are. If it isn’t, then you won’t write in it while you performing analysis, which is the whole point.
  2. The Analyst Notebook should go to every meeting with you. If an analyst is in a meeting then there is a good chance they will have to discuss a specific investigation, their analysis process, or the tools they use. Having the notebook handy is important so that relevant notes can be analyzed.
  3. The Analyst Notebook should never leave the office.  This is for two reasons. First, this tends to result in the notebook being left at home on accident. Second and most important, I believe strongly in a separation of work and home life. There is nothing wrong with putting in a few extra hours here and there, but all work and no play ultimately lead to burnout. This is a serious problem in our industry where it seems as though people are expected to devote 80+ hours a week to their craft. Being an analyst is what I do, but isn’t who I am. The analyst notebook stays at work. When you go home, focus on your family and other hobbies.
  4. Every entry in the Analyst Notebook should be dated. Doing this consistently will ensure that you can piece together items from different dates when you are trying to reconstruct a long-term stream of events. It will also allow you to tie specific notes (whether they are detailed or just scribbles of IP addresses) to case numbers.
  5. An analyst must write something in the notebook every day. In general, the investigative process should yield itself to plenty of notes. If you find that isn’t the case, then start daydreaming a bit. What do you wish one of your tools could do that it can’t? What type of data do you wish you had? How much extra time did you spend on a task because of a process inefficiency? These things can come in handy later when you are trying to justify a request to management or senior analysts. This is hard to get in the groove of at first, but it is a habit that can be developed.
  6. The analyst notebook should be treated as a sensitive document. The notebook will obviously contain information that could cause an issue for you or your constituents if a party with malicious intent obtained it. Accordingly, the notebook should be protected at all times.  This means you shouldn’t forget it on the subway or leave it sitting on the table at Chick-Fil-A while you go to the bathroom.

 

Effectively Using an Analyst Notebook

Finally, let’s look at some strategies for effective analyst notebook use that I think are applicable to people of different experience levels. My goal is for this article to be valuable to new analysts, senior analysts, and analyst managers alike. With that in mind, this section is broken into a section for each group.

 

I’m a New Analyst!

Because new analysts are often overwhelmed by the amount of data and the number of tools they have to work with, I encourage you to write down every step they take during an investigation so you can look back and review the process holistically. While this does take a bit of time, it will eventually result in time savings by making your analysis process more efficient overall. This isn’t meant to describe why you took the actions you took and be overly specific, but should help you replay the what steps you took so you can piece together your process. This might look like Figure 3.

 notebook-Figure3Figure 3: A Note Detailing the Analysis Steps Taken

This exercise becomes more useful when you are paired with more senior analysts so that they can review the investigation that was completed. This provides the opportunity to walk the senior analyst through your thought process and how you arrived at your conclusion. This also provides the senior analyst with the ability to describe what they would have done differently.

This type of pairing is a valuable tool for overcoming some of the initial process hurdles that can trip up new analysts. For instance, I’ve written at length about how most new analysts tend to operate with a philosophy that all network traffic is malicious unless you can prove it is not. As most experienced analysts know, this isn’t a sustainable philosophy, and in truth all network traffic should be treated as inherently good unless you can prove it is malicious. I’ve noticed that by having new analysts take detailed notes and then review those notes and their process with a more experienced analyst, they get over this hump quicker.

 

I’m a Senior Analyst!

As a more experienced analyst, it is likely that you’ve already refined your analysis technique quite a bit. Because of this, in addition to general analysis duties you are likely going to be tasked with bigger picture thinking, such as helping to define how collection, detection, and analysis can be improved. In order to help with this, I recommend writing down items relevant to these processes for later review. This can include things like tool deficiencies, new tool ideas, data collection gaps, and rule/signature tweak suggestions.

As an example, consider a scenario where you are performing analysis of an event and notice that a user workstation that normally acts as a consumer of data has recently become a producer of data. This means that a device that normally downloads much more than it uploads from external hosts has now begun doing the opposite, and is uploading much more than it downloads. This might eventually lead you to find that this host is participating in commodity malware C2 or is being used to exfiltrate data. In this case, you may have stumbled upon this host because of an IDS alert or through manual hunting activities. When the investigation heats up you probably aren’t going to have time to flesh out your notes on how you can identify gaps in your detection capability, but you can quickly use an analyst notebook to jot down a note about how you think there might be room to develop a detection capability associated with detecting changing in producer/consumer (upload/download) ratio.

 notebook-Figure4

Figure 4: A Note Detailing a Potential Detection Scenario

You may not yet realize it but you’ve identified a use case for a new statistical detection capability. Now you can go back later and flesh this idea out and then present it to your peers and superiors for detection planning purposes and possible capability development. This could result in the development of a new script that works off of flow data, a new Bro script that detects this scenario out right, or some other type of statistical detection capability.

 

I’m an Analyst Manager!

As a manager of analysts, you are probably responsible for general analysis duties, helping to refine the SOC processes, and for facilitating training amongst your analysts. While I still recommend keeping an analyst notebook at this level for the reasons already discussed, the real value of the analyst notebook here is your ability to leverage the fact that all of the analysts you manage are keeping their notebooks. In short, it is your responsibility to ensure that the notes your analysts keep in their notebooks become useful by providing them opportunities to share their thoughts. I think there are a couple of ways to do this.

The first way to utilize the notebooks kept by your analysts is through periodic case review meetings. I think there are several ways to do this, but one method I’ve grown to like is to borrow from medical practitioners and have Morbidity and Mortality (M&M) style case reviews. I’ve written about this topic quite extensively, and you can read more about this here (http://chrissanders.org/2012/08/information-security-incident-morbidity-and-mortality/) or in Chapter 15 of the Applied NSM book. These meetings are especially important for junior level analysts who are just getting their feet wet.

Another avenue for leveraging your analysts and their notebooks is through periodic collection and detection planning meetings. In general, organizations tasked with NSM missions should be doing this regularly, and I believe that analysts should be highly involved with the process. This gives your senior level analyst an avenue to share their ideas based upon their work in the trenches. I speak to collection planning and the “Applied Collection Framework” in Chapter 2 of the Applied NSM book, and I speak to detection planning a bit here while discussing ways to effectively use APT1 indicators: http://www.appliednsm.com/making-mandiant-apt1-report-actionable/.

 

Conclusion

I sincerely believe that a simple spiral notebook can be an analyst’s best tool for professional growth. If you are a junior analyst, use it as a tool to develop your analytic technique. If you are a senior analyst, use it as a tool to refine NSM-centric processes in your organization. If you are responsible for leading a team of analysts, ensure that your team is provided the opportunity to use their notebook effectively to better themselves, and your mission. An $0.89 cent notebook can be more powerful than you’d think.

 

 

 

I’ve had the opportunity to directly and indirectly lead teams of talented individuals while working for the Department of Defense in various SOC leadership roles. Anybody who has worked for or with me in those roles knows about my “dirty words” list. Now, these aren’t the typical seven dirty words that the FCC will fine you for if you happen to let one slip on network television, but rather, a series of buzzwords and phrases relevant to information security that tend to be inappropriately applied to certain scenarios or used in the wrong context.

You probably already know about some of these words. For instance, the most revered amongst security practitioners is probably “Advanced Persistent Threat”, which every security appliance vendor on the planet now claims to be able to detect or prevent, even if they can’t clearly define it. Two more favorites are “sophisticated” and “motivated.” These terms are used often to describe attacks, without honoring the fact that the degree of difficulty involved in an attack is very relative to the audience who is analyzing it. While a skilled defender might not consider an attack sophisticated, the attack may still be very advanced for a non-technical person. Furthermore, an attacker is only as sophisticated or motivated as their objective requires. If their tactics allows them to achieve their goals, then the attacker was motivated and sophisticated enough.

Unfortunately, “intelligence” is becoming one of these dirty words. You don’t have to look far to find a company or product that claims to provide “incredible insight through advanced network intelligence” or “the answer to network defense through thorough threat intelligence.” However, even though intelligence has become the latest major buzzword in network defense, I think that it is important when used appropriately. After all, intelligence IS a crucial part of network defense strategy.

So, how do we get away from using “intelligence” as a dirty word? I think the answer lies in carefully identifying what types of intelligence we are producing.

Intelligence has many definitions depending on the application. The definition that most closely aligns to information security is drawn from Department of Defense Joint Publication 1-02, and says that “intelligence is a product resulting from the collection, processing, integration, evaluation, analysis, and interpretation of available information concerning foreign nations, hostile or potentially hostile forces or elements, or areas of actual or potential operations .”

While this definition might not fit perfectly in all instances (particularly the part about information concerning foreign nations since an attacker might be domestic), it does provide the all-important framing required to begin thinking about generating intelligence. The key component of this definition is that intelligence is a product. This doesn’t mean that it is bought or sold for profit, but more specifically, that it is produced from collected data, based upon a specific requirement. This means that an IP address, or the registered owner of that address, or the common characteristics of the network traffic generated by that IP address are not intelligence products. When those things are combined with context through the analysis process and delivered to meet a specific requirement, they become an intelligence product.

In information security, we are generally most concerned with the development of threat intelligence products. These products seek to gather data to support the creation of an intelligence product that can be used to make determinations about the nature of a threat. What is lost on most is that there are actually three major subsets of threat intelligence: strategic, operational, and tactical intelligence.

Strategic Intelligence is information related to the strategy, policy, and plans of an attacker at a high level. Typically, intelligence collection and analysis at this level only occurs by government or military organizations in response to threats from other governments or militaries. With that said, larger organizations are now developing these capabilities, and some of these organizations now sell strategic intelligence as a service. This is focused on the long-term goals of the force supporting the individual attacker or unit. Artifacts of this type of intelligence can include policy documents, war doctrine, position statements, and government, military, or group objectives.

Operational Intelligence is information related to how an attacker or group of attackers plans and supports the operations that support strategic objectives. This is different from strategic intelligence because it focuses on narrower goals, often more timed for short-term objectives that are only a part of the big picture. While this is, once again, usually more within the purview of government or military organizations, it is common that individual organizations will fall victim to attackers who are performing actions aimed at satisfying operational goals. Because of this, some public organizations will have visibility into these attacks, with an ability to generate operational intelligence. Artifacts of this type of intelligence are similar, but often more focused versions of artifacts used for the creation of strategic intelligence.

Tactical Intelligence refers to the information regarding specific actions taken in conducting operations at the mission or task level. This is where we dive into the tools, tactics, and procedures used by an attacker, and where 99% of information security practitioners will focus their efforts. It is here that the individual actions of an attacker or group of attackers are analyzed and collected. This often includes artifacts such as indicators of compromise (IP addresses, file names, text strings) or listings of attacker specific tools. This intelligence is the most transient, and becomes outdated quickly.

The discussion of these types of threat intelligence naturally leads us to another recently popularized dirty word, “attribution.”

Attribution occurs when the actions of an adversary are actually tied back to a physical person or group. The issue with this word arises when information security practitioners attempt to perform attribution as a sole function of intrusion detection without the right resources. It is important to realize that detection and attribution aren’t the same thing, and because of this, detection indicators and attribution indicators aren’t the same thing. Detection involves discovering incidents, where as attribution involves tying those incidents back to an actual person or group. While attribution is most certainly a positive thing, it cannot be done successfully without the correlation of strategic, operational, and tactical threat intelligence data.

Generally speaking, this type of intelligence collection and analysis capability is not present within most private sector organizations without an incredibly large amount of visibility or data sharing from other organizations. The collection of indicators of compromise from multiple network attacks to generate tactical intelligence is an achievable goal. However, collecting and analyzing data from other traditional sources such as human intelligence (HUMINT), signals intelligence (SIGINT), and geospatial intelligence (GEOINT) isn’t within the practical capability of most businesses. Furthermore, even organizations that might have this capability are often limited in their actions by law. Of course, there are some companies that exist who are producing high quality attribution intelligence, so there are exceptions to the rule.

Intelligence is a tremendously valuable thing, and when it is used in the proper context, it shouldn’t have to be a dirty word. The key to not misusing this word in your organization is to ensure that you are focused on intelligence that you actually have the capability to collect, analyze, and utilize.

 

** Note: This content originally appeared on the InGuardians Labs blog. I'm reposting it here since I've changed employment.

TLDR; Check out FlowPlotter on GitHub

If you're like me, it generally makes you happy to see graphs, charts, and tables to aide in network data analysis. This can be useful for analysis of already detected events, or for detection while hunting for evil. You probably also love session (also often called flow) data. Unfortunately, it isn't always easy to generate useful visualizations from flow data. This typically involves multiple steps such as moving data around between different tools, formatting the output of the data to match the format of whatever graphing tool you might be using, and then generate the graph output and making sure it is useful for your goals. Some might turn to the graphing capabilties of spreadsheet applications because of their simplicity, but those can't really handle a large data set like we might see with flow data. With that said, it is still pretty hard to find overly useful network visualizations for NSM detection and analysis.

 

Because of this, I set out to make visualizations from flow data easy and accessible, without the need for several steps between viewing the raw data and having the ready-made chart. The result of this was a tool called FlowPlotter, which we are going to discuss in this article. We will talk about how FlowPlotter came into existence, and its current workflow. FlowPlotter works from NetFlow records viewed with SiLK, so before moving forward, be sure to check out Chris's instructions on setting SiLK up on Security Onion so that you can easily test the tool if you don't already have a SiLK environment available. You can also go ahead and grab FlowPlotter from GitHub so that we can jump into making graphs. To generate graphs, you will only need SiLK and FlowPlotter. If the nitty gritty details bore you and you want to get into FlowPlotter right away with examples, skip this API stuff and scroll down below.

 

Background Mechanics

We will begin by talking about how FlowPlotter works. For the purpose of portability across multiple platforms, it makes sense that we should be viewing this kind of visualization in a web friendly format, rather than just a jpeg or Excel graph. To do this, FlowPlotter uses an implementation of Google Charts. Google charts offers an extensive API that allows for the creation of just about any chart you can imagine. While many of these make more sense to have only a few rows of data (Pie charts for example), some benefit from having as much data as can be shoved in to them (line charts representing bins of data). Google provides thorough explanations on how to create these charts using your own data, but of course, the formatting of the data is still up to you. See their chart gallery for more examples on the structure of these charts.

 

In Applied Network Security Monitoring, we discuss several ways of seeing the "big picture" when it comes to visualizing data of many types. We also go into deep detail on using SiLK to get the most from your flow data. Here I'd like to present a simple method of going from using SiLK to parse data with rwtools, straight to using some bash kung-fu to streamline our way to having an interactive Google Chart full of flow data in your browser. The eventual goal of this exercise is to run rwfilter commands and pipe the binary data straight to a script which will take arguments to generate the data you want. This results in having a tool that we can pass data to that will generate charts using the Google API.  Before we can run though, we need to walk. We'll start by creating a chart that requires fewer data points, such as a bar chart representing the top 10 country codes talking to your network by bytes over the course of a day. The first thing we want to do is generate the SiLK data that we eventually want to plot. Since the goal of this is to make a top-10 lists, we'll use rwstats and rwfilter.

 

rwfilter --start-date=2014/02/06 --proto=0- --type=all --pass=stdout | rwstats --top --count=10 --fields=dcc --value=bytes

Screen Shot 2014-02-13 at 3.29.10 PM

The rwfilter command above states that you wish to filter down to all data that passes the filter --proto=0-255 (shorthand is 0-) that occurred on 2014/02/06. These can be stated in any order, and many people like to type them out as they would verbally say them. For instance, on the rwstats command, we're literally looking for the "top" "10" "destination country codes" by "bytes". It seems that many of the people I teach SiLK to end up having the issue of thinking too rigidly about "what SiLK wants", instead of just writing down what you want, then converting that directly into a query.

 

Now we're going to try and make that rwstats output appear in a bar graph. FlowPlotter, which I talk about in more detail below, works by using templates that allow for the formatted data to be inserted into which then yields a complete chart. Lets look at the most basic template for a column chart. This is taken directly from Google's visualization playground, with their data substituted out for "dataplaceholder" around the middle of the code. You'll even notice that for now, the title is still "Company Performance".

<html>
<head>
<script type="text/javascript" src="https://www.google.com/jsapi"></script>
<script type="text/javascript">
google.load("visualization", "1", {packages:["corechart"]});
google.setOnLoadCallback(drawChart);
function drawChart() {
var data = google.visualization.arrayToDataTable([
dataplaceholder
]);

var options = {
title: 'Company Performance',
vAxis: {title: 'Year',  titleTextStyle: {color: 'red'}}
};

var chart = new google.visualization.BarChart(document.getElementById('chart_div'));
chart.draw(data, options);
}
</script>
</head>
<body>
<div id="chart_div" style="width: 900px; height: 500px;"></div>
</body>
</html>

You'll notice the "dataplaceholder" that exists where there should be a table of data. In the place of that, we should be inserting something that looks like the following, which was created from our rwstats command:

['dcc', 'Bytes'],
['--', 26478355345],
['us', 706854881],
['ca', 8665204],
['no', 1893193],
['nl', 1416293],
['bg', 1101223],
['ch', 1092811],
['de', 202948],
['se', 169036],
['gb', 117399]

You'll notice that "- -" is also a country code here representing PRIVATE/EXPERIMENTAL address, which we'll leave for the sake of discussing additional chart features and manipulations later. In the meantime, how did I streamline the creation of that data to replace dataplaceholder? First it is important to add the "--delimited=," option on to rwstats to ease the post processing a bit. After that, I used a mix of cut, sed, and grep to wrap it all into a one liner:

rwfilter --start-date=2014/02/06 --proto=0- --type=all --pass=stdout | rwstats --top --count=10 --fields=dcc --value=bytes --delimited=, | cut -d "," -f1,2 |grep ,| sed "s/\(.*\),\(.*\)/['\1', \2],/g"|sed '$s/,$//'| sed "s/, \([A-Za-z].*\)],/, '\1'],/g" | grep ","

There are probably better ways of generating that data, and I welcome them in comments or variations of FlowPlotter, but for the sake of this particular exercise, this will get you by. The general idea is that the rwstats command output is piped to cut, which strips out the first two columns, then grep is used to filter only data fields and titles. The sed commands that follow sorts everything so that they all have the proper formatting, first by making the table, then by formatting the end line and then the first column identifier line. Now that we have the basic template and some example data, you can manually throw the data into the template  in place of dataplaceholder and change some of the obvious things such as the title so that the HTML looks like the following:

<html>
<head>
<script type="text/javascript" src="https://www.google.com/jsapi"></script>
<script type="text/javascript">
google.load("visualization", "1", {packages:["corechart"]});
google.setOnLoadCallback(drawChart);
function drawChart() {
var data = google.visualization.arrayToDataTable([
['dcc', 'Bytes'],
['--', 26478355345],
['us', 706854881],
['ca', 8665204],
['no', 1893193],
['nl', 1416293],
['bg', 1101223],
['ch', 1092811],
['de', 202948],
['se', 169036],
['gb', 117399]
]);

var options = {
title: 'Destination Country Code by Bytes',
vAxis: {title: 'Country Codes', titleTextStyle: {color: 'black'}}
};

var chart = new google.visualization.BarChart(document.getElementById('chart_div'));
chart.draw(data, options);
}
</script>
</head>
<body>
<div id="chart_div" style="width: 900px; height: 500px;"></div>
</body>
</html>

Here you can see the code we've generated.

Screen Shot 2014-02-13 at 8.33.18 PM

Notice that while mouse-over works (at the link) and overall it is a decent looking graph, the scale is being ruined by our large amount of internal destination addresses. There are two options to fix this, one that is obvious but not great, and one that is not obvious, but is by far the better method. Either we can get rid of the internal destinations from the data manually or we can accept it and change the chart to be more forgiving for large outliers. To do that, we need to edit the var options in our code. We're going add in an hAxis option as seen below. This will make the horizontal axis work on a logarithmic scale instead of scaling according to maximum and minimum data values.

<html>
<head>
<script type="text/javascript" src="https://www.google.com/jsapi"></script>
<script type="text/javascript">
google.load("visualization", "1", {packages:["corechart"]});
google.setOnLoadCallback(drawChart);
function drawChart() {
var data = google.visualization.arrayToDataTable([
['dcc', 'Bytes'],
['--', 26478355345],
['us', 706854881],
['ca', 8665204],
['no', 1893193],
['nl', 1416293],
['bg', 1101223],
['ch', 1092811],
['de', 202948],
['se', 169036],
['gb', 117399]
]);

var options = {
title: 'Destination Country Code by Bytes',
vAxis: {title: 'Country Codes', titleTextStyle: {color: 'black'}},
hAxis: {logScale: true}
};

var chart = new google.visualization.BarChart(document.getElementById('chart_div'));
chart.draw(data, options);
}
</script>
</head>
<body>
<div id="chart_div" style="width: 900px; height: 500px;"></div>
</body>
</html>

Our new graph looks like this.

Screen Shot 2014-02-13 at 8.33.38 PM

In testing changes like this, I highly recommend playing around in Google Chart's Playground as it can streamline the debugging of small changes.

 

FlowPlotter

Now that you've got an idea of how to manually generate these graphs, we can talk about FlowPlotter in a more official capacity. FlowPlotter is a scripted approach to generating graphs based on SiLK data by using templates so that it is modular enough to accept new graphs with relative ease. In short, it automates everything we just did in the previous example. The only requirement is that you provide an rwfilter command and send that to flowplotter.sh with a chart name and it's independent and dependent variables as options. From there FlowPlotter will make the html page for you, complete with titles and ideal scaling options. For instance, to generate the previous graph, you would simply run the following from the FlowPlotter root directory:

/Flowplotter$ rwfilter --start-date=2014/02/06 --proto=0- --type=all --pass=stdout | ./flowplotter.sh barchart dcc bytes

Here is the current usage page for FlowPlotter:

rwfilter [filter] | flowplotter.sh [charttype] [independent variable] [dependent variable]

Currently you must run a SiLK rwfilter command and pipe it to flowplotter.sh and specify various options as arguments. The following chart types are currently functional

geomap

  • independent variable = Must specify an rwstats compatible field for country type (scc or dcc).
  • dependent variable = Must specify an rwstats compatible value (Records, Packets, Bytes, sIP-Distinct, dIP-Distinct, or Distinct:)

linechart

  • independent variable = Must specify a bin-size that the dependent variable will be calculated by. For example, if you want "Records per Minute", this variable will be 60.
  • dependent variable = Must specify an rwcount compatible value (Records,Packets,Bytes).

treemap

  • independent variable = Must specify an rwstats compatible field.
  • dependent variable = Must specify an rwstats compatible value (Records, Packets, Bytes, sIP-Distinct, dIP-Distinct, or Distinct:)

timeline

  • independent variable = Must specify an rwcut compatible field.
  • dependent variable = Must specify an rwcut compatible field.

piechart

  • independent variable = Must specify an rwstats compatible field.
  • dependent variable = Must specify an rwstats compatible value (Records, Packets, Bytes, sIP-Distinct, dIP-Distinct, or Distinct:)

barchart

  • independent variable = Must specify an rwstats compatible field.
  • dependent variable = Must specify an rwstats compatible value (Records, Packets, Bytes, sIP-Distinct, dIP-Distinct, or Distinct:)

columnchart

  • independent variable = Must specify an rwstats compatible field.
  • dependent variable = Must specify an rwstats compatible value (Records, Packets, Bytes, sIP-Distinct, dIP-Distinct, or Distinct:)

 

As you can see, FlowPlotter doesn't just support bar charts. It currently supports numerous Google Charts. The charts below were all generated using the queries you see accompanying them.

Geomaps

rwfilter --start-date=2013/12/27 --proto=0- --type=all --pass=stdout | ./flowplotter.sh geomap dcc bytes > geomap.html

Screen Shot 2014-02-11 at 5.36.15 PM

 

Linecharts

rwfilter --start-date=2013/12/27 --proto=0- --type=all --pass=stdout | ./flowplotter.sh linechart 60 bytes > linechart.html

Screen Shot 2014-02-11 at 5.47.29 PM

 

Treemaps

rwfilter --start-date=2013/12/27 --sport=1025- --dport=1025- --not-daddress=192.168.1.0/24 --proto=0- --type=all --pass=stdout | ./flowplotter.sh treemap dip records > treemap.html

Screen Shot 2014-02-11 at 5.52.04 PM

 

Timelines

rwfilter --start-date=2013/12/27 --proto=0- --type=out,outweb --dcc=us,-- --fail=stdout | ./flowplotter.sh timeline sip dip > timeline.html

Screen Shot 2014-02-11 at 5.54.16 PM

 

Pie Charts

rwfilter --start-date=2013/12/27 --sport=1025- --dport=1025- --not-daddress=192.168.1.0/24 --proto=0- --type=all --pass=stdout | ./flowplotter.sh piechart dport bytes > piechart.html

Screen Shot 2014-02-11 at 5.55.37 PM

 

Bar Charts

rwfilter --start-date=2013/12/27 --sport=1025- --dport=1025- --not-daddress=192.168.1.0/24 --proto=0- --type=all --pass=stdout | ./flowplotter.sh barchart dport bytes > barchart.html

Screen Shot 2014-02-11 at 6.01.00 PM

 

Column Charts

rwfilter --start-date=2013/12/27 --sport=1025- --dport=1025- --not-daddress=192.168.1.0/24 --proto=0- --type=all --pass=stdout | ./flowplotter.sh columnchart dip bytes > columnchart.html

Screen Shot 2014-02-11 at 6.04.19 PM

Next Steps

FlowPlotter currently only supports charts in the Google Visualizations Chart library, but as time goes by, I'd like to add some sources outside of just google, even if they are duplicates of similar google graphs. I have the project on Github and welcome any comments, ideas, and improvements that you might have. It has examples that you can use, but I encourage the use of any kind of rwfilter input you can think of. If you come across some great visualizations that you think are repeatable by others, post the rwfilter | flowplotter command up as a comment and I'll add it to the examples!

*** Edit 12/30 - The contest is now over, and winners have been notified. Thanks to all of the 100+ folks who entered! ***

We are giving away two FREE signed copies of Applied NSM for the holidays. If you haven't bought a copy yet, you can enter the contest by sending an e-mail to chris@chrissanders.org with "Applied NSM Giveaway" in the subject line. I'll pick winners on December 30th, so you can submit your entry up until midnight the night before. Since I'll be mailing physical copies of the book, so only individuals with US shipping addresses are eligible to win.

I was absolutely thrilled by a special delivery I got yesterday. The print copies of Applied NSM have arrived! I've also seen that a few folks who have preordered have gotten their copies, and it looks like it is now available on the Elsevier bookstore and for Prime shipping from Amazon. A big thanks to all that have preordered and are planning to order!

 book1

It's no secret that I'm a big fan of Wireshark. While it isn't always the best tool for every job, it is the best graphical packet analysis application you will find, and is a must have for NSM analysis. I wanted to share a quick tip that I use nearly every time I'm using Wireshark for analysis.

Most people know that Wireshark will do host name resolution. As a matter of fact, I generally recommend people disable this feature so that your analysis is not causing the generation of additional traffic on the wire when the machine you are running Wireshark from starts generating DNS queries for the hosts in your capture file. However, what a lot of people don't know is that you can actually create a host file just for use by Wireshark so that you can easily identify certain IP addresses.

To do this, let's start with a basic capture file. In Figure 1, there is some traffic being transmitted between a few different hosts.

WS-HostFile-Figure1

Figure 1: Traffic Between A Lot of Hosts

It's pretty common in analysis to be required to examine packet captures that contain traffic from multiple hosts. When this happens, it can be confusing remembering which IP address is what. In this case, let's say that we know that 192.168.3.35 is our friendly host, and 188.124.5.107 is the hostile host we are concerned about. Since there is a lot of other traffic to be found here, it would be nice if we could easily identify these hosts without committing these IP addresses to memory. I don't know about you, but I'm horrible at remembering IP addresses. Especially when I'm having to juggle what may be multiple compromised systems or track down a web of systems involved in a compromise.

Let's remedy this by creating a Wireshark host file. First, we need to tell Wireshark to perform name resolution for IP addresses from a host file. To do this, open Wireshark's preference window (Edit -> Preferences on Windows or Wireshark -> Preferences on OS X). Then make sure that "Resolve network (IP) addresses" and "Only use the profile "hosts" file" are enabled. Also, disable "Use an external name resolver." This is shown on an OS X system (running the latest dev version of Wireshark) in Figure 2.

WS-Hostname-Figure2

Figure 2: Enabling Host File Name Resolution

Now we need to create a host file. This file takes the same form as a Windows or Linux hosts file. In our case, we will create the following hosts file:

192.168.3.35     FRIENDLY
188.124.5.107    HOSTILE

The file should be saved in the following location depending on your architecture:

  • Windows: %USERPROFILE%\Application Data\Wireshark\hosts
  • OS X: /Users/username/.wireshark/hosts
  • Linux: /home/username/.wireshark/hosts

Now, all we have to do is relaunch Wireshark and our capture file is appropriately populated with names for the devices we are examining. This is shown in Figure 3.

WS-Hostname-Figure3

Figure 3: Our Traffic is Easier to Identify

There are a number of strategies you can use for labeling hosts. For instance, you can label hosts by whether they are internal or external to the network as we did here, or you can label them by role (web server 1, dns server 2, known botnet C&C, etc).

This is a pretty simple trick, but it saves me a lot of time and frustration. It also helps the accuracy of my analysis, because I'm less likely to confuse IP addresses this way. You can even create large hosts files that can be used to automatically label known entities on your network.

Bro is one of the best things to happen to network security monitoring in a long time. However, the ability to parse and view Bro logs in most organizations isn't always too ideal. One option is to peruse Bro logs via something like Splunk; but with high throughput, you'll be paying a pretty penny since Splunk is priced based upon the amount of data ingested. Another popular (and free) solution is Elsa. However, while Elsa is extremely fast at data ingestion and searches, it currently has limitations on the number of fields that can be parsed due to its use of Sphinx. On top of that, Elsa requires searches with very specific terminology, and doesn't easily do wildcard searches without additional transforms. This is where Logstash comes in. Logstash is an excellent tool for managing any type of event or logs, and can easily parse just about anything you can throw at it. I say "easily" because once you're over the learning curve of first generating the Logstash configuration, creating addition configurations comes much more easily. In this guide I will talk about how you can use Logstash to parse logs from Bro 2.2. The examples shown here will only demonstrate parsing methods for the http.log and ssl.log files, but the download links at the end of the post will provide files for parsing all of Bro's log types.

If you want to follow along, then know that this guide assumes a few things. First, we'll be parsing "out-of-the-box" Bro 2.2 logs, which means you'll need an "out-of-the-box" Bro 2.2 installation. If you don't already have a Bro system then the easiest route to get up and running would normally be to use Security Onion, but as of this writing, Security Onion currently uses Bro 2.1 (although I'm sure this will change soon). In the meantime, reference Bro.org's documentation on installation and setup. Next, you'll need to download the latest version of Logstash, which I tested at version 1.2.2 for this article. We tested these steps using Logstash and Bro on a single Ubuntu 12.04 system.

TLDR: You can download a complete Logstash configuration file for all Bro 2.2 log files and fields here.

Creating a Logstash Configuration

Let's get started by creating a master configuration file. Logstash relies on this file to decide how logs should be handled. For our purposes, we will create a file called bro-parse.conf, which should be placed in the same directory as the Logstash JAR file. It is made up of three main sections:input, filter, and output. Below is the basic outline for a Logstash configuration file:

input {
  ...
}

filter {
  ...
}

output {
  ...
}

Input

The input section of the Logstash configuration determines what logs should be ingested, and the ingestion method. There are numerous plug-ins that can be used to ingest logs, such as TCP socket, terminal stdout, a twitter API feed, and more. We are going to use the "file" plug-in to ingest Bro logs. This plug-in constantly reads in a log file line-by-line in near real time. By default, it will read the file for new lines every 15 seconds, but this is configurable.

With the ingestion method identified, we need to provide the path to the log files we want to parse in the "path" field as well as a unique name for them in the "type" field. With the following configuration, the input section of the Logstash configuration is complete and will ingest "/opt/bro2/logs/current/http.log" and "/opt/bro2/logs/current/ssl.log", and will give them "type" names appropriately.

input {
  file {
    type => "BRO_httplog"
    path => "/opt/bro2/logs/current/http.log"
  }  
  file {
    type => "BRO_SSLlog"
    path => "/opt/bro2/logs/current/ssl.log"
  }
}

 

Filter

The filter section is where you'll need to get creative. This section of the Logstash configuration takes the log data from the input section and decides how that data is parsed. This allows the user to specify what log lines to keep, which to discard, and how to identify the individual fields in each log file.  We will use conditionals as the framework for creating these filters, which are essentially just if-then-else statements.

if EXPRESSION {
  ...
} else if EXPRESSION {
  ...
} else {
  ...
}

For this filter, we're going to use a nested conditional statement. First, we want to discard the first few lines of the Bro log files since this is just header information that we don't need. These lines begin with the "#" sign, so we can configure our conditional to discard any log line beginning with "#" using the "drop" option.  That part is trivial, but then it gets tricky. This is because we have to instruct Logstash on how to recognize each field in the log file. This can involved a bit of legwork since you will need to actually analyze the log format and determine what the fields will be called, and what common delimiters are used. Luckily, I've done a lot of that legwork for you. Continuing with our example we can begin by looking at the Bro 2.2 http.log and ssl.log files, which contain 27 and 19 fields to parse respectively, delimited by tabs:

 brohttpfieldsFigure 1: Bro 2.2 http.log

 brosslfields

Figure 2: Bro 2.2 ssl.log

The manner by which these fields are parsed can affect the performance as the amount of data you are collecting scales upward, but depending on your hardware, that is usually an extreme case. For the sake of guaranteeing that all fields are parsed correctly, I used non-greedy regular expressions. Logstash allows for "Grok" regular expressions, but I've found that there are bugs when using specific or repetitive Grok patterns. Instead, I've taken the regex translation for the Grok patterns and used Oniguruma syntax instead. In testing, these have shown to be much more reliable, creating no "random" errors. The resulting filter looks like this:

filter {

if [message] =~ /^#/ {
  drop {  }
} else {  

# BRO_httplog ######################
  if [type] == "BRO_httplog" {
      grok { 
        match => [ "message", "(?<ts>(.*?))\t(?<uid>(.*?))\t(?<id.orig_h>(.*?))\t(?<id.orig_p>(.*?))\t(?<id.resp_h>(.*?))\t(?<id.resp_p>(.*?))\t(?<trans_depth>(.*?))\t(?<method>(.*?))\t(?<host>(.*?))\t(?<uri>(.*?))\t(?<referrer>(.*?))\t(?<user_agent>(.*?))\t(?<request_body_len>(.*?))\t(?<response_body_len>(.*?))\t(?<status_code>(.*?))\t(?<status_msg>(.*?))\t(?<info_code>(.*?))\t(?<info_msg>(.*?))\t(?<filename>(.*?))\t(?<tags>(.*?))\t(?<username>(.*?))\t(?<password>(.*?))\t(?<proxied>(.*?))\t(?<orig_fuids>(.*?))\t(?<orig_mime_types>(.*?))\t(?<resp_fuids>(.*?))\t(?<resp_mime_types>(.*))" ]
      }
  }
# BRO_SSLlog ######################
  if [type] == "BRO_SSLlog" {
    grok { 
      match => [ "message", "(?<ts>(.*?))\t(?<uid>(.*?))\t(?<id.orig_h>(.*?))\t(?<id.orig_p>(.*?))\t(?<id.resp_h>(.*?))\t(?<id.resp_p>(.*?))\t(?<version>(.*?))\t(?<cipher>(.*?))\t(?<server_name>(.*?))\t(?<session_id>(.*?))\t(?<subject>(.*?))\t(?<issuer_subject>(.*?))\t(?<not_valid_before>(.*?))\t(?<not_valid_after>(.*?))\t(?<last_alert>(.*?))\t(?<client_subject>(.*?))\t(?<client_issuer_subject>(.*?))\t(?<cert_hash>(.*?))\t(?<validation_status>(.*))" ]
    }
  }
 }
}

As you can see in the filter, I've taken each field (starting with the timestamp, ts), and generated an expression that matches it. For the sake of making sure that all fields are captured correctly, I've used the general non-greedy regex ".*?". After each delimiter, I have a "\t",representing the tab delimiter that exists between each field. This can be optimized by making more specific field declarations with more precise regular expressions. For instance, an epoch timestamp will never contain letters, so why should you use a wildcard that contains them? Once you have the filter complete, you can move on to the easy part, the output.

Output

The output section of the Logstash configuration determines where ingested events are supposed to go. There are many output options in Logstash, but we are going to be sending them to Elasticsearch. Elasticsearch is the powerful search and analytics platform behind Logstash. To specify the output, we'll just add the following at the end of the Logstash configuration:

output {
elasticsearch { embedded => true }
}

That concludes how to build a Logstash configuration that will ingest your Bro logs, exclude the lines we don't want, parse the individual data fields correctly, and output them to elasticsearch for Logstash. The only thing left to do is get them on the screen. To do that we'll launch Logstash by entering the following the command in a terminal, specifying the Logstash JAR file and the configuration file we just created:

java -jar logstash-1.2.2-flatjar.jar agent -f bro-parse.conf -- web

That might take a few seconds. To verify that it everything is running correctly, you should open another terminal and run:

netstat -l | grep 9292

Once you can see that port 9292 is listening, that means that Logstash should be ready to rock.

netstat9292

Figure 3: Verifying Logstash is Running

Now you should be able to open a web browser and go to http://127.0.0.1:9292. Once there you'll probably only see the Kibana dashboard, but from there you can open the pre-built Logstash dashboard and see your Bro logs populating!

Screenshot from 2013-11-15 14:13:48

Figure 4: Bro Logs in Logstash

Logstash uses the Kibana GUI for browsing logs. The combination of Elasticsearch, Logstash, and Kibana in one package make for the easiest Bro logging solution you can find. The most basic function that we now have is the search. Searches allow for the use of wildcards or entire search terms. For instance searching for "oogle.com" will probably give you 0 results. However, searching for "*oogle.com" is likely to give you exactly what you expect; any visits to Google hosted domains. Search will also find full search terms (single terms or uniquely grouped terms between specific delimiters) without the need of a wildcard. For instance, if you want to search specifically for "plus.google.com", that is likely to return results as you would expect.

To specify the logs you'd like to view by timestamp, there is a "timepicker" at the top right.

timepicker

Figure 5: Logstash Timepicker

You can take advantage of the parsing of individual fields by generating statistics for the unique values associated with each field. This can be done by simply viewing a Bro log and clicking a field name in the left column of the screen. You can also see more complete visualizations from that window by clicking "terms". For example, the pie chart below is one that I generated that indicates how many records exist in each of the Bro logs I'm parsing.

Screenshot from 2013-11-15 14:11:05

Figure 6: Examining Bro Log Sums

As another example, lets filter down to just SSL logs. Under the "Fields" panel, click "type" to reveal the variations of log types. Then, click the magnifying glass on "Bro_SSLlog". Now you have only Bro SSL logs, as well as a new field list representing only fields seen in the SSL events that are currently present. If we only want to see certain fields displayed, you can click their check boxes in the order they're displayed. If you want those rearranged suddenly, just move them with the left and right arrows in the event columns on the event display. Below is an example of sorting those SSL logs by timestamp, where the logs displayed are ts, server_name, uid, issuer_subject, and subject.

Screenshot from 2013-11-15 14:48:57

Figure 7: Sorting Bro SSL Logs

To remove the Bro_SSLlog filter, you can open up the "filtering" panel at the top of the page and  remove that additional filter. Doing so will revert back to all data types, but with the fields still selected.

This guide only scratches the surface of the types of analysis you can do with Logstash. When you combine a powerful network logging tool like Bro and a powerful log analysis engine like Logstash, the possibilities are endless. I suggest you play around with customizing the front end and perusing the logs. If you somehow mess up badly enough or need to "reset" your data, you can stop Logstash in the terminal, and remove the data/ directory that was created in same location as the logstash JAR file. I've created a config file that you can use to parse all of the Bro 2.2 log files. You can download that file here.

UPDATE - December 18, 2013

As per G Porter's request, I've generated a new Logstash Bro configuration that is tailored to work with the most recent Security Onion update. That update marked the deployment of Bro 2.2 to Security Onion, and if you compare it to an "out-of-the-box" Bro 2.2 deployment, there are a few additions that I've accounted for.

You can download the Security Onion specific Logstash Bro 2.2 configuration here.

Session data is the summary of the communication between two network devices. Also known as a conversation or a flow, this summary data is one of the most flexible and useful forms of NSM data. If you were to consider full packet capture equivalent to having a recording of every phone conversation someone makes from a their mobile phone, then you might consider session data to be equivalent having a copy of the call log on the bill associated with that mobile phone. Session data doesn’t give you the “What”, but it does give you the “Who, Where, and When”.

When session or flow records are generated, at minimum, the record will usually include the standard 5-tuple: source IP address and port, the destination IP address and port, and the protocol being used. In addition to this, session data will also usually provide a timestamp of when the communication began and ended, and the amount of data transferred between the two devices. The various forms of session data such as NetFlow v5/v9, IPFix, and jFlow can include other information, but these fields are generally common across all implementations of session data.

There are a few different applications that have the ability to collect flow data and provide tools for the efficient analysis of that data. My personal favorite is the System for Internet-Level Knowledge (SiLK), from the folks at CERT NetSA (http://www.cert.org/netsa/). In Applied NSM we use SiLK pretty extensively.

One of the best ways to learn about different NSM technologies is the Security Onion distribution, which is an Ubuntu-based distribution designed for quick deployment of all sorts of NSM collection, detection, and analysis technologies. This includes popular tools like Snort, Suricata, Sguil, Squert, Snorby, Bro, NetworkMiner, Xplico, and more. Unfortunately, SiLK doesn’t currently come pre-packaged with Security Onion. The purpose of this guide is to describe how you can get SiLK up and running on a standalone Security Onion installation.

 

Preparation

To follow along with this guide, you should have already installed and configured Security Onion, and ensured that NSM services are already running. This guide will assume you’ve deployed a standalone installation. If you need help installing Security Onion, this installation guide should help: https://code.google.com/p/security-onion/wiki/Installation.

For the purposes of this article, we will assume this installation has access to two network interfaces. The interface at eth0 is used for management, and the eth1 interface is used for data collection and monitoring.

Now is a good time to go ahead and download the tools that will be needed. Do this by visiting this URL http://tools.netsa.cert.org/index.html# and downloading the following

  • SiLK (3.7.2 as of this writing)
  • YAF (2.4.0 as of this writing)
  • Fixbuf (1.30 as of this writing)

Alternatively, you can download the packages directly from the command line with these commands:

wget http://tools.netsa.cert.org/releases/silk-3.7.2.tar.gz
wget http://tools.netsa.cert.org/releases/yaf-2.4.0.tar.gz
wget http://tools.netsa.cert.org/releases/libfixbuf-1.3.0.tar.gz

This guide reflects the current stable releases of each tool. You will want to ensure that you place the correct version numbers in the URLs above when using wget to ensure that you are getting the most up to date version.

Installation

The analysis of flow data requires a flow generator and a collector. So, before we can begin collecting and analyzing session data with SiLK we need to ensure that we have data to collect. In this case, we will be installing the YAF flow generation utility. YAF generates IPFIX flow data, which is quite flexible. Collection will be handled by the rwflowpack component of SiLK, and analysis will be provided through the SiLK rwtool suite.

SiLK Workflow

Figure 1: The SiLK Workflow

 

To install these tools, you will need a couple of prerequisites. You can install these in one fell swoop by running this command:

sudo apt-get install glib2.0 libglib2.0-dev libpcap-dev g++ python-dev

With this done, you can install fixbuf using these steps:

1. Extract the archive and go to the newly extracted folder

tar –xvzf libfixbuf-1.3.0.tar.gz

cd libfixbuf-1.3.0/

2. Configure, make, and install the package

./configure
make
sudo make install

 

Now you can install YAF with these steps:

1. Extract the archive and go to the newly extracted folder

tar –xvzf yaf-2.4.0.tar.gz
cd yaf-2.4.0/

2. Export the PKG configuration path

export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig

3. Configure with applabel enabled

./configure --enable-applabel

4. Make and install the package

make
sudo make install

If you try to run YAF right now, you’ll notice an error. We need to continue the installation process before it will run properly. This process continues by installing SiLK with these steps:

1. Extract the archive and go to the newly extracted folder

tar –xvzf silk-3.7.2.tar.gz
cd silk-3.7.2/

2. Configure with a specified fixbuf path and python enabled

./configure --with-libfixbuf=/usr/local/lib/pkgconfig/ --with-python

3. Make and install the package

make
sudo make install

With everything installed, you need to make sure that all of the libraries we need are linked properly so that the LD_LIBRARY_PATH variable doesn’t have to be exported each time you use SiLK. This is can be done by creating a file named silk.conf in the /etc/ld.so.conf.d/ directory with the following contents:

/usr/local/lib
/usr/local/lib/silk

To apply this change, run:

sudo ldconfig

Configuring SiLK

With everything installed, now we have to configure SiLK to use rwflowpack to collect the flow data we generate. We need three files to make this happen: silk.conf, sensors.conf, and rwflowpack.conf.

Silk.conf

We will start by creating the silk.conf site configuration file. This file controls how SiLK parses data, and contains a list of sensors. It can be found in the previously unzipped SiLK installation tarball at silk-3.7.2/site/twoway/silk.conf. We will copy it to a directory that Security Onion uses to store several other configuration files:

sudo cp silk-3.7.2/site/twoway/silk.conf /etc/nsm/<$SENSOR-$INTERFACE>/

The site configuration file should work just fine for the purposes of this guide, so we won’t need to modify it.

Sensors.conf

The sensor configuration file sensors.conf is used to define the sensors that will be generating session data, and their characteristics. This file should be created at /etc/nsm/<$SENSOR-$INTERFACE>/sensors.conf. For this example, our sensors.conf will look like this:

probe S0 ipfix
  listen-on-port 18001
  protocol tcp
  listen-as-host 127.0.0.1
end probe
group my-network
  ipblocks 192.168.1.0/24
  ipblocks 172.16.0.0/16
  ipblocks 10.0.0.0/8 
end group
sensor S0
  ipfix-probes S0
  internal-ipblocks @my-network
  external-ipblocks remainder
end sensor

This sensors.conf has three different sections: probe, group, and sensor.

The probe section tells SiLK where to expect to receive data from for the identified sensor. Here, we’ve identified sensor S0, and told SiLK to expect to receive ipfix data from this sensor via the TCP protocol over port 18001. We’ve also defined the IP address of the sensor as the local loopback address, 127.0.0.1. In a remote sensor deployment, you would use the IP address of the sensor that is actually transmitting the data.

The group section allows us to create a variable containing IP Blocks. Because of the way SiLK bins flow data, it is important to define internal and external network ranges on a per sensor basis so that your queries that are based upon flow direction (inbound, outbound, inbound web traffic, outbound web traffic, etc.) are accurate. Here we’ve defined a group called my-network that has two ipblocks, 192.168.1.0/24 and 10.0.0.0/8. You will want to customize these values to reflect your actual internal IP ranges.

The last section is the sensor section, which we use to define the characteristics of the S0 sensor. Here we have specified that the sensor will be generating IPFIX data, and that my-network group defines the internal IP ranges for the sensor, with the remainder being considered external.

Be careful if you try to rename your sensors here, because the sensor names in this file must match those in the site configuration file silk.conf. If a mismatch occurs, then rwflowpack will fail to start. In addition to this, if you want to define custom sensors names then I recommend starting by renaming S1. While it might make sense to start by renaming S0, I’ve seen instances where this can cause odd problems.

Rwflowpack.conf

The last configuration step is to modify rwflowpack.conf, which is the configuration file for the rwflowpack process that listens for and collects flow records. This file can be found at /usr/local/share/silk/etc/rwflowpack.conf. First, we need to copy this file to /etc/nsm/<$SENSOR-$INTERFACE>/

sudo cp /usr/local/share/silk/etc/rwflowpack.conf /etc/nsm/<$SENSOR-$INTERFACE>/

Now we need to change seven values in the newly copied file:

ENABLED=yes

This will enable rwflowpack

statedirectory=/nsm/sensor_data/<$SENSOR-$INTERFACE>/silk

A convenience variable used for setting the location of other various SiLK files and folders

CREATE_DIRECTORIES=yes

This will allow for the creation of specified data subdirectories

SENSOR_CONFIG=/etc/nsm/<$SENSOR-$INTERFACE>/sensors.conf

The path to the sensor configuration file

DATA_ROOTDIR=/nsm/sensor_data/<$SENSOR-$INTERFACE>/silk/

The base directory for SiLK data storage

SITE_CONFIG=/etc/nsm/<$SENSOR-$INTERFACE>/silk.conf

The path to the site configuration file

LOG_TYPE=legacy

Sets the logging format to legacy

LOG_DIR=/var/log/

The path for log storage

Finally, we need to copy rwflowpack startup script into init.d so that we can start it like a normal service. This command will do that:

sudo cp /usr/local/share/silk/etc/init.d/rwflowpack /etc/init.d

Once you’ve copied this file, you need to change one path in it. Open the file, and change the SCRIPT_CONFIG_LOCATION variable from “/usr/local/etc/” to “/etc/nsm/<$SENSOR-$INTERFACE>/”

Starting Everything Up

Now that everything is configured, we should be able to start rwflowpack and YAF and begin collecting data.

First, we can start rwflowpack by simply typing the following:

sudo service rwflowpack start

If everything went well, you should see a success message, as shown in Figure 2:

 Starting rwflowpack

Figure 2: Successfully Starting rwflowpack

If you want to ensure that rwflowpack runs at startup, you can do so with the following command:

sudo update-rc.d rwflowpack start 20 3 4 5 .

Now that our collector is waiting for data, we can start YAF to begin generating flow data. If you’re using “eth1” as the sensors monitoring interface as we are in this guide, that command will look like this:

sudo nohup /usr/local/bin/yaf --silk --ipfix=tcp --live=pcap  --out=127.0.0.1 --ipfix-port=18001 --in=eth1 --applabel --max-payload=384 &

You’ll notice that several of the arguments we are calling in this YAF execution string match values we’ve configured in our SiLK configuration files.

You can verify that everything started up correctly by running ps to make sure that the process is running, as is shown in Figure 3. If YAF doesn’t appear to be running, you can check the nohup.out file for any error messages that might have been generated.

 Checking YAF

Figure 3: Using ps to Verify that YAF is Running

That’s it! If your sensor interface is seeing traffic, then YAF should begin generating IPFIX flow data and sending it to rwflowpack for collection. You can verify this by running a basic rwfilter query, but first we have to tell the SiLK rwtools where the site configuration file is. This can be done by exporting the SILK_CONFIG_FILE variable.

export SILK_CONFIG_FILE=/etc/nsm/<$SENSOR-$INTERFACE>/silk.conf
export SILK_DATA_ROOTDIR=/nsm/sensor_data/<$SENSOR-$INTERFACE>/silk/

If you don’t want to have to do this every time you log into this system, you can place these lines in your ~/.bashrc file.

You should be able to use rwfilter now. If everything is setup correctly and you are capturing data, you should see some output from this command:

rwfilter --sensor=S0 --proto=0-255 --type=all  --pass=stdout | rwcut

If you aren’t monitoring a busy link, you might need to ping something from a monitored system (or from the sensor itself) to generate some traffic.

Figure 4 shows an example of SiLK flow records being output to the terminal.

Flow Data

Figure 4: Flow Records Means Everything is Working

Keep in mind that it may take several minutes for flow records to actual become populated in the SiLK database. If you run into any issues, you can start to diagnose them by accessing the rwflowpack logs in /var/log/.

Monitoring SiLK Services

If you are deploying SiLK in production, then you will want to make sure that the services are constantly running. One way to do this might be to leverage the Security Onion “watchdog” scripts that are used to manage other NSM services, but if you modify those scripts then you run the risk of wiping out your changes any time you update your SO installation. Because of this, the best idea might be to run separate watchdog scripts to monitor these services.

This script can be used to monitor Yaf to ensure that it is always running:

#!/bin/bash

function SiLKSTART {
  sudo nohup /usr/local/bin/yaf --silk --ipfix=tcp --live=pcap --out=192.168.1.10 --ipfix-port=18001 –in=eth1 --applabel --max-payload=384 --verbose --log=/var/log/yaf.log &
}

function watchdog {
  pidyaf=$(pidof yaf)
  if [ -z “$pidyaf” ]; then
    echo “YAF is not running.”
  SiLKSTART
  fi
}
watchdog

This script can be used to monitor rwflowpack to ensure that it is always running:

#!/bin/bash
pidrwflowpack=$(pidof rwflowpack)
if [ -z “$pidrwflowpack” ]; then

  echo “rwflowpack is not running.”
  sudo pidof rwflowpack | tr ’ ’ ’\n’ | xargs -i
  sudo kill -9 {} sudo service rwflowpack restart

fi

These scripts can be set to run automatically at startup for ensured success

Conclusion

I always tell people that session data is the best “bang for your buck” data type you will find. If you just want to play around with SiLK, then installing it on Security Onion is a good way to get your feet wet. Even better, if you are using Security Onion in production on your network, it is a great platform to use for getting up and running with session data in addition to the many other data types. If you want to learn more about using SiLK for NSM detection and analysis, I recommend checking out Applied NSM when it comes out December 15th, or to sink your teeth into session data sooner, check out their excellent documentation (which includes use cases) at http://tools.netsa.cert.org/silk/docs.html.

It's been a long time since this blog has been updated because we've been hard at work finishing the book. Fortunately, I'm glad to say that we should be finished with the content in less than a week, which means that we are on track for a December 15th release date, as planned.

In preparation for this release, we've updated the book's Table of Contents to reflect changes made during production. We've also updated the list of Contributors to include all of the folks who provided some type of contribution to the book. It wouldn't have been possible without them.

If you'd like to pre-order the book, you can do so at Amazon or at the Syngress website. Once we've finished the final book editing, we plan to begin releasing related content on this blog, so stay tuned!

When you setup a new sensor, it is likely that you will choose to utilize either a SPAN (mirrored) port or a network tap in order to get to get packets to the collection interface of the sensor. Most enterprise-level switches support this port mirroring. However, if you are deploying an NSM capability into a small-office or home (SOHO) environment, you might not be able to spend the type of money required to purchase these expensive switches.

Fortunately, there are a few SOHO level switches that support port mirroring. Now, not all hardware vendors allow you to query products based upon this feature, and it often doesn't appear on the product packaging. This can make it difficult to locate affordable hardware with this feature. One resource I've really grown to love is this listing of switches that support port mirroring, from Miarec. The listing includes a lot of the major brands like Cisco, HP, Dell, and more so that you can find a model from a manufacturer whose products you like.

Personally, I've always had really good luck with Cisco, D-Link, and Netgear SOHO switches for use with NSM sensors. Regardless, there are plenty of options that will allow you to have port mirroring functionality for less than $100 bucks.