SiLK

**TL;DR - You can download FlowBAT at http://www.flowbat.com **

Above all else, we know that network visibility is critical in the modern threat landscape. In a perfect world organizations could collect and store mountains of full packet capture data for long periods of time. Unfortunately, storing packet data for an extended duration doesn't scale well, and it can be cost prohibitive for even for small networks. Even if you can afford to store some level of packet data, parsing and filtering through it to perform network or security analysis can be incredibly time consuming.

Network Flow data is ideal because it provides a significant amount of context with minimal storage overhead. This means that it can be stored for an extended amount of time, providing historical data that can account for every connection into and out of your network. The storage footprint is so minimal, that most organizations measure the amount of flow data they store by years rather than by hours or days. This provides an unbelievable amount of flexibility while investigating events or breaches that have occurred in the past.

Introducing

flowbat_web_logo

Even though flow data is so versatile, its adoption has been slowed because most of the tools available for performing flow data analysis can be challenging to use. These tools are often command-line based and lack robust analysis features. After all, spending all day examining data that looks like this isn't always the most efficient:

fb-fig1

We developed the Flow Basic Analysis Tool (FlowBAT) to address this need by providing an analyst-focused graphical interface for analyzing flow data. FlowBAT was designed by analysts, for analysts and provides a feature set that is applicable to many use cases, including Network Security Monitoring, Intrusion Detection, Incident Response, Network Forensics, System and Network Troubleshooting, and Compliance Auditing.

flowbathome

 

FlowBAT Features

FlowBAT has several features that make it applicable for analysts with multiple goals operating in a wide array of environments. This includes:

Multiple Deployment Scenarios

FlowBAT can be deployed in an existing SiLK environment or as a part of a new installation. You can deploy SiLK in two ways: local or remote. A local FlowBAT installation requires that you install FlowBAT on the same system as your SiLK database. This method is fastest as it doesn't have to traverse the network to query flow data. A remote FlowBAT installation allows you to install SiLK on a system separate from your SiLK database. In this scenario, FlowBAT queries flow data by utilizing the SSH capability of an existing server running SiLK. This allows FlowBAT to transmit queries and receive data securely with minimal additional setup. You can even deploy FlowBAT on a cloud based system as long as it can reach your SiLK database over SSH. In either deployment scenario, FlowBAT can be up and running in a matter of minutes.

Quick Query Interface

Analysis is all about getting data and getting it quickly. While we have included an interface that makes this easy for seasoned flow analysis pros, we also provide a query interface designed to present all of the possible data retrieval options to analysts who might not be as experienced, or who simply want a more visual way of getting the data they want. The quick query interface allows the analyst to iteratively build data queries and easy tweak them after the queries initial execution. This means that you don't have to spend a ton of time looking up commands to get the exact data you want.

fb-fig2

Rapid Data Pivoting

When you are hunting through large amounts of data, you need to move quick. Using traditional analysis techniques this requires a lot of typing, multiple open terminals, and constantly copying and pasting commands. With FlowBAT, you can simply click on field values in a set of query results to add additional parameters to your existing query or to create a new query. For example, while looking at a series of flow records associated with an individual service on a specific port, you can click on a specific IP address and pivot to a data set showing all communication to and from that host. From there, you can click on a timestamp from an individual flow record and automatically retrieve flow records occurring five minutes before and five minutes after that time frame. This can all occur within a matter of seconds. This same workflow using traditional command-line analysis tools could easily take several minutes or more.

Saved Queries and Dashboards

Analysts often find queries they like and will reuse them constantly. In the past, this resulted in dozens of text files thrown haphazardly in multiple directories that contain commonly used queries. Using FlowBAT's saved queries feature, you can store these queries right in the tool and execute them with a single click. Furthermore, if you use these saved queries very often, you can save them to an interactive dashboard and schedule them to periodically update over set time intervals. Using this mechanism you can stay constantly up to date on specific activity on your network. For instance, you can configure a saved query that is used to identify web servers on your network. With this query configured to execute on a periodic basis, you will be the first to know if an unexpected device starts receiving data on a common HTTP port on your network.

fb-fig3

Graphing and Statistical Capability

One of the most powerful features of flow data is the power to generate statistics from aggregated data. This can yield very powerful detection capabilities such as:

  • Calculating Device Throughput
  • Identifying Top Talking Devices
  • Identifying Odd Inbound/Outbound Traffic Rations
  • Examining Throughput Distribution Across Network Segments
  • Locating Unusual Periodic and Repetitive Traffic Patterns

While some of these statistics are best interpreted as text, sometimes it becomes easier to interpret statistical data when it is presented visually. FlowPlotter allows you to send statistical data to a graphing engine to automatically generate bar, line, column, and pie charts. This level of visualization is useful for analysis, and for helping to provide visual examples of flow data in various forms of reporting that may be required as a part of your analysis duties.

fb-fig4

Flexible Data Display

Every analysts processes and interprets information differently. As analysts, one thing we hate is when a tool locks you into viewing data in a very specific manner. With FlowBAT, we designed the display of flow data so that it is extremely customizable to each analysts needs. With this in mind, you can rearrange, sort, and add/remove columns as needed. This provides an analysis experience that can be customized to your personal taste, as well as to specific scenarios.

 

FlowBAT Demos

 CLI Query Mode

FBDemo-CLIMode

Guided Query Mode

FBDemo-QBMode

Manipulating Data

FBDemo-UsingData

Pivoting with Data

FBDemo-Pivoting

Downloading and Installing FlowBAT

We've spent quite a bit of time making sure that FlowBAT is easy to install and getting running with. You can download FlowBAT on the FlowBAT Downloads pages, and you can find explicit installation instructions on the FlowBAT Installation page. General support links and a user manual (still being written) can be found at http://www.flowbat.com. We are excited to see what you think of FlowBAT, so please give it a try and let us know what you think!

Jason and I recently had the opportunity and pleasure to speak at MIRCon 2014. The topic of the presentation was "Applied Detection and Analysis with Flow Data." We had a great time talking about effective ways to use flow data for NSM, as well as introducing the world to FlowBAT.

 

You can view the slides from this presentation here:

In my previous post where I introduced FlowPlotter I showed you some of the basics behind generating data from SiLK and turning it into nice looking visual representation that leveraged Google's Visualization API. That is all wrapped up into an easy to run tool called "FlowPlotter" that runs directly off of data piped from rwfilter to SiLK via stdout. Today I'm pleased to reveal some of the D3 visualizations I promised earlier. Previously I had planned to replicate the Google visualizations with D3 to provide an alternative look, but instead I've gone forward to fill in some of the features that Google Charts is lacking.

D3.js is a popular javascript library that is often used to streamline the way people make visualizations with their data. These visualizations are usually friendly with most browsers and can be augmented to do just about anything you can think of. For the integration of these new graphs, I've went with the same "template" approach as I did with the Google charts, and again with the data being embedded into the html file that is generated. In most D3 visualizations people will host the data on a web server and reference that data directly with d3.js. But, due to how the data is used here, it is more useful for us to be able to generate it for immediate use. In theory you can reference files on the local system, but most browsers (like Chrome) have strict security settings that prohibit this. Don't expect other more forgiving browsers (Firefox) to allow this down the road either. For all of those reasons, FlowPlotter D3 templates have been designed to contain all data within a single HTML file.

Force Directed Link Graphs

So far I've generated two new chart types in D3. The first is a force-directed link graph similar to what is generated with Afterglow. We discussed in the last FlowPlotter post that the purpose behind FlowPlotter is to have a streamlined way of generating visualizations for all of your data and having it easily available to anyone that needs to view it. While Afterglow is great to work with, it doesn't really "flow" with the rest of what we're generating, and streamlining it for wide use can be challenging. The D3 force-directed link graph that I've integrated with SiLK shows the relationship between 2 nodes, with links between of varying opacity based on the value of a 3rd value. Simply put, it is created from a 3 column CSV with a source, target, and value in that order. The overall design was borrowed partly from the work of d3noob.

Force-Directed Link Graph showing all traffic from a specific country

Figure 1: Force-Directed Link Graph showing all traffic to or from a specific country

To create a force-directed link graph directly from rwfilter output, you'll need to run the following:

rwfilter ../Sampledata/sample.rw --scc=kr --proto=0- --type=all --pass=stdout | ./flowplotter.sh forceopacity sip dip distinct:dport 100 > forcetest.html

You'll notice that there are 4 options after forceopacity. This chart module relies on rwstats to generate this data. The communication you see in Figure 1 represents the top 100 sip-dip pairs for all traffic from South Korea based on total amount of distinct destination port communication attempts. The greater the number, the darker the link between nodes. Similarly you could do something more practical like generating a force-directed link graph between the top 100 sip-dport pairs for outgoing web traffic to South Korea:

rwfilter ../Sampledata/sample.rw --dcc=kr --proto=0- --type=outweb --pass=stdout | ./flowplotter.sh forceopacity sip dport bytes 100 > forceport.html

There is still a lot I would like to do with FlowPlotter's force-directed link graph. For one, it currently does not replace the usefulness of Afterglow as the big benefit of Afterglow lies in being able to specify "rules" about how your data will be displayed, as seen in ANSM. Down the road I would like to be able to create something similar, however it isn't something that is viable from the current stdout data output. Another feature that is currently only in testing is the opacity options of links between nodes. They are representative of the values in the 3rd column for the connection between source and target node, but they could be much more. The combination of these would require a nice legend to read, but would allow for some genuinely useful visualizations.

Automated Asset Discovery

With this FlowPlotter update I'm also showing off the new Asset Discovery addition. In chapter 14 of ANSM we discuss methods of gathering "friendly intelligence" and how the importance of having friendly intelligence can't be overstated. Friendly intelligence is information related to the assets that you are tasked with protecting. In almost all environments, the overall picture of those assets frequently changes, so gathering friendly intelligence is a continuous effort. Since flow data provides are very good high level view of the communications between all devices, it is ideal for creating a general idea of the assets you are monitoring on a given network. It also is easily queried, giving you the ability to keep a constant vigilance on new assets. To generate an asset model from SiLK data, I leverage the friendly intel script provided in chapter 14 of ANSM. To stay with the template-based approach to FlowPlotter, I've made the script a standalone part of FlowPlotter that is referenced within flowplotter.sh. The premise behind doing this is to ensure that changes and additions to the asset model script can be easily made without extreme additions to flowplotter.sh. Like the previous graph, I examined many d3 galleries out there to find the best examples of great ways to visualize parent-child relationships, with the final design borrowing from the collapsible tree layout in the d3 galleries. Whereas the force-directed link graph leveraged a CSV as input, I am using JSON as input for the asset discovery module. This is mainly because extensive parent-child relationships aren't easy to represent with CSV data, but instead I can literally specify the data and fields in a JSON tree, with a good idea of how the data will turn out in the collapsible tree layout results. Also, I like JSON because by name is Jason, and it just feels right.

Auto-generated Asset List based on one hour of traffic.

Figure 2: Auto-generated Asset List based on one hour of traffic.

To create an asset model you can either do it from new data piped in via stdout like other FlowPlotter modules or you can create it based on existing rwfilter files. The important thing to remember is that the dataset that you are looking at should be fairly large, or at least representative of normal network traffic over time. For steady network traffic with production systems, an hour might be fine. For some large enterprise environments, shorter timespans will work.

To generate the data from an rwfilter stdout, do the following:

rwfilter --start-date=2014/02/06 --proto=0- --type=all --pass=stdout | ./flowplotter.sh assetdiscovery > assetlist.html

Alternatively you can create the rwfilter file, then generate the asset model from that:

rwfilter --start-date=2014/02/06 --proto=0- --type=all --pass=sample.rw

cat sample.rw | ./flowplotter.sh assetdiscovery > assetlist.html

The asset model currently attempts to identify various service types (HTTP, DNS, VPN, etc) but is receiving regular updates to include more data that could benefit from regular continuous regeneration. This is currently limited to friendly intelligence, but soon I will be including some additional less friendly detections from some of the statistical approaches discussed in ANSM and using some of the less frequently discussed SiLK features. These will all be wrapped up within FlowPlotter as usual. Be sure to digest the README for FlowPlotter to understand the additions a little better. For instance, you may notice that perhaps you're not seeing EVERY "asset" from your network for a given service. That might be due to thresholding which is set by default to look at the "Servers" that display at least 1% of the total "Server-like" traffic representative of a particular service. This default can be changed easily with options located in the README. As always, you should be aware of how far abstracted from your data you really are when generating visualizations like this.

As usual, I'm open to any and all requests for additions or advice on optimizing anything that currently exists in FlowPlotter.

TLDR; Check out FlowPlotter on GitHub

If you're like me, it generally makes you happy to see graphs, charts, and tables to aide in network data analysis. This can be useful for analysis of already detected events, or for detection while hunting for evil. You probably also love session (also often called flow) data. Unfortunately, it isn't always easy to generate useful visualizations from flow data. This typically involves multiple steps such as moving data around between different tools, formatting the output of the data to match the format of whatever graphing tool you might be using, and then generate the graph output and making sure it is useful for your goals. Some might turn to the graphing capabilties of spreadsheet applications because of their simplicity, but those can't really handle a large data set like we might see with flow data. With that said, it is still pretty hard to find overly useful network visualizations for NSM detection and analysis.

 

Because of this, I set out to make visualizations from flow data easy and accessible, without the need for several steps between viewing the raw data and having the ready-made chart. The result of this was a tool called FlowPlotter, which we are going to discuss in this article. We will talk about how FlowPlotter came into existence, and its current workflow. FlowPlotter works from NetFlow records viewed with SiLK, so before moving forward, be sure to check out Chris's instructions on setting SiLK up on Security Onion so that you can easily test the tool if you don't already have a SiLK environment available. You can also go ahead and grab FlowPlotter from GitHub so that we can jump into making graphs. To generate graphs, you will only need SiLK and FlowPlotter. If the nitty gritty details bore you and you want to get into FlowPlotter right away with examples, skip this API stuff and scroll down below.

 

Background Mechanics

We will begin by talking about how FlowPlotter works. For the purpose of portability across multiple platforms, it makes sense that we should be viewing this kind of visualization in a web friendly format, rather than just a jpeg or Excel graph. To do this, FlowPlotter uses an implementation of Google Charts. Google charts offers an extensive API that allows for the creation of just about any chart you can imagine. While many of these make more sense to have only a few rows of data (Pie charts for example), some benefit from having as much data as can be shoved in to them (line charts representing bins of data). Google provides thorough explanations on how to create these charts using your own data, but of course, the formatting of the data is still up to you. See their chart gallery for more examples on the structure of these charts.

 

In Applied Network Security Monitoring, we discuss several ways of seeing the "big picture" when it comes to visualizing data of many types. We also go into deep detail on using SiLK to get the most from your flow data. Here I'd like to present a simple method of going from using SiLK to parse data with rwtools, straight to using some bash kung-fu to streamline our way to having an interactive Google Chart full of flow data in your browser. The eventual goal of this exercise is to run rwfilter commands and pipe the binary data straight to a script which will take arguments to generate the data you want. This results in having a tool that we can pass data to that will generate charts using the Google API.  Before we can run though, we need to walk. We'll start by creating a chart that requires fewer data points, such as a bar chart representing the top 10 country codes talking to your network by bytes over the course of a day. The first thing we want to do is generate the SiLK data that we eventually want to plot. Since the goal of this is to make a top-10 lists, we'll use rwstats and rwfilter.

 

rwfilter --start-date=2014/02/06 --proto=0- --type=all --pass=stdout | rwstats --top --count=10 --fields=dcc --value=bytes

Screen Shot 2014-02-13 at 3.29.10 PM

The rwfilter command above states that you wish to filter down to all data that passes the filter --proto=0-255 (shorthand is 0-) that occurred on 2014/02/06. These can be stated in any order, and many people like to type them out as they would verbally say them. For instance, on the rwstats command, we're literally looking for the "top" "10" "destination country codes" by "bytes". It seems that many of the people I teach SiLK to end up having the issue of thinking too rigidly about "what SiLK wants", instead of just writing down what you want, then converting that directly into a query.

 

Now we're going to try and make that rwstats output appear in a bar graph. FlowPlotter, which I talk about in more detail below, works by using templates that allow for the formatted data to be inserted into which then yields a complete chart. Lets look at the most basic template for a column chart. This is taken directly from Google's visualization playground, with their data substituted out for "dataplaceholder" around the middle of the code. You'll even notice that for now, the title is still "Company Performance".

<html>
<head>
<script type="text/javascript" src="https://www.google.com/jsapi"></script>
<script type="text/javascript">
google.load("visualization", "1", {packages:["corechart"]});
google.setOnLoadCallback(drawChart);
function drawChart() {
var data = google.visualization.arrayToDataTable([
dataplaceholder
]);

var options = {
title: 'Company Performance',
vAxis: {title: 'Year',  titleTextStyle: {color: 'red'}}
};

var chart = new google.visualization.BarChart(document.getElementById('chart_div'));
chart.draw(data, options);
}
</script>
</head>
<body>
<div id="chart_div" style="width: 900px; height: 500px;"></div>
</body>
</html>

You'll notice the "dataplaceholder" that exists where there should be a table of data. In the place of that, we should be inserting something that looks like the following, which was created from our rwstats command:

['dcc', 'Bytes'],
['--', 26478355345],
['us', 706854881],
['ca', 8665204],
['no', 1893193],
['nl', 1416293],
['bg', 1101223],
['ch', 1092811],
['de', 202948],
['se', 169036],
['gb', 117399]

You'll notice that "- -" is also a country code here representing PRIVATE/EXPERIMENTAL address, which we'll leave for the sake of discussing additional chart features and manipulations later. In the meantime, how did I streamline the creation of that data to replace dataplaceholder? First it is important to add the "--delimited=," option on to rwstats to ease the post processing a bit. After that, I used a mix of cut, sed, and grep to wrap it all into a one liner:

rwfilter --start-date=2014/02/06 --proto=0- --type=all --pass=stdout | rwstats --top --count=10 --fields=dcc --value=bytes --delimited=, | cut -d "," -f1,2 |grep ,| sed "s/\(.*\),\(.*\)/['\1', \2],/g"|sed '$s/,$//'| sed "s/, \([A-Za-z].*\)],/, '\1'],/g" | grep ","

There are probably better ways of generating that data, and I welcome them in comments or variations of FlowPlotter, but for the sake of this particular exercise, this will get you by. The general idea is that the rwstats command output is piped to cut, which strips out the first two columns, then grep is used to filter only data fields and titles. The sed commands that follow sorts everything so that they all have the proper formatting, first by making the table, then by formatting the end line and then the first column identifier line. Now that we have the basic template and some example data, you can manually throw the data into the template  in place of dataplaceholder and change some of the obvious things such as the title so that the HTML looks like the following:

<html>
<head>
<script type="text/javascript" src="https://www.google.com/jsapi"></script>
<script type="text/javascript">
google.load("visualization", "1", {packages:["corechart"]});
google.setOnLoadCallback(drawChart);
function drawChart() {
var data = google.visualization.arrayToDataTable([
['dcc', 'Bytes'],
['--', 26478355345],
['us', 706854881],
['ca', 8665204],
['no', 1893193],
['nl', 1416293],
['bg', 1101223],
['ch', 1092811],
['de', 202948],
['se', 169036],
['gb', 117399]
]);

var options = {
title: 'Destination Country Code by Bytes',
vAxis: {title: 'Country Codes', titleTextStyle: {color: 'black'}}
};

var chart = new google.visualization.BarChart(document.getElementById('chart_div'));
chart.draw(data, options);
}
</script>
</head>
<body>
<div id="chart_div" style="width: 900px; height: 500px;"></div>
</body>
</html>

Here you can see the code we've generated.

Screen Shot 2014-02-13 at 8.33.18 PM

Notice that while mouse-over works (at the link) and overall it is a decent looking graph, the scale is being ruined by our large amount of internal destination addresses. There are two options to fix this, one that is obvious but not great, and one that is not obvious, but is by far the better method. Either we can get rid of the internal destinations from the data manually or we can accept it and change the chart to be more forgiving for large outliers. To do that, we need to edit the var options in our code. We're going add in an hAxis option as seen below. This will make the horizontal axis work on a logarithmic scale instead of scaling according to maximum and minimum data values.

<html>
<head>
<script type="text/javascript" src="https://www.google.com/jsapi"></script>
<script type="text/javascript">
google.load("visualization", "1", {packages:["corechart"]});
google.setOnLoadCallback(drawChart);
function drawChart() {
var data = google.visualization.arrayToDataTable([
['dcc', 'Bytes'],
['--', 26478355345],
['us', 706854881],
['ca', 8665204],
['no', 1893193],
['nl', 1416293],
['bg', 1101223],
['ch', 1092811],
['de', 202948],
['se', 169036],
['gb', 117399]
]);

var options = {
title: 'Destination Country Code by Bytes',
vAxis: {title: 'Country Codes', titleTextStyle: {color: 'black'}},
hAxis: {logScale: true}
};

var chart = new google.visualization.BarChart(document.getElementById('chart_div'));
chart.draw(data, options);
}
</script>
</head>
<body>
<div id="chart_div" style="width: 900px; height: 500px;"></div>
</body>
</html>

Our new graph looks like this.

Screen Shot 2014-02-13 at 8.33.38 PM

In testing changes like this, I highly recommend playing around in Google Chart's Playground as it can streamline the debugging of small changes.

 

FlowPlotter

Now that you've got an idea of how to manually generate these graphs, we can talk about FlowPlotter in a more official capacity. FlowPlotter is a scripted approach to generating graphs based on SiLK data by using templates so that it is modular enough to accept new graphs with relative ease. In short, it automates everything we just did in the previous example. The only requirement is that you provide an rwfilter command and send that to flowplotter.sh with a chart name and it's independent and dependent variables as options. From there FlowPlotter will make the html page for you, complete with titles and ideal scaling options. For instance, to generate the previous graph, you would simply run the following from the FlowPlotter root directory:

/Flowplotter$ rwfilter --start-date=2014/02/06 --proto=0- --type=all --pass=stdout | ./flowplotter.sh barchart dcc bytes

Here is the current usage page for FlowPlotter:

rwfilter [filter] | flowplotter.sh [charttype] [independent variable] [dependent variable]

Currently you must run a SiLK rwfilter command and pipe it to flowplotter.sh and specify various options as arguments. The following chart types are currently functional

geomap

  • independent variable = Must specify an rwstats compatible field for country type (scc or dcc).
  • dependent variable = Must specify an rwstats compatible value (Records, Packets, Bytes, sIP-Distinct, dIP-Distinct, or Distinct:[field])

linechart

  • independent variable = Must specify a bin-size that the dependent variable will be calculated by. For example, if you want "Records per Minute", this variable will be 60.
  • dependent variable = Must specify an rwcount compatible value (Records,Packets,Bytes).

treemap

  • independent variable = Must specify an rwstats compatible field.
  • dependent variable = Must specify an rwstats compatible value (Records, Packets, Bytes, sIP-Distinct, dIP-Distinct, or Distinct:[field])

timeline

  • independent variable = Must specify an rwcut compatible field.
  • dependent variable = Must specify an rwcut compatible field.

piechart

  • independent variable = Must specify an rwstats compatible field.
  • dependent variable = Must specify an rwstats compatible value (Records, Packets, Bytes, sIP-Distinct, dIP-Distinct, or Distinct:[field])

barchart

  • independent variable = Must specify an rwstats compatible field.
  • dependent variable = Must specify an rwstats compatible value (Records, Packets, Bytes, sIP-Distinct, dIP-Distinct, or Distinct:[field])

columnchart

  • independent variable = Must specify an rwstats compatible field.
  • dependent variable = Must specify an rwstats compatible value (Records, Packets, Bytes, sIP-Distinct, dIP-Distinct, or Distinct:[field])

 

As you can see, FlowPlotter doesn't just support bar charts. It currently supports numerous Google Charts. The charts below were all generated using the queries you see accompanying them.

Geomaps

rwfilter --start-date=2013/12/27 --proto=0- --type=all --pass=stdout | ./flowplotter.sh geomap dcc bytes > geomap.html

Screen Shot 2014-02-11 at 5.36.15 PM

 

Linecharts

rwfilter --start-date=2013/12/27 --proto=0- --type=all --pass=stdout | ./flowplotter.sh linechart 60 bytes > linechart.html

Screen Shot 2014-02-11 at 5.47.29 PM

 

Treemaps

rwfilter --start-date=2013/12/27 --sport=1025- --dport=1025- --not-daddress=192.168.1.0/24 --proto=0- --type=all --pass=stdout | ./flowplotter.sh treemap dip records > treemap.html

Screen Shot 2014-02-11 at 5.52.04 PM

 

Timelines

rwfilter --start-date=2013/12/27 --proto=0- --type=out,outweb --dcc=us,-- --fail=stdout | ./flowplotter.sh timeline sip dip > timeline.html

Screen Shot 2014-02-11 at 5.54.16 PM

 

Pie Charts

rwfilter --start-date=2013/12/27 --sport=1025- --dport=1025- --not-daddress=192.168.1.0/24 --proto=0- --type=all --pass=stdout | ./flowplotter.sh piechart dport bytes > piechart.html

Screen Shot 2014-02-11 at 5.55.37 PM

 

Bar Charts

rwfilter --start-date=2013/12/27 --sport=1025- --dport=1025- --not-daddress=192.168.1.0/24 --proto=0- --type=all --pass=stdout | ./flowplotter.sh barchart dport bytes > barchart.html

Screen Shot 2014-02-11 at 6.01.00 PM

 

Column Charts

rwfilter --start-date=2013/12/27 --sport=1025- --dport=1025- --not-daddress=192.168.1.0/24 --proto=0- --type=all --pass=stdout | ./flowplotter.sh columnchart dip bytes > columnchart.html

Screen Shot 2014-02-11 at 6.04.19 PM

Next Steps

FlowPlotter currently only supports charts in the Google Visualizations Chart library, but as time goes by, I'd like to add some sources outside of just google, even if they are duplicates of similar google graphs. I have the project on Github and welcome any comments, ideas, and improvements that you might have. It has examples that you can use, but I encourage the use of any kind of rwfilter input you can think of. If you come across some great visualizations that you think are repeatable by others, post the rwfilter | flowplotter command up as a comment and I'll add it to the examples!

Session data is the summary of the communication between two network devices. Also known as a conversation or a flow, this summary data is one of the most flexible and useful forms of NSM data. If you were to consider full packet capture equivalent to having a recording of every phone conversation someone makes from a their mobile phone, then you might consider session data to be equivalent having a copy of the call log on the bill associated with that mobile phone. Session data doesn’t give you the “What”, but it does give you the “Who, Where, and When”.

When session or flow records are generated, at minimum, the record will usually include the standard 5-tuple: source IP address and port, the destination IP address and port, and the protocol being used. In addition to this, session data will also usually provide a timestamp of when the communication began and ended, and the amount of data transferred between the two devices. The various forms of session data such as NetFlow v5/v9, IPFix, and jFlow can include other information, but these fields are generally common across all implementations of session data.

There are a few different applications that have the ability to collect flow data and provide tools for the efficient analysis of that data. My personal favorite is the System for Internet-Level Knowledge (SiLK), from the folks at CERT NetSA (http://www.cert.org/netsa/). In Applied NSM we use SiLK pretty extensively.

One of the best ways to learn about different NSM technologies is the Security Onion distribution, which is an Ubuntu-based distribution designed for quick deployment of all sorts of NSM collection, detection, and analysis technologies. This includes popular tools like Snort, Suricata, Sguil, Squert, Snorby, Bro, NetworkMiner, Xplico, and more. Unfortunately, SiLK doesn’t currently come pre-packaged with Security Onion. The purpose of this guide is to describe how you can get SiLK up and running on a standalone Security Onion installation.

 

Preparation

To follow along with this guide, you should have already installed and configured Security Onion, and ensured that NSM services are already running. This guide will assume you’ve deployed a standalone installation. If you need help installing Security Onion, this installation guide should help: https://code.google.com/p/security-onion/wiki/Installation.

For the purposes of this article, we will assume this installation has access to two network interfaces. The interface at eth0 is used for management, and the eth1 interface is used for data collection and monitoring.

Now is a good time to go ahead and download the tools that will be needed. Do this by visiting this URL http://tools.netsa.cert.org/index.html# and downloading the following

  • SiLK (3.7.2 as of this writing)
  • YAF (2.4.0 as of this writing)
  • Fixbuf (1.30 as of this writing)

Alternatively, you can download the packages directly from the command line with these commands:

wget http://tools.netsa.cert.org/releases/silk-3.7.2.tar.gz
wget http://tools.netsa.cert.org/releases/yaf-2.4.0.tar.gz
wget http://tools.netsa.cert.org/releases/libfixbuf-1.3.0.tar.gz

This guide reflects the current stable releases of each tool. You will want to ensure that you place the correct version numbers in the URLs above when using wget to ensure that you are getting the most up to date version.

Installation

The analysis of flow data requires a flow generator and a collector. So, before we can begin collecting and analyzing session data with SiLK we need to ensure that we have data to collect. In this case, we will be installing the YAF flow generation utility. YAF generates IPFIX flow data, which is quite flexible. Collection will be handled by the rwflowpack component of SiLK, and analysis will be provided through the SiLK rwtool suite.

SiLK Workflow

Figure 1: The SiLK Workflow

 

To install these tools, you will need a couple of prerequisites. You can install these in one fell swoop by running this command:

sudo apt-get install glib2.0 libglib2.0-dev libpcap-dev g++ python-dev

With this done, you can install fixbuf using these steps:

1. Extract the archive and go to the newly extracted folder

tar –xvzf libfixbuf-1.3.0.tar.gz

cd libfixbuf-1.3.0/

2. Configure, make, and install the package

./configure
make
sudo make install

 

Now you can install YAF with these steps:

1. Extract the archive and go to the newly extracted folder

tar –xvzf yaf-2.4.0.tar.gz
cd yaf-2.4.0/

2. Export the PKG configuration path

export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig

3. Configure with applabel enabled

./configure --enable-applabel

4. Make and install the package

make
sudo make install

If you try to run YAF right now, you’ll notice an error. We need to continue the installation process before it will run properly. This process continues by installing SiLK with these steps:

1. Extract the archive and go to the newly extracted folder

tar –xvzf silk-3.7.2.tar.gz
cd silk-3.7.2/

2. Configure with a specified fixbuf path and python enabled

./configure --with-libfixbuf=/usr/local/lib/pkgconfig/ --with-python

3. Make and install the package

make
sudo make install

With everything installed, you need to make sure that all of the libraries we need are linked properly so that the LD_LIBRARY_PATH variable doesn’t have to be exported each time you use SiLK. This is can be done by creating a file named silk.conf in the /etc/ld.so.conf.d/ directory with the following contents:

/usr/local/lib
/usr/local/lib/silk

To apply this change, run:

sudo ldconfig

Configuring SiLK

With everything installed, now we have to configure SiLK to use rwflowpack to collect the flow data we generate. We need three files to make this happen: silk.conf, sensors.conf, and rwflowpack.conf.

Silk.conf

We will start by creating the silk.conf site configuration file. This file controls how SiLK parses data, and contains a list of sensors. It can be found in the previously unzipped SiLK installation tarball at silk-3.7.2/site/twoway/silk.conf. We will copy it to a directory that Security Onion uses to store several other configuration files:

sudo cp silk-3.7.2/site/twoway/silk.conf /etc/nsm/<$SENSOR-$INTERFACE>/

The site configuration file should work just fine for the purposes of this guide, so we won’t need to modify it.

Sensors.conf

The sensor configuration file sensors.conf is used to define the sensors that will be generating session data, and their characteristics. This file should be created at /etc/nsm/<$SENSOR-$INTERFACE>/sensors.conf. For this example, our sensors.conf will look like this:

probe S0 ipfix
  listen-on-port 18001
  protocol tcp
  listen-as-host 127.0.0.1
end probe
group my-network
  ipblocks 192.168.1.0/24
  ipblocks 172.16.0.0/16
  ipblocks 10.0.0.0/8 
end group
sensor S0
  ipfix-probes S0
  internal-ipblocks @my-network
  external-ipblocks remainder
end sensor

This sensors.conf has three different sections: probe, group, and sensor.

The probe section tells SiLK where to expect to receive data from for the identified sensor. Here, we’ve identified sensor S0, and told SiLK to expect to receive ipfix data from this sensor via the TCP protocol over port 18001. We’ve also defined the IP address of the sensor as the local loopback address, 127.0.0.1. In a remote sensor deployment, you would use the IP address of the sensor that is actually transmitting the data.

The group section allows us to create a variable containing IP Blocks. Because of the way SiLK bins flow data, it is important to define internal and external network ranges on a per sensor basis so that your queries that are based upon flow direction (inbound, outbound, inbound web traffic, outbound web traffic, etc.) are accurate. Here we’ve defined a group called my-network that has two ipblocks, 192.168.1.0/24 and 10.0.0.0/8. You will want to customize these values to reflect your actual internal IP ranges.

The last section is the sensor section, which we use to define the characteristics of the S0 sensor. Here we have specified that the sensor will be generating IPFIX data, and that my-network group defines the internal IP ranges for the sensor, with the remainder being considered external.

Be careful if you try to rename your sensors here, because the sensor names in this file must match those in the site configuration file silk.conf. If a mismatch occurs, then rwflowpack will fail to start. In addition to this, if you want to define custom sensors names then I recommend starting by renaming S1. While it might make sense to start by renaming S0, I’ve seen instances where this can cause odd problems.

Rwflowpack.conf

The last configuration step is to modify rwflowpack.conf, which is the configuration file for the rwflowpack process that listens for and collects flow records. This file can be found at /usr/local/share/silk/etc/rwflowpack.conf. First, we need to copy this file to /etc/nsm/<$SENSOR-$INTERFACE>/

sudo cp /usr/local/share/silk/etc/rwflowpack.conf /etc/nsm/<$SENSOR-$INTERFACE>/

Now we need to change seven values in the newly copied file:

ENABLED=yes

This will enable rwflowpack

statedirectory=/nsm/sensor_data/<$SENSOR-$INTERFACE>/silk

A convenience variable used for setting the location of other various SiLK files and folders

CREATE_DIRECTORIES=yes

This will allow for the creation of specified data subdirectories

SENSOR_CONFIG=/etc/nsm/<$SENSOR-$INTERFACE>/sensors.conf

The path to the sensor configuration file

DATA_ROOTDIR=/nsm/sensor_data/<$SENSOR-$INTERFACE>/silk/

The base directory for SiLK data storage

SITE_CONFIG=/etc/nsm/<$SENSOR-$INTERFACE>/silk.conf

The path to the site configuration file

LOG_TYPE=legacy

Sets the logging format to legacy

LOG_DIR=/var/log/

The path for log storage

Finally, we need to copy rwflowpack startup script into init.d so that we can start it like a normal service. This command will do that:

sudo cp /usr/local/share/silk/etc/init.d/rwflowpack /etc/init.d

Once you’ve copied this file, you need to change one path in it. Open the file, and change the SCRIPT_CONFIG_LOCATION variable from “/usr/local/etc/” to “/etc/nsm/<$SENSOR-$INTERFACE>/”

Starting Everything Up

Now that everything is configured, we should be able to start rwflowpack and YAF and begin collecting data.

First, we can start rwflowpack by simply typing the following:

sudo service rwflowpack start

If everything went well, you should see a success message, as shown in Figure 2:

 Starting rwflowpack

Figure 2: Successfully Starting rwflowpack

If you want to ensure that rwflowpack runs at startup, you can do so with the following command:

sudo update-rc.d rwflowpack start 20 3 4 5 .

Now that our collector is waiting for data, we can start YAF to begin generating flow data. If you’re using “eth1” as the sensors monitoring interface as we are in this guide, that command will look like this:

sudo nohup /usr/local/bin/yaf --silk --ipfix=tcp --live=pcap  --out=127.0.0.1 --ipfix-port=18001 --in=eth1 --applabel --max-payload=384 &

You’ll notice that several of the arguments we are calling in this YAF execution string match values we’ve configured in our SiLK configuration files.

You can verify that everything started up correctly by running ps to make sure that the process is running, as is shown in Figure 3. If YAF doesn’t appear to be running, you can check the nohup.out file for any error messages that might have been generated.

 Checking YAF

Figure 3: Using ps to Verify that YAF is Running

That’s it! If your sensor interface is seeing traffic, then YAF should begin generating IPFIX flow data and sending it to rwflowpack for collection. You can verify this by running a basic rwfilter query, but first we have to tell the SiLK rwtools where the site configuration file is. This can be done by exporting the SILK_CONFIG_FILE variable.

export SILK_CONFIG_FILE=/etc/nsm/<$SENSOR-$INTERFACE>/silk.conf
export SILK_DATA_ROOTDIR=/nsm/sensor_data/<$SENSOR-$INTERFACE>/silk/

If you don’t want to have to do this every time you log into this system, you can place these lines in your ~/.bashrc file.

You should be able to use rwfilter now. If everything is setup correctly and you are capturing data, you should see some output from this command:

rwfilter --sensor=S0 --proto=0-255 --type=all  --pass=stdout | rwcut

If you aren’t monitoring a busy link, you might need to ping something from a monitored system (or from the sensor itself) to generate some traffic.

Figure 4 shows an example of SiLK flow records being output to the terminal.

Flow Data

Figure 4: Flow Records Means Everything is Working

Keep in mind that it may take several minutes for flow records to actual become populated in the SiLK database. If you run into any issues, you can start to diagnose them by accessing the rwflowpack logs in /var/log/.

Monitoring SiLK Services

If you are deploying SiLK in production, then you will want to make sure that the services are constantly running. One way to do this might be to leverage the Security Onion “watchdog” scripts that are used to manage other NSM services, but if you modify those scripts then you run the risk of wiping out your changes any time you update your SO installation. Because of this, the best idea might be to run separate watchdog scripts to monitor these services.

This script can be used to monitor Yaf to ensure that it is always running:

#!/bin/bash
function SiLKSTART {
  sudo nohup /usr/local/bin/yaf --silk --ipfix=tcp --live=pcap --out=192.168.1.10 --ipfix-port=18001 –in=eth1 --applabel --max-payload=384 --verbose --log=/var/log/yaf.log &
}

function watchdog {
  pidyaf=$(pidof yaf)
  if [ -z “$pidyaf” ]; then
    echo “YAF is not running.”
  SiLKSTART
  fi
}
watchdog

This script can be used to monitor rwflowpack to ensure that it is always running:

#!/bin/bash
pidrwflowpack=$(pidof rwflowpack)
if [ -z “$pidrwflowpack” ]; then

  echo “rwflowpack is not running.”
  sudo pidof rwflowpack | tr ’ ’ ’\n’ | xargs -i
  sudo kill -9 {} sudo service rwflowpack restart

fi

These scripts can be set to run automatically at startup for ensured success

Conclusion

I always tell people that session data is the best “bang for your buck” data type you will find. If you just want to play around with SiLK, then installing it on Security Onion is a good way to get your feet wet. Even better, if you are using Security Onion in production on your network, it is a great platform to use for getting up and running with session data in addition to the many other data types. If you want to learn more about using SiLK for NSM detection and analysis, I recommend checking out Applied NSM when it comes out December 15th, or to sink your teeth into session data sooner, check out their excellent documentation (which includes use cases) at http://tools.netsa.cert.org/silk/docs.html.