Archives

All posts by Jason Smith

In my previous post where I introduced FlowPlotter I showed you some of the basics behind generating data from SiLK and turning it into nice looking visual representation that leveraged Google's Visualization API. That is all wrapped up into an easy to run tool called "FlowPlotter" that runs directly off of data piped from rwfilter to SiLK via stdout. Today I'm pleased to reveal some of the D3 visualizations I promised earlier. Previously I had planned to replicate the Google visualizations with D3 to provide an alternative look, but instead I've gone forward to fill in some of the features that Google Charts is lacking.

D3.js is a popular javascript library that is often used to streamline the way people make visualizations with their data. These visualizations are usually friendly with most browsers and can be augmented to do just about anything you can think of. For the integration of these new graphs, I've went with the same "template" approach as I did with the Google charts, and again with the data being embedded into the html file that is generated. In most D3 visualizations people will host the data on a web server and reference that data directly with d3.js. But, due to how the data is used here, it is more useful for us to be able to generate it for immediate use. In theory you can reference files on the local system, but most browsers (like Chrome) have strict security settings that prohibit this. Don't expect other more forgiving browsers (Firefox) to allow this down the road either. For all of those reasons, FlowPlotter D3 templates have been designed to contain all data within a single HTML file.

Force Directed Link Graphs

So far I've generated two new chart types in D3. The first is a force-directed link graph similar to what is generated with Afterglow. We discussed in the last FlowPlotter post that the purpose behind FlowPlotter is to have a streamlined way of generating visualizations for all of your data and having it easily available to anyone that needs to view it. While Afterglow is great to work with, it doesn't really "flow" with the rest of what we're generating, and streamlining it for wide use can be challenging. The D3 force-directed link graph that I've integrated with SiLK shows the relationship between 2 nodes, with links between of varying opacity based on the value of a 3rd value. Simply put, it is created from a 3 column CSV with a source, target, and value in that order. The overall design was borrowed partly from the work of d3noob.

Force-Directed Link Graph showing all traffic from a specific country

Figure 1: Force-Directed Link Graph showing all traffic to or from a specific country

To create a force-directed link graph directly from rwfilter output, you'll need to run the following:

rwfilter ../Sampledata/sample.rw --scc=kr --proto=0- --type=all --pass=stdout | ./flowplotter.sh forceopacity sip dip distinct:dport 100 > forcetest.html

You'll notice that there are 4 options after forceopacity. This chart module relies on rwstats to generate this data. The communication you see in Figure 1 represents the top 100 sip-dip pairs for all traffic from South Korea based on total amount of distinct destination port communication attempts. The greater the number, the darker the link between nodes. Similarly you could do something more practical like generating a force-directed link graph between the top 100 sip-dport pairs for outgoing web traffic to South Korea:

rwfilter ../Sampledata/sample.rw --dcc=kr --proto=0- --type=outweb --pass=stdout | ./flowplotter.sh forceopacity sip dport bytes 100 > forceport.html

There is still a lot I would like to do with FlowPlotter's force-directed link graph. For one, it currently does not replace the usefulness of Afterglow as the big benefit of Afterglow lies in being able to specify "rules" about how your data will be displayed, as seen in ANSM. Down the road I would like to be able to create something similar, however it isn't something that is viable from the current stdout data output. Another feature that is currently only in testing is the opacity options of links between nodes. They are representative of the values in the 3rd column for the connection between source and target node, but they could be much more. The combination of these would require a nice legend to read, but would allow for some genuinely useful visualizations.

Automated Asset Discovery

With this FlowPlotter update I'm also showing off the new Asset Discovery addition. In chapter 14 of ANSM we discuss methods of gathering "friendly intelligence" and how the importance of having friendly intelligence can't be overstated. Friendly intelligence is information related to the assets that you are tasked with protecting. In almost all environments, the overall picture of those assets frequently changes, so gathering friendly intelligence is a continuous effort. Since flow data provides are very good high level view of the communications between all devices, it is ideal for creating a general idea of the assets you are monitoring on a given network. It also is easily queried, giving you the ability to keep a constant vigilance on new assets. To generate an asset model from SiLK data, I leverage the friendly intel script provided in chapter 14 of ANSM. To stay with the template-based approach to FlowPlotter, I've made the script a standalone part of FlowPlotter that is referenced within flowplotter.sh. The premise behind doing this is to ensure that changes and additions to the asset model script can be easily made without extreme additions to flowplotter.sh. Like the previous graph, I examined many d3 galleries out there to find the best examples of great ways to visualize parent-child relationships, with the final design borrowing from the collapsible tree layout in the d3 galleries. Whereas the force-directed link graph leveraged a CSV as input, I am using JSON as input for the asset discovery module. This is mainly because extensive parent-child relationships aren't easy to represent with CSV data, but instead I can literally specify the data and fields in a JSON tree, with a good idea of how the data will turn out in the collapsible tree layout results. Also, I like JSON because by name is Jason, and it just feels right.

Auto-generated Asset List based on one hour of traffic.

Figure 2: Auto-generated Asset List based on one hour of traffic.

To create an asset model you can either do it from new data piped in via stdout like other FlowPlotter modules or you can create it based on existing rwfilter files. The important thing to remember is that the dataset that you are looking at should be fairly large, or at least representative of normal network traffic over time. For steady network traffic with production systems, an hour might be fine. For some large enterprise environments, shorter timespans will work.

To generate the data from an rwfilter stdout, do the following:

rwfilter --start-date=2014/02/06 --proto=0- --type=all --pass=stdout | ./flowplotter.sh assetdiscovery > assetlist.html

Alternatively you can create the rwfilter file, then generate the asset model from that:

rwfilter --start-date=2014/02/06 --proto=0- --type=all --pass=sample.rw

cat sample.rw | ./flowplotter.sh assetdiscovery > assetlist.html

The asset model currently attempts to identify various service types (HTTP, DNS, VPN, etc) but is receiving regular updates to include more data that could benefit from regular continuous regeneration. This is currently limited to friendly intelligence, but soon I will be including some additional less friendly detections from some of the statistical approaches discussed in ANSM and using some of the less frequently discussed SiLK features. These will all be wrapped up within FlowPlotter as usual. Be sure to digest the README for FlowPlotter to understand the additions a little better. For instance, you may notice that perhaps you're not seeing EVERY "asset" from your network for a given service. That might be due to thresholding which is set by default to look at the "Servers" that display at least 1% of the total "Server-like" traffic representative of a particular service. This default can be changed easily with options located in the README. As always, you should be aware of how far abstracted from your data you really are when generating visualizations like this.

As usual, I'm open to any and all requests for additions or advice on optimizing anything that currently exists in FlowPlotter.

In Applied NSM we wrote quite a bit about both LogStash and Snorby. Recently, a reader of this blog posed the question if there is a way to pivot from Snorby events to your Bro logs in Logstash. Well, it is actually quite easy.

To start, you'll obviously need to have a functional instance of Snorby and Logstash with Bro logs (or any other relevant parsed PSTR-type data) feeding in to it. In this case, we'll assume that to reach our logstash dashboard, we go to http://192.168.1.12:9292/index.html#/dashboard/file/logstash.json.

If you've played around with Snorby, you've probably noticed the lookup sources when you click on an IP in an event. There are default sources, but you can also add additional sources. Those are configured from the Administration tab at the top right.

Lookup Sources

Figure 1: Lookup Sources Option in Snorby

At this time you're only allowed to use two different variables, ${ip} and ${port}, and from my testing, you can only use them once in a given lookup URL. Normally this isn't an issue if, for instance, you are doing research on an IP from your favorite intel source and you're feeding the IP in as a variable on the URL. However, if for some reason you're needing to feed it in twice, referencing ${ip} will only fill the initial variable, and leave the second blank. This becomes an issue with parsed Bro logs in Logstash.

Though not immediately obvious, Logstash allows for you to control the search from the URL as such:

http://192.168.1.12:9292/index.html#/dashboard/file/logstash.json?query=id.orig_h:192.168.1.75

Test it out!

In Snorby, the lookup source for this would be:

http://192.168.1.12:9292/index.html#/dashboard/file/logstash.json?query=id.orig_h:${ip}

However, lets assume you wanted to find logs where 192.168.1.75 existed as either the source or destination address:

http://192.168.1.12:9292/index.html#/dashboard/file/logstash.json?query=id.orig_h:192.168.1.75%20OR%20id.resp_h:192.168.1.75

That search is perfectly valid for use in Logstash, and the URL functions as expected. However, if you make the lookup source in Snorby to match that by pairing ${ip}, you'll notice that only the initial use of ${ip} is completed, and the use on id.resp.h is left blank. For the reason, I would recommend applying the simpler method of just using the message field (the unanalyzed raw log essentially) in the query. We'll also add in ${port} to narrow down more.

http://192.168.1.12:9292/index.html#/dashboard/file/logstash.json?query=message:(${ip} AND ${port})

That lookup source will look for any instance of the ip and matching port within the log. Word of warning that there is a small chance you'll get an unexpected blip somewhere with this method, as it is literally looking for that IP address and that port number as any unique string within the message, in any order. Hypothetically, you could have an odd log that has the selected IP, but the port actually is the response_body_len or some other integer field, though that would be extremely unlikely.

You'll notice that the lookup source is defaulting to the past 24 hours, and is displaying the entire log by default. If we want to change this, we're going to have to utilize a slightly different method by using a "scripted dashboard". The two Logstash searches below search for traffic that contains "91.189.92.152" and "80". However, there are a few differences.

http://192.168.1.12:9292/index.html#/dashboard/file/logstash.json?query=message:(91.189.95.152%20AND%2080)

http://192.168.1.12:9292/index.html#/dashboard/script/logstash.js?query=message:(91.189.95.152%20AND%2080)&fields=@timestamp,id.orig_h,id.orig_p,id.resp_h,id.resp_p,_type,host,uri

In testing both of these, the difference is immediate. You have the ability to custom output fields, which is essential. You'll also notice that on the second URL we're looking at /dashboard/script/logstash.js instead of /dashboard/file/logstash.json. "Scripted Dashboards" are entirely javascript, thus allowing for full control over the output. While we're adding custom fields, lets go ahead and also say that we want to look at the past 7 days (&from=7d) along with referencing the 7 days based on timestamp collected instead of ingestion time (&timefield=@timestamp).

http://192.168.1.12:9292/index.html#/dashboard/script/logstash.js?query=message:(91.189.92.152%20AND%2080)&from=7d&timefield=@timestamp&fields=@timestamp,id.orig_h,id.orig_p,id.resp_h,id.resp_p,_type,host,uri

Like we did before, lets go ahead and add that as a lookup source with the following URL in snorby:

http://192.168.1.12:9292/index.html#/dashboard/script/logstash.js?query=message:(${ip}%20AND%20${port})&from=7d&timefield=@timestamp&fields=@timestamp,id.orig_h,id.orig_p,id.resp_h,id.resp_p,_type,host,uri

 

Examine the URL for example syntax

Figure 2: Note the Special URL Syntax in this LogStash Example

To summarize, here are some of the possible lookup sources I've mentioned, with the advanced lookup being my recommendation for these purposes:

3 Possible Lookup Sources for Pivoting to Logstash

Figure 3: Three Possible Lookup Sources for Pivoting to Logstash

TLDR; Check out FlowPlotter on GitHub

If you're like me, it generally makes you happy to see graphs, charts, and tables to aide in network data analysis. This can be useful for analysis of already detected events, or for detection while hunting for evil. You probably also love session (also often called flow) data. Unfortunately, it isn't always easy to generate useful visualizations from flow data. This typically involves multiple steps such as moving data around between different tools, formatting the output of the data to match the format of whatever graphing tool you might be using, and then generate the graph output and making sure it is useful for your goals. Some might turn to the graphing capabilties of spreadsheet applications because of their simplicity, but those can't really handle a large data set like we might see with flow data. With that said, it is still pretty hard to find overly useful network visualizations for NSM detection and analysis.

 

Because of this, I set out to make visualizations from flow data easy and accessible, without the need for several steps between viewing the raw data and having the ready-made chart. The result of this was a tool called FlowPlotter, which we are going to discuss in this article. We will talk about how FlowPlotter came into existence, and its current workflow. FlowPlotter works from NetFlow records viewed with SiLK, so before moving forward, be sure to check out Chris's instructions on setting SiLK up on Security Onion so that you can easily test the tool if you don't already have a SiLK environment available. You can also go ahead and grab FlowPlotter from GitHub so that we can jump into making graphs. To generate graphs, you will only need SiLK and FlowPlotter. If the nitty gritty details bore you and you want to get into FlowPlotter right away with examples, skip this API stuff and scroll down below.

 

Background Mechanics

We will begin by talking about how FlowPlotter works. For the purpose of portability across multiple platforms, it makes sense that we should be viewing this kind of visualization in a web friendly format, rather than just a jpeg or Excel graph. To do this, FlowPlotter uses an implementation of Google Charts. Google charts offers an extensive API that allows for the creation of just about any chart you can imagine. While many of these make more sense to have only a few rows of data (Pie charts for example), some benefit from having as much data as can be shoved in to them (line charts representing bins of data). Google provides thorough explanations on how to create these charts using your own data, but of course, the formatting of the data is still up to you. See their chart gallery for more examples on the structure of these charts.

 

In Applied Network Security Monitoring, we discuss several ways of seeing the "big picture" when it comes to visualizing data of many types. We also go into deep detail on using SiLK to get the most from your flow data. Here I'd like to present a simple method of going from using SiLK to parse data with rwtools, straight to using some bash kung-fu to streamline our way to having an interactive Google Chart full of flow data in your browser. The eventual goal of this exercise is to run rwfilter commands and pipe the binary data straight to a script which will take arguments to generate the data you want. This results in having a tool that we can pass data to that will generate charts using the Google API.  Before we can run though, we need to walk. We'll start by creating a chart that requires fewer data points, such as a bar chart representing the top 10 country codes talking to your network by bytes over the course of a day. The first thing we want to do is generate the SiLK data that we eventually want to plot. Since the goal of this is to make a top-10 lists, we'll use rwstats and rwfilter.

 

rwfilter --start-date=2014/02/06 --proto=0- --type=all --pass=stdout | rwstats --top --count=10 --fields=dcc --value=bytes

Screen Shot 2014-02-13 at 3.29.10 PM

The rwfilter command above states that you wish to filter down to all data that passes the filter --proto=0-255 (shorthand is 0-) that occurred on 2014/02/06. These can be stated in any order, and many people like to type them out as they would verbally say them. For instance, on the rwstats command, we're literally looking for the "top" "10" "destination country codes" by "bytes". It seems that many of the people I teach SiLK to end up having the issue of thinking too rigidly about "what SiLK wants", instead of just writing down what you want, then converting that directly into a query.

 

Now we're going to try and make that rwstats output appear in a bar graph. FlowPlotter, which I talk about in more detail below, works by using templates that allow for the formatted data to be inserted into which then yields a complete chart. Lets look at the most basic template for a column chart. This is taken directly from Google's visualization playground, with their data substituted out for "dataplaceholder" around the middle of the code. You'll even notice that for now, the title is still "Company Performance".

<html>
<head>
<script type="text/javascript" src="https://www.google.com/jsapi"></script>
<script type="text/javascript">
google.load("visualization", "1", {packages:["corechart"]});
google.setOnLoadCallback(drawChart);
function drawChart() {
var data = google.visualization.arrayToDataTable([
dataplaceholder
]);

var options = {
title: 'Company Performance',
vAxis: {title: 'Year',  titleTextStyle: {color: 'red'}}
};

var chart = new google.visualization.BarChart(document.getElementById('chart_div'));
chart.draw(data, options);
}
</script>
</head>
<body>
<div id="chart_div" style="width: 900px; height: 500px;"></div>
</body>
</html>

You'll notice the "dataplaceholder" that exists where there should be a table of data. In the place of that, we should be inserting something that looks like the following, which was created from our rwstats command:

['dcc', 'Bytes'],
['--', 26478355345],
['us', 706854881],
['ca', 8665204],
['no', 1893193],
['nl', 1416293],
['bg', 1101223],
['ch', 1092811],
['de', 202948],
['se', 169036],
['gb', 117399]

You'll notice that "- -" is also a country code here representing PRIVATE/EXPERIMENTAL address, which we'll leave for the sake of discussing additional chart features and manipulations later. In the meantime, how did I streamline the creation of that data to replace dataplaceholder? First it is important to add the "--delimited=," option on to rwstats to ease the post processing a bit. After that, I used a mix of cut, sed, and grep to wrap it all into a one liner:

rwfilter --start-date=2014/02/06 --proto=0- --type=all --pass=stdout | rwstats --top --count=10 --fields=dcc --value=bytes --delimited=, | cut -d "," -f1,2 |grep ,| sed "s/\(.*\),\(.*\)/['\1', \2],/g"|sed '$s/,$//'| sed "s/, \([A-Za-z].*\)],/, '\1'],/g" | grep ","

There are probably better ways of generating that data, and I welcome them in comments or variations of FlowPlotter, but for the sake of this particular exercise, this will get you by. The general idea is that the rwstats command output is piped to cut, which strips out the first two columns, then grep is used to filter only data fields and titles. The sed commands that follow sorts everything so that they all have the proper formatting, first by making the table, then by formatting the end line and then the first column identifier line. Now that we have the basic template and some example data, you can manually throw the data into the template  in place of dataplaceholder and change some of the obvious things such as the title so that the HTML looks like the following:

<html>
<head>
<script type="text/javascript" src="https://www.google.com/jsapi"></script>
<script type="text/javascript">
google.load("visualization", "1", {packages:["corechart"]});
google.setOnLoadCallback(drawChart);
function drawChart() {
var data = google.visualization.arrayToDataTable([
['dcc', 'Bytes'],
['--', 26478355345],
['us', 706854881],
['ca', 8665204],
['no', 1893193],
['nl', 1416293],
['bg', 1101223],
['ch', 1092811],
['de', 202948],
['se', 169036],
['gb', 117399]
]);

var options = {
title: 'Destination Country Code by Bytes',
vAxis: {title: 'Country Codes', titleTextStyle: {color: 'black'}}
};

var chart = new google.visualization.BarChart(document.getElementById('chart_div'));
chart.draw(data, options);
}
</script>
</head>
<body>
<div id="chart_div" style="width: 900px; height: 500px;"></div>
</body>
</html>

Here you can see the code we've generated.

Screen Shot 2014-02-13 at 8.33.18 PM

Notice that while mouse-over works (at the link) and overall it is a decent looking graph, the scale is being ruined by our large amount of internal destination addresses. There are two options to fix this, one that is obvious but not great, and one that is not obvious, but is by far the better method. Either we can get rid of the internal destinations from the data manually or we can accept it and change the chart to be more forgiving for large outliers. To do that, we need to edit the var options in our code. We're going add in an hAxis option as seen below. This will make the horizontal axis work on a logarithmic scale instead of scaling according to maximum and minimum data values.

<html>
<head>
<script type="text/javascript" src="https://www.google.com/jsapi"></script>
<script type="text/javascript">
google.load("visualization", "1", {packages:["corechart"]});
google.setOnLoadCallback(drawChart);
function drawChart() {
var data = google.visualization.arrayToDataTable([
['dcc', 'Bytes'],
['--', 26478355345],
['us', 706854881],
['ca', 8665204],
['no', 1893193],
['nl', 1416293],
['bg', 1101223],
['ch', 1092811],
['de', 202948],
['se', 169036],
['gb', 117399]
]);

var options = {
title: 'Destination Country Code by Bytes',
vAxis: {title: 'Country Codes', titleTextStyle: {color: 'black'}},
hAxis: {logScale: true}
};

var chart = new google.visualization.BarChart(document.getElementById('chart_div'));
chart.draw(data, options);
}
</script>
</head>
<body>
<div id="chart_div" style="width: 900px; height: 500px;"></div>
</body>
</html>

Our new graph looks like this.

Screen Shot 2014-02-13 at 8.33.38 PM

In testing changes like this, I highly recommend playing around in Google Chart's Playground as it can streamline the debugging of small changes.

 

FlowPlotter

Now that you've got an idea of how to manually generate these graphs, we can talk about FlowPlotter in a more official capacity. FlowPlotter is a scripted approach to generating graphs based on SiLK data by using templates so that it is modular enough to accept new graphs with relative ease. In short, it automates everything we just did in the previous example. The only requirement is that you provide an rwfilter command and send that to flowplotter.sh with a chart name and it's independent and dependent variables as options. From there FlowPlotter will make the html page for you, complete with titles and ideal scaling options. For instance, to generate the previous graph, you would simply run the following from the FlowPlotter root directory:

/Flowplotter$ rwfilter --start-date=2014/02/06 --proto=0- --type=all --pass=stdout | ./flowplotter.sh barchart dcc bytes

Here is the current usage page for FlowPlotter:

rwfilter [filter] | flowplotter.sh [charttype] [independent variable] [dependent variable]

Currently you must run a SiLK rwfilter command and pipe it to flowplotter.sh and specify various options as arguments. The following chart types are currently functional

geomap

  • independent variable = Must specify an rwstats compatible field for country type (scc or dcc).
  • dependent variable = Must specify an rwstats compatible value (Records, Packets, Bytes, sIP-Distinct, dIP-Distinct, or Distinct:[field])

linechart

  • independent variable = Must specify a bin-size that the dependent variable will be calculated by. For example, if you want "Records per Minute", this variable will be 60.
  • dependent variable = Must specify an rwcount compatible value (Records,Packets,Bytes).

treemap

  • independent variable = Must specify an rwstats compatible field.
  • dependent variable = Must specify an rwstats compatible value (Records, Packets, Bytes, sIP-Distinct, dIP-Distinct, or Distinct:[field])

timeline

  • independent variable = Must specify an rwcut compatible field.
  • dependent variable = Must specify an rwcut compatible field.

piechart

  • independent variable = Must specify an rwstats compatible field.
  • dependent variable = Must specify an rwstats compatible value (Records, Packets, Bytes, sIP-Distinct, dIP-Distinct, or Distinct:[field])

barchart

  • independent variable = Must specify an rwstats compatible field.
  • dependent variable = Must specify an rwstats compatible value (Records, Packets, Bytes, sIP-Distinct, dIP-Distinct, or Distinct:[field])

columnchart

  • independent variable = Must specify an rwstats compatible field.
  • dependent variable = Must specify an rwstats compatible value (Records, Packets, Bytes, sIP-Distinct, dIP-Distinct, or Distinct:[field])

 

As you can see, FlowPlotter doesn't just support bar charts. It currently supports numerous Google Charts. The charts below were all generated using the queries you see accompanying them.

Geomaps

rwfilter --start-date=2013/12/27 --proto=0- --type=all --pass=stdout | ./flowplotter.sh geomap dcc bytes > geomap.html

Screen Shot 2014-02-11 at 5.36.15 PM

 

Linecharts

rwfilter --start-date=2013/12/27 --proto=0- --type=all --pass=stdout | ./flowplotter.sh linechart 60 bytes > linechart.html

Screen Shot 2014-02-11 at 5.47.29 PM

 

Treemaps

rwfilter --start-date=2013/12/27 --sport=1025- --dport=1025- --not-daddress=192.168.1.0/24 --proto=0- --type=all --pass=stdout | ./flowplotter.sh treemap dip records > treemap.html

Screen Shot 2014-02-11 at 5.52.04 PM

 

Timelines

rwfilter --start-date=2013/12/27 --proto=0- --type=out,outweb --dcc=us,-- --fail=stdout | ./flowplotter.sh timeline sip dip > timeline.html

Screen Shot 2014-02-11 at 5.54.16 PM

 

Pie Charts

rwfilter --start-date=2013/12/27 --sport=1025- --dport=1025- --not-daddress=192.168.1.0/24 --proto=0- --type=all --pass=stdout | ./flowplotter.sh piechart dport bytes > piechart.html

Screen Shot 2014-02-11 at 5.55.37 PM

 

Bar Charts

rwfilter --start-date=2013/12/27 --sport=1025- --dport=1025- --not-daddress=192.168.1.0/24 --proto=0- --type=all --pass=stdout | ./flowplotter.sh barchart dport bytes > barchart.html

Screen Shot 2014-02-11 at 6.01.00 PM

 

Column Charts

rwfilter --start-date=2013/12/27 --sport=1025- --dport=1025- --not-daddress=192.168.1.0/24 --proto=0- --type=all --pass=stdout | ./flowplotter.sh columnchart dip bytes > columnchart.html

Screen Shot 2014-02-11 at 6.04.19 PM

Next Steps

FlowPlotter currently only supports charts in the Google Visualizations Chart library, but as time goes by, I'd like to add some sources outside of just google, even if they are duplicates of similar google graphs. I have the project on Github and welcome any comments, ideas, and improvements that you might have. It has examples that you can use, but I encourage the use of any kind of rwfilter input you can think of. If you come across some great visualizations that you think are repeatable by others, post the rwfilter | flowplotter command up as a comment and I'll add it to the examples!

Bro is one of the best things to happen to network security monitoring in a long time. However, the ability to parse and view Bro logs in most organizations isn't always too ideal. One option is to peruse Bro logs via something like Splunk; but with high throughput, you'll be paying a pretty penny since Splunk is priced based upon the amount of data ingested. Another popular (and free) solution is Elsa. However, while Elsa is extremely fast at data ingestion and searches, it currently has limitations on the number of fields that can be parsed due to its use of Sphinx. On top of that, Elsa requires searches with very specific terminology, and doesn't easily do wildcard searches without additional transforms. This is where Logstash comes in. Logstash is an excellent tool for managing any type of event or logs, and can easily parse just about anything you can throw at it. I say "easily" because once you're over the learning curve of first generating the Logstash configuration, creating addition configurations comes much more easily. In this guide I will talk about how you can use Logstash to parse logs from Bro 2.2. The examples shown here will only demonstrate parsing methods for the http.log and ssl.log files, but the download links at the end of the post will provide files for parsing all of Bro's log types.

If you want to follow along, then know that this guide assumes a few things. First, we'll be parsing "out-of-the-box" Bro 2.2 logs, which means you'll need an "out-of-the-box" Bro 2.2 installation. If you don't already have a Bro system then the easiest route to get up and running would normally be to use Security Onion, but as of this writing, Security Onion currently uses Bro 2.1 (although I'm sure this will change soon). In the meantime, reference Bro.org's documentation on installation and setup. Next, you'll need to download the latest version of Logstash, which I tested at version 1.2.2 for this article. We tested these steps using Logstash and Bro on a single Ubuntu 12.04 system.

TLDR: You can download a complete Logstash configuration file for all Bro 2.2 log files and fields here.

Creating a Logstash Configuration

Let's get started by creating a master configuration file. Logstash relies on this file to decide how logs should be handled. For our purposes, we will create a file called bro-parse.conf, which should be placed in the same directory as the Logstash JAR file. It is made up of three main sections:input, filter, and output. Below is the basic outline for a Logstash configuration file:

input {
  ...
}

filter {
  ...
}

output {
  ...
}

Input

The input section of the Logstash configuration determines what logs should be ingested, and the ingestion method. There are numerous plug-ins that can be used to ingest logs, such as TCP socket, terminal stdout, a twitter API feed, and more. We are going to use the "file" plug-in to ingest Bro logs. This plug-in constantly reads in a log file line-by-line in near real time. By default, it will read the file for new lines every 15 seconds, but this is configurable.

With the ingestion method identified, we need to provide the path to the log files we want to parse in the "path" field as well as a unique name for them in the "type" field. With the following configuration, the input section of the Logstash configuration is complete and will ingest "/opt/bro2/logs/current/http.log" and "/opt/bro2/logs/current/ssl.log", and will give them "type" names appropriately.

input {
  file {
    type => "BRO_httplog"
    path => "/opt/bro2/logs/current/http.log"
  }  
  file {
    type => "BRO_SSLlog"
    path => "/opt/bro2/logs/current/ssl.log"
  }
}

 

Filter

The filter section is where you'll need to get creative. This section of the Logstash configuration takes the log data from the input section and decides how that data is parsed. This allows the user to specify what log lines to keep, which to discard, and how to identify the individual fields in each log file.  We will use conditionals as the framework for creating these filters, which are essentially just if-then-else statements.

if EXPRESSION {
  ...
} else if EXPRESSION {
  ...
} else {
  ...
}

For this filter, we're going to use a nested conditional statement. First, we want to discard the first few lines of the Bro log files since this is just header information that we don't need. These lines begin with the "#" sign, so we can configure our conditional to discard any log line beginning with "#" using the "drop" option.  That part is trivial, but then it gets tricky. This is because we have to instruct Logstash on how to recognize each field in the log file. This can involved a bit of legwork since you will need to actually analyze the log format and determine what the fields will be called, and what common delimiters are used. Luckily, I've done a lot of that legwork for you. Continuing with our example we can begin by looking at the Bro 2.2 http.log and ssl.log files, which contain 27 and 19 fields to parse respectively, delimited by tabs:

 brohttpfieldsFigure 1: Bro 2.2 http.log

 brosslfields

Figure 2: Bro 2.2 ssl.log

The manner by which these fields are parsed can affect the performance as the amount of data you are collecting scales upward, but depending on your hardware, that is usually an extreme case. For the sake of guaranteeing that all fields are parsed correctly, I used non-greedy regular expressions. Logstash allows for "Grok" regular expressions, but I've found that there are bugs when using specific or repetitive Grok patterns. Instead, I've taken the regex translation for the Grok patterns and used Oniguruma syntax instead. In testing, these have shown to be much more reliable, creating no "random" errors. The resulting filter looks like this:

filter {

if [message] =~ /^#/ {
  drop {  }
} else {  

# BRO_httplog ######################
  if [type] == "BRO_httplog" {
      grok { 
        match => [ "message", "(?<ts>(.*?))\t(?<uid>(.*?))\t(?<id.orig_h>(.*?))\t(?<id.orig_p>(.*?))\t(?<id.resp_h>(.*?))\t(?<id.resp_p>(.*?))\t(?<trans_depth>(.*?))\t(?<method>(.*?))\t(?<host>(.*?))\t(?<uri>(.*?))\t(?<referrer>(.*?))\t(?<user_agent>(.*?))\t(?<request_body_len>(.*?))\t(?<response_body_len>(.*?))\t(?<status_code>(.*?))\t(?<status_msg>(.*?))\t(?<info_code>(.*?))\t(?<info_msg>(.*?))\t(?<filename>(.*?))\t(?<tags>(.*?))\t(?<username>(.*?))\t(?<password>(.*?))\t(?<proxied>(.*?))\t(?<orig_fuids>(.*?))\t(?<orig_mime_types>(.*?))\t(?<resp_fuids>(.*?))\t(?<resp_mime_types>(.*))" ]
      }
  }
# BRO_SSLlog ######################
  if [type] == "BRO_SSLlog" {
    grok { 
      match => [ "message", "(?<ts>(.*?))\t(?<uid>(.*?))\t(?<id.orig_h>(.*?))\t(?<id.orig_p>(.*?))\t(?<id.resp_h>(.*?))\t(?<id.resp_p>(.*?))\t(?<version>(.*?))\t(?<cipher>(.*?))\t(?<server_name>(.*?))\t(?<session_id>(.*?))\t(?<subject>(.*?))\t(?<issuer_subject>(.*?))\t(?<not_valid_before>(.*?))\t(?<not_valid_after>(.*?))\t(?<last_alert>(.*?))\t(?<client_subject>(.*?))\t(?<client_issuer_subject>(.*?))\t(?<cert_hash>(.*?))\t(?<validation_status>(.*))" ]
    }
  }
 }
}

As you can see in the filter, I've taken each field (starting with the timestamp, ts), and generated an expression that matches it. For the sake of making sure that all fields are captured correctly, I've used the general non-greedy regex ".*?". After each delimiter, I have a "\t",representing the tab delimiter that exists between each field. This can be optimized by making more specific field declarations with more precise regular expressions. For instance, an epoch timestamp will never contain letters, so why should you use a wildcard that contains them? Once you have the filter complete, you can move on to the easy part, the output.

Output

The output section of the Logstash configuration determines where ingested events are supposed to go. There are many output options in Logstash, but we are going to be sending them to Elasticsearch. Elasticsearch is the powerful search and analytics platform behind Logstash. To specify the output, we'll just add the following at the end of the Logstash configuration:

output {
elasticsearch { embedded => true }
}

That concludes how to build a Logstash configuration that will ingest your Bro logs, exclude the lines we don't want, parse the individual data fields correctly, and output them to elasticsearch for Logstash. The only thing left to do is get them on the screen. To do that we'll launch Logstash by entering the following the command in a terminal, specifying the Logstash JAR file and the configuration file we just created:

java -jar logstash-1.2.2-flatjar.jar agent -f bro-parse.conf -- web

That might take a few seconds. To verify that it everything is running correctly, you should open another terminal and run:

netstat -l | grep 9292

Once you can see that port 9292 is listening, that means that Logstash should be ready to rock.

netstat9292

Figure 3: Verifying Logstash is Running

Now you should be able to open a web browser and go to http://127.0.0.1:9292. Once there you'll probably only see the Kibana dashboard, but from there you can open the pre-built Logstash dashboard and see your Bro logs populating!

Screenshot from 2013-11-15 14:13:48

Figure 4: Bro Logs in Logstash

Logstash uses the Kibana GUI for browsing logs. The combination of Elasticsearch, Logstash, and Kibana in one package make for the easiest Bro logging solution you can find. The most basic function that we now have is the search. Searches allow for the use of wildcards or entire search terms. For instance searching for "oogle.com" will probably give you 0 results. However, searching for "*oogle.com" is likely to give you exactly what you expect; any visits to Google hosted domains. Search will also find full search terms (single terms or uniquely grouped terms between specific delimiters) without the need of a wildcard. For instance, if you want to search specifically for "plus.google.com", that is likely to return results as you would expect.

To specify the logs you'd like to view by timestamp, there is a "timepicker" at the top right.

timepicker

Figure 5: Logstash Timepicker

You can take advantage of the parsing of individual fields by generating statistics for the unique values associated with each field. This can be done by simply viewing a Bro log and clicking a field name in the left column of the screen. You can also see more complete visualizations from that window by clicking "terms". For example, the pie chart below is one that I generated that indicates how many records exist in each of the Bro logs I'm parsing.

Screenshot from 2013-11-15 14:11:05

Figure 6: Examining Bro Log Sums

As another example, lets filter down to just SSL logs. Under the "Fields" panel, click "type" to reveal the variations of log types. Then, click the magnifying glass on "Bro_SSLlog". Now you have only Bro SSL logs, as well as a new field list representing only fields seen in the SSL events that are currently present. If we only want to see certain fields displayed, you can click their check boxes in the order they're displayed. If you want those rearranged suddenly, just move them with the left and right arrows in the event columns on the event display. Below is an example of sorting those SSL logs by timestamp, where the logs displayed are ts, server_name, uid, issuer_subject, and subject.

Screenshot from 2013-11-15 14:48:57

Figure 7: Sorting Bro SSL Logs

To remove the Bro_SSLlog filter, you can open up the "filtering" panel at the top of the page and  remove that additional filter. Doing so will revert back to all data types, but with the fields still selected.

This guide only scratches the surface of the types of analysis you can do with Logstash. When you combine a powerful network logging tool like Bro and a powerful log analysis engine like Logstash, the possibilities are endless. I suggest you play around with customizing the front end and perusing the logs. If you somehow mess up badly enough or need to "reset" your data, you can stop Logstash in the terminal, and remove the data/ directory that was created in same location as the logstash JAR file. I've created a config file that you can use to parse all of the Bro 2.2 log files. You can download that file here.

UPDATE - December 18, 2013

As per G Porter's request, I've generated a new Logstash Bro configuration that is tailored to work with the most recent Security Onion update. That update marked the deployment of Bro 2.2 to Security Onion, and if you compare it to an "out-of-the-box" Bro 2.2 deployment, there are a few additions that I've accounted for.

You can download the Security Onion specific Logstash Bro 2.2 configuration here.

Recently, Liam published a great tutorial on syntax highlighting for bro. We all recalled the excellent emacs addition that Scott Runnels posted on his github and thought about how much more accessible this makes Bro for the average user who will find himself scripting with BNPL.

For many people, myself included, nano is the preferred text editor due to its extreme simplicity and usability. Nano is quite bare in that it isn't nearly as pretty as something like Sublime Text 2, and it doesn't have quite the editing power of VIM. However, it is easy to become fond of nano as a beginner due to its layout, and it seems to stick on people quite well. In an effort to bring bro to aspiring data parsers and analysts who might not be comfortable with the cold nature of VIM, I present BNPL syntax highlighting in nano.

First off, if your favored distro of linux does not include nano out of the box, it is supplied in all base repositories. Depending on your linux flavor, install nano with;

sudo yum install nano
 or
sudo apt-get install nano

The first thing that will be needed is the bro.nanorc file that nano will reference for syntax highlighting. You can download that here. The attached bro.nanorc uses the regular expressions from the emacs example that Scott Runnels posted.  Small changes in escape characters were required to make the regular expressions compatible, but otherwise all syntax highlighting should remain consistent with all other BNPL highlighting mechanisms in previously posted editors.

Once you've downloaded this file, it should be placed in /usr/share/nano/

In order to configure nano to utilize this syntax highlighting for *.bro files, it must be enabled in the /etc/nanorc file. For our purposes, we will need to add the following lines to this file:

## bro files
include "/usr/share/nano/bro.nanorc"

Once this change is saved, you should be ready to rock.


bronano

Enjoy!