In my previous post where I introduced FlowPlotter I showed you some of the basics behind generating data from SiLK and turning it into nice looking visual representation that leveraged Google's Visualization API. That is all wrapped up into an easy to run tool called "FlowPlotter" that runs directly off of data piped from rwfilter to SiLK via stdout. Today I'm pleased to reveal some of the D3 visualizations I promised earlier. Previously I had planned to replicate the Google visualizations with D3 to provide an alternative look, but instead I've gone forward to fill in some of the features that Google Charts is lacking.
Force Directed Link Graphs
So far I've generated two new chart types in D3. The first is a force-directed link graph similar to what is generated with Afterglow. We discussed in the last FlowPlotter post that the purpose behind FlowPlotter is to have a streamlined way of generating visualizations for all of your data and having it easily available to anyone that needs to view it. While Afterglow is great to work with, it doesn't really "flow" with the rest of what we're generating, and streamlining it for wide use can be challenging. The D3 force-directed link graph that I've integrated with SiLK shows the relationship between 2 nodes, with links between of varying opacity based on the value of a 3rd value. Simply put, it is created from a 3 column CSV with a source, target, and value in that order. The overall design was borrowed partly from the work of d3noob.
To create a force-directed link graph directly from rwfilter output, you'll need to run the following:
rwfilter ../Sampledata/sample.rw --scc=kr --proto=0- --type=all --pass=stdout | ./flowplotter.sh forceopacity sip dip distinct:dport 100 > forcetest.html
You'll notice that there are 4 options after forceopacity. This chart module relies on rwstats to generate this data. The communication you see in Figure 1 represents the top 100 sip-dip pairs for all traffic from South Korea based on total amount of distinct destination port communication attempts. The greater the number, the darker the link between nodes. Similarly you could do something more practical like generating a force-directed link graph between the top 100 sip-dport pairs for outgoing web traffic to South Korea:
rwfilter ../Sampledata/sample.rw --dcc=kr --proto=0- --type=outweb --pass=stdout | ./flowplotter.sh forceopacity sip dport bytes 100 > forceport.html
There is still a lot I would like to do with FlowPlotter's force-directed link graph. For one, it currently does not replace the usefulness of Afterglow as the big benefit of Afterglow lies in being able to specify "rules" about how your data will be displayed, as seen in ANSM. Down the road I would like to be able to create something similar, however it isn't something that is viable from the current stdout data output. Another feature that is currently only in testing is the opacity options of links between nodes. They are representative of the values in the 3rd column for the connection between source and target node, but they could be much more. The combination of these would require a nice legend to read, but would allow for some genuinely useful visualizations.
Automated Asset Discovery
With this FlowPlotter update I'm also showing off the new Asset Discovery addition. In chapter 14 of ANSM we discuss methods of gathering "friendly intelligence" and how the importance of having friendly intelligence can't be overstated. Friendly intelligence is information related to the assets that you are tasked with protecting. In almost all environments, the overall picture of those assets frequently changes, so gathering friendly intelligence is a continuous effort. Since flow data provides are very good high level view of the communications between all devices, it is ideal for creating a general idea of the assets you are monitoring on a given network. It also is easily queried, giving you the ability to keep a constant vigilance on new assets. To generate an asset model from SiLK data, I leverage the friendly intel script provided in chapter 14 of ANSM. To stay with the template-based approach to FlowPlotter, I've made the script a standalone part of FlowPlotter that is referenced within flowplotter.sh. The premise behind doing this is to ensure that changes and additions to the asset model script can be easily made without extreme additions to flowplotter.sh. Like the previous graph, I examined many d3 galleries out there to find the best examples of great ways to visualize parent-child relationships, with the final design borrowing from the collapsible tree layout in the d3 galleries. Whereas the force-directed link graph leveraged a CSV as input, I am using JSON as input for the asset discovery module. This is mainly because extensive parent-child relationships aren't easy to represent with CSV data, but instead I can literally specify the data and fields in a JSON tree, with a good idea of how the data will turn out in the collapsible tree layout results. Also, I like JSON because by name is Jason, and it just feels right.
To create an asset model you can either do it from new data piped in via stdout like other FlowPlotter modules or you can create it based on existing rwfilter files. The important thing to remember is that the dataset that you are looking at should be fairly large, or at least representative of normal network traffic over time. For steady network traffic with production systems, an hour might be fine. For some large enterprise environments, shorter timespans will work.
To generate the data from an rwfilter stdout, do the following:
rwfilter --start-date=2014/02/06 --proto=0- --type=all --pass=stdout | ./flowplotter.sh assetdiscovery > assetlist.html
Alternatively you can create the rwfilter file, then generate the asset model from that:
rwfilter --start-date=2014/02/06 --proto=0- --type=all --pass=sample.rw
cat sample.rw | ./flowplotter.sh assetdiscovery > assetlist.html
The asset model currently attempts to identify various service types (HTTP, DNS, VPN, etc) but is receiving regular updates to include more data that could benefit from regular continuous regeneration. This is currently limited to friendly intelligence, but soon I will be including some additional less friendly detections from some of the statistical approaches discussed in ANSM and using some of the less frequently discussed SiLK features. These will all be wrapped up within FlowPlotter as usual. Be sure to digest the README for FlowPlotter to understand the additions a little better. For instance, you may notice that perhaps you're not seeing EVERY "asset" from your network for a given service. That might be due to thresholding which is set by default to look at the "Servers" that display at least 1% of the total "Server-like" traffic representative of a particular service. This default can be changed easily with options located in the README. As always, you should be aware of how far abstracted from your data you really are when generating visualizations like this.
As usual, I'm open to any and all requests for additions or advice on optimizing anything that currently exists in FlowPlotter.