Suricata

In Applied NSM we wrote quite a bit about both LogStash and Snorby. Recently, a reader of this blog posed the question if there is a way to pivot from Snorby events to your Bro logs in Logstash. Well, it is actually quite easy.

To start, you'll obviously need to have a functional instance of Snorby and Logstash with Bro logs (or any other relevant parsed PSTR-type data) feeding in to it. In this case, we'll assume that to reach our logstash dashboard, we go to http://192.168.1.12:9292/index.html#/dashboard/file/logstash.json.

If you've played around with Snorby, you've probably noticed the lookup sources when you click on an IP in an event. There are default sources, but you can also add additional sources. Those are configured from the Administration tab at the top right.

Lookup Sources

Figure 1: Lookup Sources Option in Snorby

At this time you're only allowed to use two different variables, ${ip} and ${port}, and from my testing, you can only use them once in a given lookup URL. Normally this isn't an issue if, for instance, you are doing research on an IP from your favorite intel source and you're feeding the IP in as a variable on the URL. However, if for some reason you're needing to feed it in twice, referencing ${ip} will only fill the initial variable, and leave the second blank. This becomes an issue with parsed Bro logs in Logstash.

Though not immediately obvious, Logstash allows for you to control the search from the URL as such:

http://192.168.1.12:9292/index.html#/dashboard/file/logstash.json?query=id.orig_h:192.168.1.75

Test it out!

In Snorby, the lookup source for this would be:

http://192.168.1.12:9292/index.html#/dashboard/file/logstash.json?query=id.orig_h:${ip}

However, lets assume you wanted to find logs where 192.168.1.75 existed as either the source or destination address:

http://192.168.1.12:9292/index.html#/dashboard/file/logstash.json?query=id.orig_h:192.168.1.75%20OR%20id.resp_h:192.168.1.75

That search is perfectly valid for use in Logstash, and the URL functions as expected. However, if you make the lookup source in Snorby to match that by pairing ${ip}, you'll notice that only the initial use of ${ip} is completed, and the use on id.resp.h is left blank. For the reason, I would recommend applying the simpler method of just using the message field (the unanalyzed raw log essentially) in the query. We'll also add in ${port} to narrow down more.

http://192.168.1.12:9292/index.html#/dashboard/file/logstash.json?query=message:(${ip} AND ${port})

That lookup source will look for any instance of the ip and matching port within the log. Word of warning that there is a small chance you'll get an unexpected blip somewhere with this method, as it is literally looking for that IP address and that port number as any unique string within the message, in any order. Hypothetically, you could have an odd log that has the selected IP, but the port actually is the response_body_len or some other integer field, though that would be extremely unlikely.

You'll notice that the lookup source is defaulting to the past 24 hours, and is displaying the entire log by default. If we want to change this, we're going to have to utilize a slightly different method by using a "scripted dashboard". The two Logstash searches below search for traffic that contains "91.189.92.152" and "80". However, there are a few differences.

http://192.168.1.12:9292/index.html#/dashboard/file/logstash.json?query=message:(91.189.95.152%20AND%2080)

http://192.168.1.12:9292/index.html#/dashboard/script/logstash.js?query=message:(91.189.95.152%20AND%2080)&fields=@timestamp,id.orig_h,id.orig_p,id.resp_h,id.resp_p,_type,host,uri

In testing both of these, the difference is immediate. You have the ability to custom output fields, which is essential. You'll also notice that on the second URL we're looking at /dashboard/script/logstash.js instead of /dashboard/file/logstash.json. "Scripted Dashboards" are entirely javascript, thus allowing for full control over the output. While we're adding custom fields, lets go ahead and also say that we want to look at the past 7 days (&from=7d) along with referencing the 7 days based on timestamp collected instead of ingestion time (&timefield=@timestamp).

http://192.168.1.12:9292/index.html#/dashboard/script/logstash.js?query=message:(91.189.92.152%20AND%2080)&from=7d&timefield=@timestamp&fields=@timestamp,id.orig_h,id.orig_p,id.resp_h,id.resp_p,_type,host,uri

Like we did before, lets go ahead and add that as a lookup source with the following URL in snorby:

http://192.168.1.12:9292/index.html#/dashboard/script/logstash.js?query=message:(${ip}%20AND%20${port})&from=7d&timefield=@timestamp&fields=@timestamp,id.orig_h,id.orig_p,id.resp_h,id.resp_p,_type,host,uri

 

Examine the URL for example syntax

Figure 2: Note the Special URL Syntax in this LogStash Example

To summarize, here are some of the possible lookup sources I've mentioned, with the advanced lookup being my recommendation for these purposes:

3 Possible Lookup Sources for Pivoting to Logstash

Figure 3: Three Possible Lookup Sources for Pivoting to Logstash

While signature-based detection isn’t enough on its own to protect a network against structured attackers, it is one of the cornerstones of a successful network security monitoring capability. If you have ever managed a signature-based detection mechanism then you know that you can’t simply turn the device on and let it work its magic. A signature-based detection mechanism like Snort, Suricata, or various commercial offerings requires careful deployment and tuning of signatures (often called rules) to ensure that you receive reliable high quality alerts. The ultimate goal, while never fully achievable, is that every alert that is generated by these detection mechanisms represents the activity it was designed to detect. This means that every alert will be actionable, and you won’t waste a lot of time chasing down false positives.

A key to achieving high fidelity alerting is to have the ability to answer the question, “Do my signatures effectively detect that activity they are designed to catch?” In order to answer that question, we need a way to track the performance of individual signatures. In the past, organizations have relied on counting false positive alerts in order to determine how effective a signature is. While this is a step in the right direction, I believe that it is one step short of a useful statistic. In this article I will discuss how a statistic called precision can be used to measure the effectiveness of IDS signatures, regardless of the platform that you are using.

Definitions

Before we get started, I think its necessary to describe a few terms that are discussed throughout this article. If we are going to play ball, it helps if we are standing on the same field. This article is meant to be platform-agnostic however, so the equipment you use doesn’t matter.

There are four main data points that are generally referred to when attempting to statistically confirm the effectiveness of a signature: true positives, false positives, true negatives, and false negatives.

 True Positive (TP): An alert that has correctly identified a specific activity. If a signature was designed to detect a certain type of malware, and an alert is generated when that malware is launched on a system, this would be a true positive, which is what we strive for with every deployed signature.

False Positive (FP): An alert has incorrectly identified a specific activity. If a signature was designed to detect a specific type of malware, and an alert is generated for an instance in which that malware was not present, this would be a false positive.

True Negative (TN): An alert has correctly not been generated when a specific activity has not occurred. If a signature was designed to detect a certain type of malware, and no alert is generated without that malware being launched, then this is a true negative, which is also desirable. This is difficult, if not impossible, to quantify in terms of NSM detection.

False Negative (FN): An alert has incorrectly not been generated when a specific activity has occurred. If a signature was designed to detect a certain type of malware, and no alert is generated when that malware is launched on a system, this would be a false negative. A false negative means that we aren’t detecting something we should be detecting, which is the worst-case scenario. False negatives aren’t detectable unless you have a secondary detection mechanism or signature designed to detect that activity that was missed.

 

The Fallacy of Relying Solely on False Positives

Historically, the measure of success for measuring the effectiveness for IDS signatures was the count of false positives over an arbitrary time period compared to a certain threshold. Let’s consider a scenario for this statistic. In this scenario, we’ve deployed signature 100, which is designed to detect the presence of a specific command and control (C2) string in network traffic. Over the course of 24 hours, this signature has generated 500 false positive alerts, which results in an FP rate of 20.8/hour. You have determined that an acceptable threshold for false positives is 0.5/hour, so this signature would be deemed ineffective based upon that threshold.

At face value, this approach may seem effective, but it is flawed. If you remember from a few paragraphs ago, the question we want to answer centers on how well a signature can effectively detect the activity it is designed to catch. When we only consider FP’s, then we are actually only considering how well a signature DOESN’T detect what it is designed to catch. While it sounds similar, detecting whether or not something succeeds is not the same as detecting whether or not it fails.

Earlier, we stated that signature 100 was responsible for 500 false positives over a 24-hour period, meaning that the signature was not effective. However, what if I told you that this signature was also responsible for 5,000 true positives during the same time period? In this case, the FPs were well worth it in order to catch 5,000 other actual infections! This is the key to precision -- taking both false positives and true positives into consideration.

 

Defining Precision

At this point it’s probably important to say that I’m not a statistician. As a matter of fact, I only took the bare minimum required math courses in college, but fortunately precision isn’t too tricky to understand. In short, precision refers to the ability to identify positive results. Often referred to as positive predictive value, precision is shown by determining the proportion of true positives against all positive results (both true positives and false positives) with the formula:

Precision = TP / (TP + FP)

 

This value is expressed as a percentage, and can be used to determine the probability that, given an alert being generated, the activity that has been detected has truly occurred. Therefore, if a signature has a high precision and an alert is generated, then the activity has very likely occurred. On the other hand, if a signature with low precision generates an alert, then it is unlikely that the activity you are attempting to detect has actually occurred. Of course, this is an aggregate statistic, so more data points will generate result in a more reliable precision stat.

In the example we identified in the previous section, we could determine that signature 100, which had 500 FPs and 5000 TPs has a precision of 90.9%.

 

5000 / (5000 + 500) = 90.9

 

That's pretty good!

 

Using Precision in the SOC

Now that you understand precision, it is helpful to think about ways it can be used in a SOC environment. After all, you can have the best statistics in the world but if you don’t use them effectively then they aren’t providing a lot of value. This begins by actually making the commitment to track a precision value for each signature being deployed for a given detection mechanism. Again, it doesn’t matter what detection mechanism you are using. Whenever you deploy a new signature it would be assigned a precision of 0. As analysts review alerts generated by signatures, they will mark the alerts as either a TP or FP. As time goes on, these data points help to determine the precision value, expressed as a percentage.

 

Precision for Signature Tuning

First and foremost, precision provides a mechanism that can be used to track the effectiveness of individual signatures. This provides quite a bit of flexibility that you can tune to the sensitive of your own staffing. If you are a small organization with only a single analyst you can choose to only keep signatures with a precision greater than a high value like >80%. If you have a larger staff and have more resources to devote to alert review, or if you simply have a low risk tolerance, you can choose to keep signatures with lower precision values like >30%. In some instances, you can even define your precision threshold based upon the nature of the threat you are dealing with. Unstructured or commodity signatures could allow for precision >90% but structured or targeted signatures might only be considered effective if they are >20% precision.

There are a few different paths that can be taken when you determine that a signature is ineffective based upon the precision standards you have set. One path would be to spend more time researching whatever it is that you are trying to detect with the signature, and attempt to strengthen it by making your search more specific, or adding more specifics to the content of the signature. Alternatively, you can configure low precision signatures to simply log instead of an alert (an option in a lot of popular IDS software), or you can configure your analysis console (SIEM, etc.) to hide alerts below your acceptable precision threshold.

 

Precision for Signature Comparison

On occasion, you may run into situations where you have more than one signature capable of detecting similar activity. When that happens, precision provides a really useful statistic for comparing those signatures. While you some instances might warrant deployment of both signatures, resources on detection systems can be scarce at times. As such, it might be helpful for performance to only enable to most useful signature. In a scenario where signature 1 has a precision of 85% and signature 2 has a precision of 22%, then signature 1 would be the best candidate for deployment.

 

Precision for Analyst Confidence

Whenever you write a signature, you should consider the analyst who will be reviewing alerts that are generated from that signature. After all, they are the ultimate consumer for the product you are generating, so any context or help you can provide the analyst in relation to the signature is helpful.

One item in particular that can help the analysts is the confidence rating associated with a signature. A lot of detection mechanisms will allow their signature to include some form of confidence rating. In some cases, this is automatically calculated based on magic, which is how a lot of commercial SIEMs operate. In other scenarios, it might be a blank field where you can input a numeric value, or a simple low/medium/high option. I would propose that a good value for this field would be the precision statistic. While precision might not be the only factor that goes into a confidence calculation, I certainly believe it has to be one of them. Another approach might be to have a signature developer subjective confidence rating (human confidence) and a precision-based confidence rating (calculated confidence.)

A confidence rating based on precision is useful for an analyst because it helps them direct their efforts while analyzing alerts. That way, when an alert is generated and the analyst is on the fence regarding whether it might be a TP or FP, they can rely somewhat on that confidence value to help them determine how much investigative effort should be put into attempting to classify the alert. If they aren’t finding much supporting evidence that provides the alert is a TP and it has a precision-based confidence rating of 10%, then it is likely a FP and it might not be worth it to spend too much time on the alert. On the other hand, if it has a rating of 98%, then the analyst should spend a great deal of time and perform thorough due diligence to determine if the alert is indeed a true positive.

 

Conclusion

Signature management can be a tricky business, but it is such a critical part of NSM detection and analysis. Because of that it should be a heavily tracked and constantly improving process. While the precision statistic is by no means a new concept, it is one that I think most people are unaware of as it relates to signature management. Having seen precision used very successfully to help manage IDS signatures in a few varying and larger organizations, I think it is a statistic that could find a home in many SOC environments.

“How do I find bad stuff on the network?”

The path to knowledge for the practice of NSM typically always begins with that question. It’s because of that question that we refer to NSM as a practice, and someone who is a paid professional in this field as a practitioner of NSM.

Scientists are often referred to as practitioners because of the evolving state of the science. As recently as the mid 1900s, medical science believed that milk was a valid treatment for ulcers. As time progressed, it was found that ulcers were caused by bacteria called helicobacter pylori and that dairy products could actually further aggravate an ulcer. Perceived facts change because although we would like to believe most sciences are exact, they simply aren’t. All scientific knowledge is based upon educated guesses utilizing the best available data at the time. As more data becomes available over time, answers to old questions change, and this redefines things that were once considered facts. This is true for Doctors as practitioners of medical science, and it is true for us as practitioners of NSM.

Unfortunately, when I started practicing NSM there weren’t a lot of reference materials available on the topic. Quite honestly, there still aren’t.  Aside from the occasional blog postings of industry pioneers and a few select books, most individuals seeking to learn more about this field are left to their own devices. I feel that it is pertinent to clear up one very important misconception to eliminate potential confusion regarding my previous statement.  There are menageries of books available on the topics TCP/IP, packet analysis, and various intrusion detection systems. Although the concepts presented in those texts are important facets of NSM, they don’t constitute the practice of NSM as a whole. That would be like saying a book about wrenches teaches you how to diagnose a car that won’t start.

With that in mind, my co-authors and I are incredibly excited to announce our newest project, a book entitled "Applied Network Security Monitoring". This book is dedicated to the practice of NSM. This means that rather than simply providing an overview of the tools or individuals components of NSM, we will speak to the process of NSM and how those tools and components support the practice.

Audience

This book is intended to be a training manual on how to become an NSM analyst. If you’ve never performed NSM analysis, then this book is designed to provide the baseline skills necessary to begin performing these duties. If you are already a practicing analyst, then my hope is that this book will provide a foundation that will allow you to grow your analytic technique in such a way as to make you much more effective at the job you are already doing. We’ve worked with several good analysts who were able to become great analysts because they were able to enhance their effectiveness with some of the techniques presented here.

The effective practice of NSM requires a certain level of adeptness with a variety of tools. As such, the book will discuss several of these tools as well, including the Snort, Bro, and Suricata IDS tools, the SiLK and Argus netflow analysis tool sets, as well as other tools like Snorby, Security Onion, and more.

This book focuses almost entirely on free and open source tools. This is in an effort to appeal to a larger grouping of individuals who may not have the budget to purchase commercial analytic tools such as NetWitness or Arcsight, and also to demonstrate that effective NSM can be achieved without a large budget. Ultimately, talented individuals are what make an NSM program successful. In addition, these open source tools often provide more transparency in how they interact with data, which is also incredibly beneficial to the analyst when working with data at an intimate level.

Table of Contents

Chapter 1: The Practice of Network Security Monitoring

The first chapter is devoted to defining network security monitoring and its relevance in the modern security landscape. It discusses a lot of the core terminology and assumptions that will be used and referenced throughout the book.

Part 1: Collection

Chapter 2: Driving Data Collection

The first chapter in the Collection section of ANSM provides an introduction to data collection and an overview of its importance. This chapter provides a framework for making decisions regarding what data should be collected using a risk-based approach.

Chapter 3: The Sensor Platform
This chapter introduces the most critical piece of hardware in an NSM deployment, the sensor. This includes a brief overview of the various NSM data types, and then discusses important considerations for purchasing and deploying sensors. Following, this chapter covers the placement of NSM sensors on the network, including a primer on creating network visibility maps for analyst use. This chapter also introduces Security Onion, which will be references throughout the book as our lab environment.

Chapter 4: Full Packet Capture Data
This section begins with an overview of the importance of full packet capture data. It will examine use cases that demonstrate its usefulness, and then demonstrate several methods of capturing and storing PCAP data with tool such as Netsniff-NG, Daemonlogger, and OpenFPC.

Chapter 5: Session Data
This chapter discusses the importance of session data, along with a detailed overview of Argus and the SiLK toolset for the collection and analysis of netflow data.

Chapter 6: Protocol Metadata
This chapter look at methods for generating metadata from other data sets, and the usefulness of integrating it into the NSM analytic process. This includes coverage of the packet string (PSTR) data format, as well as other tools used to create protocol metadata.

Chapter 7: Statistical Data
The final data type that will be examined is statistical data. This chapter will discuss use cases for the creation of this data type, and provide some effective methods for its creation and storage. Tools such as rwstats, treemap, and gnuplot will be used.

Part 2: Detection

Chapter 8: Indicators of Compromise
This chapter examines the importance of Indicators of Compromise (IOC), how they can be logically organized, and how they can be effectively managed for incorporation into an NSM program. This also includes a brief overview of the intelligence cycle, and threat intelligence.

Chapter 9: Target Based Detection
The first detection type that will be discussed is target based detection. This will include basic methods for detecting communication with certain hosts within the context of the previously discussed data types.

Chapter 10: Signature Based Detection with Snort
The most traditional form of intrusion detection is signature based. This chapter will provide a primer on this type of detection and discuss the usage of the Snort IDS. This will include the use of Snort, and a detailed discussion on the creation of Snort signatures. Several practical examples and case scenarios will be present in this chapter.

Chapter 11: Signature Based Detection with Suricata
This chapter will provide a primer on signature based detection with Suricata. This will include several practical examples and use cases.

Chapter 12: Anomaly Based Detection with Bro
Anomaly based identification is an area that has gotten quite a bit more attention in recent years. This chapter will cover Bro, one of the more popular anomaly based detection solutions. This will cover a detailed review of the Bro architecture, the Bro language, and several use cases.

Chapter 13: Early Warning AS&W with Canary Honeypots
Previously only used for research purposes, operational honeypots can be used as an effective means for attack sense and warning. This chapter will provide examples of how this can be done, complete with code samples and deployment case scenarios.

Part 3: Analysis

Chapter 14: Packet Analysis
The most critical skill in NSM is packet analysis. This chapter covers the analysis of packet data with Tcpdump and Wireshark. It also covers basic to advanced packet filtering.

Chapter 15: Friendly Intelligence
This chapter focuses on performing research related to friendly devices. This includes a framework for creating an asset model, and a friendly host characteristics database.

Chapter 16: Hostile Intelligence
This chapter focuses on performing research related to hostile devices. This includes strategies for performing open source intelligence (OSINT) research.

Chapter 17: Differential Diagnosis of NSM Events
This is the first chapter of the book that focuses on a diagnostic method of analysis. Using the same differential technique used by physicians, NSM analysts can be much more effective in the analysis process.

Chapter 18: Incident Morbidity and Mortality
Once again borrowing from the medical community, the concept of incident morbidity and mortality can be used to continually refine the analysis process. This chapter explains techniques for accomplishing this.

Chapter 19: Malware Analysis for NSM
This isn’t a malware analysis book by any stretch of the imagination, but this chapter focuses on methods an NSM analyst can use to determine whether or not a file is malicious.

Authors

Chris Sanders, Lead Author

Chris Sanders is an information security consultant, author, and researcher originally from Mayfield, Kentucky. That’s thirty miles southwest of a little town called Possum Trot, forty miles southeast of a hole in the wall named Monkey's Eyebrow, and just north of a bend in the road that really is named Podunk.

Chris is a Senior Security Analyst with InGuardians. He has as extensive experience supporting multiple government and military agencies, as well as several Fortune 500 companies. In multiple roles with the US Department of Defense, Chris significantly helped to further to role of the Computer Network Defense Service Provider (CNDSP) model, and helped to create several NSM and intelligence tools currently being used to defend the interests of the nation.

Chris has authored several books and articles, including the international best seller "Practical Packet Analysis" form No Starch Press, currently in its second edition. Chris currently holds several industry certifications, including the CISSP, GCIA, GPEN, GCIH, and GREM.

In 2008, Chris founded the Rural Technology Fund. The RTF is a 501(c)(3) non-profit organization designed to provide scholarship opportunities to students form rural areas pursuing careers in computer technology. The organization also promotes technology advocacy in rural areas through various support programs.

When Chris isn't buried knee-deep in packets, he enjoys watching University of Kentucky Wildcat basketball, amateur drone building, BBQing, and spending time at the beach. Chris currently resides in Charleston, South Carolina.

Liam Randall, Co-Author

Liam Randall is a principal security consultant with Cincinnati, OH based GigaCo.  Originally, from Louisville, KY he worked his way through school as a sysadmin while getting his Bachelors in Computer Science at Xavier University.  He first got his start in high security writing device drivers and XFS based software for Automated Teller Machines.

Presently he consults on high volume security solutions for the Fortune 500, Research and Education Networks, various branches of the armed service, and other security focused groups.  As a contributor to the open source SecurityOnion distribution and the Berkeley based Bro-IDS network security package you can frequently find him speaking about cutting edge blue team tactics on the conference circuit.

A father and a husband, Liam spends his weekends fermenting wine, working in his garden, restoring gadgets, or making cheese.  With a love of the outdoors he and his wife enjoy competing in triathlons, long distance swimming and enjoying their community.

Jason Smith, Co-Author

Jason Smith is an intrusion detection analyst by day and junkyard engineer by night. Originally from Bowling Green, Kentucky, Jason started his career mining large data sets and performing finite element analysis as a budding physicist. By dumb luck, his love for data mining led him to information security and network security monitoring where he took up a fascination with data manipulation and automation.

Jason has a long history of assisting state and federal agencies with hardening their defensive perimeters and currently works as an Information Security Analyst with the Commonwealth of Kentucky. As part of his development work, he has created several open source projects, several of which have become "best-practice" tools for the DISA CNDSP program.

Jason regularly spends weekends in the garage building anything from arcade cabinets to open wheel racecars. Other hobbies include home automation, firearms, monopoly, playing guitar and eating. Jason has a profound love of rural America, a passion for driving, and an unrelenting desire to learn. Jason is currently living in Frankfort, Kentucky.

Release Date

The tentative release date for Applied NSM is during the third quarter of 2013.