Introduction to Zeek Log Analysis

This webcast was originally published on December 19, 2024.

In this video, Troy Wojewoda discusses the intricacies of Zeek log analysis, focusing on how this network security monitoring system can be used to understand traffic and analyze logs effectively. Troy provides insights into different log types, explains the unique identifiers used by Zeek, and shares tips on how to leverage these logs for network forensics and threat detection. Whether you’re new to Zeek or looking to deepen your understanding, this webcast aims to shed light on how Zeek can enhance your network security monitoring strategy.

  • Zeek is a powerful network security monitoring tool that provides detailed insights into network traffic and can be used for intrusion detection, forensic analysis, and auditing.
  • Placement of network security monitoring tools like Zeek is critical for capturing accurate and useful data, with different network positions offering unique visibility into potential threats.
  • Zeek logs a wide range of network protocols and events, offering extensive data for analysis, including connection attempts, file transfers, and protocol-specific activities such as DNS and HTTP.

Highlights

Full Video

Transcript

Jason Blanchard

Hello, everybody, and welcome to today’s Black Hills Information Security webcast. My name is Jason Blanchard, and I am the content community director here at Black Hills. And today we got Troy Wojewoda. I almost said it wrong.

Troy Wojewoda. and so Troy’s gonna be talking about Zeek Analysis, right?

Troy Wojewoda

Log analysis.

Jason Blanchard

Yeah, Zeek log analysis. So the way that this works is we reach out to all the testers and technical people on our team and we say, hey, what would you like to share with the community? Here’s some open time slots.

And so Troy wanted to talk about this. And so Troy has a passion for this. He’s excited about this. And so for the next 50 minutes or so, he’s going to talk about all the things that he thinks would be valuable for you to know.

If you have questions at any time, feel free to ask them inside Zoom, or if you have questions, you can ask them inside Discord in the live chat section. so that way, potentially the community can answer your question before we ever get a chance to get to it.

it takes a community for us all to come together. And then lastly, if you ever need a pen test, active SOC or ANTISOC or continuous pen testing or incident response, or if you need anything related to cyber security in any way whatsoever, you can reach out to Black Hills Information Security.

And we would love to talk to you about that. We’re not, what we don’t pressure. There’s no pressure. We, just want to talk to you and find out what you need. Now with that, Troy, it is all yours.

I’m going to head backstage. If you need anything at any time, I will pop back in and help. And to the rest of the community, thanks so much for joining us today and I’ll see you in a little bit.

Troy Wojewoda

Thanks, Jason. And welcome. Welcome to the last webcast of 2024 for Black Hills Information Security. My name is Troy Wojewoda, and as Jason said, I’m going to be talking about Zeek logs, the analysis of Zeek logs, and really an introduction, to the log framework that zeek, the, network security monitoring system produces.

And so I actually got this idea to do, put this webcast together for one of the SOC analysts, that we were talking, we were analyzing all the data and the telemetry coming in into the active SOC that BHIS has.

one of the analysts was basically saying it would be really nice to have an introductory, material for going over Z logs. And so I was like, oh, so this is a perfect opportunity to do this webcast to introduce those of you have, maybe you’re familiar with Zeek, maybe you’re hearing about Zeek for the first time, maybe you use Zeek for a while, but not have really got a handle of the log analysis portion of what the network security monitoring solution does.

And so this webcast is to bring light into how Zeek analyzes traffic, but most importantly the logs that it produces and some tips and tricks for analyzing those logs.

And so without further ado, we’re kind of jump right into it. Zeek is actually a technology that’s been out for a very long time. it’s coming up on 30 years, believe it or not. Vern Paxson, who developed the original network security monitoring olution known as BRO or the BRO IDS, started it in 1995, believe it or not, Berkeley, California, and had kind of occupied that academia space for a very long time.

And it wasn’t until say the 2010s where the technology started kind of coming into the not so much commercial space, but security, practitioners that were using it to monitor and protect their network environments.

and it’s kind of gone through a little bit of a name change. So instead of bro, which was originally what it was called, I think around 2016, 2017, the developers of the technology relabeled it as Zeek.

And so a lot of the core technology that Zeek does is and did so at BRO hasn’t really changed. There has been some adaptation, some improvements over time and it continues to improve.

There’s actually a commercial version of the ZEEK platform called corelight. And the developers of the product that open source it as Zeek, have that.

So I do like to at least give them credit for that. And if it, if for those of you that want to look towards something like a commercial product that has this technology built in, poor light would be the way to go.

however those developers also open sourced it and you can go on GitHub or you go to their source documentation, and utilize this technology yourself.

So I also like this quote from Richard Biglich, who is the CTO of corelight now. At the time when he made this quote, it kind of resonated with me very much.

And that is even though BRO or Zeke today was labeled as intrusion detection system, it’s a lot different than something like a snore or a sericata that’s a traditional network intrusion detection system.

It does a lot more and I resonated with that quote there and that it does this rich data extraction, this metadata extraction of all these network objects, that the sensor sees.

And so when it’s doing so it’s logging that activity. And so I look at it more of like a source of network forensics or just capturing of those network events of interest.

And we can go look back at, we can hunt against we can look for all different, various things and I’ll show some examples towards the end of this presentation where not only we can use the platform to look for intrusions, look for threats in our environments, we can also look for and do auditing and look for vulnerabilities that are occurring based off of just looking at the active state of network connections.

and so I really do believe it’s more, it resonates more on the forensic side of the house versus intrusion detection. Although we can definitely use it to create intrusion detection signatures and look for threats in near real time as well.

so just real quick, before we jump into the actual analysis of the logs, I wanted to talk a little bit about network security monitoring placement. And the placement actually it does matter.

And it does matter when you’re analyzing the logs, believe it or not, depending on where our network security monitoring stack is, those logs are going to be produced. But the artifacts that we’re looking at are going to be in a way in which the directionality matters.

But also if we consider this basic network diagram where we have an Internet point of presence, say perimeter firewall and then we have a DMZ and then an inside network.

If you had network security monitoring placed on the outside of your environment, say somewhere either in line with your perimeter firewall or above it, say your Internet service provider connection or something like that, and there was natting in place, so network address translation was in play.

Meaning that all the connections you’re going to see that are originating out from your environment are going to be your own IP addresses as the source IPs going to their, their destination and vice versa coming back.

All the inbound connections will be aimed at your Internet service provider provided IP address space. so you, you get the visibility on the, the threats that are occurring from the Internet, they’re, they’re sourced from the Internet coming inbound and you also have the ability to see data leaving your environment.

although with the NAT network address translation in place it would make it a little bit m more difficult to determine what internal host or hosts are is ultimately creating those connections.

so the trade off that you get by being positioned on the outside is that you do get a better vantage point of attacks that are, that are sourced from the outside coming in, hitting your perimeter which is definitely important in today’s day and age 20, 24 there’s been a number of articles, a number of blogs have been put out there with activity exploiting perimeter facing devices.

Such as the Palo Alto incident that happened at the beginning of the year. With that, with that oday we see VPN concentrators time and time again coming up where odays are being discovered, threat actors are leveraging them to gain access to environments and if you’re not positioned on the outside of those gateways or those concentrators you’re not going to see that traffic and it’s going to make it very difficult to detect it from a network security monitoring perspective.

conversely if you’re looking at it from the inside you have the opportunity to look at stuff like beaconing, from a use case perspective you have the opportunity to look at the source IP which would be their internal host addresses making connections going outbound.

So I think both of these points of perspective and I give two different examples here. The first example would be say you have Internet connection that’s an outbound connection that’s originating from inside your environment.

As you could see down below you have a 192-16-855101 with an ephemera report going out over port 22 to that 3.171 address you would see that and you would know that that internal host was making that connection or at least attempting to make that connection.

And then the outside you would have that natted IP as the source address going to that destination that that was just talked about. And then the second scenario would be you have some kind of external sourced IP that’s making and attempting make an inbound connection over port 443 say that’s aimed at say something like a TLS or SSL or TLS based like VPN concentrator or website that you have in your dmz.

obviously if you’re positioned on the outside you can see those attempted traffic, there’s traffic attempts whether or not they’re successful or not. but if say a perimeter device does get compromised from the outside, it’s gonna be very difficult to see if any of that traffic, how it would be making it internally.

that would make it very difficult to decipher. So if you’re just in position one of these places, you have the opportunity obviously to get good visibility into the traffic from that perspective, but then also lose the opportunities on the other side.

And when it comes to Zeek and the Zeek technology itself, it really has two different modes of operation. The first one is what predominantly gets used when we talk about Zeek and that is you install Zeek on a link based system and you have various spans.

you can get these via like port mirroring on a switch. You can even have like inline taps or something like that. you position the choke point, at least what you’re monitoring through an inline tap and then have that traffic copied to an interface that ZEEK is installed on whether you do an inline tap or a span.

Either way Zeek is passively getting the network traffic and that’s an important distinction to make versus some other idss and ipss where it is common to have some of those devices in line where they have the ability to stop or prevent say threat activity or threat based network traffic where ZEEK is entirely a passive solution.

So Zeek was never designed to be an intrusion prevention system, it was only designed to be passive. and so in that first mode of operation you would have one or more spans feeding or mirror traffic, feeding your sensor and then Zeek producing the logs based on that data as it’s coming in the other use case which I find useful if you have saved peak apps, I find this useful a lot and say forensic scenarios or IR scenarios where you can go get tactical packet captures and have network packet captures.

You could take these packet captures. You can also do this to actually help you test your, your Zeek sensor out and create custom signatures and custom logging and everything by creating PCAP by taking PCAPs and then running them with the Zeek software.

So then you don’t have to necessarily attach Zeek as listening on a network interface. You can simply have saved PCAPs and run Zeek and it would produce the same logs as if it was sniffing that traffic on one of its network interfaces.

By default Zeke produces this logs and tab separated format or tab Separated values. So what we’re seeing here is two different examples of the same traffic, just outputted in two different log formats.

The ZEKE has the opportunity or has the ability to create the logs in JSON format as well. I’m, I’m a proponent of the tab separated values, format.

If I’m operating on the sensor that ZEEK is producing the logs, it just, it’s more of a, I guess a personal preference for myself. But it just makes it easier for manipulating the field values, dealing with stuff on the command line, and inhaling those values on the command line.

However, JSON is definitely a very popular format, type format that’s out there. It is one of the benefits of using JSON is if you’re getting these logs and you want to pump them to some type of centralized sim.

the, the fact that with JSON, if you, if you, if you look at the, if you look at both screenshots here, the first one you see, each field is separated by a tab, at the very top the, the first line and the second line.

Actually the, the line that says types will tell you the actual field type. that it is. So Zeek actually is a programming language.

Believe it or not, if you didn’t know that already, it’s a programming language based off of network artifacts. So what’s kind of interesting about that is those, those different fields that we see at the first line there, fields we see, TS stands for timestamp, UID for unique, id.

And I’ll talk about these fields in a minute here. but the types, which is the next field down or next line down, tells you the actual type that those, those are. So we talk about when you get into programming languages and different data types, that’s what those types are actually representing.

So TS is a type of time and UID is a string and so on and so forth. But you can see there’s, there’s adder and port and adder. And you see those. Well in the Zeek programming language, adder or the address type is a type and port is a type.

So M, it’s kind of interesting, M approach at a programming language, but geared towards network traffic or network artifacts. But if we look at the JSON output, we see the same types of, we see the same data.

but if you, if you notice that each line, the field name is contained with the, with the field value. and so by nature of doing that the JSON output is always going to be larger than if you were just dealing with tab separated.

So one of the trade offs that you get from what JSON makes it easier to pump to a SIM because every single field value comes with its field name. And so parsing that data makes it pretty trivial.

however it adds more data onto your disk space. So if you’re saving this, these logs locally on a host or on a system, or you’re porting those off somewhere for long term storage, you’re going to have much larger data.

for, for the same amount of data that’s being presented, it’s going to be a larger size on disk. However, both formats do compress very well. so there is that when we’re dealing with something like JSON.

one of the tools that I like to utilities that I like to recommend is this utility called jq. one interesting part also with the ZEEK logs is that when it comes to JSON, even though that is a format that’s supported, if you’ve actually worked with JSON before and know that there’s this JSON data could be represented in like a nested type relationship, you don’t have to worry about that.

With ZEEK logging, pretty much everything is flat in that perspective. And so you could. The screenshot here on the right shows an example how we can cat out the contents of the HTTP log and pipe that to the utility JQ and then there’s the parameters to JQ to tell it we want the originating host and the responding host.

And I’m going to talk a little bit more about these fields and what they actually represent. but this is a way we can pick out those specific field values, and print those out to standard out.

What’s cool interesting thing with J, with, with JQ is you could, you could actually put in arbitrary text as well. So the second, the screenshot to the left, which is the second command that’s showed below, shows you where you can actually print out certain like say messages to say this, these are my host ports and these are my, this is the client request and server response.

and so it’s, it is kind of interesting that you could do that and use it for that purpose. but then the, the, the default format that ZEEK logs its logs to is tap separated.

there’s actually two different utilities that you can use. When you’re dealing with ZEEK logs on a ZEEK sensor, one is Zeek Cut, which comes with the, the, the Zeek installation.

and, and you could see in the, the first command there catting out the contents of the con log and piping that to APT Zeek bin Zeek Cut. Wherever your Zeek binaries are, where Zeke the, the primary ZEKE binary is, that’s where Zeke Cut is going to exist as well.

And the nice part about Z Cut is you can target those field names at the top there. ID originating, underscore, host or Proto or service or duration or whichever ones we want to target.

You could just name those fields that you want to pull out. If you knew the names of the fields, you specify them as parameters to the, as arguments to the Z cut and it’ll cut out those fields for you.

you can also use the native Linux command utility Cut. and because it’s already tab separated, you don’t have to give it a special delimiter. You could just basically tell cut, I want the fields and their numeric numeric fields.

Starting from one would be the first column field, and then so on and so forth. So just to show an example of cutting out the same contents here using two different utilities, one we could specify the column name using Z Cut.

The other we could specify the actual column number, starting from the left side. The first column would be column one. So we’re coming out columns 10, 11, 18 and 20.

the pro with using something like Z Cut is that you can actually specify the fields in different orders. So say if you wanted, the, the responding underscore IP bytes column first.

you could specify that column first and you could specify in whichever order you want to display the standard out with the Cut utility. even if you specify these numbers numerical values differently, say if you do 18 comma 10 comma 11, kinda 20, it would still present them in the order in which the columns are presented.

it will still present them as 10, 11, 18 and 20. so yeah, there are other Linux, Linux utilities that you can use to rearrange order and stuff like that. But just kind of off the, off the cuff there.

the Z cut has a little bit more of advantage if you want the columns displayed in different orders. And so when Z creates the logs, it creates a crap ton of logs.

And one of the things that I wanted to highlight here is this is I think the most recent version of all the different logs that Z creates. There’s a cheat sheet that I’m going to share the location that actually corelight has, I’ll share that here a little bit later.

but, But that cheat sheet shows the different log types and all the different fields and those values and what they represent. What I wanted to show here is some of the more common ones that you’re almost always going to see.

No matter, almost no matter where you place your, your network security monitoring sensor, as long as it’s seeing network traffic in your and your environments, it’s likely that you’re going to see the ones that are highlighted here in yellow.

And I’m going to start with the con log because everything starts pretty much from the conlog. There is a couple of exceptions, but for the most part when we’re dealing with network connections there will always be a con log.

Even network connections that aren’t established connections, will exist. Those events will exist in the conlog. so connection attempts, will, will be in the conlog.

Furthermore, in the conlog, if it identifies anything, that any protocol that it understands, any network protocol that it understands, and under these protocols that we see here displayed in front of us, it’ll, there’ll be a an associated log to that protocol for the connection events that are already logged that cannot conlog.

And that may sound like a lot, but basically to summarize that is, if the con log sees a connection over a protocol that it understands, that connection event will be logged in the con log and the overall connection will be logged there and then the associated application events will be logged in those respective application logs.

So if there’s a DNS, there’s DNS activity and the, and and there’s a DNS, there’s DNS activity and it gets picked up by Zeek that DNS activity, the overall connection will be logged in the con log, but the specifics of the DNS application activity will be logged in a DNS log.

And so the, the. The files log is a little bit different in that is not purely based off of connections, although everything originates from connections.

But if Zeke sees any file object being transferred, whether it’s being sent out or being received back in, irrespective of the direction, if it sees that file object being transferred, it’ll record that event also in the files log and it’ll tell you which connections it saw that file associated in that, in that associated network traffic.

The HTTP log is obviously if it sees HTTP data, HTTP events that’ll be logged in the HP log And then the SSL log, which is a little bit maybe a little bit misleading to some folks that would say well the SSL protocol is dead and it’s been evolved over to tls.

Yes and no. the developers of Zeek called SSL TLS ssl, it’s in the SSL log and TLS in the, I guess in the relation to these is just an evolution of ssl.

so tls, even the latest and greatest tls, events will be, or the latest versions of tls, if they’re recorded as events will be in the SSL log.

And then there’s some newer types of logs that we’re seeing kind of like I said, the developers of Zeek are continually improving and continually adding capability.

I started using Zeek or Bro at the time in 2013, I think it was that Bro version 2 point something. and then we saw evolution in three dot and then in four dot and five dot.

And then now we’re we’re getting to the, to 7.1 which when I put these slides together, which was the latest version for Zeek, we could see there’s a postgres, SQL log now and then in version six they released three different log types, two different LDAP log types and then a quick log.

So in case you didn’t know, QUIC was the protocol that is now known as HTTP 3. So HTTP has gone through a couple different evolutions over time.

HTTP 1.0 and 1.1 are typically the legacy versions of HTTP that we’re so associated with when we open up HTTP traffic and Wireshark, and then header compression and some other compression type, formats came with HTTP 2.

and then the developers as they evolved HTTP 3 developed quic, which is essentially the evolution of HTTP into HTTP version 3.

just a little quick note on that. HTTP version 3 actually communicates over UDP, UDP port 43, 443.

Sorry. if you see that traffic in your environment and you’re curious of what that is, it may likely be HTTP 3 traffic, where, where traditionally HTTP 1 and HTTP 2 operate over TCP.

So a huge different a leap as far as from what we’re used to from a lot of these traditional protocols, where we see that they don’t really kind of change from TCP to UDP.

Well that, that has happened with HTTP 3 and we see now that Zeke, Zeek has a log for that. But, but looking at, we see a lot of other log types as well.

so you can, you can imagine how powerful this could be. depending on where your network security monitoring platform is or where that position is, some of these will likely not be applicable.

Hopefully they won’t be applicable. Especially considering if you have a network security monitoring stack on the outside of your environment, you shouldn’t really be seeing traffic like SMB traffic and RDP traffic and other traffic, other traffic type data, that would be associated with internal communications and you shouldn’t really see that on the outside of your environment.

And so, and then the other thing that I wanted to point out here is some of these protocols you might be looking at be like, well those are, there’s encryption that play, right with ssl, there’s encryption with ssh, there’s encryption with rdp, there’s encryption, and so on and so forth.

And that is true, that encryption is, is as that play. but even with these encrypted protocols there is an opportunity to gather some types of data.

that, that is presented, that’s, that’s unencrypted, right? So you have the connections that are occurring between IPs and the ports that are associated with it. But then negotiating the, the, the crypto, the crypto algorithms and how they’re going to connect up cryptographically are typically transferred in the clearance.

And so when those things happen, that data can be collected and then logged. And we could see, we have a lot of opportunities to get, to gain visibility on network traffic, even if it happens to be encrypted.

So getting into it, let’s jump into the actual logs themselves. as I mentioned before, every time Zeek sees a connection or even a connection attempt, that connection will be logged in the con lock.

What we’re looking right here is we’re looking at three different recorded events. the first one at the top is the event, that’s captured in the con log and that’s give us a summary of that connection, the first column in that log and the first column in many of the logs that we see is the TS field or timestamp.

Zeek records these timestamps as epoch or Linux time, which is the number of seconds since 1-1-1970. and so that’s why you just see this numerical value, right?

And then the dot, and then we have the mill, milliseconds nanoseconds, from, from that as well. So we can get very granular in the timestamp that’s being recorded.

You can see that the, the connection id and I haven’t really talked about the uniqueness of this connection ID yet, but the connection id when Zeke sees a connection, it will create a unique connection ID and it’ll be.

And then that connection ID that unique identifier will carry will follow through all the different various Zeek logs that are associated to that connection. So what we’re looking at here is we’re looking at a con log at the very top that’s saying we saw this IP import talk to this other IP import.

the transfer layer protocol or layer four protocol was tcp. the application layer protocol was HTTP. And then that, that next value that happens after that is, is duration.

And so what important thing that I wanted to highlight here with timestamps is that we don’t have a timestamp of when the connection ended. We have a timestamp of when the connection, when the connection event occurred or started, but not when it ended.

But we do have duration. So we can look at the timestamp of when the connection started, we can look at the value, of the duration, and then we can calculate the duration, which is also in seconds, to when the connection was last seen or when the connection ended for Zeek.

But what’s interesting is when we look at the HTTP log, which is the next two rows that follow, we see that the connection ID is the same. So that makes sense, right?

Zeek saw a connection over HTTP, it logged those connections over HTTP and HTTP log. But the timestamps are a little bit different. when I first actually discovered this, I thought something was wrong with my, my Zeek deployment.

I thought Zeek was, there was, there was some error in something somewhere. it didn’t really make sense to me because I was like, well everything goes back to the connection. Why isn’t the timestamp the same as when the connection started?

And then it dawned on me, well, this is a TCP connection, so there’s amount of time that has to occur to establish that three way handshake. So there’s the syn, SYN acknowledge, which has nothing to do with the HTTP protocol at this point as only to do with establishing a TCP session.

And then what will follow afterwards would be the application layer protocol in this case HTTP. And so what Zeek is telling us a few things here, is that not only did it have that one recorded connection or session over HTTP?

it logged those HDB events. There was actually two different events over HTTP over that same connection. so we see that where the connection ID matches, the timestamps are different because each event is when that actually event happens.

So even though the original timestamp of the connection ID is logged, the timestamps of the event one over that session had one timestamp and then the second event.

so we could see if we look down a little bit toward the right of this, of these HTTP events there was a git which is the method, the HTTP method that was issued by the client, as well as to the, to the server there 205.185/with a URI.

And then you could see that there was another git request over that same TCP connection To um,/File/911DLL.

the trans depth, which is a field that doesn’t really get talked about is the transmission depth of that event. So if, if, if multiple events are happening over that same, at the, over that same connection in that same application protocol, then what would happen is that trans depth will increment.

So say for example, hypothetically if this, there was a third transaction request and a fourth and a fifth for each one of those, that trans depth number will then increment. So the first time it was seen, it was one, the second time, then so on and so forth.

Not all application protocols carry this trans depth field. the other one that I’m aware of is smtp, SMTP for exchanging email.

And it’s not very common, but it is supported within the protocol that when an SMTP connection gets created, the syn, SYNAC AC three way handshake you can have multiple, the mtas are talking to each other over that TCP session and transferring email.

And in that TCP session it’s supported by the protocol where the sending mta, the sending mail transfer authority, can send multiple message, messages over the same TCP session.

If that was the case, then each one of those messages being sent, would be another transduct and that transduct would then increment. So there’s a lot of stuff going on there with timestamps and with the events that are occurring over, linked back to that unique connection ID.

And speaking of unique connection IDs, we hit up on that one a lot on the connection id, that gets established for every connection. However, there’s a couple other IDs that are associated with Zeek logs.

one is the IDs that are associated with the host, the originating and responding hosts and their associated ports. I’ll talk a little bit more about originator, and responder in a minute.

but just think of that as like the traditional source and destination addresses and ports that are being that are seen by Zeek. And then there’s the fuid, which is the file’s unique identifier.

So when, when, as I said earlier, when Zeke sees a file get trans, file object gets transferred over the wire, irrespective of the direction that that that file occurred, that they’ll, it’ll create a file ID unique to that.

And then there’ll be additional things that we could, we can gain and glean off of for, for that given file and that file id. And then subsequently there’s additional opportunities to find that file id.

So for example, if the file was transferred over the HTTP, protocol, the HTTP log will carry will, will. Will log that file ID as either the originator or the responder.

So which, which host actually sent the file or transmitted the file, will, will show up in that log. there’s also this, somewhat of a newer log, I think it came out, either in version four or five of Zeek.

but it’s the PE log. So Portable executable. So Windows Portable executable file. If Zeek sees that over the wire, it’ll log that in the PE log and it’ll also associate the file ID that was associated to it.

And then there’s this concept of the parent fuid, which is if Zeek see something like a container file, like an archive, like a zip or something like that, there’ll be a file ID associated to that archive.

But then Zeek will then identify the files that are with, inside that container object. and those have, they’ll have their own FU id.

So going, linking back to the parent up uid. So then you can connect the dots and for a little bit of visual. How this looks is two different examples here.

So the first one is we have a connection. It’s logged in the con id. or logged in the con log. It has a cuid or connection id. and it’s, and it’s of type DNS.

And so there’ll be associated DNS log that have that same kind of con id. But it will tell us more information about what happened over DNS. And then and Then we have an example that’s a little bit more involved and that is we have a con ID that’s associated to HTTP traffic.

Like we just saw a few slides ago. In that con ID or in that HTTP traffic there was a, there was files, one or more files that were observed. And so that files log will also contain the con ID to, to the, where that connection came from.

And then if there was a PE file, Windows executable file associated with it, you’ll have that fuid in the PE log to to then correlate back to where the files and ultimately where the connection came from.

So, and I have the notice log to the side here. The notice log is something where ZEEK has this capability to if certain events happen you can then say if XYZ happens, send that data to the notice log.

And there are some built in things that Zeek does outside the box. If you’re familiar with Zeek and use it a while, you probably notice like the majority of your events in the notice log are SSL certificate validation errors.

because when you do, when there’s one of the things Zeek does out of the box and you have to trust like certificate CA stores and all that other stuff, you got to add all that stuff into Zeek in order for it to know if that the certificate is trusted and such.

So those are typically a lot of like the noisiest stuff that comes out of the box. but the notice log exists and we can leverage that if we, if we needed to to develop and use ZEKE as more of like an intrusion detection system, or if we’re just interested in some certain events or sequence of events happening, we can have those events being written to the notice log and it is one of those things that can help us too.

in the case, in the instance of say we can’t send all of our Zeek data to our sim, but maybe we could just send our notice log to our sim. and in that case we might want to put more stuff, more events record to the notice log.

And so we can operationalize that in our security operation centers with the SSL log as I mentioned earlier, even though it’s it contains also not just traditional SSL but TLS traffic, it observes the handshake that occurs, observes some communication that occurs in Clare and logs that to the SSL log.

the X509 log is actually the certificates the events that are generated via the certificates that are, get transferred when SSL or TLS connections occur.

they used to have a corollary back to the files log. However they removed that in a, in a more recent version and now they fingerprint the x509 event.

and that fingerprint is recorded as well as can be found in the SSL log and as as well as the files log. So there’s one of the things too with Zeek is that you’ll, once you, once you get to the point where you figure you think everything, the developers do like to make changes so you gotta, you gotta really stay on top of it.

and you gotta, you gotta watch for, for iterative changes that go on because although they’re doing some really, really cool and fascinating things, it could be somewhat dynamic and it could throw a loop for you as if you were dependent on certain artifacts or attributes existing and then all of a sudden they’re gone or they’re replaced.

And so just use that, take that as a caveat. in the connection log there’s a lot of other data going on in the connection log. I was just showing you like a snippet of what was going on.

We talked a little, we talked already about these, the timestamps and these unique IDs and such. but I just want to take a quick minute and just go over some of the other ones. So there’s the field called Proto, which is short for protocol and that’s the her documentation is the transport layer protocol kind of sort of, And I say kind of sort of because if you ever have ICMP traffic occurring in your network, instead of it saying TCP or udp, it’ll say icmp.

ICMP is a layer three protocol. so even though the field is supposed to be layer four, it’s transport layer or say some lower level protocol like icmp, the service field will contain a value of, if Zeek knows about that application layer protocol, so stuff like DNS, httpssl, and just a quick tip, if you’re looking in your conlab and you see these things like DNS or HTTP or SSL or SMTP or you just name that protocol.

then there should be an associated log for that traffic. so if you’re seeing HTTP in that column, there should, that same con connection that you’re seeing summarized in the con log should show in the HP log which much More, Much more data that was involved with the HTTP traffic We talked about duration already the length and seconds that that connection existed.

The originator, responder bytes, the number of bytes that were sent, the number of payload bytes that were sent and then you can see down below a little bit lower, there’s this concept of the originating and responding IP bytes that will always be a little bit larger than the payload bytes because it’s containing the.

The The. The. The size of the payload plus the header That comes with the traffic the, the IP layer in below or above if rather So the overhead if you will.

So it has the overhead traffic of the IP connection plus the payload in this concept of local If the, the responder or the originator is local it’s a boolean value.

So whether or not the that’s something that you set in your local variables and you, and you say whether or not that traffic is local to your environment or is not. And so there’s this boolean value to whether or not you can, you can determine if those hosts are local to your environment or if they are remote expanding out some of these two fields here.

This is actually taking from that cheat sheet there. So if you want a screenshot, that that link or grab that link, that corelight cheat sheet poster is the latest and greatest that they, that they have to offer.

And there’s a lot of stuff in there but they extrapolate out on the con state as well as the history all in the con log. so like I mentioned before, just because an event is logged in your connection log doesn’t mean that that connection occurred.

So if you, if you’re looking at your, if you position at the outside of your environment you’re definitely going to see a lot of attempts coming in like trying to come into the environment if your firewall is dropping them or your ISP dropping them wherever you’re positioned and that traffic is not establishing a connection it’ll still show up in the con log but the state will be at S0.

So connection template seen but no reply. so just keep that in mind as well. for when you’re looking at traffic, you want to look at a normal healthy connection should be of type sf and that is it saw the normal and established communications and greater than zero byte count.

So it saw data and it saw a healthy connection occur and then and go down and then and then you have the, the history. So it’ll track, if you think a lot of that, like with tcp, and you have your, your, all your TCP flags, it’ll track the different flags that get sent.

you can see the originator, all the flags associated with the originating traffic will be in uppercase and all the responder traffic will be lowercase. So if you see those different combinations of values, that’s what those things mean.

And so going back to the actual. When I first got introduced to Zeek and when I was looking at the traffic and I thought something was broke with the environment, it brought me back to this experience that I had, where one of our network spans was broken, and it was only capturing traffic in one direction.

And so if you look at network traffic like we traditionally have with like Wireshark, you would see that the source and destination show up. but, and you can see over onto the right that it looks like it’s HTTP traffic.

Wireshark is identifying as HTTP traffic. However, we’re only seeing the source on one side and destination on other. We’re not seeing the 205185 address, as the source sending b bi directional traffic.

So we’re not seeing bidirectional in this case. We’re only seeing in one direction. And so this concept of looking at this traffic back and forth was something that in my mind was almost just by second nature.

And when I saw something like this. Or I’m used to looking at something like this, this looks a lot more like a, visibility, into network traffic.

That makes more sense. Right? We, we’re seeing bidirectional traffic. Same traffic is here, but now we’re seeing the actual the flip flop of the source and destination addresses. And I wanted to show this because when I first looked at Zeke, this is what I saw.

And I thought something was broke. And this is not broke. This is a healthy connection, log. These are events that are in the con log that Zeke is recording. But you see as the originator is on one side and the responder is all on the other.

And that’s what Zeke is calling that. And it made more sense when I realized Zeke does not label these connections as response. Source and destination. Source and destination implies bilateral and more of a flow type of a record, logging.

Whereas Zeek sees the entire connection. It says this was the host that originated the connection, this was the host that responded to the connection. And so our client server minds kind of Think, okay, originator is the client, responder is the server.

But really, really, that’s arbitrary, right? If you think about what host started the connection, that’s the originator, what host is it sending to, and and maybe responding back from is there.

Is the responding host. And so it made more sense when I was looking at that, that of why Zeke was presenting it the way that was presented.

as I mentioned a little while ago, the concept of local versus remote. There’s this Boolean field in here. Local underscore, original originator, local underscore, responder.

so it’s looking at those, those addresses and it’s saying, based off of how your local nets are defined, this traffic is either, yes, true, it is a local, the originating is a local host, or false.

And you can see here all the originating hosts, they’re all the same host. So it would make sense that they’re all true. And we could see, that the ones on the right here, the only ones that are true are the ones that are associated to the 10.9 IPs, where all the other ones are labeled as false.

Zeek used to rely ultimately on the. The the local dot, your, Your. Your your network blocks and the local nets, variable.

However, I think version four or five. I can’t remember which version it was, but it was a couple versions back. They made the change where any.

Anything that’s that’s private. Ina. Private IPs are now default to local. So if that’s something you wanted to change, you would have to redefine your local nets variable like the example there it showed.

And also in the networks config file, is where you would define, more more networks and give it like a name. So in this case, I’m defining 192-16-812024 as my VPC subnet, and telling Zeke that that’s my local, subnet there.

The concept between stateful and stateless. So I always like to bring this up. Zeke does a, Does a good job at looking at connections that are over stateless protocols like udp, for example, DNS.

And it looks for the returning traffic and it puts it all in one log. So instead of like traditionally when we open up a PCAP and we go look, we go, okay, we see the DNS request go out. Where’s the DNS response? Let me go try to find the DNS Response.

What are the answers coming back? Well, Zeek is tracking all that and it’s pulling out the, it’s mapping up the DNS request to the response and it’s putting all that on one line.

It does this for all the protocols, that it understands and it can lock. And so we can see here, I’m using the Z cut command again, to grab the timestamp, the unique id, the originating host, originating, the responding host, the responding port.

the actual query that was queried, the answer that came back, the actual DNS type here in this, a record, type, and then the ttl, which is the time to live of that DNS, request.

So, I’m just pulling out those fields from that DNS log, and presenting it to standard out. And you could see here, you also see how long that DNS connection.

This is the con log. You, can see that it happened over udp. The DNS protocol was, Was, observed over that service field. and the duration of the DNS traffic, was a little over one second.

And then there’s the files, right? We mentioned files over file, files being transferred over the wire. another change that Zeek has made over the years is they used to. They used to log the, transmitting the host that transmitted the file and the host that received the file as TX and rx.

they remove that now. So now it’s back to the originating host and the responding host like we would see in the con log. But, you have this unique files id, and that file that gets transferred over, is recorded in the files log as well as the protocol that it’s over.

We could see that there’s certain analyzers that are attached to the files log. In this case, the analyzers are the SHA1, the PE and the MD5 analyzers, which is something that Zeek does.

and you could, you could tune. You can actually create it and say, I want to log as SHA256 instead of SHA1. In, this case, it’s telling us which analyzers are attached to this traffic. and I’ve given you an example of how you can actually use the traditional cut command, to get these same values out.

But, really cool thing is what Zeke actually sees a file that’s being transferred over the wire. It tries to identify the mime type or the magic bytes of the file, the content of the file, and it’ll record that if it has a mapping just like you run the files command on command line and it looks at the binary makeup of the file and it looks at the signature.

The magic bites. It says, okay, this is of type Windows Executable, Portable Executable. Or this is of type, application, like Word Document or whatever the different MIME types would be.

If Zeek has a library for that, it would record that application MIME type. There’s another thing that I use when I do a lot of hunting is look for unique MIME types or MIME types that shouldn’t really be transferred over the wire.

thus executables are something that should always pique a little bit of an interest when you’re looking and doing threat hunting. and the only, the other thing I wanted to mention about the files being transferred over the wire is that file names may exist in the files log, but just keep in mind that file names don’t actually exist with the file content itself necessarily.

So a file name is just an attribute that’s associated with file content when it’s saved to a local disk. When a file system, when a file system can point to the contents of that file.

When file objects are transferred over the wire, there’s, unless the application protocol that’s transferring that file tells the receiving end what the name of the file is.

And we see that with some of the application protocols like SMTP and sometimes with HTTP, but not necessarily that file name has to exist. So if the file name does exist in the transfer, Zeek will record it.

If not it’ll be blank. Some other cool things that you can do with Zeek getting more out of it. there’s the package manager concept. You can go there, check out packages zeek.org and see all different packages.

one common one is the ja3 package, which provides tls handshake hashes for both the client side as well as the server side of the TLS connection.

and then there’s also the community id. That’s not something that you necessarily have to turn on with, with Zeek. Community id, does a calculation and creates a hash or signature on the the connection using the five tuples of the connection.

that’s something that you, your SIEM can actually create too if it’s ingesting these logs. but I’ve seen where you can actually add this connection, the community ID to the con log as well and then operationalizing some just quick Some examples of like how you can use some of this data, for your various needs in your security operations center or whatever your operational needs are and your day to day, so I mentioned alerting, you can write custom detects, you can use your sim, you can get the logs in, just use your SIM to write custom detects.

You can use the notice framework to write those events to the nose log. you could just go hunting old school on the blogs, right? You could just go looking through different, various, interesting things.

one, one very common thing you can do with your Zeek, deployment, is use it from an auditing perspective. I remember back during the, the days of like Heartbleed and and some of the other vulnerabilities that were really, really prolific within SSL or tls, is that these IT organizations were really trying to get, after these applications that were using these weaker cipher suites, these cipher suites are extremely vulnerable.

and then the experience I had was like, okay, the organization high fives each other. We deployed these GPOs, we disabled all the weak cipher suites. And then we go look at our Z, sensors logs and we see that there’s still TLS 1.0 and 1.1 in the environment.

This is the active cipher, the active version that not only just gets negotiated but gets accepted and used to create those TLS connections. So you can, you can audit your environment to say, well, m, we have these weaker protocols or weaker versions of the protocols in use right now on our environment.

Same thing for ssh, you could look for SSH version one, versus version two. HTTP log will do that. There’s a software log that will actually see any types of signatures based off the known software that it will have.

And I’ll pull that out of there. snmp, it’ll log the different versions of snmp. And that’s not so much for versions of SMB. but SMB version three supports encryption.

If you’re seeing SMB traffic in your environment and it’s not encrypted, obviously you’re not operating SMB version 3 in an encrypted form. And then other things like FTP, do you have FTP in your environment?

Are you using FTP? It’s a clear text protocol. If it’s being used in your environment and your network security monitoring sensor is in position to see that traffic, it’s going to pick that up and it’s going to log it.

You can just See, this is an example of what the TLS log looks like. sorry, the SSL log. And it’s. And it’s. And it’s logging the TLS version 1.2. In this case, it also logs the cipher suite that gets negotiated and selected.

and then the server name. So in the SNI field in the TLS extension, there’s a server name or the host name that the server that the client wishes to communicate encrypted to the server. and if that SNI name is non encrypted, then Zeek will grab that and put that in the SSL log as well.

So you’ll actually see the host name, that the client is negotiating and actually forming a TLS connection with. With that said, we’re five minutes at the top of the hour, so I don’t know if.

Jason, do I have to say your name three times or will you just appear?

Jason Blanchard

You’ll just want to keep going.

Troy Wojewoda

I’m there, I’m at the end. Okay. yeah.

Jason Blanchard

Okay. Okay.

Troy Wojewoda

So, I mean, I could keep going, but as Bill said, I probably can keep going into 2025.

Jason Blanchard

Yeah.

Troy Wojewoda

you have like the surface a little bit. Yeah.

Jason Blanchard

Well, you do have a training class. Not. This isn’t a promo for. But there’s the training class. Yeah, have on demand.

Troy Wojewoda

I do. And in that, in that class I actually go over examples of how you can. I even really talk about this. But you can create custom Zeke scripts. Zeek operates a lot on script, these already pre prescribed scripts that exist.

but I show the students how they can create their own scripts and for different various reasons, add data to their logs or get more forensics artifacts out of the network traffic. So, yeah, okay, I’m trying.

Jason Blanchard

Well done, Ryan. Are you showing Discord there? Is that what’s going on? so reminder everyone, if you haven’t checked in yet for Hackett, just by engaging at all in the live chat during the webcast counts.

or you could always go to the Hackett check in channel and go ahead and post in there. And we’ll give you credit for attending today. If you’re new to Black Hills Information Security and you’ve never attended a webcast before, what we do is we have Discord server where you can join for free, be a part of this community for free and engage with the community for free, where you can share your knowledge.

The community can share their knowledge with you. but we have this thing called Hackett, which is based on Pizza Hut’s Book it program from the 80s and 90s. If you weren’t alive then, no worries.

but back then you could read 10 books. You had the teacher to sign off on it and you get a free personal pan pizza when you went to the Pizza Hut and it was amazing. And so for nostalgia stake, say we brought it back.

so if you attend 10 Black Hills webcast, you get a reward. 20, 30, 40 and 50, you get rewards. just the thing is, is the store is closed right now until the beginning of January.

So if you did get 10, 20, 30, 40 or 50 today, the store is closed until the beginning of the year. So you can just get it when you come back in January. All right, Troy, if you could sum up everything.

Well, actually let’s do. Are there any questions will that you, you feel like really have to be asked this morning?

Troy Wojewoda

Now, kind of just opinion questions and some things like that. I’ve noticed I did see one question go over discord about the recommended hardware. and I, and that, that, that gets a little bit complicated depending on.

It just really depends on what you’re trying to monitor. Right? So if you’re trying to monitor a connection that’s like say within well under, say it was under a gigabit per second, there’s pretty much any commodity hardware will do.

But if you start getting into like, like the say the major leagues of monitoring, right, when you’re in the 5, 10 gig stuff, then you might want to start looking at like dedicated nicks and stuff like that.

And it might be a little bit expensive, but for the most part you could think about like getting started. Find some tactical choke points, something that’s not too busy, in most any commodity hardware and in Ubuntu latest version Ubuntu and you should be good.

Jason Blanchard

Yeah. All right. If you normally attend the anti siphon anticast, on Wednesdays, those are now moving to the Black Hills Discord server. and they will count for Hackett.

If you watch the news on Mondays, that will now count for Hackett. we are trying to find ways to help you engage with the community and community to engage with you. And the reason why is because together we can answer all of our questions.

But a lot of us have questions that we would love answered, but we can’t find a way to get that question answered. And so a community of people helping to answer each other’s questions is how we can all get better.

Troy, if you were going to sum up everything you talked about today in one final thought, what would it be?

Troy Wojewoda

If you’re not using Zeke, start using Zeke.

Jason Blanchard

Nice. That’s very concise. Deb. if you had one final thought today, what would it be?

Troy Wojewoda

happy 2024, and thank you for joining us in all the weeks that you did, and we will see you next year.

Jason Blanchard

Yeah. All right. To all of you that have joined us here on Discord or any of our webcasts in all of 2024 before that, thank you so much. Thank you for being part of this community.

You give purpose to this community, and this community gives purpose to us. Right? Like, it gives us an opportunity to share our knowledge and for knowledge to be shared. This is the last webcast in 2024.

We will be gone for the next two weeks, and then as soon as we get back, we have an entire year scheduled and planned. we’re also going to change things up, make some improvements, do things differently, make it a little more, engaging and interactive.

and so you’ll get a chance to do things in 2025 to increase your knowledge and abilities here inside cyber security. Do you ever need an, an active soc, an anti soc, pen testing, threat hunting, red teaming, any of that stuff that we provide?

You always know where to find us. And we hopefully, if you ever need us for your service in 2025, reach out. And with that, Ryan, for the last time in 2024, kill it with fire.

Sir. Kill it, Ryan.



Ready to learn more?

Level up your skills with affordable classes from Antisyphon!

Pay-What-You-Can Training

Available live/virtual and on-demand