Strelka is a real-time file scanning system used for threat hunting, threat detection, and incident response. Based on the design established by Lockheed Martin’s Laika BOSS and similar projects, Strelka’s purpose is to perform file extraction and metadata collection at huge scale.
Strelka differs from its sibling projects in a few significant ways:
- Codebase is Python 3 (minimum supported version is 3.6)
- Designed for non-interactive, distributed systems (network security monitoring sensors, live response scripts, disk/memory extraction, etc.)
- Supports direct and remote file requests (Amazon S3, Google Cloud Storage, etc.) with optional encryption and authentication
- Uses widely supported networking, messaging, and data libraries/formats (ZeroMQ, protocol buffers, YAML, JSON)
- Built-in scan result logging and log management (compatible with Filebeat/ElasticStack, Splunk, etc.)
Strelka’s architecture allows clients (“clients”) to submit file requests to a single intake server (“broker”) which distributes the requests as tasks to multiple processing servers (“workers”). A series of workers connected to a broker creates a “cluster.” During file processing, files are sent through a series of metadata and file extraction modules (“scanners”) via a user-defined distribution system (“tastes” and “flavors”); file scan results are logged to disk and can be sent to downstream analytics platforms (e.g. ElasticStack, Splunk, etc.).
This architecture makes the following deployments possible:
- 1-to-1 cluster (one client to one worker)
- 1-to-N cluster (one client to N workers)
- N-to-1 cluster (N clients to one worker)
- N-to-N cluster (N clients to N workers)
The most practical deployment is an N-to-N cluster — this creates a fully scalable deployment that can be modified in-place without requiring cluster downtime. Clients, brokers, and workers communicate using TCP sockets in the ZeroMQ (ZMQ) networking library. File requests are encoded as protocol buffers (protobuf). protobufs have a maximum message size of 2GB — any attempts to send file requests bigger than the maximum message size will fail and we have observed inconsistent behavior with direct file requests larger than 1.5GB. We do not recommend scanning extremely large files (>1GB), but if you must, then we suggest using remote file requests to do so.
Why would I want a file scanning system?
File metadata is an additional pillar of data (alongside network, endpoint, authentication, and cloud) that is effective in enabling threat hunting, threat detection, and incident response and can help event analysts and incident responders bridge visibility gaps in their environment. This type of system is especially useful for identifying threat actors during KC3 and KC7.
Should I switch from my current file scanning system to Strelka?
It depends — we recommend reviewing the features of each and choosing the most appropriate tool for your needs. We believe the most significant motivating factors for switching to Strelka are:
- Modern codebase (Python 3.6+)
- More scanners (40+ at release) and file types (60+ at release) than related projects
- Supports direct and remote file requests
- Built-in encryption and authentication for client connections
- Built using libraries and formats that allow cross-platform, cross-language support
Is Strelka an intrusion detection system (IDS)?
Strelka shouldn’t be thought of as an IDS, but it can be used for threat detection through YARA rule matching and downstream metadata interpretation. Strelka’s design follows the philosophy established by other popular metadata collection systems (Bro, Sysmon, Volatility, etc.): it extracts data and leaves the decision-making up to the user.
Does it work at scale?
Everyone has their own definition of “at scale,” but we have been using Strelka and systems like it to scan up to 100 million files each day for over a year and have never reached a point where the system could not scale to our needs — as file volume and diversity increases, horizontally scaling the system should allow you to scan any number of files.
Doesn’t this use a lot of bandwidth?
Yep! Strelka isn’t designed to operate in limited bandwidth environments, but we have experimented with solutions to this and there are tricks you can use to reduce bandwidth. These are what we’ve found most successful:
- Reduce the total volume of files sent to Strelka
- Use a tracking system to only send unique files to Strelka (networked Redis servers are especially useful for this)
- Use traffic control (tc) to shape connections to Strelka