Yara Malware Scanner

Thursday, Oct 30, 2025| Tags: perl, yara

DISCLAIMER: Image is generated using ChatGPT.


 1. Introduction

 2. What is YaraFFI?

 3. Install YARA and YaraFFI

 4. Basic Usage

 5. Writing Simple YARA Rule

 6. Scanning Memory Buffer

 7. Scanning File

 8. Callback Events

 9. Advanced Event Types

10. Limitations

11. Example


Introduction



YARA is a battle‑tested malware pattern matching engine relied on by reverse engineers, Digital Forensics and Incident Response (DFIR) analysts and modern Security Operations Center (SOC) pipelines. It lets you express malware characteristics in readable rules and scan files or memory for matches. YaraFFI brings this capability natively into Perl, no system calls, no temporary files by binding directly to the C library via FFI::Platypus.


What is YaraFFI?


YaraFFI is a minimal, modern Perl interface to the libyara C engine using Foreign Function Interface (FFI). Instead of relying on XS or spawning the yara CLI tool, it talks to YARA in‑process. This makes it ideal for embedding in automation, stream scanners, build pipelines and malware research scripts.


Installing YARA and YaraFFI


You need the native libyara shared library installed first.


$ sudo apt install libyara-dev yara
$ yara -v
4.5.0

Then install the Perl module YaraFFI from CPAN:


$ cpanm -vS YaraFFI

Basic Usage


Let’s start with a minimal real scan.


File: ex-1.pl

use YaraFFI;

my $rules = <<'YARA';
rule HelloWorld {
  strings:
    $a = "hello" ascii
  condition:
    $a
}
YARA

my $yara = YaraFFI->new;
$yara->compile($rules) or die "compile failed";

$yara->scan_buffer("hello hacker", sub {
    my ($event) = @_;
    print "Matched rule: $event\n";
}, emit_string_events => 0);

Output


$ perl ex-1.pl
Matched rule: HelloWorld

Writing Simple YARA Rule


In YARA a rule is divided into meta, strings and condition sections. For a practical first example we’ll detect the ASCII word "test".


my $rules = <<'YARA';
rule TestRule {
  meta:
    description = "Detect the literal string 'test'"
    author = "you@example.com"

  strings:
    $t1 = "test" ascii

  condition:
    $t1
}
YARA

Scanning Memory Buffer


scan_buffer is the most common in-process operation, it lets you scan arbitrary byte buffers (scalars) directly. This is ideal for scanning network captures, unpacked payloads, or API-returned blobs without touching disk.


File: ex-2.pl

use YaraFFI;

my $rules = <<'YARA';
rule TestRule {
  meta:
    description = "Detect the literal string 'test'"
    author = "you@example.com"

  strings:
    $t1 = "test" ascii

  condition:
    $t1
}
YARA

my $yara = YaraFFI->new;
$yara->compile($rules) or die "compile failed";

my $payload = "this is a test payload";

$yara->scan_buffer($payload, sub {
    my ($event) = @_;
    print "Matched: $event\n";
    print "Event type: " . $event->{event}, "\n";
});

Output


$ perl ex-2.pl
Matched: TestRule
Event type: rule_match
Matched: TestRule
Event type: string_match

Important Practical Notes:


  • Binary data & binmode:

    Ensure any data read from files or sockets is treated as raw bytes (Perl’s binmode or read_file(..., binmode => ':raw')). scan_buffer expects a Perl scalar containing the bytes to scan.

  • Large buffers:

    Scanning very large buffers in one call consumes memory and may be slow. For very large inputs consider chunking and scanning each chunk with scan_buffer. Because YaraFFI (currently) does not report match offsets by default, if you need exact byte positions you must enable the experimental enable_offsets option and track chunk offsets yourself.

  • Callback behaviour:

    The callback is invoked for rule_match and string_match events by default. The supplied event object stringifies to the rule name but also contains a hash-like structure e.g. { event => 'rule_match', rule => 'RuleName' }.

  • Concurrency:

    FFI::Platypus closures capture Perl state; concurrency models vary — prefer process-level parallelism for heavy scanning workloads until you’ve tested threads in your environment.


Collecting matches into an array:


File: ex-3.pl

use YaraFFI;

my $rules = <<'YARA';
rule TestRule {
  meta:
    description = "Detect the literal string 'test'"
    author = "you@example.com"

  strings:
    $t1 = "test" ascii

  condition:
    $t1
}
YARA

my $yara = YaraFFI->new;
$yara->compile($rules) or die "compile failed";

my $payload = "this is a test payload";

my @hits;
$yara->scan_buffer($payload, sub {
    my ($event) = @_;
    push @hits, $event;
});

print scalar(@hits) . " matches found.\n";
foreach my $e (@hits) {
    print "- " . $e->{rule} . " (" . $e->{event} . ")\n";
}

Output


$ perl ex-3.pl
2 matches found.
- TestRule (rule_match)
- TestRule (string_match)

Scanning File


scan_file in YaraFFI is a convenience wrapper that reads the file into memory and calls scan_buffer. For small-to-medium files this is usually the easiest option.


File: ex-4.pl

use YaraFFI;

die "Usage: $0 <file>\n" unless @ARGV == 1;
my $path = $ARGV[0];

my $rules = <<'YARA';
rule TestRule {
  meta:
    description = "Detect the literal string 'test'"
    author = "you@example.com"

  strings:
    $t1 = "test" ascii

  condition:
    $t1
}
YARA

my $yara = YaraFFI->new;
$yara->compile($rules) or die "compile failed";

$yara->scan_file($path, sub {
    my ($event) = @_;
    print "[event=$event->{event}] rule=$event->{rule}\n";
});

Let’s first create a malicious file, malicious.bin, for the demo purpose.


$ dd if=/dev/urandom of=malicious.bin bs=1K count=64 2>/dev/null
$ printf 'test' | dd of=malicious.bin bs=1 seek=16384 conv=notrunc 2>/dev/null

Output


$ perl ex-4.pl malicious.bin
[event=rule_match] rule=TestRule
[event=string_match] rule=TestRule

Practical considerations:


  • Large files: scan_file slurps the whole file. For very large files read the file in chunks and call scan_buffer per chunk while tracking chunk offsets externally.

  • Binary mode: scan_file uses binary read (:raw). If you implement your own reader, always open files with binmode on Windows to avoid CRLF conversions.

  • Directory scanning: To scan many files, use File::Find or Path::Tiny to iterate and call scan_file for each regular file.


Chunked scanning pattern:


File: ex-5.pl

use YaraFFI;

die "Usage: $0 <file>\n" unless @ARGV == 1;
my $path = $ARGV[0];

my $rules = <<'YARA';
rule TestRule {
  meta:
    description = "Detect the literal string 'test'"
    author = "you@example.com"

  strings:
    $t1 = "test" ascii

  condition:
    $t1
}
YARA

my $yara = YaraFFI->new;
$yara->compile($rules) or die "compile failed";

open my $fh, '<:raw', $path or die "open $path: $!";

my $chunk_size = 1024;
my $offset     = 0;
my $overlap    = 4096;
my $carry      = '';

while (1) {
    my $buf;
    my $read = read($fh, $buf, $chunk_size);
    last unless $read;

    my $to_scan = $carry . $buf;
    my $chunk_start = $offset - length($carry);

    $yara->scan_buffer($to_scan, sub {
        my ($event) = @_;
        print "[event=$event->{event}] rule=$event->{rule} (chunk start: $chunk_start)\n";
    }, emit_string_events => 1);

    if (length($to_scan) > $overlap) {
        $carry = substr($to_scan, -$overlap);
    } else {
        $carry = $to_scan;
    }

    $offset += $read;
}

close $fh;

Output


$ perl ex-5.pl malicious.bin
[event=rule_match] rule=TestRule (chunk start: 12288)
[event=string_match] rule=TestRule (chunk start: 12288)
[event=rule_match] rule=TestRule (chunk start: 13312)
[event=string_match] rule=TestRule (chunk start: 13312)
[event=rule_match] rule=TestRule (chunk start: 14336)
[event=string_match] rule=TestRule (chunk start: 14336)
[event=rule_match] rule=TestRule (chunk start: 15360)
[event=string_match] rule=TestRule (chunk start: 15360)
[event=rule_match] rule=TestRule (chunk start: 16384)
[event=string_match] rule=TestRule (chunk start: 16384)

Callback Events


YaraFFI exposes matches to your Perl code via a small, friendly event object (the YaraFFI::Event class). The object is blessed but intentionally minimal, it stringifies to the rule name so simple test scripts can say $event and get a readable output. It also behaves like a hashref for more detailed inspection in tests or tooling.

Typical events you’ll observe with this minimal binding:

  • rule_match: indicates a rule matched. The object has at least { event => 'rule_match', rule => 'RuleName' }.

  • string_match: a lightweight stand-in for when a string inside a rule matched; it carries { event => 'string_match', rule => 'RuleName', string_id => '$...' }.


The goal of YaraFFI is simplicity and predictability. Instead of exposing the full complex YR_RULE struct and offsets (which differ across libyara versions), YaraFFI maps the most useful information into a stable Perl object you can inspect or stringify.


Advanced Event Types


YaraFFI now supports additional event types beyond the basic rule_match and string_match events. These advanced events provide more detailed scanning information and can be enabled on demand.


Available Event Types


rule_not_match

Emitted when a rule does not match the scanned data. This is useful for understanding which rules were evaluated but did not trigger.

$yara->scan_buffer($data, sub {
    my ($event) = @_;
    if ($event->{event} eq 'rule_not_match') {
        print "Rule $event->{rule} did not match\n";
    }
}, emit_not_match_events => 1);

import_module

Emitted when a YARA module is imported during rule compilation or scanning.

$yara->scan_buffer($data, sub {
    my ($event) = @_;
    if ($event->{event} eq 'import_module') {
        print "Module imported: $event->{module_name}\n";
    }
}, emit_import_events => 1);

scan_finished

Emitted when the scanning operation completes. This event is always the last one emitted and can be used to trigger post-scan actions.

$yara->scan_buffer($data, sub {
    my ($event) = @_;
    if ($event->{event} eq 'scan_finished') {
        print "Scanning completed\n";
    }
}, emit_finished_events => 1);

Event Configuration Options


All advanced event types are disabled by default to maintain backward compatibility. You can enable them individually:

$yara->scan_buffer($data, $callback,
    emit_string_events    => 1,  # default: 1
    emit_not_match_events => 1,  # default: 0
    emit_import_events    => 1,  # default: 0
    emit_finished_events  => 1,  # default: 0
);

Complete Example with All Event Types


File: ex-6.pl

use YaraFFI;

my $rules = <<'YARA';
rule MatchingRule {
    strings:
        $s = "malware"
    condition:
        $s
}

rule NonMatchingRule {
    strings:
        $x = "benign"
    condition:
        $x
}
YARA

my $yara = YaraFFI->new;
$yara->compile($rules) or die "compile failed";

my $data = "This contains malware signature";

$yara->scan_buffer($data, sub {
    my ($event) = @_;

    if ($event->{event} eq 'rule_match') {
        print "[MATCH] Rule: $event->{rule}\n";
    }
    elsif ($event->{event} eq 'rule_not_match') {
        print "[NO MATCH] Rule: $event->{rule}\n";
    }
    elsif ($event->{event} eq 'string_match') {
        print "[STRING] Rule: $event->{rule}, String: $event->{string_id}\n";
    }
    elsif ($event->{event} eq 'import_module') {
        print "[IMPORT] Module: $event->{module_name}\n";
    }
    elsif ($event->{event} eq 'scan_finished') {
        print "[FINISHED] Scan completed\n";
    }
},
    emit_not_match_events => 1,
    emit_import_events    => 1,
    emit_finished_events  => 1
);

Output


$ perl ex-6.pl
[MATCH] Rule: MatchingRule
[STRING] Rule: MatchingRule, String: $
[NO MATCH] Rule: NonMatchingRule
[FINISHED] Scan completed

Event Order Guarantee


When multiple event types are enabled, events are emitted in the following order:

  1. rule_match or rule_not_match (one per rule evaluated)
  2. string_match (if emit_string_events is enabled, follows each rule_match)
  3. import_module (if any modules are imported)
  4. scan_finished (always last if enabled)

Limitations


This module is intentionally minimal. That keeps the API stable and easy to understand but it also means a number of features are not present yet. Be aware of these before you build on YaraFFI:

  • Limited match offsets and metadata - by default, exact byte offsets and rule metadata are not extracted. These features are available experimentally via enable_offsets and enable_metadata options, but they are disabled by default due to YARA version compatibility concerns.

  • No modules / external variables - YARA modules (e.g. PE, ELF) and providing external variables to rules are not implemented.

  • Assumes libyara ABI compatibility — the callback currently probes the YR_RULE structure at runtime to find the identifier pointer. This is fragile across very old/new YARA versions; test with your target libyara version.

  • Scan flags hardcoded to 0 — no mechanism exposed yet to change YARA scan flags (e.g., SCAN_FLAGS_PROCESS_MEMORY in some deployments).


Why This Design Matters?


Embedding YARA with FFI is about speed and control. Calling the CLI in a subprocess works, but it costs process startup time and complicates embedding in long-running daemons. The small, well-defined surface area of YaraFFI keeps it practical for automation tasks like pre-commit scans, CI checks or evidence triage.


Troubleshooting


  • If yr_initialize() fails, ensure libyara is installed and the shared library (libyara.so, libyara.dylib) is on your library path.

  • If compile returns false, double-check your rule syntax with the yara CLI (e.g. yara -s myrules.yar) to rule out syntax errors.

  • On mismatched libyara versions you may see the warning DEBUG: Could not find valid rule name - this is the callback failing to locate the identifier field in the YR_RULE structure.


Example


File: ex-7.pl

use YaraFFI;
use File::Slurp qw(read_file);

die "Usage: $0 <file>" unless @ARGV == 1;
my ($file) = @ARGV;

my $rules = <<'YARA';
rule DemoRule {
    strings:
        $s1 = "test" ascii
    condition:
        $s1
}
YARA

my $y = YaraFFI->new;
$y->compile($rules) or die "Failed to compile rules";

my $content = read_file($file, binmode => ':raw');

my $matches = 0;
$y->scan_buffer($content, sub {
    my ($event) = @_;
    print "Match: $event\n";
    $matches++;
});

print "Total matches: $matches\n";

Output


$ perl ex-7.pl malicious.bin
Match: DemoRule
Match: DemoRule
Total matches: 2

Happy Hacking !!!

SO WHAT DO YOU THINK ?

If you have any suggestions or ideas then please do share with us.

Contact with me