DISCLAIMER: Image is generated using ChatGPT.
1. Introduction
2. Storable
3. Sereal
4. File-based Serialisation
5. In-memory Serialisation
6. Benchmark
Introduction
Serialisation is the process of converting complex Perl data structures (hashes, arrays, objects) into a format that can be stored or transmitted and later reconstructed.
Two major Perl serialisation CPANmodules: Storable and Sereal
Storable – A core Perl module for fast serialisation.
Sereal – A high-performance alternative with better speed and compression.
Local System Configuration
$ sudo lshw -short | grep -E 'processor|memory|storage|volume'
/0/0 memory 15GiB System memory
/0/1 processor 13th Gen Intel(R) Core(TM) i9-13900HX
/0/3 storage Virtio 1.0 console
/0/7 scsi0 storage
/0/7/0.0.0 /dev/sda volume 388MiB Virtual Disk
/0/7/0.0.1 /dev/sdb volume 4GiB Virtual Disk
/0/7/0.0.2 /dev/sdc volume 1TiB Virtual Disk
Storable
Introduced in Perl 5.0 in 1994 as a core module.
It uses binary format for compact storage.
Storable is one of Perl's core modules for serialisation and deserialisation of Perl data structures.
It can handle nested data structures, objects, and references.
It provides both functional and OO interfaces.
It generally performs better than text-based serialisation formats e.g YAML, JSON etc.
Always use $Storable::forgive_me = 1 and $Storable::canonical = 1 for security when processing untrusted input.
Consider using the lock_store and lock_retrieve functions from Storable v2.53+ which are safer by default.
Example
A very simple example where we have three different types of data.
use Storable qw(store retrieve freeze thaw);
my $data = {
numbers => [1..1000],
nested => { a => 1, b => 2, c => 3 },
object => YourClass->new
};
store($data, 'storable.dat');
my $deserialised = retrieve('storable.dat');
my $frozen = freeze($data);
my $thawed = thaw($frozen);
Arbitrary Code Execution Risk
There is always a risk of arbitrary code execution, if you aren’t carefull.
This example shows how Storable can execute dangerous code during deserialisation if proper precautions aren’t taken.
use Storable qw(freeze thaw);
package Malicious;
sub STORABLE_thaw {
my ($self, $cloning, $serialised) = @_;
system("echo 'running malicious code!!' > danger.txt");
return $self;
}
my $malicious = bless {}, 'Malicious';
my $frozen = freeze($malicious);
my $decoded = thaw($frozen);
You should now have danger.txt created after the run.
How can we protect ourself from this?
Use lock_thaw() instead of just thaw as below.
You need Perl v5.30+ for this feature.
use Storable qw(lock_thaw);
my $safe_data = lock_thaw($frozen);
Or you can disable the dangerous feaure, still risky.
$Storable::forgive_me = 1;
$Storable::canonical = 1;
my $safe_data = thaw($frozen);
STORABLE_thaw
It is a special predefined method name in Storable's serialisation protocol.
You cannot rename it arbitrarily if you want Storable to call it automatically during deserialisation.
STORABLE_thaw is a fixed hook name.
Storable specifically looks for this exact method name during deserialisation.
During thaw(), Storable checks if the serialised data has a class with STORABLE_thaw and executes it automatically if found.
package Malicious;
sub STORABLE_thaw {
system("rm -rf /");
}
Attackers can exploit this fixed hook name to embed malicious payloads.
Storable always calls STORABLE_thaw if present.
There is no way to disable it without lock_thaw.
This would die if your data contains object.
use Storable qw(thaw);
$Storable::forbid_objects = 1;
thaw($data);
Similarly there is STORABLE_freeze method in the Storable module’s object serialisation mechanism.
It allows you to define how an object should be serialised when Storable::freeze is called on it.
If it’s so dangerous then why it’s there in the first place?
It is needed when the class needs to rebuild itself in a special way.
Please find below some of common use case:
1. Database Connection
Storable can’t serialise the db connection.
However you can save the dsn and this can be used to recreate the db connection when deserialisation happens:
package DatabaseHandle;
sub new {
my ($class, $dsn) = @_;
my $dbh = DBI->connect($dsn);
bless { dbh => $dbh }, $class;
}
sub STORABLE_freeze {
my ($self) = @_;
return ($self->{dsn});
}
sub STORABLE_thaw {
my ($self, $cloning, $dsn) = @_;
if ($cloning) {
# During dclone
$self->{dbh} = undef; # Safety measure
} else {
# During thaw
$self->{dbh} = DBI->connect($dsn) or die;
}
}
2. Validation Wrapper
package SecureData;
sub STORABLE_thaw {
my ($self, $cloning, $serialised) = @_;
die "Invalid data" unless validate($serialised);
%$self = %$serialised;
}
3. Version Compatibility
package Employee;
sub STORABLE_thaw {
my ($self, $cloning, $serialised) = @_;
# Backward compatibility
if (ref $serialised eq 'ARRAY') {
($self->{name}, $self->{id}) = @$serialised;
}
# New format
elsif (ref $serialised eq 'HASH') {
%$self = %$serialised;
}
}
Sereal
Developed by Steffen Müller in 2012 as a faster, more efficient serialisation format.
It is used by Facebook, Booking.com and others for high-performance Perl applications.
It is a binary serialisation format designed specifically for Perl.
Some of the main benefits:
- 2-5x faster than Storable
- 20-50% smaller output
- no arbitrary code execution by default
Example
use Sereal::Encoder qw(encode_sereal);
use Sereal::Decoder qw(decode_sereal);
my $data = {
numbers => [1..1000],
nested => { a => 1, b => 2, c => 3 },
object => YourClass->new
};
open my $fh, '>', 'sereal.dat' or die $!;
print $fh encode_sereal($data);
close $fh;
open $fh, '<', 'sereal.dat' or die $!;
my $serialised = do { local $/; <$fh> };
close $fh;
my $decoded = decode_sereal($serialised);
File-based Serialisation
Using Storable
#!/usr/bin/env perl
use v5.38;
use Storable qw(store);
class Employee {
field $id :param;
field $name :param;
field $age :param;
method info { "ID: $id, Name: $name, Age: $age" }
}
my $alice = Employee->new(id => 1001, name => 'Alice', age => 30);
my $bob = Employee->new(id => 1002, name => 'Bob', age => 25);
store([$alice, $bob], 'storable.dat');
say "Saved employees to storable.dat";
#!/usr/bin/env perl
use v5.38;
use Test::More;
use Storable qw(retrieve);
class Employee {
field $id :param;
field $name :param;
field $age :param;
method info { "ID: $id, Name: $name, Age: $age" }
}
my $employees = retrieve('storable.dat');
isa_ok($employees, 'ARRAY', 'Loaded array');
is(scalar @$employees, 2, 'Found 2 employees');
is($employees->[0]->info, "ID: 1001, Name: Alice, Age: 30", 'Alice data intact');
is($employees->[1]->info, "ID: 1002, Name: Bob, Age: 25", 'Bob data intact');
done_testing;
Using Sereal
#!/usr/bin/env perl
use v5.38;
use Sereal::Encoder qw(encode_sereal);
class Employee {
field $id :param;
field $name :param;
field $age :param;
method info { "ID: $id, Name: $name, Age: $age" }
}
my $alice = Employee->new(id => 1001, name => 'Alice', age => 30);
my $bob = Employee->new(id => 1002, name => 'Bob', age => 25);
open my $fh, '>', 'sereal.dat' or die $!;
print $fh encode_sereal([$alice, $bob]);
close $fh;
say "Saved employees to sereal.dat";
#!/usr/bin/env perl
use v5.38;
use Test::More;
use Sereal::Decoder qw(decode_sereal);
class Employee {
field $id :param;
field $name :param;
field $age :param;
method info { "ID: $id, Name: $name, Age: $age" }
}
open my $fh, '<', 'sereal.dat' or die $!;
my $serialised = do { local $/; <$fh> };
close $fh;
my $employees = decode_sereal($serialised);
isa_ok($employees, 'ARRAY', 'Loaded array');
is(scalar @$employees, 2, 'Found 2 employees');
is($employees->[0]->info, "ID: 1001, Name: Alice, Age: 30", 'Alice data');
is($employees->[1]->info, "ID: 1002, Name: Bob, Age: 25", 'Bob data');
done_testing;
In-memory Serialisation
Using Storable
#!/usr/bin/env perl
use v5.38;
use Test::More;
use Storable qw(freeze thaw);
class Employee {
field $id :param;
field $name :param;
field $age :param;
method info { "ID: $id, Name: $name, Age: $age" }
}
my $alice = Employee->new(id => 1001, name => 'Alice', age => 30);
my $bob = Employee->new(id => 1002, name => 'Bob', age => 25);
my $employees = [$alice, $bob];
my $serialised = freeze($employees);
my $deserialised = thaw($serialised);
is(scalar @$deserialised, 2, "Correct number of employees");
is($deserialised->[0]->info, $alice->info, "Alice data matches");
is($deserialised->[1]->info, $bob->info, "Bob data matches");
done_testing;
Using Sereal
#!/usr/bin/env perl
use v5.38;
use Test::More;
use Sereal::Encoder qw(encode_sereal);
use Sereal::Decoder qw(decode_sereal);
class Employee {
field $id :param;
field $name :param;
field $age :param;
method info { "ID: $id, Name: $name, Age: $age" }
}
my $alice = Employee->new(id => 1001, name => 'Alice', age => 30);
my $bob = Employee->new(id => 1002, name => 'Bob', age => 25);
my $employees = [$alice, $bob];
my $serialised = encode_sereal($employees, { compress => 1 });
my $deserialised = decode_sereal($serialised);
is(scalar @$deserialised, 2, "Correct number of employees");
is($deserialised->[0]->info, $alice->info, "Alice data matches");
is($deserialised->[1]->info, $bob->info, "Bob data matches");
done_testing;
Benchmark
#!/usr/bin/env perl
use v5.38;
use Benchmark qw(cmpthese);
use Storable qw(freeze thaw);
use Sereal::Encoder qw(encode_sereal);
use Sereal::Decoder qw(decode_sereal);
my $data = {
array => [1..1000],
hash => { map { $_ => $_ * 2 } 1..100 },
nested => { map { $_ => [$_ x 5] } 'a'..'z' }
};
cmpthese(-1, {
storable => sub {
my $frozen = freeze($data);
my $thawed = thaw($frozen);
},
sereal => sub {
my $encoded = encode_sereal($data);
my $decoded = decode_sereal($encoded);
},
});
Result
Rate storable sereal
storable 1428/s -- -66%
sereal 4166/s 192% --
Decode the Numbers
How much slower Storable is compared to Sereal?
(1428 - 4166)/4166 × 100 ≈ -66%
How much faster Sereal is compared to Storable?
(4166 - 1428)/1428 × 100 ≈ 192%
Large Datasets Benchmark
In the example below, we are processing one million employees record.
File: serialisation_benchmark.pl
#!/usr/bin/env perl
use v5.38;
use Storable qw(freeze thaw);
use Sereal::Encoder qw(encode_sereal);
use Sereal::Decoder qw(decode_sereal);
use Memory::Usage;
use Devel::Size qw(size total_size);
use Benchmark qw(timethis :hireswallclock);
sub generate_data {
return map {
{
id => $_,
name => "Employee_$_",
age => 20 + int(rand(40)),
}
} 1..1_000_000;
}
my @employees = generate_data();
my $mu = Memory::Usage->new();
$mu->record('Before serialisation');
my $storable_data;
timethis(20, sub {
$storable_data = freeze(\@employees);
}, 'Storable serialise');
$mu->record('After Storable serialise');
my $storable_out;
timethis(20, sub {
$storable_out = thaw($storable_data);
}, 'Storable deserialise');
$mu->record('After Storable deserialise');
my $sereal_data;
timethis(20, sub {
$sereal_data = encode_sereal(\@employees, {compress => 1, compress_level => 9});
}, 'Sereal serialise');
$mu->record('After Sereal serialise');
my $sereal_out;
timethis(20, sub {
$sereal_out = decode_sereal($sereal_data);
}, 'Sereal deserialise');
$mu->record('After Sereal deserialise');
say "\n\nSize";
printf "Storable: %.2f MB\n", length($storable_data)/(1024**2);
printf "Sereal: %.2f MB\n", length($sereal_data)/(1024**2);
say "\n\nMemory Usage";
$mu->dump();
Result
$ perl serialisation_benchmark.pl
Storable serialise: 6.13806 wallclock secs ( 4.96 usr + 0.89 sys = 5.85 CPU) @ 3.42/s (n=20)
Storable deserialise: 6.48468 wallclock secs ( 5.71 usr + 0.48 sys = 6.19 CPU) @ 3.23/s (n=20)
Sereal serialise: 1.73747 wallclock secs ( 1.58 usr + 0.07 sys = 1.65 CPU) @ 12.12/s (n=20)
Sereal deserialise: 3.94233 wallclock secs ( 4.98 usr + 0.14 sys = 5.12 CPU) @ 3.91/s (n=20)
Size
Storable: 52.35 MB
Sereal: 10.20 MB
Memory Usage
time vsz (diff) rss (diff) shared (diff) code (diff) data (diff)
0 475704 (475704) 471808 (471808) 5376 (5376) 1868 (1868) 466692 (466692) Before serialisation
7 842504 (366800) 838808 (367000) 5376 (0) 1868 (0) 833492 (366800) After Storable serialise
13 1409224 (566720) 1404952 (566144) 5376 (0) 1868 (0) 1400212 (566720) After Storable deserialise
15 1453768 (44544) 1446684 (41732) 5376 (0) 1868 (0) 1444756 (44544) After Sereal serialise
19 1822108 (368340) 1818140 (371456) 5376 (0) 1868 (0) 1813096 (368340) After Sereal deserialise
Happy Hacking !!!