DISCLAIMER: Image is generated using ChatGPT
.
1. Introduction
2. Storable
3. Sereal
4. File-based Serialisation
5. In-memory Serialisation
6. Benchmark
Introduction
Serialisation
is the process of converting complex Perl
data structures (hashes, arrays, objects) into a format that can be stored or transmitted and later reconstructed.
Two major Perl
serialisation CPAN
modules: Storable and Sereal
Storable – A core Perl module for fast serialisation.
Sereal – A high-performance alternative with better speed and compression.
Local System Configuration
$ sudo lshw -short | grep -E 'processor|memory|storage|volume'
/0/0 memory 15GiB System memory
/0/1 processor 13th Gen Intel(R) Core(TM) i9-13900HX
/0/3 storage Virtio 1.0 console
/0/7 scsi0 storage
/0/7/0.0.0 /dev/sda volume 388MiB Virtual Disk
/0/7/0.0.1 /dev/sdb volume 4GiB Virtual Disk
/0/7/0.0.2 /dev/sdc volume 1TiB Virtual Disk
Storable
Introduced in Perl 5.0
in 1994
as a core module.
It uses binary format for compact storage.
Storable
is one of Perl
's core modules for serialisation and deserialisation of Perl
data structures.
It can handle nested data structures, objects, and references.
It provides both functional and OO interfaces.
It generally performs better than text-based serialisation formats e.g YAML
, JSON
etc.
Always use $Storable::forgive_me = 1
and $Storable::canonical = 1
for security when processing untrusted input.
Consider using the lock_store
and lock_retrieve
functions from Storable v2.53+
which are safer by default.
Example
A very simple example where we have three different types of data.
use Storable qw(store retrieve freeze thaw);
my $data = {
numbers => [1..1000],
nested => { a => 1, b => 2, c => 3 },
object => YourClass->new
};
store($data, 'storable.dat');
my $deserialised = retrieve('storable.dat');
my $frozen = freeze($data);
my $thawed = thaw($frozen);
Arbitrary Code Execution Risk
There is always a risk of arbitrary code execution, if you aren’t carefull.
This example shows how Storable
can execute dangerous code during deserialisation if proper precautions aren’t taken.
use Storable qw(freeze thaw);
package Malicious;
sub STORABLE_thaw {
my ($self, $cloning, $serialised) = @_;
system("echo 'running malicious code!!' > danger.txt");
return $self;
}
my $malicious = bless {}, 'Malicious';
my $frozen = freeze($malicious);
my $decoded = thaw($frozen);
You should now have danger.txt
created after the run.
How can we protect ourself from this?
Use lock_thaw()
instead of just thaw
as below.
You need Perl v5.30+
for this feature.
use Storable qw(lock_thaw);
my $safe_data = lock_thaw($frozen);
Or you can disable the dangerous feaure, still risky.
$Storable::forgive_me = 1;
$Storable::canonical = 1;
my $safe_data = thaw($frozen);
STORABLE_thaw
It is a special predefined method name in Storable
's serialisation protocol.
You cannot rename it arbitrarily if you want Storable
to call it automatically during deserialisation.
STORABLE_thaw
is a fixed hook name.
Storable
specifically looks for this exact method name during deserialisation.
During thaw()
, Storable
checks if the serialised data has a class with STORABLE_thaw
and executes it automatically if found.
package Malicious;
sub STORABLE_thaw {
system("rm -rf /");
}
Attackers can exploit this fixed hook name to embed malicious payloads.
Storable
always calls STORABLE_thaw
if present.
There is no way to disable it without lock_thaw
.
This would die
if your data contains object
.
use Storable qw(thaw);
$Storable::forbid_objects = 1;
thaw($data);
Similarly there is STORABLE_freeze
method in the Storable
module’s object serialisation mechanism.
It allows you to define how an object should be serialised when Storable::freeze
is called on it.
If it’s so dangerous then why it’s there in the first place?
It is needed when the class needs to rebuild itself in a special way.
Please find below some of common use case:
1. Database Connection
Storable
can’t serialise the db connection
.
However you can save the dsn
and this can be used to recreate the db connection when deserialisation happens:
package DatabaseHandle;
sub new {
my ($class, $dsn) = @_;
my $dbh = DBI->connect($dsn);
bless { dbh => $dbh }, $class;
}
sub STORABLE_freeze {
my ($self) = @_;
return ($self->{dsn});
}
sub STORABLE_thaw {
my ($self, $cloning, $dsn) = @_;
if ($cloning) {
# During dclone
$self->{dbh} = undef; # Safety measure
} else {
# During thaw
$self->{dbh} = DBI->connect($dsn) or die;
}
}
2. Validation Wrapper
package SecureData;
sub STORABLE_thaw {
my ($self, $cloning, $serialised) = @_;
die "Invalid data" unless validate($serialised);
%$self = %$serialised;
}
3. Version Compatibility
package Employee;
sub STORABLE_thaw {
my ($self, $cloning, $serialised) = @_;
# Backward compatibility
if (ref $serialised eq 'ARRAY') {
($self->{name}, $self->{id}) = @$serialised;
}
# New format
elsif (ref $serialised eq 'HASH') {
%$self = %$serialised;
}
}
Sereal
Developed by Steffen Müller
in 2012
as a faster, more efficient serialisation format.
It is used by Facebook
, Booking.com
and others for high-performance Perl
applications.
It is a binary serialisation format designed specifically for Perl
.
Some of the main benefits:
- 2-5x faster than Storable
- 20-50% smaller output
- no arbitrary code execution by default
Example
use Sereal::Encoder qw(encode_sereal);
use Sereal::Decoder qw(decode_sereal);
my $data = {
numbers => [1..1000],
nested => { a => 1, b => 2, c => 3 },
object => YourClass->new
};
open my $fh, '>', 'sereal.dat' or die $!;
print $fh encode_sereal($data);
close $fh;
open $fh, '<', 'sereal.dat' or die $!;
my $serialised = do { local $/; <$fh> };
close $fh;
my $decoded = decode_sereal($serialised);
File-based Serialisation
Using Storable
#!/usr/bin/env perl
use v5.38;
use Storable qw(store);
class Employee {
field $id :param;
field $name :param;
field $age :param;
method info { "ID: $id, Name: $name, Age: $age" }
}
my $alice = Employee->new(id => 1001, name => 'Alice', age => 30);
my $bob = Employee->new(id => 1002, name => 'Bob', age => 25);
store([$alice, $bob], 'storable.dat');
say "Saved employees to storable.dat";
#!/usr/bin/env perl
use v5.38;
use Test::More;
use Storable qw(retrieve);
class Employee {
field $id :param;
field $name :param;
field $age :param;
method info { "ID: $id, Name: $name, Age: $age" }
}
my $employees = retrieve('storable.dat');
isa_ok($employees, 'ARRAY', 'Loaded array');
is(scalar @$employees, 2, 'Found 2 employees');
is($employees->[0]->info, "ID: 1001, Name: Alice, Age: 30", 'Alice data intact');
is($employees->[1]->info, "ID: 1002, Name: Bob, Age: 25", 'Bob data intact');
done_testing;
Using Sereal
#!/usr/bin/env perl
use v5.38;
use Sereal::Encoder qw(encode_sereal);
class Employee {
field $id :param;
field $name :param;
field $age :param;
method info { "ID: $id, Name: $name, Age: $age" }
}
my $alice = Employee->new(id => 1001, name => 'Alice', age => 30);
my $bob = Employee->new(id => 1002, name => 'Bob', age => 25);
open my $fh, '>', 'sereal.dat' or die $!;
print $fh encode_sereal([$alice, $bob]);
close $fh;
say "Saved employees to sereal.dat";
#!/usr/bin/env perl
use v5.38;
use Test::More;
use Sereal::Decoder qw(decode_sereal);
class Employee {
field $id :param;
field $name :param;
field $age :param;
method info { "ID: $id, Name: $name, Age: $age" }
}
open my $fh, '<', 'sereal.dat' or die $!;
my $serialised = do { local $/; <$fh> };
close $fh;
my $employees = decode_sereal($serialised);
isa_ok($employees, 'ARRAY', 'Loaded array');
is(scalar @$employees, 2, 'Found 2 employees');
is($employees->[0]->info, "ID: 1001, Name: Alice, Age: 30", 'Alice data');
is($employees->[1]->info, "ID: 1002, Name: Bob, Age: 25", 'Bob data');
done_testing;
In-memory Serialisation
Using Storable
#!/usr/bin/env perl
use v5.38;
use Test::More;
use Storable qw(freeze thaw);
class Employee {
field $id :param;
field $name :param;
field $age :param;
method info { "ID: $id, Name: $name, Age: $age" }
}
my $alice = Employee->new(id => 1001, name => 'Alice', age => 30);
my $bob = Employee->new(id => 1002, name => 'Bob', age => 25);
my $employees = [$alice, $bob];
my $serialised = freeze($employees);
my $deserialised = thaw($serialised);
is(scalar @$deserialised, 2, "Correct number of employees");
is($deserialised->[0]->info, $alice->info, "Alice data matches");
is($deserialised->[1]->info, $bob->info, "Bob data matches");
done_testing;
Using Sereal
#!/usr/bin/env perl
use v5.38;
use Test::More;
use Sereal::Encoder qw(encode_sereal);
use Sereal::Decoder qw(decode_sereal);
class Employee {
field $id :param;
field $name :param;
field $age :param;
method info { "ID: $id, Name: $name, Age: $age" }
}
my $alice = Employee->new(id => 1001, name => 'Alice', age => 30);
my $bob = Employee->new(id => 1002, name => 'Bob', age => 25);
my $employees = [$alice, $bob];
my $serialised = encode_sereal($employees, { compress => 1 });
my $deserialised = decode_sereal($serialised);
is(scalar @$deserialised, 2, "Correct number of employees");
is($deserialised->[0]->info, $alice->info, "Alice data matches");
is($deserialised->[1]->info, $bob->info, "Bob data matches");
done_testing;
Benchmark
#!/usr/bin/env perl
use v5.38;
use Benchmark qw(cmpthese);
use Storable qw(freeze thaw);
use Sereal::Encoder qw(encode_sereal);
use Sereal::Decoder qw(decode_sereal);
my $data = {
array => [1..1000],
hash => { map { $_ => $_ * 2 } 1..100 },
nested => { map { $_ => [$_ x 5] } 'a'..'z' }
};
cmpthese(-1, {
storable => sub {
my $frozen = freeze($data);
my $thawed = thaw($frozen);
},
sereal => sub {
my $encoded = encode_sereal($data);
my $decoded = decode_sereal($encoded);
},
});
Result
Rate storable sereal
storable 1428/s -- -66%
sereal 4166/s 192% --
Decode the Numbers
How much slower Storable
is compared to Sereal
?
(1428 - 4166)/4166 × 100 ≈ -66%
How much faster Sereal
is compared to Storable
?
(4166 - 1428)/1428 × 100 ≈ 192%
Large Datasets Benchmark
In the example below, we are processing one million employees record.
File: serialisation_benchmark.pl
#!/usr/bin/env perl
use v5.38;
use Storable qw(freeze thaw);
use Sereal::Encoder qw(encode_sereal);
use Sereal::Decoder qw(decode_sereal);
use Memory::Usage;
use Devel::Size qw(size total_size);
use Benchmark qw(timethis :hireswallclock);
sub generate_data {
return map {
{
id => $_,
name => "Employee_$_",
age => 20 + int(rand(40)),
}
} 1..1_000_000;
}
my @employees = generate_data();
my $mu = Memory::Usage->new();
$mu->record('Before serialisation');
my $storable_data;
timethis(20, sub {
$storable_data = freeze(\@employees);
}, 'Storable serialise');
$mu->record('After Storable serialise');
my $storable_out;
timethis(20, sub {
$storable_out = thaw($storable_data);
}, 'Storable deserialise');
$mu->record('After Storable deserialise');
my $sereal_data;
timethis(20, sub {
$sereal_data = encode_sereal(\@employees, {compress => 1, compress_level => 9});
}, 'Sereal serialise');
$mu->record('After Sereal serialise');
my $sereal_out;
timethis(20, sub {
$sereal_out = decode_sereal($sereal_data);
}, 'Sereal deserialise');
$mu->record('After Sereal deserialise');
say "\n\nSize";
printf "Storable: %.2f MB\n", length($storable_data)/(1024**2);
printf "Sereal: %.2f MB\n", length($sereal_data)/(1024**2);
say "\n\nMemory Usage";
$mu->dump();
Result
$ perl serialisation_benchmark.pl
Storable serialise: 6.13806 wallclock secs ( 4.96 usr + 0.89 sys = 5.85 CPU) @ 3.42/s (n=20)
Storable deserialise: 6.48468 wallclock secs ( 5.71 usr + 0.48 sys = 6.19 CPU) @ 3.23/s (n=20)
Sereal serialise: 1.73747 wallclock secs ( 1.58 usr + 0.07 sys = 1.65 CPU) @ 12.12/s (n=20)
Sereal deserialise: 3.94233 wallclock secs ( 4.98 usr + 0.14 sys = 5.12 CPU) @ 3.91/s (n=20)
Size
Storable: 52.35 MB
Sereal: 10.20 MB
Memory Usage
time vsz (diff) rss (diff) shared (diff) code (diff) data (diff)
0 475704 (475704) 471808 (471808) 5376 (5376) 1868 (1868) 466692 (466692) Before serialisation
7 842504 (366800) 838808 (367000) 5376 (0) 1868 (0) 833492 (366800) After Storable serialise
13 1409224 (566720) 1404952 (566144) 5376 (0) 1868 (0) 1400212 (566720) After Storable deserialise
15 1453768 (44544) 1446684 (41732) 5376 (0) 1868 (0) 1444756 (44544) After Sereal serialise
19 1822108 (368340) 1818140 (371456) 5376 (0) 1868 (0) 1813096 (368340) After Sereal deserialise
Happy Hacking !!!