Using Sereal in the way that is optimally performant for your use case can make quite a significant
difference in performance. Broadly speaking, there are two classes of tweaks you can do: choosing the
right options during encoding (sometimes incurring trade-offs in output size) and calling the Sereal
encode/decode functions in the most efficient way.
If you are not yet using re-usable Sereal::Encoder and Sereal::Decoder objects, then read no further. By
switching from the "encode_sereal" and "decode_sereal" functions to either the OO interface or the
advanced functional interface, you will get a noticeable speed boost as encoder and decoder structures
can be reused. This is particularly significant for the encoder, which can re-use its output buffer. In
some cases, such a warmed-up encoder can avoid most memory allocations.
Irepeat,ifyoucareaboutperformance,thendonotusethe"encode_sereal"and"decode_sereal"interface.
The exact performance in time and space depends heavily on the data structure to be (de-)serialized.
Often there is a trade-off between space and time. If in doubt, do your own testing and most importantly
ALWAYSTESTWITHREALDATA. If you care purely about speed at the expense of output size, you can use the
"no_shared_hashkeys" option for a small speed-up, see below. If you need smaller output at the cost of
higher CPU load and more memory used during encoding/decoding, try the "dedupe_strings" option and enable
Snappy compression.
For ready-made comparison scripts, see the author_tools/bench.pl and author_tools/dbench.pl programs that
are part of this distribution. Suffice to say that this library is easily competitive in both time and
space efficiency with the best alternatives.
If switching to the OO interface is not enough, you may consider switching to the advanced functional
interface that avoids method lookup overhead, and by inlining as custom Perl OPs, may also avoid some of
the Perl function call overhead (Perl 5.14 and up). This additional speed-up is only a constant-offset,
avoiding said method/function call, rather than speeding up encoding itself and so will be most
significant if you are working with very small data sets.
"sereal_encode_with_object" and "sereal_decode_with_object" are optionally exported from the Sereal
module (or "Sereal::Encoder" and "Sereal::Decoder" respectively). They work the same as the object-
oriented interface except that they are invoked differently:
$srl_doc = $encoder->encode($data);
becomes
$srl_doc = sereal_encode_with_object($encoder, $data);
and
$data = $decoder->decode($srl_doc);
becomes
$data = sereal_decode_with_object($decoder, $srl_doc);
On Perl versions before 5.14, this will be marginally faster than the OO interface as it avoids method
lookup. This should rarely matter. On Perl versions starting from 5.14, the function call to
"sereal_encode_with_object" or "sereal_decode_with_object" will also be replaced with a custom Perl OP,
thus avoiding most of the function call overhead as well.
Tuningthe"Sereal::Encoder"
Several of the "Sereal::Encoder" options add or remove useful behaviour and some of them come at a
runtime performance cost.
"no_shared_hashkeys"
By default, Sereal will emit a "repetition" marker for hash keys that were already previously
encountered. Depending on your data structure, this can save quite a bit of space in the generated
document. Consider, for example, encoding an array of many objects of the same class. But it may not
save anything if you don't have a lot of repeated hash keys or don't even encode any hashes to begin
with.
In those cases, you can turn this feature off with the "no_shared_hashkeys" option for a small but
measurable speed-up.
"dedupe_strings"
If set, this option will apply the de-duplication logic to all strings that is only applied to hash
keys by default. This can be quite expensive in both memory and performance. The same is true for
"aliased_dedupe_strings".
"snappy" and "snappy_incr"
Enabling Snappy compression can (but doesn't have to) make your Sereal documents significantly smaller.
How effective this compression is for you depends entirely on the nature of your data. Snappy
compression is designed to be very fast. The additional space savings are very often worth the small
overhead.
"freeze_callbacks"
Using custom Perl "FREEZE" callbacks is very expensive. If enabled, the encoder has to do a method
lookup at least once per class of an object being serialized. If a "FREEZE" hook actually exists,
calling it will be even more expensive. If you care about ultimate performance, use with care.
"sort_keys"
This option forces the encoder to always "sort" the entries in a hash by its keys before writing them
to the Sereal document. This can be somewhat expensive for large hashes.
GeneralConsiderations
Perl variables (scalars specifically) can, at the same time, hold multiple representations of the same
data. If you create and integer and use it as a string, it will be cached in its string form. Sereal
attempts to detect the most compact of these representations for encoding, but can not always succeed.
For example, if a data structure was previously also traversed by certain other serialization modules
(such as Storable), then the scalars in the structure may have been irrevocably upgraded to a more
complex (and bigger) type. This is only an issue in crude benchmarks. So if you plan to benchmark
serialization, take care not to re-use the test data structure between serializers for results that do
not depend on the order of operations.