Data is encoded in nodes, which can be of several types. The types fall into two categories:
Scalar types
Scalars encode a simple value. That can be a number or the NULL value. The five integer numeric types
are also available as NegativeByte, NegativeMedium and so on.
Byte - 8 bit integer
Short - 16 bit integer
Medium - 24 bit integer
Long - 32 bit integer
Huge - 64 bit integer
Float64 - 64 bit IEEE754 double-precision
Null
True
False
Collection types
Collections encode multiple values.
Text
Encodes a list of characters - that is, a string.
Array
Encodes a list of nodes, which can themselves be of any type.
Dictionary
Encodes a list of key-value pairs, the keys being strings or numbers and the values being nodes
of any type. The keys must be stored in ASCIIbetical order. Note that while the use of numeric
types is *permitted* for keys it is not recommended, as you may run into problems finding
floating point keys because of the usual floating point imprecision issues.
Each node is encoded as a typeheader occupying from 1 to 9 bytes, followed by data if necessary
NODETYPEHEADERS
The type header consists of a typespecifier followed by up to 8 bytes telling us how much data is in the
node. The type specifier is a bit field. The first two bits will tell us whether the node is a
collection or not.
0b00 - Text node
0b01 - Array node
0b10 - Dictionary node
0b11 - it's not a collection, it's a scalar node
The next four bits tell us, for scalar nodes, the type, or for collection nodes some of them tell us what
type is used to encode the collection's length. Only Byte, Short, Medium, and Long are valid for lengths.
0b0000 - Byte (valid as a length) 0b0001 - NegativeByte
0b0010 - Short (valid as a length) 0b0011 - NegativeShort
0b0100 - Medium (valid as a length) 0b0101 - NegativeMedium
0b0110 - Long (valid as a length) 0b0111 - NegativeLong
0b1000 - Huge 0b1001 - NegativeHuge
0b1010 - Null
0b1011 - Float64
0b1100 - True
0b1101 - False
Any unspecified bits or combinations of bits are reserved for future use. Unspecified bits should be set
to zero if you want your data to be compatible with future versions.
NODEDATANUMERICNODES
The header is followed by the appropriate number of bytes of data.
NULL,TRUEandFALSENODES
These are just a header.
TEXTNODES
The header is followed by the appropriate number of bytes to encode the text's length, followed by that
many bytes of text. Note that text lengths are stored in bytes but text is actually encoded in UTF-8. So
the 3 character string "北京市" is stored as the 9 bytes:
北: 0xE5 0x8C 0x97 京: 0xE4 0xBA 0xAC 市: 0xE5 0xB8 0x82
and the entire node would be the 11 bytes:
0b00-0000-00: this is a Text node, with the length stored in a Byte
0x09: the length of the text
0xE5 ... 0x82: nine bytes of text
ARRAYNODES
The header is followed by the appropriate number of bytes to encode the number of elements in the array,
"N". Zero obviously means an empty array. That is immediately followed by "N" pointers of the size
specified in the database header. Each pointer is the location in the file of another node, which can be
of any type.
DICTIIONARYNODES
The hader is followed by the appropriate number of bytes to encode the number of elements in
thedictionary, "N". Zero means an empty dictionary. That is immediately followed by "N" pairs of pointers
of the size specifed in the database header. The first pointer in each pair must point to a Text or
numeric node which will be used as a key for looking up values. The second pointer in each pair points to
the value, which can be any type of node. The pointers to keys must list them in ASCIIbetical order. If
they are out of order some elements may not be able to be found.
perl v5.36.0 2023-10-30 Data::CompactReadonly::V0::Format(3pm)