151 lines
4.8 KiB
ReStructuredText
151 lines
4.8 KiB
ReStructuredText
|
============
|
||
|
CBOR binding
|
||
|
============
|
||
|
|
||
|
Overview
|
||
|
========
|
||
|
|
||
|
C functions to encode/decode values in CBOR format, and a simple command
|
||
|
line utility to convert between JSON and CBOR.
|
||
|
|
||
|
To integrate CBOR into your application:
|
||
|
|
||
|
* Call ``duk_cbor_encode()`` and ``duk_cbor_decode()`` directly if a C API
|
||
|
is enough.
|
||
|
|
||
|
* Call ``duk_cbor_init()`` to register a global ``CBOR`` object with
|
||
|
ECMAScript bindings ``CBOR.encode()`` and ``CBOR.decode()``, roughly
|
||
|
matching https://github.com/paroga/cbor-js.
|
||
|
|
||
|
Basic usage of the ``jsoncbor`` conversion tool::
|
||
|
|
||
|
$ make jsoncbor
|
||
|
[...]
|
||
|
$ cat test.json | ./jsoncbor -e # writes CBOR to stdout
|
||
|
$ cat test.cbor | ./jsoncbor -d # writes JSON to stdout
|
||
|
|
||
|
CBOR objects are decoded into ECMAScript objects, with non-string keys
|
||
|
coerced into strings.
|
||
|
|
||
|
Direct support for CBOR is likely to be included in the Duktape API in the
|
||
|
future. This extra will then become unnecessary.
|
||
|
|
||
|
CBOR
|
||
|
====
|
||
|
|
||
|
CBOR is a standard format for JSON-like binary interchange. It is
|
||
|
faster and smaller, and can encode more data types than JSON. In particular,
|
||
|
binary data can be serialized without encoding e.g. in base-64. These
|
||
|
properties make it useful for storing state files, IPC, etc.
|
||
|
|
||
|
Some CBOR shortcomings for preserving information:
|
||
|
|
||
|
* No property attribute or inheritance support.
|
||
|
|
||
|
* No DAGs or looped graphs.
|
||
|
|
||
|
* Array objects with properties lose their non-index properties.
|
||
|
|
||
|
* Array objects with gaps lose their gaps as they read back as undefined.
|
||
|
|
||
|
* Buffer objects and views lose much of their detail besides the raw data.
|
||
|
|
||
|
* ECMAScript strings cannot be fully represented; strings must be UTF-8.
|
||
|
|
||
|
* Functions and native objects lose most of their detail.
|
||
|
|
||
|
* CBOR tags are useful to provide soft decoding information, but the tags
|
||
|
are just integers from an IANA controlled space with no space for custom
|
||
|
tags. So tags cannot be easily used for private, application specific tags.
|
||
|
IANA allows reserving custom tags with little effort however, see
|
||
|
https://www.iana.org/assignments/cbor-tags/cbor-tags.xhtml.
|
||
|
|
||
|
Future work
|
||
|
===========
|
||
|
|
||
|
General:
|
||
|
|
||
|
* Add flags to control encode/decode behavior.
|
||
|
|
||
|
* Allow decoding with a trailer so that stream parsing is easier.
|
||
|
Similar change would be useful for JSON decoding.
|
||
|
|
||
|
* Reserve CBOR tag for missing value.
|
||
|
|
||
|
* Reserve other necessary CBOR tags.
|
||
|
|
||
|
* Explicit support for encoding with and without side effects (e.g.
|
||
|
skipping Proxy traps and getters).
|
||
|
|
||
|
* JSON encoding supports .toJSON(), maybe something like .toCBOR()?
|
||
|
|
||
|
* Optimize encoding and decoding more.
|
||
|
|
||
|
Encoding:
|
||
|
|
||
|
* Tagging of typed arrays:
|
||
|
https://datatracker.ietf.org/doc/draft-ietf-cbor-array-tags/.
|
||
|
Mixed endian encode must convert to e.g. little endian because
|
||
|
no mixed endian tag exists.
|
||
|
|
||
|
* Encoding typed arrays as integer arrays instead?
|
||
|
|
||
|
* Float16Array encoding support (once/if supported by main engine).
|
||
|
|
||
|
* Tagging of array gaps, once IANA reservation is complete:
|
||
|
https://github.com/svaarala/duktape/blob/master/doc/cbor-missing-tag.rst.
|
||
|
|
||
|
* Support 64-bit integer when encoding, e.g. up to 2^53?
|
||
|
|
||
|
* Definite-length object encoding even when object has more than 23 keys.
|
||
|
|
||
|
* Map/Set encoding (once supported in the main engine), maybe tagged
|
||
|
so they decode back into Map/Set.
|
||
|
|
||
|
* Bigint encoding (once supported in the main engine), as tagged byte
|
||
|
strings like in Python CBOR.
|
||
|
|
||
|
* String encoding options: combining surrogate pairs, tagging non-UTF-8
|
||
|
byte strings so they decode back to string, using U+FFFD replacement,
|
||
|
etc.
|
||
|
|
||
|
* Detection of Symbols, encode them in a useful tagged form.
|
||
|
|
||
|
* Better encoding of functions.
|
||
|
|
||
|
* Hook for serialization, to allow caller to serialize values (especially
|
||
|
objects) in a context specific manner (e.g. serialize functions with
|
||
|
IPC metadata to allow them to be called remotely). Such a hook should
|
||
|
be able to emit tag(s) to mark custom values for decode processing.
|
||
|
|
||
|
Decoding:
|
||
|
|
||
|
* Typed array decoding support. Should decoder convert to host
|
||
|
endianness?
|
||
|
|
||
|
* Float16Array decoding support (once/if supported by main engine).
|
||
|
|
||
|
* Decoding objects with non-string keys, could be represented as a Map.
|
||
|
|
||
|
* Use bare objects and arrays when decoding?
|
||
|
|
||
|
* Use a Map rather than a plain object when decoding, which would allow
|
||
|
non-string keys.
|
||
|
|
||
|
* Bigint decoding (once supported in the main engine).
|
||
|
|
||
|
* Decoding of non-BMP codepoints into surrogate pairs.
|
||
|
|
||
|
* Decoding of Symbols when call site indicates it is safe.
|
||
|
|
||
|
* Hooking for revival, to allow caller to revive objects in a context
|
||
|
specific manner (e.g. revive serialized function objects into IPC
|
||
|
proxy functions). Such a hook should have access to encoding tags,
|
||
|
so that revival can depend on tags present.
|
||
|
|
||
|
* Option to compact decoded objects and arrays.
|
||
|
|
||
|
* Improve fastint decoding support, e.g. decode non-optimally encoded
|
||
|
integers as fastints, decode compatible floating point values as
|
||
|
fastints.
|