OVMS3/OVMS.V3/components/duktape/extras/cbor
2022-04-06 00:04:46 +02:00
..
cbordecode.py Initial commit, fork from original Project 2022-04-06 00:04:46 +02:00
duk_cbor.c Initial commit, fork from original Project 2022-04-06 00:04:46 +02:00
duk_cbor.h Initial commit, fork from original Project 2022-04-06 00:04:46 +02:00
jsoncbor.c Initial commit, fork from original Project 2022-04-06 00:04:46 +02:00
Makefile Initial commit, fork from original Project 2022-04-06 00:04:46 +02:00
README.rst Initial commit, fork from original Project 2022-04-06 00:04:46 +02:00
run_testvectors.js Initial commit, fork from original Project 2022-04-06 00:04:46 +02:00

============
CBOR binding
============

Overview
========

C functions to encode/decode values in CBOR format, and a simple command
line utility to convert between JSON and CBOR.

To integrate CBOR into your application:

* Call ``duk_cbor_encode()`` and ``duk_cbor_decode()`` directly if a C API
  is enough.

* Call ``duk_cbor_init()`` to register a global ``CBOR`` object with
  ECMAScript bindings ``CBOR.encode()`` and ``CBOR.decode()``, roughly
  matching https://github.com/paroga/cbor-js.

Basic usage of the ``jsoncbor`` conversion tool::

    $ make jsoncbor
    [...]
    $ cat test.json | ./jsoncbor -e   # writes CBOR to stdout
    $ cat test.cbor | ./jsoncbor -d   # writes JSON to stdout

CBOR objects are decoded into ECMAScript objects, with non-string keys
coerced into strings.

Direct support for CBOR is likely to be included in the Duktape API in the
future.  This extra will then become unnecessary.

CBOR
====

CBOR is a standard format for JSON-like binary interchange.  It is
faster and smaller, and can encode more data types than JSON.  In particular,
binary data can be serialized without encoding e.g. in base-64.  These
properties make it useful for storing state files, IPC, etc.

Some CBOR shortcomings for preserving information:

* No property attribute or inheritance support.

* No DAGs or looped graphs.

* Array objects with properties lose their non-index properties.

* Array objects with gaps lose their gaps as they read back as undefined.

* Buffer objects and views lose much of their detail besides the raw data.

* ECMAScript strings cannot be fully represented; strings must be UTF-8.

* Functions and native objects lose most of their detail.

* CBOR tags are useful to provide soft decoding information, but the tags
  are just integers from an IANA controlled space with no space for custom
  tags.  So tags cannot be easily used for private, application specific tags.
  IANA allows reserving custom tags with little effort however, see
  https://www.iana.org/assignments/cbor-tags/cbor-tags.xhtml.

Future work
===========

General:

* Add flags to control encode/decode behavior.

* Allow decoding with a trailer so that stream parsing is easier.
  Similar change would be useful for JSON decoding.

* Reserve CBOR tag for missing value.

* Reserve other necessary CBOR tags.

* Explicit support for encoding with and without side effects (e.g.
  skipping Proxy traps and getters).

* JSON encoding supports .toJSON(), maybe something like .toCBOR()?

* Optimize encoding and decoding more.

Encoding:

* Tagging of typed arrays:
  https://datatracker.ietf.org/doc/draft-ietf-cbor-array-tags/.
  Mixed endian encode must convert to e.g. little endian because
  no mixed endian tag exists.

* Encoding typed arrays as integer arrays instead?

* Float16Array encoding support (once/if supported by main engine).

* Tagging of array gaps, once IANA reservation is complete:
  https://github.com/svaarala/duktape/blob/master/doc/cbor-missing-tag.rst.

* Support 64-bit integer when encoding, e.g. up to 2^53?

* Definite-length object encoding even when object has more than 23 keys.

* Map/Set encoding (once supported in the main engine), maybe tagged
  so they decode back into Map/Set.

* Bigint encoding (once supported in the main engine), as tagged byte
  strings like in Python CBOR.

* String encoding options: combining surrogate pairs, tagging non-UTF-8
  byte strings so they decode back to string, using U+FFFD replacement,
  etc.

* Detection of Symbols, encode them in a useful tagged form.

* Better encoding of functions.

* Hook for serialization, to allow caller to serialize values (especially
  objects) in a context specific manner (e.g. serialize functions with
  IPC metadata to allow them to be called remotely).  Such a hook should
  be able to emit tag(s) to mark custom values for decode processing.

Decoding:

* Typed array decoding support.  Should decoder convert to host
  endianness?

* Float16Array decoding support (once/if supported by main engine).

* Decoding objects with non-string keys, could be represented as a Map.

* Use bare objects and arrays when decoding?

* Use a Map rather than a plain object when decoding, which would allow
  non-string keys.

* Bigint decoding (once supported in the main engine).

* Decoding of non-BMP codepoints into surrogate pairs.

* Decoding of Symbols when call site indicates it is safe.

* Hooking for revival, to allow caller to revive objects in a context
  specific manner (e.g. revive serialized function objects into IPC
  proxy functions).  Such a hook should have access to encoding tags,
  so that revival can depend on tags present.

* Option to compact decoded objects and arrays.

* Improve fastint decoding support, e.g. decode non-optimally encoded
  integers as fastints, decode compatible floating point values as
  fastints.