Skip to content

Commit

Permalink
String buffers, part 2e: add serialization string dictionary.
Browse files Browse the repository at this point in the history
Sponsored by fmad.io.
  • Loading branch information
Mike Pall committed Jun 7, 2021
1 parent 4216bdf commit ac02a12
Show file tree
Hide file tree
Showing 10 changed files with 214 additions and 65 deletions.
70 changes: 63 additions & 7 deletions doc/ext_buffer.html
Original file line number Diff line number Diff line change
Expand Up @@ -175,14 +175,19 @@ <h3 id="buffer_overview">Buffer Method Overview</h3>

<h2 id="create">Buffer Creation and Management</h2>

<h3 id="buffer_new"><tt>local buf = buffer.new([size])</tt></h3>
<h3 id="buffer_new"><tt>local buf = buffer.new([size [,options]])<br>
local buf = buffer.new([options])</tt></h3>
<p>
Creates a new buffer object.
</p>
<p>
The optional <tt>size</tt> argument ensures a minimum initial buffer
size. This is strictly an optimization for cases where the required
buffer size is known beforehand.
size. This is strictly an optimization when the required buffer size is
known beforehand. The buffer space will grow as needed, in any case.
</p>
<p>
The optional table <tt>options</tt> sets various
<a href="#serialize_options">serialization options</a>.
</p>

<h3 id="buffer_reset"><tt>buf = buf:reset()</tt></h3>
Expand All @@ -205,7 +210,7 @@ <h3 id="buffer_free"><tt>buf = buf:free()</tt></h3>

<h2 id="write">Buffer Writers</h2>

<h3 id="buffer_put"><tt>buf = buf:put([str|num|obj] [, ...])</tt></h3>
<h3 id="buffer_put"><tt>buf = buf:put([str|num|obj] [,])</tt></h3>
<p>
Appends a string <tt>str</tt>, a number <tt>num</tt> or any object
<tt>obj</tt> with a <tt>__tostring</tt> metamethod to the buffer.
Expand All @@ -217,7 +222,7 @@ <h3 id="buffer_put"><tt>buf = buf:put([str|num|obj] [, ...])</tt></h3>
writes to use a single buffer.
</p>

<h3 id="buffer_putf"><tt>buf = buf:putf(format, ...)</tt></h3>
<h3 id="buffer_putf"><tt>buf = buf:putf(format, )</tt></h3>
<p>
Appends the formatted arguments to the buffer. The <tt>format</tt>
string supports the same options as <tt>string.format()</tt>.
Expand Down Expand Up @@ -298,7 +303,7 @@ <h3 id="buffer_length"><tt>len = #buf</tt></h3>
Returns the current length of the buffer data in bytes.
</p>

<h3 id="buffer_concat"><tt>res = str|num|buf .. str|num|buf [...]</tt></h3>
<h3 id="buffer_concat"><tt>res = str|num|buf .. str|num|buf []</tt></h3>
<p>
The Lua concatenation operator <tt>..</tt> also accepts buffers, just
like strings or numbers. It always returns a string and not a buffer.
Expand All @@ -319,7 +324,7 @@ <h3 id="buffer_skip"><tt>buf = buf:skip(len)</tt></h3>
length of the buffer data.
</p>

<h3 id="buffer_get"><tt>str, ... = buf:get([len|nil] [,...])</tt></h3>
<h3 id="buffer_get"><tt>str, = buf:get([len|nil] [,])</tt></h3>
<p>
Consumes the buffer data and returns one or more strings. If called
without arguments, the whole buffer data is consumed. If called with a
Expand Down Expand Up @@ -444,6 +449,56 @@ <h3 id="buffer_decode"><tt>obj = buffer.decode(str)<br>
any left-over data in the buffer.
</p>

<h3 id="serialize_options">Serialization Options</h3>
<p>
The <tt>options</tt> table passed to <tt>buffer.new()</tt> may contain
the following members (all optional):
</p>
<ul>
<li>
<tt>dict</tt> is a Lua table holding a <b>dictionary of strings</b> that
commonly occur as table keys of objects you are serializing. These keys
are compactly encoded as indexes during serialization. A well chosen
dictionary saves space and improves serialization performance.
</li>
</ul>
<p>
<tt>dict</tt> needs to be an array of strings, starting at index 1 and
without holes (no <tt>nil</tt> inbetween). The table is anchored in the
buffer object and internally modified into a two-way index (don't do
this yourself, just pass a plain array). The table must not be modified
after it has been passed to <tt>buffer.new()</tt>.
</p>
<p>
The <tt>dict</tt> tables used by the encoder and decoder must be the
same. Put the most common entries at the front. Extend at the end to
ensure backwards-compatibility &mdash; older encodings can then still be
read. You may also set some indexes to <tt>false</tt> to explicitly drop
backwards-compatibility. Old encodings that use these indexes will throw
an error when decoded.
</p>
<p>
Note: parsing and preparation of the options table is somewhat
expensive. Create a buffer object only once and recycle it for multiple
uses. Avoid mixing encoder and decoder buffers, since the
<tt>buf:set()</tt> method frees the already allocated buffer space:
</p>
<pre class="code">
local options = {
dict = { "commonly", "used", "string", "keys" },
}
local buf_enc = buffer.new(options)
local buf_dec = buffer.new(options)

local function encode(obj)
return buf_enc:reset():encode(obj):get()
end

local function decode(str)
return buf_dec:set(str):decode()
end
</pre>

<h3 id="serialize_stream">Streaming Serialization</h3>
<p>
In some contexts, it's desirable to do piecewise serialization of large
Expand Down Expand Up @@ -536,6 +591,7 @@ <h3 id="serialize_format">Serialization Format Specification</h3>
complex → 0x12 re.L im.L // FFI complex

string → (0x20+len).U len*char.B
| 0x0f (index-1).U // Dict entry

.B = 8 bit
.I = 32 bit little-endian
Expand Down
60 changes: 42 additions & 18 deletions src/lib_buffer.c
Original file line number Diff line number Diff line change
Expand Up @@ -29,9 +29,7 @@
#include "lj_serialize.h"
#include "lj_lib.h"

/* ------------------------------------------------------------------------ */

#define LJLIB_MODULE_buffer_method
/* -- Helper functions ---------------------------------------------------- */

/* Check that the first argument is a string buffer. */
static SBufExt *buffer_tobuf(lua_State *L)
Expand All @@ -49,11 +47,16 @@ static LJ_AINLINE SBufExt *buffer_tobufw(lua_State *L)
return sbx;
}

#define buffer_toudata(sbx) ((GCudata *)(sbx)-1)

/* -- Buffer methods ------------------------------------------------------ */

#define LJLIB_MODULE_buffer_method

LJLIB_CF(buffer_method_free)
{
SBufExt *sbx = buffer_tobuf(L);
lj_bufx_free(G(L), sbx);
lj_bufx_init(L, sbx);
lj_bufx_free(L, sbx);
L->top = L->base+1; /* Chain buffer object. */
return 1;
}
Expand Down Expand Up @@ -83,6 +86,7 @@ LJLIB_CF(buffer_method_skip)
LJLIB_CF(buffer_method_set)
{
SBufExt *sbx = buffer_tobuf(L);
GCobj *ref;
const char *p;
MSize len;
#if LJ_HASFFI
Expand All @@ -98,9 +102,11 @@ LJLIB_CF(buffer_method_set)
p = strdata(str);
len = str->len;
}
lj_bufx_free(G(L), sbx);
lj_bufx_init_cow(L, sbx, p, len);
setgcref(sbx->cowref, gcV(L->base+1));
lj_bufx_free(L, sbx);
lj_bufx_set_cow(L, sbx, p, len);
ref = gcV(L->base+1);
setgcref(sbx->cowref, ref);
lj_gc_objbarrier(L, buffer_toudata(sbx), ref);
L->top = L->base+1; /* Chain buffer object. */
return 1;
}
Expand Down Expand Up @@ -249,8 +255,7 @@ LJLIB_CF(buffer_method_decode)
LJLIB_CF(buffer_method___gc)
{
SBufExt *sbx = buffer_tobuf(L);
lj_bufx_free(G(L), sbx);
lj_bufx_init(L, sbx);
lj_bufx_free(L, sbx);
return 0;
}

Expand All @@ -272,24 +277,41 @@ LJLIB_CF(buffer_method___len)
LJLIB_PUSH("buffer") LJLIB_SET(__metatable)
LJLIB_PUSH(top-1) LJLIB_SET(__index)

/* ------------------------------------------------------------------------ */
/* -- Buffer library functions -------------------------------------------- */

#define LJLIB_MODULE_buffer

LJLIB_PUSH(top-2) LJLIB_SET(!) /* Set environment. */

LJLIB_CF(buffer_new)
{
MSize sz = L->base == L->top ? 0u :
(MSize)lj_lib_checkintrange(L, 1, 0, LJ_MAX_BUF);
GCtab *env = tabref(curr_func(L)->c.env);
GCudata *ud = lj_udata_new(L, sizeof(SBufExt), env);
SBufExt *sbx = (SBufExt *)uddata(ud);
MSize sz = 0;
int targ = 1;
GCtab *env, *dict = NULL;
GCudata *ud;
SBufExt *sbx;
if (L->base < L->top && !tvistab(L->base)) {
targ = 2;
if (!tvisnil(L->base))
sz = (MSize)lj_lib_checkintrange(L, 1, 0, LJ_MAX_BUF);
}
if (L->base+targ-1 < L->top) {
GCtab *options = lj_lib_checktab(L, targ);
cTValue *opt_dict = lj_tab_getstr(options, lj_str_newlit(L, "dict"));
if (opt_dict && tvistab(opt_dict)) {
dict = tabV(opt_dict);
lj_serialize_dict_prep(L, dict);
}
}
env = tabref(curr_func(L)->c.env);
ud = lj_udata_new(L, sizeof(SBufExt), env);
ud->udtype = UDTYPE_BUFFER;
/* NOBARRIER: The GCudata is new (marked white). */
setgcref(ud->metatable, obj2gco(env));
setudataV(L, L->top++, ud);
sbx = (SBufExt *)uddata(ud);
lj_bufx_init(L, sbx);
setgcref(sbx->dict, obj2gco(dict));
if (sz > 0) lj_buf_need2((SBuf *)sbx, sz);
return 1;
}
Expand All @@ -298,7 +320,8 @@ LJLIB_CF(buffer_encode)
{
cTValue *o = lj_lib_checkany(L, 1);
SBufExt sbx;
lj_bufx_init_borrow(L, &sbx, &G(L)->tmpbuf);
memset(&sbx, 0, sizeof(SBufExt));
lj_bufx_set_borrow(L, &sbx, &G(L)->tmpbuf);
lj_serialize_put(&sbx, o);
setstrV(L, L->top++, lj_buf_str(L, (SBuf *)&sbx));
lj_gc_check(L);
Expand All @@ -309,7 +332,8 @@ LJLIB_CF(buffer_decode)
{
GCstr *str = lj_lib_checkstrx(L, 1);
SBufExt sbx;
lj_bufx_init_cow(L, &sbx, strdata(str), str->len);
memset(&sbx, 0, sizeof(SBufExt));
lj_bufx_set_cow(L, &sbx, strdata(str), str->len);
/* No need to set sbx.cowref here. */
setnilV(L->top++);
lj_serialize_get(&sbx, L->top-1);
Expand Down
16 changes: 9 additions & 7 deletions src/lj_buf.h
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ typedef struct SBufExt {
MRef bsb; /* Borrowed string buffer. */
};
char *r; /* Read pointer. */
GCRef dict; /* Serialization string dictionary table. */
int depth; /* Remaining recursion depth. */
} SBufExt;

Expand Down Expand Up @@ -114,19 +115,17 @@ static LJ_AINLINE void lj_bufx_init(lua_State *L, SBufExt *sbx)
setsbufXL(sbx, L, SBUF_FLAG_EXT);
}

static LJ_AINLINE void lj_bufx_init_borrow(lua_State *L, SBufExt *sbx, SBuf *sb)
static LJ_AINLINE void lj_bufx_set_borrow(lua_State *L, SBufExt *sbx, SBuf *sb)
{
memset(sbx, 0, sizeof(SBufExt));
setsbufXL(sbx, L, SBUF_FLAG_EXT | SBUF_FLAG_BORROW);
setmref(sbx->bsb, sb);
sbx->r = sbx->w = sbx->b = sb->b;
sbx->e = sb->e;
}

static LJ_AINLINE void lj_bufx_init_cow(lua_State *L, SBufExt *sbx,
const char *p, MSize len)
static LJ_AINLINE void lj_bufx_set_cow(lua_State *L, SBufExt *sbx,
const char *p, MSize len)
{
memset(sbx, 0, sizeof(SBufExt));
setsbufXL(sbx, L, SBUF_FLAG_EXT | SBUF_FLAG_COW);
sbx->r = sbx->b = (char *)p;
sbx->w = sbx->e = (char *)p + len;
Expand All @@ -142,9 +141,12 @@ static LJ_AINLINE void lj_bufx_reset(SBufExt *sbx)
sbx->r = sbx->w = sbx->b;
}

static LJ_AINLINE void lj_bufx_free(global_State *g, SBufExt *sbx)
static LJ_AINLINE void lj_bufx_free(lua_State *L, SBufExt *sbx)
{
if (!sbufiscow(sbx)) lj_mem_free(g, sbx->b, sbufsz(sbx));
if (!sbufiscow(sbx)) lj_mem_free(G(L), sbx->b, sbufsz(sbx));
setsbufXL(sbx, L, SBUF_FLAG_EXT);
setgcrefnull(sbx->cowref);
sbx->r = sbx->w = sbx->b = sbx->e = NULL;
}

/* Low-level buffer put operations */
Expand Down
2 changes: 2 additions & 0 deletions src/lj_errmsg.h
Original file line number Diff line number Diff line change
Expand Up @@ -182,8 +182,10 @@ ERRDEF(FFI_NYICALL, "NYI: cannot call this C function (yet)")

#if LJ_HASBUFFER
/* String buffer errors. */
ERRDEF(BUFFER_BADOPT, "bad options table")
ERRDEF(BUFFER_BADENC, "cannot serialize " LUA_QS)
ERRDEF(BUFFER_BADDEC, "cannot deserialize tag 0x%02x")
ERRDEF(BUFFER_BADDICTX, "cannot deserialize dictionary index %d")
ERRDEF(BUFFER_DEPTH, "too deep to serialize")
ERRDEF(BUFFER_DUPKEY, "duplicate table key")
ERRDEF(BUFFER_EOB, "unexpected end of buffer")
Expand Down
5 changes: 3 additions & 2 deletions src/lj_gc.c
Original file line number Diff line number Diff line change
Expand Up @@ -67,9 +67,10 @@ static void gc_mark(global_State *g, GCobj *o)
gc_markobj(g, tabref(gco2ud(o)->env));
if (LJ_HASBUFFER && gco2ud(o)->udtype == UDTYPE_BUFFER) {
SBufExt *sbx = (SBufExt *)uddata(gco2ud(o));
if (sbufiscow(sbx) && gcref(sbx->cowref) != NULL) {
if (sbufiscow(sbx) && gcref(sbx->cowref))
gc_markobj(g, gcref(sbx->cowref));
}
if (gcref(sbx->dict))
gc_markobj(g, gcref(sbx->dict));
}
} else if (LJ_UNLIKELY(gct == ~LJ_TUPVAL)) {
GCupval *uv = gco2uv(o);
Expand Down
2 changes: 1 addition & 1 deletion src/lj_obj.h
Original file line number Diff line number Diff line change
Expand Up @@ -923,7 +923,7 @@ static LJ_AINLINE void setgcV(lua_State *L, TValue *o, GCobj *v, uint32_t it)
}

#define define_setV(name, type, tag) \
static LJ_AINLINE void name(lua_State *L, TValue *o, type *v) \
static LJ_AINLINE void name(lua_State *L, TValue *o, const type *v) \
{ \
setgcV(L, o, obj2gco(v), tag); \
}
Expand Down
Loading

0 comments on commit ac02a12

Please sign in to comment.