Chapter 16: Advanced Blob Features
There are two large features that involve complex uses of blobs that are supported in the default runtime. These are Blob Storage and Backed Tables, the latter of which uses key and value blobs for generic schema.
Blob Storage
The general idea here is that you might want to take arbitrary data that is in a cursor, which is to say any database shape at all, and convert it into a single blob. You could use this blob to store composite data in a single column in the database or to send your data over some kind of wire transport in a generic fashion. The idea of blob storage is to provide a reslient way to do this that is platform neutral (i.e. you can store the blob on one system and recover it on another system with different endian order). Blob storage allows limited extension of the blob shape over time, allowing you to add new nullable columns at the end of your shape and still recover your data from older blobs.
Defining a shape for Blob Storage
In SQL/CQL, the main way you define structures, especially those that you want to maintain, is with tables. Hence we introduce notation like this:
[[blob_storage]]
create table news_info(
who text,
what text,
`when` long -- timestamp of some kind
);
The blob_storage attribute
indicates that the table we’re about
to define here is not really going to be a materialized table. As a
result, you will not be able to (e.g.) DROP
the table or SELECT
from it, and there will be no schema upgrade for it should you request
one. However, the usual schema maintenance rules still apply (See
Chapter 10
and
Chapter 11
) which help you to
create compatible versions of this structure. For instance, new columns
can be added only at the end, and only if they are nullable. Here we add
source
to the schema in a hypothetical “version 6”.
[[blob_storage]]
create table news_info(
who text,
what text,
`when` long -- timestamp of some kind
source text @create(6)
);
NOTE: schema versions move forward globally in the schema, not locally in one table; this implies there are versions 1-5 elsewhere, not shown.
Additionally, since the storage is not backed by SQLite and therefore
lacks its constraint system, default values and constraints are
not allowed in a table marked with cql:blob_storage
; it’s just
data. Similarly, triggers, views, and indices may not use the table
marked as blob storage. It isn’t really a table.
Declaring Blob Storage
Blob storage goes in a blob field, but recall CQL has discriminated types so we can use a form like this:
create table info(
id long primary key,
news_info blob<news_info>
);
From a SQL perspective news_info
is just a blob, you can only apply
blob operations to it in the context of a query. This means WHERE
clauses that partially constraint the blob contents are not generally
possible (though you could do it if you write suitable UDFs and
Backed Tables
actually generalize this if generic schema support
is what is desired). Blob storage is really about moving the whole blob
around so it’s apprpropriate when you want to crack the blob in code,
and not so much in database operations.
Creating Blobs with Blob Storage
You can use the to_blob
pipeline function on a cursor to get a blob.
In the below C:to_blob
becomes simply cql_cursor_to_blob(C)
.
create proc make_blob(like news_info, out result blob<news_info>)
begin
-- declare the cursor
cursor C like news_info;
-- load it from loose arguments
fetch c from arguments;
-- set the blob variable
set result := C:to_blob;
END;
The above declares a cursor, loads it from argument values, and converts it to a blob. Of course any of the usual cursor building forms can be used to power your blob creation, you just do one serialization at the end. The above is assembling a blob from arguments but you could equally make the blob from data.
create proc get_news_info(id_ long not null, out result blob<news_info>)
begin
-- use our @columns sugar syntax for getting just news_info columns from
-- a table with potentially lots of stuff (or an error if it's missing columns)
cursor C for
select @columns(like news_info) from some_source_of_info where info.id = id_;
fetch c;
set result := C:to_blob;
END;
There are many cursor fetch forms, including dummy data forms and other interesting bits of sugar. You can fetch a cursor from arguments, from other cursors, and even combinations. Because cursors are the source of new blobs, any of these data acquistion forms are viable and convenient sources of data with which to make blobs.
Unpacking Blob Storage
Again, the normal way that you work with records in CQL is by creating suitable cursors. Such cursors can be economically accessed on a field-by-field basis. What we need is a way to easily recreate a cursor from a blob so we can read the data values. To do this use this form:
-- get the blob from somewhere (b will be of type blob<news_info>)
let b := (select news_info from info where id = id_ if nothing null);
-- create a suitable cursor with the same shape
cursor C like b;
-- load the cursor (note that this can throw an exception if the blob is corrupt)
C:from_blob(b);
-- now use C.who, C.what, etc.
Once the data is loaded in a cursor it is very economical to access on a field-by-field basis, and, since the deserialization of the blob happened all at once, that can also be economical.
NOTE: The runtime cannot assume that the blob is well formed, it could be coming from anywhere. If only for security reasons it must assume the blob is “hostile”. Hence the decoding validates the shape, internal lengths, and so forth. Therefore there are many ways conversion might fail.
Once you have the cursor you can do any of the usual data operations; you could
even make new blobs with different combinations by slicing the cursor fields
using the LIKE
operator. You can return the cursor with OUT
, or OUT UNION
,
or pass the blob fields as arguments to functions using the FROM
forms. The cracked blob is fully usable for all the usual CQL things you might
need to do. Importantly, blob storage can contain other blobs so you can nest
shapes as needed.
Blob Storage Representation
The blob data must be able to evolve over time, so each blob has to be self-describing. We must also be able to throw an exception if an incorrect or invalid blob is used when loading a cursor, so the blob has to contain the following:
- the number of columns in the blob data type when it was created, the count is inferred from the field types which are null terminated
- the type of each field (this is encoded as a single plain-text character)
- the types are bool, int, long, (double) real, (string) text, blob;
- we use ‘f’ (flag) for bools, hence the codes are “fildsb”
- these are encoded with one letter each, upper case meaning ’not null’ so the storage might be “LFss”
- the blob begins with a null terminated string that serves for both the count and the types
- Each nullable field may be present or null; 1 bit is used to store this fact. The bits are in an array of bytes that comes immediately after the type info (the type info implicitly tells us the size of this array)
- Boolean values are likewise encoded as bits within the same array, so the total number of bits stored is nullables plus booleans (nullable booleans use 2 bits, even if null both bits are stored)
- When reading a newer version of a record from an older piece of data that is missing a column then the column is assumed to be NULL
- Any columns added after the initial version (using @create) must be nullable; this is normal for adding columns to existing schema
- Integers and longs are stored in a
varint
/VLQ
format afterzigzag
encoding (same article) - Text is stored inline in null terminated strings (embedded nulls are not allowed in CQL text)
- Nested blobs are stored inline, with a length prefix encoded like any other int (varint zigzag)
- Floating point is stored in IEEE 754 format which is already highly portable
- None of these encodings have endian issues, they are fully specified byte orders
Blob Storage Customization
As with many other features, it’s possible to replace the (de)serialization with code of your choice by supplying your own runtime methods.
Storage types that are going to be persisted in the database or go over a wire-protocol should be managed like schema with the usual validation rules. On the other hand, formats that will be used only transiently in memory can be changed at whim from version to version. As mentioned above, the design specifically considers cases where a new client discovers an old-format blob (with fewer columns) and, the reverse, cases where an old client recieves a datagram from a new client with too many columns. Any customizations should consider these same cases.
Blob Storage Example
The code samples below illustrate some of the more common blob operations that are likely to come up.
[[blob_storage]]
create table news_info(
who text,
what text,
`when` long -- timestamp of some kind
);
-- a place where the blob appears in storage
create table some_table(
x integer,
y integer,
news_blob blob<news_info>
);
-- a procedure that creates a news_info blob from loose args
create proc make_blob(like news_info, out result blob<news_info>)
begin
cursor C like news_info;
fetch C from arguments;
result := C:to_blob;
end;
-- a procedure that cracks the blob, creating a cursor
create proc crack_blob(data blob<news_info>)
begin
cursor C like news_info;
C:from_blob(data);
out C;
end;
-- A procedure that cracks the blob into loose args if needed
-- the OUT statement was created specifically to allow you to
-- avoid this sort mass OUT awfulness consider that the blob
-- might have dozens of columns, this quickly gets unweildy.
-- But, as an illustration, here is explicit extraction.
create proc crack_blob_to_vars(
data blob<news_info>,
out who text,
out what text,
out `when` long) -- when is a keyword, `when` is not.
begin
cursor C like news_info;
C:from_blob(data);
who := C.who;
what := C.what;
`when` := C.`when`;
end;
-- we're going to have a result with x/y columns in it
-- this shape lets us select those out easily using LIKE
interface my_basic_columns (
x int,
y int
);
-- this defines a shape for the result we want
-- we're never actually defining this procedure
interface my_result_shape (
like my_basic_columns,
like news_info
);
create proc select_and_crack(whatever bool not null)
begin
cursor C for select * from some_table where whatever;
loop fetch c
begin
-- crack the blob in c into a new cursor 'news'
declare news cursor like news_info;
fetch news from blob c.news_blob;
-- assemble the result we want from the parts we have
declare result cursor like my_result_shape;
fetch result from values (from c like my_basic_columns, from news);
-- emit one row, some came from the original data some cracked
-- callers will never know that some data was encoded in a blob
out union result;
end;
end;
Deprecated Syntax
These forms are no longer supported:
-- loading a blob from a cursor
set a_blob from cursor C;
-- new supported forms, these are all the same thing
let b := C:to_blob;
C:to_blob(b);
let b := cql_cursor_to_blob(C);
cql_cursor_to_blob(C, b);
-- loading a cursor from a blob
fetch C from blob b;
-- new supported forms, these are all the same thing
C:from_blob(b);
cql_cursor_from_blob(C, b);
Backed Tables
Most production databases include some tables that are fairly generic, they use maybe a simple key-value combination to store some simple settings or something like that. In the course of feature development this kind of thing comes up pretty often and in large client applications there are many small features that need a little bit of state.
It’s easy enough to model whatever state you need with a table or two but this soon results in an explosion of tiny tables. In some cases there are only a few rows of configuration data and indeed the situation can be so bad that the text of the schema for the little state table is larger than the sum of all the data you will ever store there. This situation is a bit tragic because SQLite has initialization cost associated with each table. So these “baby tables” are really not paying for themselves at all.
What we’d like to do is use some kind of generic table as the backing store for many of these small tables while preserving type safety. The cost of access might be a bit higher but since data volumes are expected to be low anyway this would be a good trade-off. And we can have as many as we like. In some cases the state doesn’t even need to be persisted, so we’re talking about tables in an in-memory database. Here low cost of initialization is especially important. And lastly, if your project has dozens or even hundreds of small features like this, the likelihood that all of them are even used in any one session is quite low and so again, having a low fixed cost for the schema is a good thing. No need to create a few hundred in-memory tables on the off chance that they are needed.
Defining Backed Tables
First you need a place to store the data. We using “backing” tables to store the data for “backed” tables. And we define a backing table in the usual way. A simple backing table is just a key/value store and looks like this:
[[backing_table]]
CREATE TABLE backing(
k BLOB PRIMARY KEY,
v BLOB NOT NULL
);
The backing_table
attribute indicates that the table we’re about to
define is to be used for backing storage. At present it is signficantly
restricted. It has to have exactly two columns, both of which are blobs,
one is the key and one is the value. It should be either “baseline”
schema (no @create annotation) or annotated with @create
as it is
expected to be precious data. @recreate
is an error. If it’s an
in-memory table then versioning is somewhat moot but really the backing
store schema is not supposed to change over time, that’s the point.
As showwn below there are additional attributes that can be put on the backing table to define the functions that will be used to store data there.
In future versions we expect to allow some number of additional physical columns which can be used by the backed tables (discussed below) but for now only the simple key/value pattern is allowed. The columns can have any names but as they will be frequently used short names like “k” and “v” are recommended.
Backed tables look like this:
[[backed_by=backing]]
CREATE TABLE backed(
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
bias REAL
);
[[backed_by=backing]]
CREATE TABLE backed2(
id INTEGER PRIMARY KEY,
name TEXT NOT NULL
);
The backed_by
attribute indicates that the table we’re about to define
is not really going to be its own table. As a result, you will not be
able to (e.g.) DROP
the table or CREATE INDEX
or CREATE TRIGGER
on it, and there will be no schema upgrade for it should you request one
with --rt schema_upgrade
. The table may not contain constraints as
there would be no way to enforce them, but they may have default values.
As compensation for these restrictions, backed tables can be changed
freely and have no associated physical schema cost.
NOTE: Adding new not null columns creatives effectively a new backed table, any previous data will seem “lost”. See below.
Reading Data From Backed Tables
To understand how reading works, imagine that we had a VIEW for each backed table which simply reads the blobs out of the backing store and then extracts the backed columns using some blob extraction functions. This would work, but then we’d be trading view schema for table schema so the schema savings we’re trying to achieve would be lost.
We can’t use an actual VIEW but CQL already has something very “view like” – the shared fragment structure. So what CQL does instead of views is to automatically create a shared fragment just like the view we could have made. The shared fragment looks like this:
[[shared_fragment]]
CREATE PROC _backed ()
BEGIN
SELECT
rowid,
cql_blob_get(T.k, backed.id) AS id,
cql_blob_get(T.v, backed.name) AS name,
cql_blob_get(T.v, backed.bias) AS bias
FROM backing AS T
WHERE cql_blob_get_type(backed, T.k) = 2105552408096159860L;
END;
Two things to note:
First, this fragment has the right shape, but the shared fragment
doesn’t directly call blob extraction UDFs. Rather it uses indirect
functions like cql_blob_get
. The point of these helpers is to make
the actual blob functions configurable. The default runtime includes
an implementation of extration functions with the default names, but
you can create whatever blob format you want by defining suitable functions.
You can even have different encodings in different backed tables.
Second, there is a type code embedded in the procedure. The type code is a hash of the type name and the names and types of all the not-null fields in the backed table. The hash is arbitrary but repeatable, any system can compute the same hash and find the records they want without having to share headers. The actual hash function is included in the open source but it’s just a SHA256 reduced to 64 bits with some name canonicalization. The JSON output also includes the relevant hashes so you can easily consume them without even having to know the hash function.
As a second example, the expanded code for backed2
is shown below:
[[shared_fragment]]
CREATE PROC _backed2 ()
BEGIN
SELECT
rowid,
cql_blob_get(T.k, backed2.id) AS id,
cql_blob_get(T.v, backed2.name) AS name
FROM backing AS T
WHERE cql_blob_get_type(backed, T.k) = -1844763880292276559L;
END;
As you can see it’s very similar – the type hash is different and of course it has different columns but the pattern should be clear.
Computation of the Type Hash
The type hash, which is also sometimes called the record type, is designed to stay fixed over time even if you add new optional fields. Contrariwise, if you change the name of the type or if you add new not null fields the type identity is considered to be changed and any data you have in the backing table will basically be ignored because the type hash will not match. This is lossy but safe. More complicated migrations from one type shape to another are possible by introducing a new backed type and moving data.
cql_blob_get
and cql_blob_get_type
By default cql_blob_get
turns into either bgetkey
or bgetval
depending on
if you are reading from the key blob or the value blob. This is controlled by
attributes on the backing table. All users of that table will apply the same
functions.
[[get_key = bgetkey]]
[[get_val = bgetval]]
[[get_type = bgetkey_type]]
The key can no optional fields, it is formed from the primary key of the backing table. As a result it’s possible to index the fields by number and infer their types. Values, on the other hand, are expected to have nullable fields so some might be missing. They are indexed using a hash of the field name and the type, similar to the type code that identifies the backed table in the storage.
These attributes control offsets vs. codes:
[[use_key_codes]] -- omit and keys use offsets by default
[[use_val_offsets]] -- omit and values use codes by default
Here the offset means the zero based ordinal of the column in the key or the value. The order is the order that they appear in the table definition, there might be gaps because keys and values could interleave, each gets its own ordinals.
[[backed_by=something]]
create table foo(
x int, -- ordinal 0 for keys
a int, -- ordinal 0 for values
b int -- ordinal 1 for values
y int, -- ordinal 1 for keys
c int -- ordinal 2 for values
primary key (x,y)
);
NOTE: The blob format for keys must be canonical in the sense that the same values always produce the same blob, even after replacements, so if you use a field id based blob it will be important to always store the fields in the same order. Contrariwise use of offsets for the value blob indicates that your particular value blob will have fixed storage that is addressable by offset. This may be suitable for your needs, for instance maybe you only ever need to store 0-3 integers in the value.
JSON options
The attributes [[json]]
and [[jsonb]]
cause the compiler to ignore the other
backing attributes and use a JSON array for the keys (with the leading column
being the type) and a JSON. This requires that the SQLite you are using
supports the JSON functions, in particular json_array
, json_object
,
json_set
, and the ->>
extraction operator. Some stock versions of Linux do
not have them but any recent SQLite amalgamation will have these features.
Selecting from a Backed Table
Armed with these basic transforms we can already do a simple transform to make select statement work. Suppose CQL sees:
cursor C for select * from backed;
The compiler can make this select statement work with a simple transform:
cursor C FOR WITH
backed (*) AS (CALL _backed())
SELECT *
FROM backed;
Now remember _backed
was the automatically created shared fragment.
Basically, when the compiler encounters a select statement that mentions
any backed table it adds a call to the corresponding shared fragment in
the WITH
clause, creating a `WITH`` clause if needed. This effectively
creates necessary “view”. And, because we’re using the shared fragment
form, all users of this fragment will share the text of the view.
So there is no schema and the “view” text of the backed table appears
only once in the binary. More precisely we get this after full expansion:
WITH
backed (rowid, id, name, bias) AS (
SELECT
rowid,
bgetkey(T.k, 0), -- 0 is offset of backed.id in key blob
bgetval(T.v, -6639502068221071091L), -- note hash of backed.name
bgetval(T.v, -3826945563932272602L) -- note hash of backed.bias
FROM backing AS T
WHERE bgetkey_type(T.k) = 2105552408096159860L)
SELECT rowid, id, name, bias
FROM backed;
Now with this in mind we can see that it would be very beneficial to also add this:
[[deterministic]]
declare select function bgetkey_type(b blob) long;
CREATE INDEX backing_index ON backing(bgetkey_type(k));
or more cleanly:
CREATE INDEX backing_index ON backing(cql_blob_get_type(backing, k));
Either of these results in a computed index on the row type stored in the blob. Other physical indices might be helpful too and these can potentially be shared by many backed tables, or used in partial indicies.
NOTE: Your type function can be named something other than the default
bgetkey_type
.
Now consider a slightly more complex example:
select T1.* from backed T1 join backed2 T2 where T1.id = T2.id;
This becomes:
WITH
backed (rowid, id, name, bias) AS (CALL _backed()),
backed2 (rowid, id, name) AS (CALL _backed2())
SELECT T1.*
FROM backed AS T1
INNER JOIN backed2 AS T2
WHERE T1.id = T2.id;
Now even though two different backed tables will be using the backing
store, the original select “just works” once the CTE’s have been added.
All the compiler had to do was add both backed table fragments. Even if
backed
was joined against itself, that would also just work.
Inserting Into a Backed Table
It will be useful to consider a simple example such as:
insert into backed values (1, "n001", 1.2), (2, "n002", 3.7);
This statement has to insert into the backing storage, converting the various values into key and value blobs. The compiler uses a simple transform to do this job as well. The above becomes:
WITH
_vals (id, name, bias) AS (
VALUES(1, "n001", 1.2), (2, "n002", 3.7)
)
INSERT INTO backing(k, v) SELECT
cql_blob_create(backed, V.id, backed.id),
cql_blob_create(backed,
V.name, backed.name,
V.bias, backed.bias)
FROM _vals AS V;
Again the compiler is opting for a transform that is universal and here the issue is that the data to be inserted can be arbitrarily complicated. It might include nested select expressions, value computations, or any similar thing. In this particular case the data is literal values but in general the values could be anything.
To accompodate this possiblity the compiler’s transform takes the original
values and puts them in its own CTE _vals
. It then generates a new
insert statement targetting the backing store by converting _vals into
two blobs – one for the key and one for the value. There is only the one
place it needs to do this for any given insert
statement no matter now
many items are inserted or how complex the insertion computation might be.
The compiler uses cql_blob_create
as a wrapper to that can expand
to a user configured function with optional hash codes and mandatory
field types. The default configuration that corresponds to this:
[[create_key = bcreatekey]]
[[create_val = bcreateval]]
The final SQL for an insert operation looks like this:
WITH
_vals (id, name, bias) AS (
VALUES(1, "n001", 1.2), (2, "n002", 3.7)
)
INSERT INTO backing(k, v) SELECT
bcreatekey(2105552408096159860, V.id, 1), -- type 1 is integer, offset implied
bcreateval(2105552408096159860,
-6639502068221071091, V.name, 4, -- hash as before, type 4 is text,
-3826945563932272602, V.bias, 3) -- hash as before, type 3 is real,
FROM _vals AS V
As can be seen, both blobs have the same overall type code
(2105552408096159860) as in the select case. The key blob is configured
for to use offsets and the argument positions give the implied offset.
In contrast the value blob is using hash codes (offset
was not
specified). This configuration is typical.
A more complex insert works just as well:
insert into backed
select id+10, name||'x', bias+3 from backed where id < 3;
The above insert statement is a bit of a mess. It’s taking some of
the backed data and using that data to create new backed data. But the
simple transform above works just as before. We add the needed backed
CTE for the select and create _vals
like before.
WITH
backed (*) AS (CALL _backed()),
_vals (id, name, bias) AS (
SELECT id + 10, name || 'x', bias + 3
FROM backed
WHERE id < 3
)
INSERT INTO backing(k, v)
SELECT
cql_blob_create(backed, V.id, backed.id),
cql_blob_create(backed, V.name, backed.name, V.bias, backed.bias)
FROM _vals AS V;
Looking closely at the above we see a few things:
cql_blob_create
will expand as before (not shown)- we added
backed(*)
as usual _vals
once again just has the exact unchanged insert clause- the
insert into backing(k, v)
part is identical, the same recipe always works
Deleting From a Backed Table
Again, we begin with a simple example:
delete from backed where id = 7;
The compiler requires a transformation that is quite general and while this case is simple the where condition could be very complicated. Fortunately there is such a simple transform:
WITH
backed (*) AS (CALL _backed())
DELETE FROM backing
WHERE rowid IN (
SELECT rowid
FROM backed
WHERE id = 7
);
All the compiler has to do here is:
- add the usual
_backed
CTE - move the original
WHERE
clause into a subordinateSELECT
that gives us the rowids to delete.
With the backed table in scope, any WHERE
clause works. If other backed
tables are mentioned, the compiler simply adds those as usual.
Below is a more complicated delete, it’s a bit crazy but illustrative:
delete from backed where
id in (select id from backed2 where name like '%x%');
So this is using rows in backed2
to decide which rows to deleted in
backed
. The same simple transform works directly.
WITH
backed2 (*) AS (CALL _backed2()),
backed (*) AS (CALL _backed())
DELETE FROM backing
WHERE rowid IN (
SELECT rowid
FROM backed
WHERE id IN (
SELECT id FROM backed2 WHERE name LIKE '%x%'
)
);
What happened:
- the
WHERE
clause went directly into the body of the rowid select backed
was used as before but now we also needbacked2
The delete pattern does not need any additional cql helpers beyond what we’ve already seen.
Updating Backed Tables
The update
statement is the most complicated of all the DML forms, it requires
all the transforms from the previous statements plus one additional transform.
First, the compiler requires two more blob helpers that are configurable. By default they look like this:
[[update_key = bupdatekey]]
[[update_val = bupdateval]];
These are used to replace particular columns in a stored blob. Now let’s start with a very simple update to see now it all works:
update backed set name = 'foo' where id = 5;
Fundamentally we need to do these things:
- the target of the update has to end up being the backing table
- we need the backed table CTE so we can do the filtering
- we want to use the rowid trick to figure out which rows to update which
handles our
where
clause - we need to modify the existing key and/or value blobs rather than create them from scratch
Applying all of the above, we get a transform like the following:
WITH
backed (*) AS (CALL _backed())
UPDATE backing
SET v = cql_blob_update(v, 'foo', backed.name)
WHERE rowid IN (SELECT rowid
FROM backed
WHERE id = 5);
Looking into the details:
- we needed the normal CTE so that we can use
backed
rows - the
WHERE
clause moved into aWHERE rowid
sub-select just like in theDELETE
case - they compiler changed the SET targets to be
k
andv
very much like theINSERT
case, except we used an update helper that takes the current blob and creates a new blob to store- the helper is varargs so as we’ll see it can mutate many columns in one call
The above gives a update statement that is almost working. The remaining
problem is that it is possible to use the existing column values in the update
expressions and there is no way to use our backed
CTE to get those values in
that context since the final update has to be all relative to the backing table.
Let’s look at another example to illustrate this last wrinkle:
update backed set name = name || 'y' where bias < 5;
This update basically adds the letter ‘y’ to the name of some rows. This is a
silly example but this kind of thing happens in many contexts that are
definitely not silly. To make these cases work the reference to name
inside
of the set expression has to change. We end up with something like this:
WITH
backed (*) AS (CALL _backed())
UPDATE backing
SET v = cql_blob_update(v,
cql_blob_get(v, backed.name) || 'y',
backed.name)
WHERE rowid IN (SELECT rowid
FROM backed
WHERE bias < 5);
Importantly the reference to name
in the set expression was changed to
cql_blob_get(v, backed.name)
– extracting the name from the value blob. After
which it is appended with ‘y’ as usual.
The rest of the pattern is just as it was after the first attempt above, in fact
literally everything else is unchanged. It’s easy to see that the WHERE
clause could be arbitrarily complex with no difficulty.
NOTE: Since the
UPDATE
statement has noFROM
clause only the fields in the target table might need to be rewritten, so in this casename
,id
, andbias
were possible but onlyname
was mentioned.
After the cql_blob_get
and cql_blob_update
are expanded the result looks
like this:
WITH
backed (rowid, id, name, bias) AS (
SELECT
rowid,
bgetkey(T.k, 0),
bgetval(T.v, -6639502068221071091L),
bgetval(T.v, -3826945563932272602L)
FROM backing AS T
WHERE bgetkey_type(T.k) = 2105552408096159860L
)
UPDATE backing
SET v =
bupdateval(
v,
-6639502068221071091L, bgetval(v, -6639502068221071091L) || 'y', 4
)
WHERE rowid IN (SELECT rowid
FROM backed
WHERE bias < 5);
NOTE: The blob update function for the value blob requires the original blob, the hash or offset to update, the new value, and the type of the new value. The blob update function for the key blob is the nearly same (blob, hash/offset, value) but the type is not required since the key blob necessarily has all the fields present because they are necessarily not null. Therefore the type codes are already all present and so the type of every column is known. The value blob might be missing nullable values hence their type might not be stored/known.
Normally backed tables are used without having to know the details of the transforms and the particulars of how each of the helper UDFs is invoked but for illustration purposes we can make another small example that shows a few more variations that might be created. In this examples keys and values need to be mutated.
[[backed_by=backing]]
create table sample(
name text,
state long,
prev_state long,
primary key(name, state)
);
This update mixes all kinds of values around…
update sample
set state = state + 1, prev_state = state
where name = 'foo';
And the final output will be:
WITH
sample (rowid, name, state, prev_state) AS (
SELECT
rowid,
bgetkey(T.k, 0),
bgetkey(T.k, 1),
bgetval(T.v, -4464241499905806900)
FROM backing AS T
WHERE bgetkey_type(T.k) = 3397981749045545394
)
SET
k = bupdatekey(k, bgetkey(k, 1) + 1, 1),
v = bupdateval(v, -4464241499905806900, bgetkey(k, 1), 2)
WHERE rowid IN (SELECT rowid
FROM sample
WHERE name = 'foo');
As expected the bupdatekey
call gets the column offset (1) but not
the type code (2). bupdateval
gets a hash code and a type.
Normal Helper Declarations
If you want to refer to your blob functions in your own code, such as for indices you’ll also need to do something like this:
[[deterministic]] declare select function bgetkey_type(b blob) long;
[[deterministic]] declare select function bgetval_type(b blob) long;
[[deterministic]] declare select function bgetkey(b blob, iarg integer) long; -- polymorphic
[[deterministic]] declare select function bgetval(b blob, iarg integer) long; -- polymorphic
[[deterministic]] declare select function bcreateval no check blob;
[[deterministic]] declare select function bcreatekey no check blob;
[[deterministic]] declare select function bupdateval no check blob;
[[deterministic]] declare select function bupdatekey no check blob;
bgetval
and bgetkey
are not readily declarable generally because their
result is polymorphic so it’s preferable to use cql_blob_get
like the compiler
does (see examples above) which then does the rewrite for you. But it is helpful
to have a UDF declaration for each of the above, especially if you want the
--rt query_plan
output to work seamlessly. Typically bgetval
or bgetval
would only be needed in the context of a create index
statement and
cql_blob_get
can be used instead in that case.
NOTE: it’s possible to rewrite
CREATE INDEX
on a backed table into and index on the backing table but this has not yet been implemented.
Backed Table Attributes Summary
This example has the whole set.
[[backing_table]] -- this makes it a backing table
[[get_type = bgetkey_type]] -- the function to get the type out of the key blob
[[get_key = bgetkey]] -- the function to get a column out of the key blob
[[get_val = bgetval]] -- the function to get a column out of the value blob
[[create_key = bcreatekey]] -- the function to create a key blob
[[create_val = bcreateval]] -- the function to create a value blob
[[update_key = bupdatekey]] -- the function to update a key blob
[[update_val = bupdateval]] -- the function to update a value blob
[[use_key_codes]] -- if present key functions use a column code, by default key funcs use column offsets
[[use_val_offsets]] -- if present value functions use a column offset, by default value funcs use column codes
[[json]] -- ignores the above and targets json_* instead (don't combine it)
[[jsonb]] -- ignores the above and targets jsonb_* instead (don't combine it)
create table foo(
k blob primary key,
v blob
);