Chapter 14: Shared Fragments

Chapter 14: Shared Fragments

Shared fragments allows you to reuse and compose SQL queires in a safe (type checked) and efficient manner. They are based on Common Table Expressions (CTEs) , so some basic knowledge of that is recommended before using Shared Fragments.

You can think of shared fragments as being somewhat like a parameterized view, but the parameters are both value parameters and type parameters. In Java or C#, a shared fragments might have had an invocation that looked something like this: my_fragment(1,2)<table1, table2>.

It’s helpful to consider a real example:

split_text(tok) AS (
  WITH RECURSIVE
    splitter(tok,rest) AS (
      SELECT
        '' tok,
        IFNULL( some_variable_ || ',', '') rest
      UNION ALL
      SELECT
        substr(rest, 1, instr(rest, ',') - 1) tok,
        substr(rest, instr(rest, ',') + 1) rest
        FROM splitter
        WHERE rest != ''
  )
  SELECT tok from splitter where tok != '';
)

This text might appear in dozens of places where a comma separated list needs to be split into pieces and there is no good way to share the code between these locations. CQL is frequently used in conjunction with the C-pre-processor so you could come up with something using the #define construct but this is problematic for several reasons:

  • the compiler does not then know that the origin of the text really is the same
    • thus it has no clue that sharing the text of the string might be a good idea
  • any error messages happen in the context of the use of the macro not the definition
  • bonus: a multi-line macro like the above gets folded into one line so any error messages are impenetrable
  • if you try to compose such macros it only gets worse; it’s more code duplication and harder error cases
  • any IDE support for syntax coloring and so forth will be confused by the macro as it’s not part of the language

None of this is any good but the desire to create helpers like this is real both for correctness and for performance.

To make these things possible, we introduce the notion of shared fragments. We need to give them parameters and the natural way to create a select statement that is bindable in CQL is the procedure. So the shape we choose looks like this:

[[shared_fragment]]
PROC split_text(value TEXT)
BEGIN
  WITH RECURSIVE
    splitter(tok,rest) AS (
      SELECT
        '' tok,
        IFNULL( value || ',', '') rest
      UNION ALL
      SELECT
        substr(rest, 1, instr(rest, ',') - 1) tok,
        substr(rest, instr(rest, ',') + 1) rest
        FROM tokens
        WHERE rest != ''
  )
  SELECT tok from splitter where tok != '';
END;

The introductory attribute [[shared_fragment]] indicates that the procedure is to produce no code, but rather will be inlined as a CTE in other locations. To use it, we introduce the ability to call a procedure as part of a CTE declaration. Like so:

WITH
  result(v) as (call split_text('x,y,z'))
  select * from result;

Once the fragment has been defined, the statement above could appear anywhere, and of course the text 'x,y,z' need not be constant. For instance:

PROC print_parts(value TEXT)
BEGIN
  DECLARE C CURSOR FOR
    WITH
      result(v) as (CALL split_text('x,y,z'))
      SELECT * from result;

  LOOP FETCH C
  BEGIN
     CALL printf("%s\n", C.v);
  END;
END;

Fragments are also composable, so for instance, we might also want some shared code that extracts comma separated numbers. We could do this:

[[shared_fragment]]
PROC ids_from_string(value TEXT)
BEGIN
  WITH
    result(v) as (CALL split_text(value))
  SELECT CAST(v as LONG) as id from result;
END;

Now we could write:

PROC print_ids(value TEXT)
BEGIN
  DECLARE C CURSOR FOR
    WITH
      result(id) as (CALL ids_from_string('1,2,3'))
      SELECT * from result;

  LOOP FETCH C
  BEGIN
     CALL printf("%ld\n", C.id);
  END;
END;

Of course these are very simple examples but in principle you can use the generated tables in whatever way is necessary. For instance, here’s a silly but illustrative example:

/* This is a bit silly */
PROC print_common_ids(value TEXT)
BEGIN
  DECLARE C CURSOR FOR
    WITH
      v1(id) as (CALL ids_from_string('1,2,3')),
      v2(id) as (CALL ids_from_string('2,4,6'))
      SELECT * from v1
      INTERSECT
      SELECT * from v2;

  LOOP FETCH C
  BEGIN
     CALL printf("%ld\n", C.id);
  END;
END;

With a small amount of dynamism in the generation of the SQL for the above, it’s possible to share the body of v1 and v2. SQL will of course see the full expansion but your program only needs one copy no matter how many times you use the fragment anywhere in the code.

So far we have illustrated the “parameter” part of the flexibility. Now let’s look at the “generics” part; even though it’s overkill for this example, it should still be illustrative. You could imagine that the procedure we wrote above ids_from_string might do something more complicated, maybe filtering out negative ids, ids that are too big, or that don’t match some pattern, whatever the case might be. You might want these features in a variety of contexts, maybe not just starting from a string to split.

We can rewrite the fragment in a “generic” way like so:

[[shared_fragment]]
PROC ids_from_string_table()
BEGIN
  WITH
    source(v) LIKE (select "x" v)
  SELECT CAST(v as LONG) as id from source;
END;

Note the new construct for a CTE definition: inside a fragment we can use “LIKE” to define a plug-able CTE. In this case we used a select statement to describe the shape the fragment requires. We could also have used a name source(*) LIKE shape_name just like we use shape names when describing cursors. The name can be any existing view, table, a procedure with a result, etc. Any name that describes a shape.

Now when the fragment is invoked, you provide the actual data source (some table, view, or CTE) and that parameter takes the role of “values”. Here’s a full example:

PROC print_ids(value TEXT)
BEGIN
  DECLARE C CURSOR FOR
    WITH
      my_data(*) as (CALL split_text(value)),
      my_numbers(id) as (CALL ids_from_string_table() USING my_data AS source)
      SELECT id from my_numbers;

  LOOP FETCH C
  BEGIN
     CALL printf("%ld\n", C.id);
  END;
END;

We could actually rewrite the previous simple id fragment as follows:

[[shared_fragment]]
PROC ids_from_string(value TEXT)
BEGIN
  WITH
    tokens(v) as (CALL split_text(value))
    ids(id) as (CALL ids_from_string_table() USING tokens as source)
  SELECT * from ids;
END;

And actually we have a convenient name we could use for the shape we need so we could have used the shape syntax to define ids_from_string_table.

[[shared_fragment]]
PROC ids_from_string_table()
BEGIN
  WITH
    source(*) LIKE split_text
  SELECT CAST(tok as LONG) as id from source;
END;

These examples have made very little use of the database but of course normal data is readily available, so shared fragments can make a great way to provide access to complex data with shareable, correct code. For instance, you could write a fragment that provides the ids of all open businesses matching a name from a combination of tables. This is similar to what you could do with a VIEW plus a WHERE clause but:

  • such a system can give you well controlled combinations known to work well
  • there is no schema required, so your database load time can still be fast
  • parameterization is not limited to filtering VIEWs after the fact
  • “generic” patterns are available, allowing arbitrary data sources to be filtered, validated, augmented
  • each fragment can be tested separately with its own suite rather than only in the context of some larger thing
  • code generation can be more economical because the compiler is aware of what is being shared

In short, shared fragments can help with the composition of any complicated kinds of queries. If you’re producing an SDK to access a data set, they are indispensible.

Creating and Using Valid Shared Fragments

When creating a fragment the following rules are enforced:

  • the fragment many not have any out arguments
  • it must consist of exactly one valid select statement (but see future forms below)
  • it may use the LIKE construct in CTE definitions to create placeholder shapes
    • this form is illegal outside of shared fragments (otherwise how would you bind it)
  • the LIKE form may only appear in top level CTE expressions in the fragment
  • the fragment is free to use other fragments, but it may not call itself
    • calling itself would result in infinite inlining

Usage of a fragment is always introduced by a “call” to the fragment name in a CTE body. When using a fragment the following rules are enforced.

  • the provided parameters must create a valid procedure call just like normal procedure calls
    • i.e. the correct number and type of arguments
  • the provided parameters may not use nested (SELECT ...) expressions
    • this could easily create fragment building within fragment building which seems not worth the complexity
    • if database access is required in the parameters simply wrap it in a helper procedure
  • the optional USING clause must specify each required table parameter exactly once and no other tables
    • a fragment that requires table parameters be invoked without a USING clause
  • every actual table provided must match the column names of the corresponding table parameter
    • i.e. in USING my_data AS values the actual columns in my_data must be the same as in the values parameter
    • the columns need not be in the same order
  • each column in any actual table must be “assignment compatible” with its corresponding column in the parameters
    • i.e. the actual type could be converted to the formal type using the same rules as the := operator
    • these are the same rules used for procedure calls, for instance, where the call is kind of like assigning the actual parameter values to the formal parameter variables
  • the provided table values must not conflict with top level CTEs in the shared fragment
    • exception: the top level CTEs that were parameters do not create conflicts
    • e.g. it’s common to do values(*) as (CALL something() using source as source) - here the caller’s “source” takes the value of the fragment’s “source”, which is not a true conflict
    • however, the caller’s source might itself have been a parameter in which case the value provided could create an inner conflict
      • all these problems are easily avoided with a simple naming convention for parameters so that real arguments never look like parameter names and parameter forwarding is apparent
      • e.g. USING _source AS _source makes it clear that a parameter is being forwarded and _source is not likely to conflict with real table or view names

Note that when shared fragments are used, the generated SQL has the text split into parts, with each fragment and its surroundings separated, therefore the text of shared fragments is shared(!) between usages if normal linker optimizations for text folding are enabled (common in production code.)

Shared Fragments with Conditionals

Shared fragments use dynamic assembly of the text to do the sharing but it is also possible to create alternative texts. There are many instances where it is desirable to not just replace parameters but use, for instance, an entirely different join sequence. Without shared fragments, the only way to accomplish this is to fork the desired query at the topmost level (because SQLite has no internal possibility of “IF” conditions.) This is expensive in terms of code size and also cognitive load because the entire alternative sequences have to be kept carefully in sync. Macros can help with this but then you get the usual macro maintenance problems, including poor diagnostics. And of course there is no possibility to share the common parts of the text of the code if it is forked.

However, conditional shared fragments allow forms like this:

[[shared_fragment]]
PROC ids_from_string(val TEXT)
BEGIN
  IF val IS NULL OR val IS '' THEN
    SELECT 0 id WHERE 0; -- empty result
  ELSE
    WITH
      tokens(v) as (CALL split_text(val))
      ids(id) as (CALL ids_from_string_table() USING tokens as source)
    SELECT * from ids;
  END IF;
END;

NOTE: This appendix uses the more modern form [[foo]] instead of the equivalent @attribute(cql:foo)

Now we can do something like:

  ids(*) AS (CALL ids_from_string(str))

In this case, if the string val is empty then SQLite will not see the complex comma splitting code, and instead will see the trivial case select 0 id where 0. The code in a conditional fragment might be entirely different between the branches removing unnecessary code, or swapping in a new experimental cache in your test environment, or anything like that.

Conditionals without ELSE clauses

When a condiitional is specified without an else clause, the fragment would return a result with no rows if none of the specified conditionals are truthy.

For example:

[[shared_fragment]]
PROC maybe_empty(cond BOOL NOT NULL)
BEGIN
  IF cond THEN
    SELECT 1 a, 2 b, 3 c;
  END IF;
END;

Internally, this is actually equivalent to the following:

[[shared_fragment]]
PROC maybe_empty(cond BOOL NOT NULL)
BEGIN
  IF cond THEN
    SELECT 1 a, 2 b, 3 c;
  ELSE
    SELECT NOTHING;
  END IF;
END;

The SELECT NOTHING expands to the a query that returns no rows, like this:

SELECT 0,0,0 WHERE 0; -- number of columns match the query returned by the main conditional.

Summary

The generalization is simply this:

  • instead of just one select statement there is one top level “IF” statement
  • each statement list of the IF must be exactly one select statement
  • the select statements must be type compatible, just like in a normal procedure
  • any table parameters with the same name in different branches must have the same type
    • otherwise it would be impossible to provide a single actual table for those table parameters

With this additional flexibility a wide variety of SQL statements can be constructed economically and maintainability. Importantly, consumers of the fragments need not deal with all these various alternate possibilities but they can readily create their own useful combinations out of building blocks.

Ultimately, from SQLite’s perspective, all of these shared fragment forms result in nothing more complicated than a chain of CTE expressions.

See Appendix 8 for an extensive section on best practices around fragments and common table expressions in general.

TIP: If you use CQL’s query planner on shared fragments with conditionals, the query planner will only analyze the first branch by default. You need to use [[query_plan_branch={an_integer}]] to modify the behavior. Read Query Plan Generation for details.

Shared Fragments as Expressions

The shared fragment system also has the ability to create re-usable expression-style fragments giving you something like SQL inline functions. These do come with some performance cost so they should be used for larger fragments. In many systems a simple shared fragment would not compete well with an equivalent #define. Expression fragments shine when:

  • the fragment is quite large
  • its used frequently (hence providing significant space savings)
  • the arguments are complex, potentially used many times in the expression

From a raw performance perspective, the best you can hope for with any of the fragment approaches is a “tie” on speed compared do directly inlining equivalent SQL or using a macro to do the same. However, from a correctness and space perspective it is very possible to come out ahead. It’s fair to say that expression fragments have the greatest overhead compared to the other types and so they are best used in cases where the size benefits are going to be important.

Syntax

An expression fragment is basically a normal shared fragment with no top-level FROM clause that generates a single column. A typical one might look like this:

-- this isn't very exciting because regular max would do the job
[[shared_fragment]]
proc max_func(x integer, y integer)
begin
  select case when x >= y then x else y end;
end;

The above can be used in the context of a SQL statement like so:

select max_func(T1.column1, T1.column2) the_max from foo T1;

Discussion

The consequence of the above is that the body of max_func is inlined into the generated SQL. However, like the other shared fragments, this is done in such a way that the text can be shared between instances so you only pay for the cost of the text of the SQL in your program one time, no matter how many time you use it.

In particular, for the above, the compiler will generate the following SQL:

select (
  select case when x >= y then x else y end
    from (select T1.column1 x, column2 y))

But each line will be its own string literal, so, more accurately, it will concatenate the following three strings:

"select (",                                      // string1
" select case when x >= y then x else y end",    // string2
" from (select T1.column1 x, column2 y))"        // string3

Importantly, string2 is fixed for any given fragment. The only thing that changes is string3, i.e., the arguments. In any modern C compilation system, the linker will unify the string2 literal across all translation units so you only pay for the cost of that text one time. It also means that the text of the arguments appears exactly one time, no matter how complex they are. For these benefits, we pay the cost of the select wrapper on the arguments. If the arguments are complex that “cost” is negative. Consider the following:

select max_func((select max(T.m) from T), (select max(U.m) from U))

A direct expansion of the above would result in something like this:

case when (select max(T.m) from T) >= (select max(U.m) from U)
   then (select max(T.m) from T)
   else (select max(U.m) from U)
end;

The above could be accomplished with a simple #define style macro. However, the expression fragment generates the following code:

select (
  select case when x >= y then x else y end
    from select (select max(T.m) from T) x, (select max(U.m) from U) y))

Expression fragments can nest, so you could write:

[[shared_fragment]]
proc max3_func(x integer, y integer, z integer)
begin
  select max_func(x, max_func(y, z));
end;

Again, this particular example is a waste because regular max would already do the job better.

To give another example, common mappings from one kind of code to another using case/when can be written and shared this way:

-- this sort of thing happens all the time
[[shared_fragment]]
proc remap(x integer not null)
begin
   select case x
     when 1 then 1001
     when 2 then 1057
     when 3 then 2010
     when 4 then 2011
     else 9999
   end;
end;

In the following:

select remap(T1.c), remap(T2.d), remap(T3.e) from T1, T2, T3... etc.

The text for remap will appear three times in the generated SQL query but only one time in your binary.

Restrictions

  • the fragment must consist of exactly one simple select statement
    • no FROM, WHERE, HAVING, etc. – the result is an expression
  • the select list must have exactly one value
    • Note the expression can be a nested SELECT which could then have all the usual SELECT elements
  • the usual shared fragment rules apply, e.g. no out-parameters, exactly one statement, etc.

Additional Notes

A simpler syntax might have been possible but expression fragments are only interesting in SQL contexts where (among other things) normal procedure and function calls are not available. So the select keyword makes it clear to the coder and the compiler that the expression will be evaluated by SQLite and the rules for what is allowed to go in the expression are the SQLite rules.

The fragment has no FROM clause because we’re trying to produce an expression, not a table-value with one column. If you want a table-value with one column, the original shared fragments solution already do exactly that. Expression fragments give you a solution for sharing code in, say, the WHERE clause of a larger select statement.

Commpared to something like

#define max_func(x,y) case when (x) >= (y) then x else y end;

The macro does give you a ton of flexibility, but it has many problems:

  • if the macro has an error, you see the error in the call site with really bad diagnostic info
  • the compiler doesn’t know that the sharing is going on so it won’t be able to share text between call sites
  • the arguments can be evaluated many times each which could be expensive, bloaty, or wrong
  • there is no type-checking of arguments to the macro so you may or may not get compilation errors after expansion
  • you have to deal with all the usual pre-processor hazards

In general, macros can be used (as in C and C++) as an alternative to expression fragments, especially for small fragments. But, this gets to be a worse idea as such macros grow. For larger cases, C and C++ provide inline functions – CQL provides expression fragments.