Chapter 12: Testability Features

Chapter 12: Testability Features

CQL includes a number of features to make it easier to create what you might call “Test” procedures. These primarily are concerned with loading up the database with dummy data, and/or validating the result of normal procedures that query the database. There are several interesting language features in these dimensions.

Dummy Data

Test code can be needlessly brittle, especially when creating dummy data; any column changes typically cause all sorts of data insertion code to need to be repaired. In many cases the actual data values are completely uninteresting to the test – any values would do. There are several strategies you can use to get good dummy data into your database in a more maintainable way.

Simple Inserts With Dummy Data

The simplest form uses a variant of the insert statement that fills in any missing columns with a seed value. An example might be something like the below:

create proc dummy_user()
begin
  insert into users () values () @dummy_seed(123)
     @dummy_nullables @dummy_defaults;
end;

This statement causes all values including columns that are nullable or have a default value to get the value 123 for any numeric type and 'column_name_123' for any text.

If you omit the @dummy_nullables then any nullable fields will be null as usual. And likewise if you omit @dummy_defaults then any fields with a default value will use that value as usual. You might want any combination of these for your tests (null values are handy in your tests and default behavior is also handy.)

The @dummy_seed expression provided can be anything that resolves to a non-null integer value, so it can be pretty flexible. You might use a while loop to insert a bunch of rows with the seed value being computed from the while loop variable.

The form above is sort of like insert * into table in that it is giving dummy values for all columns but you can also specify some of the columns while using the seed value for others. Importantly, you can specify values you particularly want to control either for purposes of creating a more tailored test or because you need them to match existing or created rows in a table referenced by a foreign key.

As an example:

insert into users (id) values (1234) @dummy_seed(123)
   @dummy_nullables @dummy_defaults;

will provide dummy values for everything but the id column.

Using WITH RECURSIVE

Sometimes what you want to do is create a dummy result set without necessarily populating the database at all. If you have code that consumes a result set of a particular shape, it’s easy enough to create a fake result set with a pattern something like this:

create procedure dummy_stuff(lim integer not null)
begin
  WITH RECURSIVE
  dummy(x) AS (
     SELECT 1
     UNION ALL
     SELECT x+1 FROM dummy WHERE x < lim)
  SELECT
    x id,
    printf("name_%d", x) name,
    cast(x % 2 as bool) is_cool,
    x * 1.3 as rate,
    x etc1,
    x etc2
  FROM dummy;
end;

The first part of the above creates a series of numbers from 1 to lim. The second uses those values to create dummy columns. Any result shape can be generated in this fashion.

You get data like this from the above:

1|name_1|1|1.3|1|1
2|name_2|0|2.6|2|2
3|name_3|1|3.9|3|3
4|name_4|0|5.2|4|4
5|name_5|1|6.5|5|5
6|name_6|0|7.8|6|6
7|name_7|1|9.1|7|7
8|name_8|0|10.4|8|8
9|name_9|1|11.7|9|9
10|name_10|0|13.0|10|10

The result of the select statement is itself quite flexible and if more dummy data is what you wanted, this form can be combined with INSERT ... FROM SELECT... to create dummy data in real tables. And of course once you have a core query you could use it in a variety of ways, combined with cursors or any other strategy to select out pieces and insert them into various tables.

Using Temporary Tables

If you need an API to create very flexible dummy data with values of your choice you can use temporary tables and a series of helper procedures.

First, create a table to hold the results. You can of course make this table however you need to but the like construct in the table creation is especially helpful; it creates columns in the table that match the name and type of the named object. For instance like my_proc is shorthand for the column names ands of the shape that my_proc returns. This is perfect for emulating the results of my_proc.

create proc begin_dummy()
begin
   drop table if exists my_dummy_data;

   -- the shape of my_dummy_data matches the columns
   -- returned by proc_I_want_to_emulate
   create temp table my_dummy_data(
     like proc_I_want_to_emulate;
   );
end;

Next, you will need a procedure that accepts and writes a single row to your temp table. You can of course write this all explicitly but the testing support features provide more support to make things easier; In this example, arguments of the procedure will exactly match the output of the procedure we emulating, one argument for each column the proc returns. The insert statement gets its values from the arguments.

create proc add_dummy(like proc_I_want_to_emulate)
begin
   insert into my_dummy_data from arguments;
end;

This allows you to create the necessary helper methods automatically even if the procedure changes over time.

Next we need a procedure to get our result set.

create proc get_dummy()
begin
  select * from my_dummy_data;
end;

And finally, some cleanup.

create proc cleanup_dummy()
begin
   drop table if exists my_dummy_data;
end;

Again the temp table could be combined with INSERT INTO ...FROM SELECT... to create dummy data in real tables.

Other Considerations

Wrapping your insert statements in try/catch can be very useful if there may be dummy data conflicts. In test code searching for a new suitable seed is pretty easy. Alternatively

set seed := 1 + (select max(id) from foo);

could be very useful. Many alternatives are also possible.

The dummy data features are not suitable for use in production code, only tests. But the LIKE features are generally useful for creating contract-like behavior in procs and there are reasonable uses for them in production code.

Complex Result Set Example

Here’s a more complicated example that can be easily rewritten using the sugar features. This method is designed to return a single-row result set that can be used to mock a method. I’ve replaced the real fields with ‘f1, ‘f2’ etc.

CREATE PROCEDURE test_my_subject(
  f1_ LONG INTEGER NOT NULL,
  f2_ TEXT NOT NULL,
  f3_ INTEGER NOT NULL,
  f4_ LONG INTEGER NOT NULL,
  f5_ TEXT,
  f6_ TEXT,
  f7_ TEXT,
  f8_ BOOL NOT NULL,
  f9_ TEXT,
  f10_ TEXT
)
BEGIN
  DECLARE data_cursor CURSOR LIKE my_subject;
  FETCH data_cursor()
        FROM VALUES (f1_, f2_, f3_, f4_, f5_, f6_, f7_, f8_, f9_, f10);
  OUT data_cursor;
END;

This can be written much more maintainably as:

CREATE PROCEDURE test_my_subject(like my_subject)
BEGIN
  DECLARE C CURSOR LIKE my_subject;
  FETCH C FROM ARGUMENTS;
  OUT C;
END;

Naturally, real columns have much longer names and there are often many more than 10.

Autotest Attributes

Some of the patterns described above are so common that CQL offers a mechanism to automatically generate those test procedures.

Temporary Table Pattern

The attributes dummy_table, dummy_insert, and dummy_select can be used together to create and populate temp tables.

Example:

To create a dummy row set for sample_proc, add the [[autotest]] attribute with dummy_table, dummy_insert, and dummy_select values.

create table foo(
  id integer not null,
  name text not null
);

[[autotest=(dummy_table, dummy_insert, dummy_select)]]
create proc sample_proc(foo int)
begin
  select * from Foo;
end;

dummy_table generates procedures for creating and dropping a temp table with the same shape as sample_proc.

CREATE PROC open_sample_proc()
BEGIN
  CREATE TEMP TABLE test_sample_proc(LIKE sample_proc);
END;

CREATE PROC close_sample_proc()
BEGIN
  DROP test_sample_proc;
END;

The dummy_insert attribute generates a procedure for inserting into the temp table.

CREATE PROC insert_sample_proc(LIKE sample_proc)
BEGIN
  INSERT INTO test_sample_proc FROM ARGUMENTS;
END;

The dummy_select attribute generates procedures for selecting from the temp table.

CREATE PROC select_sample_proc()
BEGIN
  SELECT * FROM test_sample_proc;
END;

It’s interesting to note that the generated test code does not ever need to mention the exact columns it is emulating because it can always use like, *, and from arguments in a generic way.

When compiled, the above will create C methods that can create, drop, insert, and select from the temp table. They will have the following signatures:

CQL_WARN_UNUSED cql_code open_sample_proc(
  sqlite3 *_Nonnull _db_);

CQL_WARN_UNUSED cql_code close_sample_proc(
  sqlite3 *_Nonnull _db_);

CQL_WARN_UNUSED cql_code insert_sample_proc(
  sqlite3 *_Nonnull _db_,
  cql_int32 id_,
  cql_string_ref _Nonnull name_);

CQL_WARN_UNUSED cql_code select_sample_proc_fetch_results(
  sqlite3 *_Nonnull _db_,
  select_sample_proc_result_set_ref _Nullable *_Nonnull result_set);

Single Row ResultSet

In some cases, using four APIs to generate fake data can be verbose. In the case that only a single row of data needs to be faked, the dummy_result_set attribute can be more convenient.

Example:

[[autotest=(dummy_result_set]])
create proc sample_proc()
begin
  select id from Foo;
end;

Will generate the following procedure

CREATE PROC generate_sample_proc_row(LIKE sample_proc)
BEGIN
  DECLARE curs CURSOR LIKE sample_proc;
  FETCH curs FROM ARGUMENTS;
  OUT curs;
END;

Which generates this C API:

void generate_sample_proc_row_fetch_results(
    generate_sample_proc_row_rowset_ref _Nullable *_Nonnull result_set,
    string_ref _Nonnull foo_,
    int64_t bar_);

These few test helpers are useful in a variety of scenarios and can save you a lot of typing and maintenance. They evolve automatically as the code changes, always matching the signature of the attributed procedure.

Generalized Dummy Test Pattern

The most flexible test helper is the dummy_test form. This is far more advanced than the simple helpers above. While the choices above were designed to help you create fake result sets pretty easily, dummy_test goes much further letting you set up arbitrary schema and data so that you can run your procedure on actual data. The dummy_test code generator uses the features above to do its job and like the other autotest options, it works by automatically generating CQL code from your procedure definition. However, you get a lot more code in this mode.

It’s easiest to study an example so let’s begin there.

To understand dummy_test we’ll need a more complete example, so we start with this simple two-table schema with a trigger and some indices. To this we add a very small procedure that we might want to test.

create table foo(
 id integer not null primary key,
 name text
);

create table bar(
 id integer not null primary key references foo(id),
 data text
);

create index foo_index on foo(name);

create index bar_index on bar(data);

create temp trigger if not exists trigger1
  before delete on foo
begin
  delete from foo where name = 'this is so bogus';
end;

[[autotest=(
  dummy_table,
  dummy_insert,
  dummy_select,
  dummy_result_set,
  (dummy_test, (bar, (data), ('plugh'))))
]]
create proc the_subject()
begin
  select * from bar;
end;

As you can see, we have two tables, foo and bar; the foo table has a trigger; both foo and bar have indices. This schema is very simple, but of course it could be a lot more complicated, and real cases typically are.

The procedure we want to test is creatively called the_subject. It has lots of test attributes on it. We’ve already discussed dummy_table, dummy_insert, dummy_select, and dummy_result_set above but as you can see they can be mixed in with dummy_test. Now let’s talk about dummy_test. First you’ll notice that annotation has additional sub-attributes; the attribute grammar is sufficiently flexible such that, in principle, you could represent an arbitrary LISP program, so the instructions can be very detailed. In this case, the attribute provides table and column names, as well as sample data. We’ll discuss that when we get to the population code.

First let’s dispense with the attributes we already discussed – since we had all the attributes, the output will include those helpers, too. Here they are again:

-- note that the code does not actually call the test subject
-- this declaration is used so that CQL will know the shape of the result
DECLARE PROC the_subject () (id INTEGER NOT NULL, data TEXT);

CREATE PROC open_the_subject()
BEGIN
  CREATE TEMP TABLE test_the_subject(LIKE the_subject);
END;

CREATE PROC close_the_subject()
BEGIN
  DROP TABLE test_the_subject;
END;

CREATE PROC insert_the_subject(LIKE the_subject)
BEGIN
  INSERT INTO test_the_subject FROM ARGUMENTS;
END;

CREATE PROC select_the_subject()
BEGIN
  SELECT * FROM test_the_subject;
END;

CREATE PROC generate_the_subject_row(LIKE the_subject)
BEGIN
  DECLARE curs CURSOR LIKE the_subject;
  FETCH curs FROM ARGUMENTS;
  OUT curs;
END;

That covers what we had before, so, what’s new? Actually, quite a bit. We’ll begin with the easiest:

CREATE PROC test_the_subject_create_tables()
BEGIN
  CREATE TABLE IF NOT EXISTS foo(
    id INTEGER NOT NULL PRIMARY KEY,
    name TEXT
  );
  CREATE TABLE IF NOT EXISTS bar(
    id INTEGER NOT NULL PRIMARY KEY REFERENCES foo (id),
    data TEXT
  );
END;

Probably the most important of all the helpers, test_the_subject_create_tables will create all the tables you need to run the procedure. Note that in this case, even though the subject code only references bar, CQL determined that foo is also needed because of the foreign key.

The symmetric drop procedure is also generated:

CREATE PROC test_the_subject_drop_tables()
BEGIN
  DROP TABLE IF EXISTS bar;
  DROP TABLE IF EXISTS foo;
END;

Additionally, in this case there were triggers and indices. This caused the creation of helpers for those aspects.

CREATE PROC test_the_subject_create_indexes()
BEGIN
  CREATE INDEX bar_index ON bar (data);
  CREATE INDEX foo_index ON foo (name);
END;

CREATE PROC test_the_subject_create_triggers()
BEGIN
  CREATE TEMP TRIGGER IF NOT EXISTS trigger1
    BEFORE DELETE ON foo
  BEGIN
  DELETE FROM foo WHERE name = 'this is so bogus';
  END;
END;

CREATE PROC test_the_subject_drop_indexes()
BEGIN
  DROP INDEX IF EXISTS bar_index;
  DROP INDEX IF EXISTS foo_index;
END;

CREATE PROC test_the_subject_drop_triggers()
BEGIN
  DROP TRIGGER IF EXISTS trigger1;
END;

If there are no triggers or indices, the corresponding create/drop methods will not be generated.

With these helpers available, when writing test code you can then choose if you want to create just the tables, or the tables and indices, or tables and indices and triggers by invoking the appropriate combination of helper methods. Since all the implicated triggers and indices are automatically included, even if they change over time, maintenance is greatly simplified.

Note that in this case the code simply reads from one of the tables, but in general the procedure under test might make modifications as well. Test code frequently has to read back the contents of the tables to verify that they were modified correctly. So these additional helper methods are also included:

CREATE PROC test_the_subject_read_foo()
BEGIN
 SELECT * FROM foo;
END;

CREATE PROC test_the_subject_read_bar()
BEGIN
 SELECT * FROM bar;
END;

These procedures will allow you to easily create result sets with data from the relevant tables which can then be verified for correctness. Of course if more tables were implicated, those would have been included as well.

As you can see, the naming always follows the convention test_[YOUR_PROCEDURE]_[helper_type]

Finally, the most complicated helper is the one that used that large annotation. Recall that we provided the fragment (dummy_test, (bar, (data), ('plugh')))) to the compiler. This fragment helped to produce this last helper function:

CREATE PROC test_the_subject_populate_tables()
BEGIN
  INSERT OR IGNORE INTO foo(id) VALUES(1) @dummy_seed(123);

  INSERT OR IGNORE INTO foo(id) VALUES(2) @dummy_seed(124)
      @dummy_nullables @dummy_defaults;

INSERT OR IGNORE INTO bar(data, id) VALUES('plugh', 1) @dummy_seed(125);

  INSERT OR IGNORE INTO bar(id) VALUES(2) @dummy_seed(126)
     @dummy_nullables @dummy_defaults;
END;

In general the populate_tables helper will fill all implicated tables with at least two rows of data. It uses the dummy data features discussed earlier to generate the items using a seed. Recall that if @dummy_seed is present in an insert statement then any missing columns are generated using that value, either as a string, or as an integer (or true/false for a boolean). Note that the second of the two rows that is generated also specifies @dummy_nullables and @dummy_defaults. This means that even nullable columns, and columns with a default value will get the non-null seed instead. So you get a mix of null/default/explicit values loaded into your tables.

Of course blindly inserting data doesn’t quite work. As you can see, the insert code used the foreign key references in the schema to figure out the necessary insert order and the primary key values for foo were automatically specified so that they could then be used again in bar.

Lastly, the autotest attribute included explicit test values for the table bar, and in particular the data column has the value 'plugh'. So the first row of data for table bar did not use dummy data for the data column but rather used 'plugh'.

In general, the dummy_test annotation can include any number of tables, and for each table you can specify any of the columns and you can have any number of tuples of values for those columns.

NOTE: if you include primary key and/or foreign key columns among the explicit values, it’s up to you to ensure that they are valid combinations. SQLite will complain as usual if they are not, but the CQL compiler will simply emit the data you asked for.

Generalizing the example a little bit, we could use the following:

(dummy_test, (foo, (name), ('fred'), ('barney'), ('wilma'), ('betty')),
                        (bar, (id, data), (1, 'dino'), (2, 'hopparoo'))))

to generate this population:

CREATE PROC test_the_subject_populate_tables()
BEGIN
  INSERT OR IGNORE INTO foo(name, id) VALUES('fred', 1) @dummy_seed(123);

  INSERT OR IGNORE INTO foo(name, id) VALUES('barney', 2) @dummy_seed(124)
    @dummy_nullables @dummy_defaults;

  INSERT OR IGNORE INTO foo(name, id) VALUES('wilma', 3) @dummy_seed(125);

  INSERT OR IGNORE INTO foo(name, id) VALUES('betty', 4) @dummy_seed(126)
    @dummy_nullables @dummy_defaults;

  INSERT OR IGNORE INTO bar(id, data) VALUES(1, 'dino') @dummy_seed(127);

  INSERT OR IGNORE INTO bar(id, data) VALUES(2, 'hopparoo') @dummy_seed(128)
    @dummy_nullables @dummy_defaults;
END;

And of course if the annotation is not flexible enough, you can write your own data population.

The CQL above results in the usual C signatures. For instance:

CQL_WARN_UNUSED cql_code test_the_subject_populate_tables(sqlite3 *_Nonnull _db_);

So, it’s fairly easy to call from C/C++ test code or from CQL test code.

Cross Procedure Limitations

Generally it’s not possible to compute table usages that come from called procedures. This is because to do so you need to see the body of the called procedure and typically that body is in a different translation – and is therefore not available. A common workaround for this particular problem is to create a dummy procedure that explicitly uses all of the desired tables. This is significantly easier than creating all the schema manually and still gets you triggers and indices automatically. Something like this:

[[autotest=(dummy_test]])
create proc use_my_stuff()
begin
  let x := select 1 from t1, t2, t3, t4, t5, t6, etc..;
end;

The above can be be done as a macro if desired. But in any case use_my_stuff simply and directly lists the desired tables. Using this approach you can have one set of test helpers for an entire unit rather than one per procedure. This is often desirable and the maintenance is not too bad. You just use the use_my_stuff test helpers everywhere.