Chapter 18: Pre-processing

Chapter 18: Pre-processing

Introduction

Pre-processing features are a recent introduction to the CQL language; previously any pre-processing functionality was provided by running the C Pre-Processor over the input file before processing. The practice of using cc -E or the equivalent was deprecated because:

  • It creates an unnatural dependence in the compile chain
  • The lexical rules for CQL and C are not fully compatible so cc -E often gives specious warnings
  • It is not possible to create automatic code formatting tools with text based macro replacement
  • Macros are easily abused, creating statement fragments in weird places that are hard to understand
  • The usual problems with text replacement and order of operations means that macro arguments frequently have to be wrapped to avoid non-obvious errors
  • Debugging problems in the macros is very difficult with line information being unhelpful and pre-processed output being nearly unreadable

To address these problems CQL introduced pre-processing features including structured macros. That is, macros that describe the sort of thing they intend to produce and the kinds of things they consume. This allows for reasonable syntax and type checking and much better error reporting. To this we also add @include to import code and @ifdef/@ifndef for conditionals.

Conditional Compilation

Users can “define” conditional compilation switches using --defines x y z on the command line. Additionally, if --rt foo is specified then __rt__foo will be defined.

NOTE: that if -cg is specified but no --rt is specified then the default is --rt c and so __rt__c will be defined.

The syntax for conditional compilation is the familiar:

@ifdef name
 ... this code will be processed if "name" is defined
@else
 ... this code will be processed if "name" is not defined
@endif

The @else is optional and the construct nests as one would expect. @ifndef is also available and simply reverses the sense of the condition.

In order to avoid “rude” macro patterns with @ifdef in the middle of SQL or expressions, this construct is itself a statement in the grammar. In order to create conditional expression pieces or other internal pieces you should use one of the macro types and define it two or more ways. These design choices were made to avoid the weird token pasting that inevitably results if pre-processing is allowed everywhere. Conditional cte_tables, select_core and even just expr are highly flexible and give clear composition in the code with no weird syntax.

Note that in CQL even the code that is conditionally compiled out must at least parse correctly. Semantic analysis does not run, and indeed often there would be conflicts if it did, but the code must at least be correct enough to parse.

Conditionally choosing one of several macro implementations for use later in the code is a very powerful way to get conditionality throughout your code cleanly. @ifdef can only appear inside of statement list (stmt_list) macros becasue @ifdef is a statement so it can’t appear in expressions and query fragments. Hence the most powerful pattern is:

@ifdef something
  @macro(...) foo! begin choice1 end;
@else
  @macro(...) foo! begin choice2 end;
@endif

-- foo! is now conditionally defined

Text Includes

Pulling in common headers is also supported. The syntax is

@include "any/path/foo.sql"

In addition to the current directory any paths specified with --include_paths x y z will be checked.

Like the @ifdef forms, @include can only appear at the statement level, so it cannot be used to do exotic token pasting like often happens with #include. Furthermore, it must appear at the top of files, so it’s a lot more like the import features of other languages than it is like the C pre-Processor token stream. Once normal statements begin further includes are not possible, each file, gets an include section and a statements section. Note that @ifdef and include do not compose. Again, @include is more like an import. If you need conditionals the included item should conditionally produce declarations and possibly macros. This means that file dependencies are consistent regardless of conditionals.

Macros

A typical macro declaration might look like this:

@MACRO(STMT_LIST) assert!(exp! expr)
begin
  if not exp! then
    call printf("assertion failed\n");
  end if;
end;

-- Usage

assert!(foo < bar);

This example is a macro that produces a statement list (stmt_list), so it can be used in the places where a statement list can appear. Every macro definition specifies the nature of the thing it produces, which limits the places that it can appear.

The nature of macros means that while you may get an error for using the wrong macro type in the wrong location, you cannot get syntax errors due to the replacement. Of course semantic errors are still possible, so for instance maybe the macro references a table that doesn’t exist or maybe the table doesn’t have certain necessary columns. Such errors are possible, but the macro is sure to expand correctly.

Any errors are reported on the lines of the macro not where the macro is used.

Types of Macros and Macro Arguments

The example in the introduction is a macro that produces a statement list. It can be used anywhere a statement list would be valid. The full list of macro types is as follows:

TypeNotes
cte_tablespart of the contents of a WITH expression
exprany expression
query_partssomething that goes in a FROM clause
select_coreone or more select statement that can be unioned
select_exprone or more select named expressions
stmt_listone more statements

Here are examples that illustrate the various macro types, in alphabetical order with examples. The names of the macro types are the same as the same structure in the grammar so the definition is easy to spot.

cte_tables

@macro(cte_tables) foo!()
begin
  x(a,b) as (select 1, 2),
  y(d,e) as (select 3, 4)
end;

-- all or part of the cte tables in the with clause
with foo!() select * from x join y;

expr

@macro(expr) pi!()
begin
   3.14159
end;

Macro arguments can have the same types as macros themselves and expressions are a common choice as we saw in the assert macro.

@macro(expr) max!(a! expr, b! expr)
begin
  case when a! > b! then a! else b! end
end;

max!(not 3, 1=2);

-- this generates

CASE WHEN (NOT 3) > (1 = 2) THEN NOT 3
ELSE 1 = 2
END;

Order of operations isn’t a problem with CQL macros, no extra parentheses are needed for arguments and so forth since the macro drops directly into the syntax tree. This means that in the above even though NOT is lower precedence than >, the correct expression is generated with no extra effort for the coder. If the expanded syntax tree is rendered as text with --echo and --exp like in the above, any necessary parentheses are added, but the tree is always the right shape for the arguments.

query_parts

A query part macro generates “something you could put in a from clause”. It could be the whole from clause or it could be one or more of the joined tables.

@macro(query_parts) foo!(x! expr)
begin
  foo inner join bar on foo.a == bar.a and foo.x == x!
end;

select * from foo!(y);

-- this generates

SELECT *
  FROM foo
  INNER JOIN bar ON foo.a = bar.a AND foo.x = y;

select_core

A select core macro generates “something you could union”. It’s the part of a select statement that comes before order by. If you’re trying to make a macro that assembles parts of a set of results which are then unioned and ordered, this is what you need.

@macro(select_core) foo!()
begin
  select x, y from T
  union all
  select x, y from U
end;

foo!()
union all
select x, y from V
order by x;

-- this generates

SELECT x, y FROM T
UNION ALL
SELECT x, y FROM U
UNION ALL
SELECT x, y FROM V
ORDER BY x;

A select_core macro can be a useful way to specify a set of tables and values while leaving filtering and sorting open for customization.

select_expr

A select expression macro can let you codify certain common columns and alises that might might want to select. Such as:

@macro(select_expr) foo!()
begin
  T1.x + T1.y as A, T2.u / T2.v * 100 as pct
end;

select foo!() from X as T1 join Y as T2 on T1.id = T2.id;

--- this generates

SELECT T1.x + T1.y AS A, T2.u / T2.v * 100 AS pct
  FROM X AS T1
  INNER JOIN Y AS T2 ON T1.id = T2.id;

If certain column extractions are common you can easily make a macro that lets you pull out the columns you want. This can be readily generalized. This becomes very useful when it’s normal to extract (e.g.) the same 20 columns from various queries.

@MACRO(SELECT_EXPR) foo!(t1! EXPR, t2! EXPR)
BEGIN
  t1!.x + t1!.y AS A, t2!.u / t2!.v * 100 AS pct
END;

select foo!(X, Y) from X join Y on X.id = Y.id;

-- this generates

SELECT X.x + X.y AS A, Y.u / Y.v * 100 AS pct
  FROM X
  INNER JOIN Y ON X.id = Y.id;

In this second case we have provided the table names as arguments rather than hard coding them.

stmt_list

We began with a statement list macro before many of the concepts had been introduced. Let’s revisit where we started.

@MACRO(STMT_LIST) assert!(exp! expr)
begin
  if not exp! then
    call printf("assertion failed\n");
  end if;
end;

assert!(1 == 1);

-- this generates

IF NOT 1 = 1 THEN
  CALL printf("assertion failed\n");
END IF;

Recall that in SQL order of operations NOT is very weak. This is in contrast to many other languages where ! binds quite strongly. But as it happens we don’t have to care. The expression would have been evaluated correctly regardless of the binding strength of what surrounds the macro because the replacement is in the AST not in the text.

This rounds out all of the macro types.

Passing Macro Arguments

In order to avoid language ambiguity and to allow macro fragments like a cte_table in unusual locations. The code must specify the type of the macro argument. Expressions are the defaul type, the others use a function-like syntax to do the job.

TypeSyntax
cte_tableswith( x(*) as (select 1 x, 2 y))
exprno keyword required just foo!(x)
query_partsfrom(U join V)
select_corerows(select * from U union all select * from V)
select_exprselect(1 x, 2 y)
stmt_listbegin statement1; statement2; end

With these forms the type of macro argument is unambiguous and can be immediately checked against the macro requirements.

Note that when using a select_core macro or macro argument in source it is necessary to do ROWS(name!). This is an unfortunate but unavoidable concession to the grammar tools.

Note that none of the macro args requires qualifications when used in a macro argument context because they can always be type checked later, therefore foo!(a!, b!, c!) always works. The other macro types require their wrappings to have clean grammar.

And example with all of the types:

@macro(stmt_list) mondo1!(
  a! expr,
  b! query_parts,
  c! select_core,
  d! select_expr,
  e! cte_tables,
  f! stmt_list)
begin
  -- macros can be used without qualification in @ID and @TEXT
  set zz := @text(a!, b!, c!, d!, e!, f!);
end;

@macro(stmt_list) mondo2!(
  a! expr,
  b! query_parts,
  c! select_core,
  d! select_expr,
  e! cte_tables,
  f! stmt_list)
begin
  -- arguments can be forwarded unambigously
  mondo1!(a!, b!, c!, d!, e!, f!);
  if a! then   -- an expression (the most common)
    f!;        -- a statement list (next most common)
  else
    -- these are the parts of a query that you might
    -- want to macroize

    with e!    -- cte tables
    select d!  -- select expressions
    from b!    -- query parts
    union all
    rows(c!);  -- select core
  end if;
end;

-- and this is how you encode each type
mondo2!(
  1+2,
  from(x join y),
  rows(select 1 from foo union select 2 from bar),
  select(20 xx),
  with(f(*) as (select 99 from yy)),
  begin let qq := 201; end
  );

Meta Variables

VariableNotes
@LINEThe current line number
@MACRO_LINEThe line where macro expansion began
@MACRO_FILEThe file where macro expansion began

@MACRO_LINE and @MACRO_FILE are useful for providing error information that refer to the source file that used the macro rather than the macro itself. Like in an assert macro. @LINE is useful to report problems in the macro itself, like an invariant.

We can make the assert macro better still:

@MACRO(STMT_LIST) assert!(exp! expr)
begin
  if not exp! then
    call printf("%s:%d assertion failed: %s\n",
      @MACRO_FILE, @MACRO_LINE, @TEXT(exp!));
  end if;
end;

assert!(1 == 1);

-- this generates

IF NOT 1 = 1 THEN
  CALL printf("%s:%d assertion failed: %s\n", 'myfile.sql', 9, "1 = 1");
END IF;

Note that here the macro was invoked on line 9 of myfile.sql. @LINE would have been much less useful, reporting the line of the printf.

Token Pasting

You can create new tokens in one of three ways:

  • @TEXT will create a string from one or more parts.
  • @ID will make a new identifier from one or more parts
    • If the parts do not concatenate into a valid identifier an error is generated.
  • @TMP will make an identifier just like @ID but it will include a tmp_nnn prefix where nnn is unique to the macro expansion it appears in. This lets you safely create temporary variables for use in a macro body without fear of conflict with other expansions of the same macro. @TMP(_foo) creates tmp_nnn_foo so you can make as many such temporaries as you need. @TMP is otherwise identical to @ID.

This can be used to create very powerful code-helpers. Consider this code, which is very similar to the actual test helpers in the compiler’s test cases.

@MACRO(stmt_list) EXPECT!(pred! expr)
begin
  if not pred! then
    throw;
  end if;
end;

@MACRO(stmt_list) TEST!(name! expr, body! stmt_list)
begin
  create procedure @ID("test_", name!)()
  begin
    try
      body!;
    catch
      call printf("Test failed %s\n", @TEXT(name!));
      throw;
    end;
  end;
end;

TEST!(try_something,
BEGIN
  EXPECT!(1 == 1);
END);

--- This generates:

CREATE PROC test_try_something ()
BEGIN
  TRY
    IF NOT 1 = 1 THEN
      THROW;
    END IF;
  CATCH
    CALL printf("Test failed %s\n", "try_something");
    THROW;
  END;
END;

And of course additional diagnostics can be readily added (and they are present in the real code). For instance all of the tricks used in the assert! macro would be helpful in the expect! macro.

Macros for Types

There is no macro form that can stand in for a type name. However, identifiers are legal types and so @ID(...) is an excellent construct for creating type names from expressions like strings or identifiers. In general, @ID(...) can allow you to use an expression macro where an expression is not legal but a name is.

For instance:

@macro(stmt_list) make_var!(x! expr, t! expr)
begin
   declare @id(t!,"_var") @id(t!);
   set @id(t!,"_var") := x!;
end;

-- declares real_var as an real and stores 1+5 in it
make_var!(1+5, "real");

--- this generates
DECLARE real_var real;
SET real_var := 1 + 5;

Since @id(...) can go anywhere an identifier can go, it is not only suitable for use for type names but also for procedure names, table names – any identifer. As mentioned above @id will generate an error if the expression does not make a legal normal identifier.

Sometimes type can be inferred, and indeed it might make sense to infer it. For instance part of a test macro might look like this:

-- this uses @ID (via @TMP) and @TEXT for errors
@macro(stmt_list) expect_equal!(x! expr, y! expr)
begin
  let @tmp(x) := x!;  -- evaluate only once
  let @tmp(y) := y!;  -- evaluate only once
  if @tmp(x) IS MOT !tmp(y) then
    printf("%s IS NOT %s\nleft: %s\nright:%s", @TEXT(x!), @TEXT(y!), @tmp(x), @tmp(y));
  end;
end;

Pipeline syntax for Expression Macros

The pipeline syntax expr:macro_name!(args...) can be used instead of macro_name!(expr, args). This allows macros to appear in expressions written in the fluent style. Note that args may be empty in which case the form can be written expr:macro_name!() or the even shorter expr:macro_name! both of which become macro_name!(expr)

printf("%s", x:fmt!);