Introduction

SQL is the most popular way to interact with relational databases. It is used both as an application programming interface for programmers and as a human interface for non-technical users. While SQL the interface is quite successful, I find that it doesn’t serve both groups of users equally well.

Good Human Interface

A few points:

  1. From a non-technical user’s PoV, the best thing about SQL is probably the way SQL works; SQL works by telling the DB what to do, not how to do it. (Do-what-I-mean).
  2. And there are few restrictions on what users can do with SQL. (General purpose).
  3. The grammar is simple and mostly English-like.

Poor Programming Interface

Two points.

1. It’s a string

String abuse is common in software systems.

  • SQL is cumbersome to generate programmatically.
  • Logic is mixed with data in a string.
  • Sloppy programmers simply concatenate strings to generate SQLs.
  • Parsing SQL wastes CPU.

Prepared statements are better, but still not good enough. Structured data formats (say S-expr or even JSON) are much easier to work with programmatically than a piece of string and can eliminate a large number of injection problems.

2. Lack of control

SQL is designed to work as a black box, but that’s not always desirable. Applications often need to respond to queries in a reasonable amount of time, which requires programmers to design and index tables properly; from there, it’s no longer a black box — programmers need to know what they are doing and what the database is actually doing.

And black boxes are sometimes undesirable in software projects. There are two types of use cases for SQL: analytical (OLAP) and transactional (OLTP). The two types can be very distinct. The former (OLAP) is often performed by humans and is quite good. The latter (OLTP) is often interfaced by application code rather than humans.

Unlike OLAP, OLTP queries on a sufficiently large scale are more restrictive, time-constrained, and have to be resource efficient. This requires proper upfront design and indexing. And queries are expected to be executed in a very specific way, using the designed indexes, and touching a limited amount of data.

Long-running queries are almost forbidden in large OLTP DBs, at best they cause the user request to timeout, and at worst, they bring down the application by exhausting hardware resources. This brings up the problem of control: how to make sure that a SQL runs as planned?

The do-what-I-mean property of SQL can quickly turn into a burden. SQL DBs choose indexes automatically, and they are right most of the time. But when they are wrong, programmers have to “tweak” the SQL to make it right. Or use the escape hatch FORCE INDEX. The problem is that this is not easily predictable, and can only be fixed after things go wrong.

That’s why black boxes are undesirable, being able to predict and control how SQL executed is as important as getting the result.

Alternative API for Relational DBs

Two key design choices:

  1. Abandon the string form of SQL and use structured formats.

    This makes both DB servers and application clients simpler. And reduces insecure applications.

  2. Explicitly define how the query is to be executed.

    This is considered “low-level” work. But when designing indexes, programmers already have a good idea of how queries will be executed, why let the DB guess?

The API might look something like this:

// select user_id, name from tbl_user
//  where 100 <= user_id and user_id <= 199
//      and gender = 'M' and birth_day >= '1999-01-23'
//  limit 100
{
    "op": "range_scan",
    "table": "tbl_user",
    "columns": ["user_id", "name"],
    // 2. explicitly defined index.
    "by_index": ["user_id"],
    "index_range": [100, 199],
    // 1. proper structure. no more string escaping and concatenation.
    "filter": [
        "and",
        ["eq", ["col", "gender"], "M"],
        ["ge", ["col", "birth_day"], "1999-01-23"],
    ],
    "limit": [0, 100],
}