Introduction
This document specifies the syntax of the intermix.io query annotation format.
Annotation Structure
The intermix.io format is a base64 encoded JSON string which is prepended to SQL queries with a special token (“INTERMIX_ID:”) inside of a /* */ comment block. Thus the process to tag a query is as follows:
- Create a JSON string (see below)
- Encode the string using Base64
- Prepend your query with /* INTERMIX_ID: base64_encoded_json */ <original sql query>
Note that newlines and spaces before or after the /* */ tag block are ignored, and the INTERMIX_ID: token and base64 encoded json string should be separated by a single space.
Schema
The JSON string that is encoded in base64 contains required and optional fields which allow you to enhance your query with meta-data. The allowed fields are shown in the table below. Note all key values (other than meta) should be strings (i.e. double quoted).
Field |
Description |
Example |
Required? |
app |
A keyword used to identify your application. This name will appear in the product in aggregations. |
“etl-loader” |
Yes |
app_ver |
A version string for your app. |
“1.2” |
Yes |
plugin |
The name of the software library that is being used to generate this annotation. |
“intermix-airflow” |
No |
plugin_ver |
A version for the above. |
“1.0a” |
No |
dag |
The DAG executing the SQL. This can be e.g. an Airflow DAG, a Pinball Workflow, or a string that identifies the logical collection of tasks that this SQL is part of. |
“load-sales-data” |
No |
task |
The task that is executing the SQL. ie an Airflow "task" or a Pinball "job". This can be a series of queries executing in a single transaction. |
“load-ny-sales” |
No |
user |
The user executing the task. |
“joesmith” |
No |
at |
The time the query was annotated, in iso-8601 UTC format. |
“2018-04-18T12:52:48Z” |
No |
chart_name |
The name of the chart that is submitting this query. ie a chart.io dashboard. |
"revenue" |
No |
meta |
A sub-object of additional user-defined key-value fields. |
{“department”: “east”, “report_id”: “6”} |
No |
classname |
If applicable, the name of the class whose instances was used to run the query. |
|
No |
file | The name of the file which this query was executed in. | No | |
function | The name of the method which executed this query. | No | |
linenumber | The line number in the file which submitted the query. | No | |
module | The module which the class ran in. | No |
Sample Tag Message Body
A full example of the format is as follows:
{
“app”: “etl-loader”,
“app-ver”: “1.2”,
“plugin”: “my-company-name”,
“plugin_ver”: “1.0a”,
“dag”: “training-data-for-model-1”,
“task”: “load-raw-data”,
“user”: “joesmith”,
“file”: “thiscooltask.py”,
“function”: “aggregate_all_things”,
“linenumber”: 33,
“module”: “psycopg2”,
“at”: “2018-04-18T12: 52: 48Z”,
“meta”: {
“department”: “east”,
“report_id”: “6”
}
}
When encoded in base64, this gives the following string:
e+KAnGFwcOKAnTog4oCcZXRsLWxvYWRlcuKAnSwg4oCcYXBwLXZlcuKAnTog4oCcMS4y4oCdLCDigJxwbHVnaW7igJ06IOKAnGludGVybWl4LWFpcmZsb3figJ0sIOKAnHBsdWdpbl92ZXLigJ06IOKAnDEuMGHigJ0sIOKAnGRhZ+KAnTog4oCcbG9hZC1zYWxlcy1kYXRh4oCdLCDigJx0YXNr4oCdOiDigJxsb2FkLW55LXNhbGVz4oCdLCDigJx1c2Vy4oCdOiDigJxqb2VzbWl0aOKAnSwg4oCcYXTigJ06IOKAnDIwMTgtMDQtMThUMTI6NTI6NDha4oCdLCDigJxtZXRh4oCdOiB74oCcZGVwYXJ0bWVudOKAnTog4oCcZWFzdOKAnSwg4oCccmVwb3J0X2lk4oCdOiDigJw24oCdfX0=