JSON-stat endpoint

/v10/query-json-stat/<dataset-name>?<dataset-query>

Execute a cross-tabulation query on the dataset, returning a self-describing response that includes a list of row-major order observations together with the name, label, category codes and category labels of the dimensions that describe the observations.

Any rule variable categories blocked by disclosure rules do not appear in the returned observations but are instead documented within a JSON-stat extension field.

This endpoint has some key differences from the GraphQL endpoint:

  • it is versioned using a URL prefix, with this version set as v10

  • it does not incorporate reference metadata from cantabular-metadata except for labels in the default language.

  • as such, it has no ability to provide localized reference metadata.

About JSON-stat

JSON-stat is a lightweight JSON-based format designed for dissemination of statistical datasets and has been used by many different international statistical agencies.

The JSON-stat endpoint provides an easy to use request/response format that allows a user to execute a query for tabular data and receive a pre-defined self-describing response.

Request

A Cantabular dataset-query is specified in the URL query string. This follows the same format of a dataset-query in the core Cantabular API.

It includes an ordered list of variables identified by name, along with optional filters. It contains the following parameters:

v

The name of a variable to include in the query. At least one variable must be specified. The order of variables in a query determines the order in which the observations in the output table appear.

Variables in a query cannot be repeated. If any of the variables is in the mapped chain of variables identified by RuleBaseVariable then it must be the first one in the query.

f

A string specifying a filter to apply on the output table. It takes the form f=var1,c1,c2 where var1 is a variable name and c1 and c2 are codes of categories in the variable. Each variable used to filter must either be in the query variables list or be a direct or indirect mapping of one the query variables.

Response

The response is a minimal set of JSON-stat fields in JSON form containing:

version

A string identifying the version of the JSON-stat standard used in the response.

class

The type of JSON-stat object returned. At this time, the value will always be dataset.

source

The description of the Cantabular dataset that this JSON-stat query result is based on.

updated

The release date of this JSON-stat dataset in an ISO 8601 format. The date is the same as the release date of the Cantabular dataset it is based on.

id

An ordered list of dimension IDs, which correspond to the names of the variables in the Cantabular dataset on which the query was executed. The order is the same as the order in which the query variables were specified and as the order used to generate the list of observations in value.

size

An ordered list of integers representing the number of categories in each dimension of the query output.

dimension

An object mapping the names of all query dimensions to a value containing:

label

A human-readable string identifying the dimension’s underlying variable.

category

A JSON-stat category object used to describe the possible values of a dimension. This contains:

index

An array of category codes in the dimension, representing the order used in the list of observations in value.

label

An object mapping each category code in the dimension to its human-readable label.

value

This is an array of positive integers representing the N-dimensional matrix of result observations.

value is guaranteed to be the last name in the response object to facilitate processing of the result before all observations are read. This is useful when generating a streamed response for a user such as a CSV download.

The observations are arranged in row-major order such that the observations for the first category of the first variable all appear together first followed by the observations for the second category of the first variable. Within the group of observations corresponding to each category in the first variable, the observations for the categories of the second variable are arranged in a similar manner and so on recursively for all variables.

The following list shows an example order of observations for 3 variable query (v1,v2,v3) where each variable has 2 categories: v1 has v1c1 and v1c2, v2 has v2c1 and v2c2 and v3 has v3c1 and v3c2:

v1c1,v2c1,v3c1
v1c1,v2c1,v3c2
v1c1,v2c2,v3c1
v1c1,v2c2,v3c2
v1c2,v2c1,v3c1
v1c2,v2c1,v3c2
v1c2,v2c2,v3c1
v1c2,v2c2,v3c2
extension

The extension field in JSON-stat contains information specific to a particular use case.

cantabular
dataset

This field within the cantabular extension includes two sub-fields: name, which contains the machine-readable name for the Cantabular dataset that this JSON-stat query result is based on, and digest, which is a hexadecimal string, derived from the dataset’s inputs, that uniquely identifies the dataset. For example:

extension {
  cantabular {
    dataset {
      name: "Example",
      digest: <hexadecimal string>
    }
  }
}
blocked

When categories of the rule variable are blocked by disclosure rules, this field returns a key-value pair mapping the name of the rule variable in the query to a JSON-stat dimension object, as documented above, which describes the categories of the rule variable that were blocked by disclosure rules. For example:

extension {
  cantabular {
    blocked {
      city: {
        label: "City",
        category: {
          index: [
            "0",
            "1"
          ],
          label: {
            0: "London",
            1: "Liverpool"
          }
        }
      }
    }
  }
}

When no categories are blocked, this fields returns null.

Example

Request

This example query requests a breakdown of data in an example microdata dataset for the variables city and siblings filtered to only those cities in England.

/v20/query-json-stat/Example?v=city&v=siblings&f=country,E

Response

{
  "version": "2.0",
  "class": "dataset",
  "source": "Example microdata dataset for validation",
  "updated": "2020-12-15T11:16:02Z",
  "id": [
    "city",
    "siblings"
  ],
  "size": [
    2,
    7
  ],
  "dimension": {
    "city": {
      "label": "City",
      "category": {
        "index": [
          "0",
          "1"
        ],
        "label": {
          "0": "London",
          "1": "Liverpool"
        }
      }
    },
    "siblings": {
      "label": "Number of siblings",
      "category": {
        "index": [
          "0",
          "1",
          "2",
          "3",
          "4",
          "5",
          "6"
        ],
        "label": {
          "0": "No siblings",
          "1": "1 sibling",
          "2": "2 siblings",
          "3": "3 siblings",
          "4": "4 siblings",
          "5": "5 siblings",
          "6": "6 or more siblings"
        }
      }
    }
  },
  "extension": {
    "cantabular": {
      "dataset": {
        "name": "Example",
        "digest": <hexadecimal string>
      },
      "blocked": null
    }
  },
  "value": [
    1,
    0,
    0,
    1,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    1,
    0,
    0
  ]
}

Configuration

Access to the JSON-stat endpoint is disabled by default. It is enabled by setting the CANTABULAR_API_EXT_JSONSTAT_ENABLED environment variable to TRUE.

Clients

There are existing libraries written in a variety of languages to interpret JSON-stat output, including Python, R, JavaScript and Java.

Working with blocked categories

Because of limitations with how existing Python clients work with null values, rule variable categories blocked by disclosure rules are currently excluded from the main content of a JSON-stat response and are instead included in a Cantabular-specific JSON-stat extension, as documented above.

To detect these, client scripts need to check whether the blocked property of the cantabular extension is equal to something other than null. The contents can then be used to determine exactly which categories were blocked.