JSON-stat endpoint¶
/v10/query-json-stat/<dataset-name>?<dataset-query>
Execute a cross-tabulation query on the dataset, returning a self-describing response that includes a list of row-major order observations together with the name, label, category codes and category labels of the dimensions that describe the observations.
Any rule variable categories blocked by disclosure rules do not appear in the returned observations but are instead documented within a JSON-stat extension field.
This endpoint has some key differences from the GraphQL endpoint:
it is versioned using a URL prefix, with this version set as
v10
it does not incorporate reference metadata from
cantabular-metadata
except for labels in the default language.as such, it has no ability to provide localized reference metadata.
About JSON-stat¶
JSON-stat is a lightweight JSON-based format designed for dissemination of statistical datasets and has been used by many different international statistical agencies.
The JSON-stat endpoint provides an easy to use request/response format that allows a user to execute a query for tabular data and receive a pre-defined self-describing response.
Request¶
A Cantabular dataset-query
is specified in the URL query string. This follows the same
format of a dataset-query
in the core Cantabular API.
It includes an ordered list of variables identified by name, along with optional filters. It contains the following parameters:
v
The name of a variable to include in the query. At least one variable must be specified. The order of variables in a query determines the order in which the observations in the output table appear.
Variables in a query cannot be repeated. If any of the variables is in the mapped chain of variables identified by
RuleBaseVariable
then it must be the first one in the query.f
A string specifying a filter to apply on the output table. It takes the form
f=var1,c1,c2
wherevar1
is a variable name andc1
andc2
are codes of categories in the variable. Each variable used to filter must either be in the query variables list or be a direct or indirect mapping of one the query variables.
Response¶
The response is a minimal set of JSON-stat fields in JSON form containing:
version
A string identifying the version of the JSON-stat standard used in the response.
class
The type of JSON-stat object returned. At this time, the value will always be
dataset
.source
The description of the Cantabular dataset that this JSON-stat query result is based on.
updated
The release date of this JSON-stat dataset in an ISO 8601 format. The date is the same as the release date of the Cantabular dataset it is based on.
id
An ordered list of dimension IDs, which correspond to the names of the variables in the Cantabular dataset on which the query was executed. The order is the same as the order in which the query variables were specified and as the order used to generate the list of observations in
value
.size
An ordered list of integers representing the number of categories in each dimension of the query output.
dimension
An object mapping the names of all query dimensions to a value containing:
label
A human-readable string identifying the dimension’s underlying variable.
category
A JSON-stat category object used to describe the possible values of a dimension. This contains:
index
An array of category codes in the dimension, representing the order used in the list of observations in
value
.label
An object mapping each category code in the dimension to its human-readable label.
value
This is an array of positive integers representing the N-dimensional matrix of result observations.
value
is guaranteed to be the last name in the response object to facilitate processing of the result before all observations are read. This is useful when generating a streamed response for a user such as a CSV download.The observations are arranged in row-major order such that the observations for the first category of the first variable all appear together first followed by the observations for the second category of the first variable. Within the group of observations corresponding to each category in the first variable, the observations for the categories of the second variable are arranged in a similar manner and so on recursively for all variables.
The following list shows an example order of observations for 3 variable query (
v1,v2,v3
) where each variable has 2 categories:v1
hasv1c1
andv1c2
,v2
hasv2c1
andv2c2
andv3
hasv3c1
andv3c2
:v1c1,v2c1,v3c1 v1c1,v2c1,v3c2 v1c1,v2c2,v3c1 v1c1,v2c2,v3c2 v1c2,v2c1,v3c1 v1c2,v2c1,v3c2 v1c2,v2c2,v3c1 v1c2,v2c2,v3c2
extension
The extension field in JSON-stat contains information specific to a particular use case.
cantabular
dataset
This field within the
cantabular
extension includes two sub-fields:name
, which contains the machine-readable name for the Cantabular dataset that this JSON-stat query result is based on, anddigest
, which is a hexadecimal string, derived from the dataset’s inputs, that uniquely identifies the dataset. For example:extension { cantabular { dataset { name: "Example", digest: <hexadecimal string> } } }
blocked
When categories of the rule variable are blocked by disclosure rules, this field returns a key-value pair mapping the name of the rule variable in the query to a JSON-stat dimension object, as documented above, which describes the categories of the rule variable that were blocked by disclosure rules. For example:
extension { cantabular { blocked { city: { label: "City", category: { index: [ "0", "1" ], label: { 0: "London", 1: "Liverpool" } } } } } }
When no categories are blocked, this fields returns
null
.
Example¶
Request¶
This example query requests a breakdown of data in an example microdata dataset for the
variables city
and siblings
filtered to only those cities in
England.
/v20/query-json-stat/Example?v=city&v=siblings&f=country,E
Response¶
{
"version": "2.0",
"class": "dataset",
"source": "Example microdata dataset for validation",
"updated": "2020-12-15T11:16:02Z",
"id": [
"city",
"siblings"
],
"size": [
2,
7
],
"dimension": {
"city": {
"label": "City",
"category": {
"index": [
"0",
"1"
],
"label": {
"0": "London",
"1": "Liverpool"
}
}
},
"siblings": {
"label": "Number of siblings",
"category": {
"index": [
"0",
"1",
"2",
"3",
"4",
"5",
"6"
],
"label": {
"0": "No siblings",
"1": "1 sibling",
"2": "2 siblings",
"3": "3 siblings",
"4": "4 siblings",
"5": "5 siblings",
"6": "6 or more siblings"
}
}
}
},
"extension": {
"cantabular": {
"dataset": {
"name": "Example",
"digest": <hexadecimal string>
},
"blocked": null
}
},
"value": [
1,
0,
0,
1,
0,
0,
0,
0,
0,
0,
0,
1,
0,
0
]
}
Configuration¶
Access to the JSON-stat endpoint is disabled by default. It is enabled by setting the
CANTABULAR_API_EXT_JSONSTAT_ENABLED
environment variable to TRUE
.
Clients¶
There are existing libraries written in a variety of languages to interpret JSON-stat output, including Python, R, JavaScript and Java.
Working with blocked categories¶
Because of limitations with how existing Python clients work with null
values, rule variable
categories blocked by disclosure rules are currently excluded from the main content of a
JSON-stat response and are instead included in a Cantabular-specific JSON-stat extension,
as documented above.
To detect these, client scripts need to check whether the blocked
property of the cantabular
extension is equal to something other than null
. The contents can then be used to determine
exactly which categories were blocked.