Loading metadata¶
Metadata is loaded into the service by supplying the following files in the environment:
CANTABULAR_METADATA_GRAPHQL_TYPES_FILE=<types-file.gql>
CANTABULAR_METADATA_SERVICE_FILE=<json-service-file>
CANTABULAR_METADATA_DATASET_FILES=<json-dataset-file>
CANTABULAR_METADATA_TABLE_FILES=<json-table-file>
These files are:
types-file.gql
A GraphQL types file defining the schema of the metadata to be loaded.
json-service-file
A service JSON file containing custom metadata which applies to the entire service.
json-dataset-file
One or more dataset JSON files separated by spaces.
json-table-file
Optional: One or more service table JSON files separated by spaces.
Structure of service JSON file¶
Each service JSON file must have the following structure:
[
{
"lang": "<language tag>",
"meta": {
...
}
},
...
]
The lang
key specifies the BCP 47 language tag of the language for which the corresponding
object supplies metadata.
The meta
key specifies a single object whose structure must match that of the GraphQL
type ServiceMetadata
.
Structure of dataset JSON file¶
Each dataset JSON file must have the following structure:
[
{
"name": "<dataset name>",
"label": "<dataset label>",
"description": "<dataset description>",
"lang": "<language tag>",
"meta": {
...
},
"vars": [
{
"name": "<variable name>",
"label": "<variable label>",
"description": "<variable description>",
"catLabels": {
"<category code>": "<category label>"
...
}
"meta": {
...
}
},
...
],
"incl": [
{
"name: "<dataset name>",
"lang": "<language tag>",
},
...
]
},
...
]
The combined values of name
and lang
for a supplied dataset must
be unique within the loaded JSON.
The lang
field on a dataset is a required field and must also
be a valid BCP 47 language tag.
The label
and description
fields for both a dataset and a variable are
optional.
The user can decide whether to place metadata for each dataset in a separate JSON file or whether to group metadata for several datasets into a single JSON file. This allows for more flexibility in specification and hence for different approaches to generating the metadata.
The structure of the meta
objects within a dataset JSON file must match
that of the appropriate GraphQL type of DatasetMetadata
or
VariableMetadata
.
incl
is an optional element that allows metadata from one or more datasets
to be reused in another in order to avoid duplication. Metadata from included datasets
will be merged into the including dataset at a few different levels:
Dataset and variable fields: if a dataset or variable has no value for its
label
ordescription
field, it will inherit its value, when defined, from any included dataset.Dataset and variable metadata fields: if a field of the dataset metadata or variable metadata is missing when compared with an included dataset then that field will be copied from the included dataset. However, if only a sub-field of the dataset or variable metadata is missing then that will not be copied from the included dataset.
Variable category labels: if a key in the
LabelsMap
for a variable is missing when compared to an included dataset’s identically named variable, then that key and its corresponding value will be copied from the included dataset.
If incl
is specified then meta
and/or vars
may be omitted.
Structure of service table JSON file¶
Each service table JSON file must have the following structure:
[
{
"name": "<table name>",
"datasetName": "<dataset name>",
"vars": ["<var name>", ...],
"ref": [
{
"lang": "<language tag>",
"label": "<table label>",
"description": "<table description>",
"meta": {
...
},
"incl": [
{
"name: "<table name>",
"lang": "<language tag>"
},
...
]
},
...
]
},
...
]
The combined values of name
and lang
for a supplied table must
be unique within the loaded JSON.
The lang
field on a table ref
is a required field and must also
be a valid BCP 47 language tag.
The user can decide whether to place metadata for each table in a separate JSON file or whether to group metadata for several tables into a single JSON file. This allows for more flexibility in specification and hence for different approaches to generating the metadata.
The structure of the meta
objects within a table JSON file must match
that of the appropriate GraphQL type of TableMetadata
.
incl
is an optional element that allows metadata from one or more tables
to be reused in another in order to avoid duplication. Metadata from included tables
will be merged into the including table at a few different levels:
Table fields: if a table has no value for its
label
ordescription
field, it will inherit its value, when defined, from any included table.Table metadata fields: if a field of the table metadata is missing when compared with an included table then that field will be copied from the included table metadata. However, if only a sub-field of the metadata is missing then that will not be copied from the included table.
Validating metadata¶
Loaded metadata files are automatically validated to ensure the files are valid GraphQL and JSON and that the metadata in the JSON files is consistent with the GraphQL schema definition.
The files can optionally also be validated against datasets loaded into a running
instance of cantabular-server
by executing the command shown below:
./cantabular-metadata -check-complete <cantabular-server-url> \
<types-file.gql> <json-service-file> <json-dataset-file>...
This command performs a case-insensitive check that every variable in each dataset
loaded into cantabular-server
has a metadata entry associated with it.
cantabular-metadata
will exit when the check is complete.
The command also checks that all tables loaded by cantabular-metadata
are for datasets
available in cantabular-server
and that the variables within each table exist within
the specified dataset.
If errors are found in the metadata provided, details of each error are logged
to stdout
by cantabular-metadata
up to a maximum number which can be
configured using the environment variable CANTABULAR_METADATA_CHECK_MAX_ERRORS
.
Example input¶
The Cantabular software package includes a very simple example dataset which
also comes with some example files, shown below, for loading into cantabular-metadata
for testing and validation purposes.
example/metadata/metadata.graphql
type ServiceMetadata {
contact: Contact!
license: String!
copyright: String!
}
type Contact {
name: String!
email: String!
phone: String
website: String
}
type DatasetMetadata {
units: String
methodology: Methodology
sdc: Methodology
}
type Methodology {
statement: String!
link: String
}
type VariableMetadata {
url: String
}
type TableMetadata {
keywords: [String!]
}
The example above shows a number of interesting features of using GraphQL for this purpose:
User-specified types such as
Contact
which allow modelling of arbitrary concepts relevant to a specific use case.Use of a list or array to allow for the definition of multiple keywords.
Use of the
!
operator to denote particular fields as required or non-nullable.
The corresponding service JSON file for the example dataset is shown below.
example/metadata/serviceMetadata.json
[
{
"lang": "en",
"meta": {
"contact": {
"name": "The Sensible Code Company",
"email": "hello@sensiblecode.io",
"website": "https://sensiblecode.io"
},
"copyright": "All data is provided under an open licence.",
"license": "https://creativecommons.org/share-your-work/public-domain/cc0"
}
},
{
"lang": "cy",
"meta": {
"contact": {
"name": "The Sensible Code Company",
"email": "hello@sensiblecode.io",
"website": "https://sensiblecode.io"
},
"copyright": "Darperir yr holl ddata o dan drwydded agored.",
"license": "https://creativecommons.org/share-your-work/public-domain/cc0"
}
}
]
This uses the contact field defined in the example schema. The phone field within the contact field is not specified in this example service metadata as it is not a required field.
If it was necessary to specify more than one contact for a service, this could be achieved by modifying the schema to define a contacts field as a list or array of contacts and then providing a list of contact details in the service JSON file.
The dataset JSON file for the example dataset is shown below. It includes metadata for the dataset in both English and Welsh. The Welsh translations were created using an online translation service and may be inaccurate.
example/metadata/datasetMetadata.json
[
{
"name": "Example",
"lang": "en",
"description": "A small example dataset used only for validation of a Cantabular installation.",
"obsTypes": [
{
"name": "COUNT",
"label": "Count",
"description": "This describes the count"
}
],
"meta": {
"units": "people",
"methodology": {
"statement": "This dataset has been compiled manually and is intended for use only to validate a Cantabular installation."
}
},
"vars": [
{
"name": "city",
"description": "The city in which a person lives.",
"meta": {
"url": "https://www.example.com"
}
},
{
"name": "country",
"description": "The country - either England or Northern Ireland - in which a person lives, derived from the city they live in.",
"meta": {
"url": "https://www.example.com"
}
},
{
"name": "sex",
"description": "The classification of a person as either male or female.",
"meta": {
"url": "https://www.example.com"
}
},
{
"name": "siblings",
"description": "The number of brothers or sisters, including half-brothers and half-sisters, that a person has.",
"meta": {
"url": "https://www.example.com"
}
},
{
"name": "siblings_3",
"description": "The number of brothers or sisters, including half-brothers and half-sisters, that a person has, grouped into three categories.",
"meta": {
"url": "https://www.example.com"
}
}
]
},
{
"name": "Example",
"lang": "cy",
"label": "Set ddata enghreifftiol i'w dilysu",
"incl": [{"name": "Example", "lang":"en"}],
"description": "Set ddata enghreifftiol fach a ddefnyddir ar gyfer dilysu gosodiad Cantabular yn unig.",
"obsTypes": [
{
"name": "COUNT",
"label": "Cyfri",
"description": "Mae hyn yn disgrifio'r cyfrif"
}
],
"meta": {
"units": "pobl",
"methodology": {
"statement": "Mae'r set ddata hon wedi'i llunio â llaw ac fe'i bwriedir i'w defnyddio i ddilysu gosodiad Cantabular yn unig."
}
},
"vars": [
{
"name": "city",
"label": "Dinas",
"catLabels": {
"0": "Llundain",
"1": "Lerpwl",
"2": "Felffast"
},
"description": "Y ddinas y mae person yn byw ynddi.",
"meta": {
"url": "https://www.example.com/cy"
}
},
{
"name": "country",
"label": "Gwlad",
"catLabels": {
"E": "Lloegr",
"N": "Gogledd Iwerddon"
},
"description": "Mae'r wlad - naill ai Lloegr neu Ogledd Iwerddon - y mae person yn byw ynddi, yn deillio o'r ddinas y mae'n byw ynddi.",
"meta": {
"url": "https://www.example.com/cy"
}
},
{
"name": "sex",
"label": "Rhyw",
"catLabels": {
"0": "Gwryw",
"1": "Benyw"
},
"description": "Dosbarthiad person naill ai fel gwryw neu fenyw.",
"meta": {
"url": "https://www.example.com/cy"
}
},
{
"name": "siblings",
"label": "Nifer o frodyr a chwiorydd",
"catLabels": {
"0": "Dim brodyr a chwiorydd",
"1": "1 brawd neu chwaer",
"2": "2 brodyr a chwiorydd",
"3": "3 brodyr a chwiorydd",
"4": "4 brodyr a chwiorydd",
"5": "5 brodyr a chwiorydd",
"6": "6 neu fwy o frodyr a chwiorydd"
},
"description": "Nifer y brodyr neu chwiorydd, gan gynnwys hanner brodyr a hanner chwiorydd, sydd gan berson.",
"meta": {
"url": "https://www.example.com/cy"
}
},
{
"name": "siblings_3",
"label": "Nifer y brodyr a chwiorydd (3 chategori)",
"catLabels": {
"0": "Dim brodyr a chwiorydd",
"1-2": "1 neu 2 o frodyr a chwiorydd",
"3+": "3 neu fwy o frodyr a chwiorydd"
},
"description": "Nifer y brodyr neu chwiorydd, gan gynnwys hanner brodyr a hanner chwiorydd, sydd gan berson, wedi'u grwpio i dri chategori.",
"meta": {
"url": "https://www.example.com/cy"
}
}
]
},
{
"name": "Example-Tabular",
"lang": "en",
"description": "A small example tabular dataset containing two pre-computed tables used only for validation of a Cantabular installation.",
"incl": [{"name": "Example", "lang": "en"}],
"vars": [
{
"name": "health",
"description": "A self-assessment of a person's general state of health. This assessment is not based on a person's health over any specified period of time.",
"meta": {
"url": "https://www.example.com"
}
}
]
},
{
"name": "Example-Tabular",
"lang": "cy",
"label": "Set ddata tablau enghreifftiol i'w dilysu",
"description": "Set ddata tablau enghreifftiol fach yn cynnwys dau dabl a adeiladwyd ymlaen llaw a ddefnyddir i ddilysu gosodiad Cantabular yn unig.",
"incl": [{"name": "Example", "lang": "cy"}],
"vars": [
{
"name": "health",
"description": "Hunanasesiad o gyflwr iechyd cyffredinol person. Nid yw'r asesiad hwn yn seiliedig ar iechyd person dros unrhyw gyfnod penodol o amser.",
"meta": {
"url": "https://www.example.com/cy"
}
}
]
}
]
The JSON file for the service tables defined for the example dataset is shown below. It includes metadata in both English and Welsh. The Welsh translations were created using an online translation service and may be inaccurate.
example/metadata/tableMetadata.json
[
{
"name": "sex-siblings-city",
"datasetName": "Example",
"vars": [
"city",
"sex",
"siblings_3"
],
"ref": [
{
"lang": "en",
"label": "Sex by siblings for all cities",
"description": "Table of cities with sex and siblings.",
"meta": {
"keywords": [ "Siblings", "Cities" ]
}
},
{
"lang": "cy",
"label": "Rhyw gan frodyr a chwiorydd ar gyfer pob dinas",
"description": "Tabl o ddinasoedd gyda rhyw a brodyr a chwiorydd.",
"meta": {
"keywords": [ "Brodyr a Chwiorydd", "Dinasoedd" ]
}
}
]
},
{
"name": "city-sex-health",
"datasetName": "Example-Tabular",
"vars": [
"city",
"sex",
"health"
],
"ref": [
{
"lang": "en",
"label": "Sex by health for all cities",
"description": "Table of cities with sex and health variables.",
"meta": {
"keywords": [ "Health", "Cities" ]
}
},
{
"lang": "cy",
"label": "Rhyw yn ôl iechyd i bob dinas",
"description": "Tabl o ddinasoedd gyda newidynnau rhyw ac iechyd.",
"meta": {
"keywords": [ "Iechyd", "Dinasoedd" ]
}
}
]
}
]