Loading metadata

Metadata is loaded into the service by supplying the following files in the environment:

CANTABULAR_METADATA_GRAPHQL_TYPES_FILE=<types-file.gql>
CANTABULAR_METADATA_SERVICE_FILE=<json-service-file>
CANTABULAR_METADATA_DATASET_FILES=<json-dataset-file>
CANTABULAR_METADATA_TABLE_FILES=<json-table-file>

These files are:

types-file.gql

A GraphQL types file defining the schema of the metadata to be loaded.

json-service-file

A service JSON file containing custom metadata which applies to the entire service.

json-dataset-file

One or more dataset JSON files separated by spaces.

json-table-file

Optional: One or more service table JSON files separated by spaces.

Structure of service JSON file

Each service JSON file must have the following structure:

[
  {
    "lang": "<language tag>",
    "meta": {
      ...
    }
  },
  ...
]

The lang key specifies the BCP 47 language tag of the language for which the corresponding object supplies metadata.

The meta key specifies a single object whose structure must match that of the GraphQL type ServiceMetadata.

Structure of dataset JSON file

Each dataset JSON file must have the following structure:

[
  {
    "name": "<dataset name>",
    "label": "<dataset label>",
    "description": "<dataset description>",
    "lang": "<language tag>",
    "meta": {
      ...
    },
    "vars": [
      {
        "name": "<variable name>",
        "label": "<variable label>",
        "description": "<variable description>",
        "catLabels": {
          "<category code>": "<category label>"
          ...
        }
        "meta": {
          ...
        }
      },
      ...
    ],
    "incl": [
      {
        "name: "<dataset name>",
        "lang": "<language tag>",
      },
      ...
    ]
  },
  ...
]

The combined values of name and lang for a supplied dataset must be unique within the loaded JSON.

The lang field on a dataset is a required field and must also be a valid BCP 47 language tag.

The label and description fields for both a dataset and a variable are optional.

The user can decide whether to place metadata for each dataset in a separate JSON file or whether to group metadata for several datasets into a single JSON file. This allows for more flexibility in specification and hence for different approaches to generating the metadata.

The structure of the meta objects within a dataset JSON file must match that of the appropriate GraphQL type of DatasetMetadata or VariableMetadata.

incl is an optional element that allows metadata from one or more datasets to be reused in another in order to avoid duplication. Metadata from included datasets will be merged into the including dataset at a few different levels:

  • Dataset and variable fields: if a dataset or variable has no value for its label or description field, it will inherit its value, when defined, from any included dataset.

  • Dataset and variable metadata fields: if a field of the dataset metadata or variable metadata is missing when compared with an included dataset then that field will be copied from the included dataset. However, if only a sub-field of the dataset or variable metadata is missing then that will not be copied from the included dataset.

  • Variable category labels: if a key in the LabelsMap for a variable is missing when compared to an included dataset’s identically named variable, then that key and its corresponding value will be copied from the included dataset.

If incl is specified then meta and/or vars may be omitted.

Structure of service table JSON file

Each service table JSON file must have the following structure:

[
  {
    "name": "<table name>",
    "datasetName": "<dataset name>",
    "vars": ["<var name>", ...],
    "ref": [
      {
        "lang": "<language tag>",
        "label": "<table label>",
        "description": "<table description>",
        "meta": {
          ...
        },
        "incl": [
          {
            "name: "<table name>",
            "lang": "<language tag>"
          },
          ...
        ]
      },
      ...
    ]
  },
  ...
]

The combined values of name and lang for a supplied table must be unique within the loaded JSON.

The lang field on a table ref is a required field and must also be a valid BCP 47 language tag.

The user can decide whether to place metadata for each table in a separate JSON file or whether to group metadata for several tables into a single JSON file. This allows for more flexibility in specification and hence for different approaches to generating the metadata.

The structure of the meta objects within a table JSON file must match that of the appropriate GraphQL type of TableMetadata.

incl is an optional element that allows metadata from one or more tables to be reused in another in order to avoid duplication. Metadata from included tables will be merged into the including table at a few different levels:

  • Table fields: if a table has no value for its label or description field, it will inherit its value, when defined, from any included table.

  • Table metadata fields: if a field of the table metadata is missing when compared with an included table then that field will be copied from the included table metadata. However, if only a sub-field of the metadata is missing then that will not be copied from the included table.

Validating metadata

Loaded metadata files are automatically validated to ensure the files are valid GraphQL and JSON and that the metadata in the JSON files is consistent with the GraphQL schema definition.

The files can optionally also be validated against datasets loaded into a running instance of cantabular-server by executing the command shown below:

./cantabular-metadata -check-complete <cantabular-server-url> \
  <types-file.gql> <json-service-file> <json-dataset-file>...

This command performs a case-insensitive check that every variable in each dataset loaded into cantabular-server has a metadata entry associated with it. cantabular-metadata will exit when the check is complete.

The command also checks that all tables loaded by cantabular-metadata are for datasets available in cantabular-server and that the variables within each table exist within the specified dataset.

If errors are found in the metadata provided, details of each error are logged to stdout by cantabular-metadata up to a maximum number which can be configured using the environment variable CANTABULAR_METADATA_CHECK_MAX_ERRORS.

Example input

The Cantabular software package includes a very simple example dataset which also comes with some example files, shown below, for loading into cantabular-metadata for testing and validation purposes.

example/metadata/metadata.graphql

type ServiceMetadata {
  contact: Contact!
  license: String!
  copyright: String!
}

type Contact {
  name: String!
  email: String!
  phone: String
  website: String
}

type DatasetMetadata {
  units: String
  methodology: Methodology
  sdc: Methodology
}

type Methodology {
  statement: String!
  link: String
}

type VariableMetadata {
  url: String
}

type TableMetadata {
  keywords: [String!]
}

The example above shows a number of interesting features of using GraphQL for this purpose:

  • User-specified types such as Contact which allow modelling of arbitrary concepts relevant to a specific use case.

  • Use of a list or array to allow for the definition of multiple keywords.

  • Use of the ! operator to denote particular fields as required or non-nullable.

The corresponding service JSON file for the example dataset is shown below.

example/metadata/serviceMetadata.json

[
  {
    "lang": "en",
    "meta": {
      "contact": {
        "name": "The Sensible Code Company",
        "email": "hello@sensiblecode.io",
        "website": "https://sensiblecode.io"
      },
      "copyright": "All data is provided under an open licence.",
      "license": "https://creativecommons.org/share-your-work/public-domain/cc0"
    }
  },
  {
    "lang": "cy",
    "meta": {
      "contact": {
        "name": "The Sensible Code Company",
        "email": "hello@sensiblecode.io",
        "website": "https://sensiblecode.io"
      },
      "copyright": "Darperir yr holl ddata o dan drwydded agored.",
      "license": "https://creativecommons.org/share-your-work/public-domain/cc0"
    }
  }
]

This uses the contact field defined in the example schema. The phone field within the contact field is not specified in this example service metadata as it is not a required field.

If it was necessary to specify more than one contact for a service, this could be achieved by modifying the schema to define a contacts field as a list or array of contacts and then providing a list of contact details in the service JSON file.

The dataset JSON file for the example dataset is shown below. It includes metadata for the dataset in both English and Welsh. The Welsh translations were created using an online translation service and may be inaccurate.

example/metadata/datasetMetadata.json

[
    {
        "name": "Example",
        "lang": "en",
        "description": "A small example dataset used only for validation of a Cantabular installation.",
        "obsTypes": [
            {
                "name": "COUNT",
                "label": "Count",
                "description": "This describes the count"
            }
        ],
        "meta": {
            "units": "people",
            "methodology": {
                "statement": "This dataset has been compiled manually and is intended for use only to validate a Cantabular installation."
            }
        },
        "vars": [
            {
                "name": "city",
                "description": "The city in which a person lives.",
                "meta": {
                    "url": "https://www.example.com"
                }
            },
            {
                "name": "country",
                "description": "The country - either England or Northern Ireland - in which a person lives, derived from the city they live in.",
                "meta": {
                    "url": "https://www.example.com"
                }
            },
            {
                "name": "sex",
                "description": "The classification of a person as either male or female.",
                "meta": {
                    "url": "https://www.example.com"
                }
            },
            {
                "name": "siblings",
                "description": "The number of brothers or sisters, including half-brothers and half-sisters, that a person has.",
                "meta": {
                    "url": "https://www.example.com"
                }
            },
            {
                "name": "siblings_3",
                "description": "The number of brothers or sisters, including half-brothers and half-sisters, that a person has, grouped into three categories.",
                "meta": {
                    "url": "https://www.example.com"
                }
            }
        ]
    },
    {
        "name": "Example",
        "lang": "cy",
        "label": "Set ddata enghreifftiol i'w dilysu",
        "incl": [{"name": "Example", "lang":"en"}],
        "description": "Set ddata enghreifftiol fach a ddefnyddir ar gyfer dilysu gosodiad Cantabular yn unig.",
        "obsTypes": [
            {
                "name": "COUNT",
                "label": "Cyfri",
                "description": "Mae hyn yn disgrifio'r cyfrif"
            }
        ],
        "meta": {
            "units": "pobl",
            "methodology": {
                "statement": "Mae'r set ddata hon wedi'i llunio â llaw ac fe'i bwriedir i'w defnyddio i ddilysu gosodiad Cantabular yn unig."
            }
        },
        "vars": [
            {
                "name": "city",
                "label": "Dinas",
                "catLabels": {
                    "0": "Llundain",
                    "1": "Lerpwl",
                    "2": "Felffast"
                },
               "description": "Y ddinas y mae person yn byw ynddi.",
                "meta": {
                    "url": "https://www.example.com/cy"
                }
            },
            {
                "name": "country",
                "label": "Gwlad",
                "catLabels": {
                    "E": "Lloegr",
                    "N": "Gogledd Iwerddon"
                },
                "description": "Mae'r wlad - naill ai Lloegr neu Ogledd Iwerddon - y mae person yn byw ynddi, yn deillio o'r ddinas y mae'n byw ynddi.",
                "meta": {
                    "url": "https://www.example.com/cy"
                }
            },
            {
                "name": "sex",
                "label": "Rhyw",
                "catLabels": {
                    "0": "Gwryw",
                    "1": "Benyw"
                },
                "description": "Dosbarthiad person naill ai fel gwryw neu fenyw.",
                "meta": {
                    "url": "https://www.example.com/cy"
                }
            },
            {
                "name": "siblings",
                "label": "Nifer o frodyr a chwiorydd",
                "catLabels": {
                    "0": "Dim brodyr a chwiorydd",
                    "1": "1 brawd neu chwaer",
                    "2": "2 brodyr a chwiorydd",
                    "3": "3 brodyr a chwiorydd",
                    "4": "4 brodyr a chwiorydd",
                    "5": "5 brodyr a chwiorydd",
                    "6": "6 neu fwy o frodyr a chwiorydd"
                },
                "description": "Nifer y brodyr neu chwiorydd, gan gynnwys hanner brodyr a hanner chwiorydd, sydd gan berson.",
                "meta": {
                    "url": "https://www.example.com/cy"
                }
            },
            {
                "name": "siblings_3",
                "label": "Nifer y brodyr a chwiorydd (3 chategori)",
                "catLabels": {
                    "0": "Dim brodyr a chwiorydd",
                    "1-2": "1 neu 2 o frodyr a chwiorydd",
                    "3+": "3 neu fwy o frodyr a chwiorydd"
                },
                "description": "Nifer y brodyr neu chwiorydd, gan gynnwys hanner brodyr a hanner chwiorydd, sydd gan berson, wedi'u grwpio i dri chategori.",
                "meta": {
                    "url": "https://www.example.com/cy"
                }
            }
        ]
    },
    {
        "name": "Example-Tabular",
        "lang": "en",
        "description": "A small example tabular dataset containing two pre-computed tables used only for validation of a Cantabular installation.",
        "incl": [{"name": "Example", "lang": "en"}],
        "vars": [
            {
                "name": "health",
                "description": "A self-assessment of a person's general state of health. This assessment is not based on a person's health over any specified period of time.",
                "meta": {
                    "url": "https://www.example.com"
                }
            }
        ]
    },
    {
        "name": "Example-Tabular",
        "lang": "cy",
        "label": "Set ddata tablau enghreifftiol i'w dilysu",
        "description": "Set ddata tablau enghreifftiol fach yn cynnwys dau dabl a adeiladwyd ymlaen llaw a ddefnyddir i ddilysu gosodiad Cantabular yn unig.",
        "incl": [{"name": "Example", "lang": "cy"}],
        "vars": [
            {
                "name": "health",
                "description": "Hunanasesiad o gyflwr iechyd cyffredinol person. Nid yw'r asesiad hwn yn seiliedig ar iechyd person dros unrhyw gyfnod penodol o amser.",
                "meta": {
                    "url": "https://www.example.com/cy"
                }
            }
        ]
    }
]

The JSON file for the service tables defined for the example dataset is shown below. It includes metadata in both English and Welsh. The Welsh translations were created using an online translation service and may be inaccurate.

example/metadata/tableMetadata.json

[
  {
    "name": "sex-siblings-city",
    "datasetName": "Example",
    "vars": [
      "city",
      "sex",
      "siblings_3"
    ],
    "ref": [
      {
        "lang": "en",
        "label": "Sex by siblings for all cities",
        "description": "Table of cities with sex and siblings.",
        "meta": {
          "keywords": [ "Siblings", "Cities" ]
        }
      },
      {
        "lang": "cy",
        "label": "Rhyw gan frodyr a chwiorydd ar gyfer pob dinas",
        "description": "Tabl o ddinasoedd gyda rhyw a brodyr a chwiorydd.",
        "meta": {
          "keywords": [ "Brodyr a Chwiorydd", "Dinasoedd" ]
        }
      }
    ]
  },
  {
    "name": "city-sex-health",
    "datasetName": "Example-Tabular",
    "vars": [
      "city",
      "sex",
      "health"
    ],
    "ref": [
      {
        "lang": "en",
        "label": "Sex by health for all cities",
        "description": "Table of cities with sex and health variables.",
        "meta": {
          "keywords": [ "Health", "Cities" ]
        }
      },
      {
        "lang": "cy",
        "label": "Rhyw yn ôl iechyd i bob dinas",
        "description": "Tabl o ddinasoedd gyda newidynnau rhyw ac iechyd.",
        "meta": {
          "keywords": [ "Iechyd", "Dinasoedd" ]
        }
      }
    ]
  }
]