Preface¶

Cantabular provides disclosure-safe cross tabulations of row-level data (“microdata”) to users, from microdata-based datasets or serves pre-computed tables from tabular datasets.

In a microdata file, a row represents a set of observations about a statistical unit such as a person. In a tabular file, a row represents a single observation such as a count of people with specific characteristics.

Both microdata and tabular data are fed as input to Cantabular in a flat representations using a comma-separated (CSV) format.

The format of both microdata and tabular input files is described in more detail in another document: “Cantabular Data Loading Guide”.

This document describes the Cantabular “codebook” format, which contains:

allowed codes for each variable (column) in the dataset
human readable descriptions corresponding to codes
“mappings” describing re-categorizations of the existing variables

For example data may contain “age in single years”, but the user may want output tabulation by “age in 10-year bands”. A mapping can be used to achieve this end.