Logging and monitoring¶
Overview¶
Cantabular executables produce logs and expose endpoints that are helpful for understanding, monitoring and debugging their operation.
Logging¶
All Cantabular executables generate logs during normal operation to record messages and timestamps associated with important events. Logging can be configured to provide output in either JSON or key-value formats.
All logs are written to the stdout
of the process. This decision was
influenced by a customer requirement to adhere to the
twelve-factor app methodology.
The default format of Cantabular logs is a list of key-value pairs, as shown below:
t=2021-09-10T16:25:12+01:00 ns=server lvl=info ev=message msg="cantabular-server version 1.2.3 commit fa57c0de dated 2021-03-21 01:23:45 +0100"
t=2021-09-10T16:25:12+01:00 ns=server lvl=info ev=message msg="Licensed to: The Sensible Code Company"
Different key-value pairs are separated by spaces, and values that contain a space or other special characters are surrounded by double-quotes characters.
To switch logging to use a standard JSON format, use the environment variable CANTABULAR_LOG_FORMAT
:
export CANTABULAR_LOG_FORMAT=JSON
The keys available in log output for all event types are as follows:
t
The timestamp of the event, in ISO8601 format
ns
The namespace of the event, indicating the name of the Cantabular executable
lvl
The log level of the event, either
info
,warn
orerror
ev
The event type, either
query
(for a query tocantabular-server
)http
(for details of an HTTP request received),graphql
(for a query to the GraphQL endpoint in eithercantabular-api-ext
orcantabular-metadata
) ormessage
(for everything else)
Further specific fields are provided for different event types. A notable event type is http
which logs details of all HTTP requests. This event type provides:
port
The listen port that the request came in on
url
The Request-URI from the HTTP request line
method
The Method from the HTTP request line
httpStatus
The HTTP status code from the response
contentType
The HTTP Content-Type of the response
remoteAddr
The IP address of the peer
authUser
The authenticated user if any
nBytes
The number of bytes of content in the response if known
timeTaken
The time taken to serve the request (including writing the response) in milliseconds
While timeTaken
includes the time taken to write the response,
it will often not reflect the time taken by the operating system to send the response to the client
owing to significant buffers in the network stack.
Maximum request concurrency¶
The servers have built in overload protection to prevent them from being overwhelmed by too many HTTP requests. The servers will report unhealthy when serving their maximum number of concurrent requests. This can be useful to direct a load balancer.
The default level of maximum request concurrency is a multiple of the number of CPUs in use by the Go runtime.
It can be configured by the environment variable CANTABULAR_MAX_CONCURRENT_REQUESTS
.
When this limit is reached incoming requests will be queued until the server is able to process them.
The maximum queuing time is configurable by the environment variable CANTABULAR_MAX_CONCURRENT_REQUESTS_WAIT_TIME
.
When a request exceeds this time it will be rejected with a 503 Service Unavailable
response.
When the maximum request concurrency is reached then the health check endpoint will report 503 Service Unavailable
after a time configurable with CANTABULAR_MAX_CONCURRENT_REQUESTS_HEALTH_WAIT_TIME
.
Values for all these three settings are reported in the logs at startup.
cantabular-server
has no built in overload protection for requests because it has a separate
queuing mechanism for queries and the other requests are either infrequent or inexpensive.
Health check endpoint¶
An endpoint at /health
is available for every Cantabular service executable.
An HTTP GET request can be made to these endpoints to perform health checks.
They are intended for use with a load balancer to inform whether a service can accept traffic,
and whether a service should be killed or restarted.
These URLs will always work unauthenticated and will respond with one of two statuses:
200 OK
to indicate that the service is healthy503 Service Unavailable
to indicate that the service is unhealthy
IP address and port of health check endpoint¶
By default, the health endpoint is served on the main service listener.
For the Public UI, Admin UI and Extended API only the IP address and port
for the health check endpoint can be altered by setting
the value of the environment variable CANTABULAR_HEALTH_LISTEN_ADDRESS
.
Dependency health checks¶
As well as reporting on the health of the service itself,
these endpoints will also by default only report healthy status
if all runtime dependencies they have are also healthy.
These dependency health checks can be turned off
by setting the value of the environment variable CANTABULAR_HEALTH_CHECK_SANS_DEPENDENCIES
to TRUE
(or another truthy value),
and the interval at which they are performed can be configured by setting CANTABULAR_HEALTH_CHECK_INTERVAL
.
Services that depend on Cantabular Server:
cantabular-api-ext
andcantabular-ui
both have a runtime dependency oncantabular-server
. By default, they will both perform health check queries on theircantabular-server
dependency. Ifcantabular-server
reports itself to be unhealthy or is not available thencantabular-ui
andcantabular-api-ext
will also report themselves to be unhealthy.Services that depend on Cantabular Metadata:
cantabular-api-ext
,cantabular-ui
andcantabular-admin
all have an optional runtime dependency oncantabular-metadata
. If they have been configured to expect a connection tocantabular-metadata
they will perform health check queries oncantabular-metadata
. Ifcantabular-metadata
reports itself to be unhealthy, its consuming services will also report themselves to be unhealthy.
Changes in health check responses¶
A service can report itself as unhealthy but then become healthy again.
For example, when cantabular-api-ext
is starting up and has yet to connect to its dependencies
it may be unhealthy but become healthy once these connections are established.
Metrics endpoint¶
The /metrics
endpoint in the service executables can be configured
to run in order to provide Prometheus-compatible metrics on the health of the
Go runtime. For more information on Prometheus, see https://prometheus.io.
For security reasons the metrics are not served on the same port
as the public UI, and may be served on a different IP address.
The endpoint is disabled by default but can be switched on by using a
CANTABULAR_<NAME>_METRICS_LISTEN_ADDRESS
environment variable e.g.:
export CANTABULAR_UI_METRICS_LISTEN_ADDRESS=localhost:9999
This is an example of the output:
go_memstats_alloc_bytes 2177928
go_memstats_alloc_bytes_total 3293984
go_memstats_buck_hash_sys_bytes 1446099
go_memstats_frees_total 15080
go_memstats_gc_cpu_fraction 0.015215186876243745
go_memstats_gc_sys_bytes 4299072
go_memstats_heap_alloc_bytes 2177928
go_memstats_heap_idle_bytes 3964928
go_memstats_heap_inuse_bytes 3997696
go_memstats_heap_objects 28350
go_memstats_heap_released_bytes 3342336
go_memstats_heap_sys_bytes 7962624
go_memstats_last_gc_time_seconds 1.6312686931793177e+09
go_memstats_lookups_total 0
go_memstats_mallocs_total 43430
go_memstats_mcache_inuse_bytes 4800
go_memstats_mcache_sys_bytes 16384
go_memstats_mspan_inuse_bytes 80648
go_memstats_mspan_sys_bytes 81920
go_memstats_next_gc_bytes 4194304
go_memstats_other_sys_bytes 989685
go_memstats_stack_inuse_bytes 425984
go_memstats_stack_sys_bytes 425984
go_memstats_sys_bytes 15221768
go_gc_duration_seconds_sum 9.3034e-05
go_gc_duration_seconds_count 1
go_goroutines 6
go_threads 6
go_info{version="go1.17"} 1