Improving performance with HTTP caching¶
Overview¶
The cantabular-server
service is optimised to generate and serve responses to queries as
efficiently as possible. To further improve overall performance and availability, an HTTP caching
server may be deployed in front of the server. This approach is often particularly beneficial when
the server is expected to deal with typical traffic patterns generated by users incrementally
building and modifying complex queries.
For more information on HTTP caching in general, see https://developer.mozilla.org/en-US/docs/Web/HTTP/Caching.
This document demonstrates how HTTP caching may be configured for cantabular-server
using Nginx,
a popular open source reverse proxy and caching server. For more information on Nginx, see
https://www.nginx.com/.
Alternative reverse proxies with varying levels of caching support are available, including Varnish, Haproxy and Traefik.
Configuring the service to support caching¶
By default, cantabular-server
will send HTTP headers allowing eligible responses to be cached by
downstream servers such as Nginx.
For example, the following query executed against the “Example” dataset:
GET /v8/query/Example?v=city
Should result in a response including the following HTTP headers:
Cache-Control: must-revalidate, stale-while-revalidate=3600, s-max-age=5
ETag: "LCa0a2j/xo/5m0U8HTBBNBNCLXBkg7+g+YpeiGJm5644LjM3\"
The Cache-Control
header value may be read by downstream caching servers as a signal to enable
caching.
The ETag
header value is also used by downstream caching servers. If the dataset is updated or
the software version changes then the ETag value will change, and downstream caches will be
invalidated.
The actual values of these headers in real-world deployments will be likely to differ from the example shown above.
The default configuration allows for responses to be served from the cache for up-to five seconds, before being re-validated on the back-end server. Generally, re-validation is a relatively cheap action compared to performing the full query again. This configuration is recommended to reduce the risk of stale data being served to clients, and ensure the maximum transparency and maintainability of the system.
The environment variable CANTABULAR_API_HTTP_CACHING_MAX_ROWS
can be used to set an upper limit
on the size of a query output that can be cached. If a query output has more rows than the provided
limit, then an ETag
header will not be set.
Note that there is currently a hard-coded upper limit of 50,000 rows that takes precedence over
any limit defined in the CANTABULAR_API_HTTP_CACHING_MAX_ROWS
environment variable.
Disabling caching¶
It is possible to disable cache-related response headers by setting the following environment
variable when launching cantabular-server
:
CANTABULAR_API_HTTP_CACHING_OFF=1
The same request will then result in the following HTTP response headers being sent:
Cache-Control: no-store
This should result in caching being disabled in any downstream servers.
Configuring Nginx¶
Below is a simple example of an Nginx configuration that has been tested successfully with
cantabular-server
.
Note that it is not intended as a complete, production-ready configuration. Please refer to the Nginx documentation for all of its configuration options.
user nobody;
error_log /dev/stdout info;
events {}
http {
access_log /dev/stdout;
# Define proxy cache and set storage location.
# Cached data will be stored in `conf/proxy_cache`.
proxy_cache_path conf/proxy_cache levels=1:2 keys_zone=core:10M;
server {
listen 8493;
# Increase Nginx's maximum client header buffer.
# This ensures that GET request URIs of up to 16kB will be allowed, allowing users
# to construct complex queries up to this length. Requests containing longer URIs
# will return an error with 414 status code.
#
# This setting should be updated according to your users' requirements:
large_client_header_buffers 8 16k;
location / {
# Enable proxy cache for this location:
proxy_cache core;
# Set connection timeout:
proxy_connect_timeout 3s;
# Populate standard HTTP headers:
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
# Force Nginx to enable request revalidation. Required for Nginx to
# recognize the must-revalidate and stale-while-revalidate response headers:
proxy_cache_revalidate on;
# Proxy requests to this address, where `cantabular-server` should be listening:
proxy_pass http://localhost:8491;
}
}
}