Caching
Table of Contents
Overview
Caching is a major factor in CM’s performance and optimization of Reddit API usage. Leveraging caching effectively in your operator configuration and in individual subreddit configurations can make or break your CM instance.
What Is Cache?
A Cache is a storage medium with high write and read speed that is generally used to store temporary, but frequently accessed data.
How CM Uses Caching
CM primarily caches two types of data:
Reddit API Calls
How Reddit’s API Works
In order to communicate with Reddit to retrieve posts, comments, user information, etc… CM uses API calls. Each API call is composed of a
- request – CM asks Reddit for certain information
- response – Reddit responds with the request information
Reddit imposes an api quota on every individual account using the API through an application. This quota is 600 requests per 10 minutes. At the end of the 10 minutes period the quota is reset.
Additionally, some API calls have limits on how much data they can return. The most relevant of these is user history can only be returned 100 activities (submission/comments) per API call. EX if you want to get 500 activities from a user’s history you will need to make 5 api calls.
Caching API Responses
In order to most effectively use the limited quota of API calls CM will automatically cache API responses based on the “fingerprint” of the request sent.
On an individual “item” basis that means these resources are always cached:
- General user information (name, karma, age, profile description, etc..)
- General subreddit information (name, nsfw, quarantined, etc…)
- Individually processed activities (comment body, is comment author op, submission title, reports, link, etc…)
Additionally (and most importantly), responses for user history are cached based on what was requested. Example “fingerprint”:
- username
- type of activities to retrieve (overview, only submissions, only comments)
- range of activities to retrieve (last 100, last 6 months, etc…)
If the above “fingerprint” is used in three different Rules then
- First fingerprint appearance -> CM make API call and caches response
- Second fingerprint appearance -> CM uses cached response
- Third fingerprint appearance -> CM uses cached response
So only one API call is made even though the history is used three times.
It is therefore important to re-use window criteria wherever possible to take advantage of this caching.
Rules and Filters
Once CM has processed a Rule or Filter (itemIs
or authorIs
) the results of that component is stored in cache. For Rules the result is stored for the lifecycle of the Activity being processed and then discarded. For Filters the result is stored for a short time in cache and can be re-used by other Activities.
Re-using Rules and Filters by either using the exact same configuration or by using names provides:
- A major performance benefit since these do not need to be re-calculated
- A low-to-medium optimization on API caching, depending on what criteria are being tested.
In general re-use should always be a goal.
Configuration
Cache Provider
CM supports two cache providers. By default all providers use memory
:
memory
– in-memory (non-persistent) backend- Cache will be lost when CM is restarted/exits
redis
– Redis backend
Each provider
object in configuration can be specified as:
- one of the above strings to use the defaults settings or
- an object with keys to override default settings
Refer to full documentation on cache providers in the schema
Some examples:
{
"provider": {
"store": "memory", // one of "memory" or "redis"
"ttl": 60, // the default max age of a key in seconds
"max": 500, // the maximum number of keys in the cache (for "memory" only)
// the below properties only apply to 'redis' provider
"host": 'localhost',
"port": 6379,
"auth_pass": null,
"db": 0,
}
}
YAML
provider:
store: redis
ttl: 60
max: 500
host: localhost
port: 6379
auth_pass: null
db: 0
Providers can be specified in multiple locations, with each more specific location overriding the parent-level config:
- top-level config
- in individual bot configurations
- in the web config
operator:
name: example
# top level
caching:
provider:
...
bots:
- name: u/MyBot
# overrides top level
caching:
provider:
...
web:
# overrides top level
caching:
provider:
...
Cache TTL
The Time To Live (TTL) – how long data may live in the cache before “expiring” – can be controlled indepedently for different data types. Sane defaults are provided for all types but tweaking these can improve API caching optimization depending on the subreddit’s configuration (use case).
Each of these can be specified in the caching
property. TTL is measured in seconds.
authorTTL
(default 60) – user activity (overview, submissions, comments)commentTTL
(default 60) – individually fetched commentssubmissionTTL
(default 60) – individually fetched submissionsfilterCriteriaTTL
(default 60) – filter results (itemIs
andauthorIs
)selfTTL
(default 60) – actions performed by the bot (creating comment reply, report). If action is in cache the bot ignores it if found during polling.- This helps prevent bot from reacting to things it did itself IE you don’t want it to remove a comment because it reported the comment itself
subredditTTL
(default 60) – general information on fetched subreddituserNotesTTL
(default 300) – Amount of time Toolbox User Notes are cachedwikiTTL
(default 300) – Wiki pages used for content (in messages, reports, bans, etc…) are cached for this amount of time