Skip to main content Link Search Menu Expand Document (external link)

ModerateHateSpeech.com Rule

Table of Contents

Overview

moderatehatespeech.com (MHS) is a non-profit initiative to identify and fight toxic and hateful content online using programmatic technology such as machine learning models.

They offer a toxic content prediction model specifically trained on and for reddit content as well as partnering directly with subreddits..

Context Mod leverages their API for toxic content predictions in the MHS Rule.

The MHS Rule sends an Activity’s content (title or body) to MHS which returns a prediction on whether the content is toxic and actionable by a moderator.

MHS Predictions

MHS’s toxic content predictions return two indicators about the content it analyzed. Both are available as test conditions in ContextMod.

Flagged

MHS returns a straight “Toxic or Normal” flag based on how it classifies the content.

Example

  • Normal - “I love those pineapples”
  • Toxic - “why are we having all these people from shithole countries coming here”

Confidence

MHS returns how confident it is of the flag classification on a scale of 0 to 100.

Example

“why are we having all these people from shithole countries coming here”

  • Flag = Toxic
  • Confidence = 97.12 -> The model is 97% confident the content is Toxic

Usage

An MHS Api Key is required to use this Rule. An API Key can be acquired, for free, by creating an account at moderatehatespeech.com.

The Key can be provided by the bot’s Operator in the bot config credentials or in the subreddit’s config in the top-level credentials property like this:

credentials:
  mhs:
    apiKey: 'myMHSApiKey'

# the rest of your config below
polling:
  # ...
runs:
  # ...

Minimal/Default Config

ContextMod provides a reasonable default configuration for the MHS Rule if you do not wish to configure it yourself. The default configuration will trigger the rule if the MHS prediction:

  • flags as toxic
  • with 90% or greater confidence

Example

rules:
  - kind: mhs
    
  # rest of your rules here...

Full Config

Property Type Description Default
flagged boolean Test whether content is flagged as toxic (true) or normal (false) true
confidence string Comparison against a number 0 to 100 representing how confident MHS is in the prediction >= 90
testOn array Which parts of the Activity to send to MHS. Options: title and/or body body

Example

rules:
  - kind: mhs
    criteria:
      flagged: true # triggers if MHs flags the content as toxic AND
      confidence: '> 66' # MHS is 66% or more confident in its prediction
      testOn:  # send the body of the activity to the MHS prediction service
        - body

Historical Matching

Like the Sentiment and Regex rules CM can also use MHS predictions to check content from the Author’s history.

Example

rules:
  - kind: mhs
    # ...same config as above but can include below...
    historical:
      mustMatchCurrent: true # if true then CM will not check author's history unless current Activity matches MHS prediction criteria
      totalMatching: '> 1' # comparison for how many activities in history must match to trigger the rule
      window: 10 # specify the range of activities to check in author's history
      criteria: #... if specified, overrides parent-level criteria

Template Variables

Name Description Example
result Summary of rule results (also found in Actioned Events) Current Activity MHS Test: ✓ Confidence test (>= 90) PASSED MHS confidence of 99.85% Flagged pass condition of true (toxic) MATCHED MHS flag ‘toxic’
window Number or duration of Activities considered from window 1 activities
criteriaTest MHS value to test against MHS confidence is > 95%
totalMatching Total number of activities (current + historical) that matched criteriaTest 1

Examples

Report if MHS flags as toxic

rules:
  - kind: mhs
actions:
  - kind: report
    content: 'MHS flagged => '

Report if MHS flags as toxic with 95% confidence

rules:
  - kind: mhs
    criteria:
      confidence: '>= 95'
actions:
  - kind: report
    content: 'MHS flagged => '

Report if MHS flags as toxic and at least 3 recent activities in last 10 from author’s history are also toxic

rules:
  - kind: mhs
    historical:
      window: 10
      mustMatchCurrent: true
      totalMatching: '>= 3'
actions:
  - kind: report
    content: 'MHS flagged => '

Approve if MHS flags as NOT toxic with 95% confidence

rules:
  - kind: mhs
    criteria:
      confidence: '>= 95'
      flagged: false
actions:
  - kind: approve