egressif.

Resources / Spam filtering

Rspamd architecture

Rspamd is an event-driven filtering framework that sits between the MTA and the internet, runs dozens of modules in parallel, sums named symbols into a score, and maps that score to an action. Here is the pipeline, the scoring and action model, fuzzy storage, and why it is fast.

Last checked: June 22, 2026

Rspamd describes itself as “a high-performance email processing framework designed as an independent layer between your Mail Transfer Agent (MTA) and the internet.” That positioning is the whole design philosophy: it does not live inside the MTA’s delivery path. “Operating outside MTA internal flows, Rspamd provides security isolation while delivering comprehensive message analysis, spam filtering, and policy enforcement.” It looks at a message, recommends an action, and lets the MTA carry it out.

This page is written for senders. Rspamd is, alongside SpamAssassin, one of the two engines you are most likely to meet on the receiving side, and it is increasingly the default at scale. Its model - parallel modules, summed symbols, and a four-way action decision - tells you exactly how the signals you generate translate into pass, greylist, header-tag, or reject.

SYMBOLSDKIM_ALLOW -1.0MIME_GOOD -0.2BAYES_SPAM +3.0FUZZY +5.0SPF_FAIL +1.0SCOREsum of weightsREJECTSOFT REJECTREWRITE SUBJECTADD HEADERGREYLISTNO ACTIONINCREASING SCORE
Symbol weights are illustrative. Modules emit weighted symbols that sum to a score; a rising score climbs the action ladder - but the thresholds separating these actions are deployment-specific, not universal numbers.

The 60-second version

  • Rspamd runs a four-stage pipeline: pre-filters → main filters (in parallel) → post-filters → action decision.
  • Each module contributes named symbols with weights (positive or negative); the weights sum to a score - the same additive model as SpamAssassin, different vocabulary.
  • The score maps to an action - the documented set is no action, greylist, add header, rewrite subject, soft reject, and reject - not a single spam/not-spam flag.
  • Its statistics module is a Bayesian classifier using OSB tokens combined with the inverse chi-square distribution - the Robinson/Fisher lineage in production.
  • It is event-driven and asynchronous (C core + Lua), so one worker handles 100+ concurrent messages; typical scan time is 50–200 ms.
  • It ships native fuzzy hashing, a neural-network module, greylisting, rate limiting, reputation, and multimap - things SpamAssassin needs external plugins for.
  • Redis is a core dependency for statistics, learning, rate limits, and caching.
  • The exact action score thresholds are configurable defaults, not universal numbers.

The four-stage pipeline

The documentation lays out an explicit four-stage flow, and the parallelism in the middle stage is the key to both its speed and its layered logic:

StageWhat runsPurpose
1. Pre-filtersWhitelisting, basic policy checksExecute first; can short-circuit and skip the rest
2. Main filtersAuthentication (SPF/DKIM/DMARC), content analysis, RBL lookups, statistical classifiers - in parallelThe bulk of the analysis
3. Post-filtersComposites, neural networks, final scoring adjustmentsCombine and refine the symbols
4. Action decisionMap cumulative score to an actionpass, add headers, greylist, or reject

Pre-filters can stop processing early - a whitelisted sender never pays for the full main-filter stage. The main filters run concurrently rather than in sequence, which is why authentication, content, blocklists, and the Bayesian classifier all resolve quickly even though they involve network round-trips.

Symbols, weights, and the score

Rspamd’s scoring is structurally identical to SpamAssassin’s additive model, with renamed parts. “Each analysis module fires and contributes named symbols (analogous to SA rules); cumulative symbol score determines the final action.” A symbol carries a weight; weights can be negative (a valid DKIM signature, a fuzzy whitelist hit) or positive (a blocklist match, a high Bayes probability). The running total is what gets mapped to an action.

Symbols on a message (illustrative names; weights are configurable)
  +5.0  FUZZY_DENIED        confirmed fuzzy-hash match
  +0.1  DKIM_TRACE          (informational)
  -1.0  DKIM_ALLOW          valid aligned DKIM signature
  -0.2  MIME_GOOD           well-formed MIME
  ----
  +3.9  -> compared against this deployment's action thresholds

The action model: a ladder, not a flag

This is what most distinguishes Rspamd from a yes/no filter. The cumulative score is compared against a series of thresholds, and Rspamd recommends the matching action rather than a binary verdict. Its protocol documentation enumerates the full set:

ActionMeaning
no actionmessage is likely ham - deliver normally
greylistdefer with a temporary failure so the sender must retry
add headersuspicious - deliver but add a spam header the MTA or client can sort on
rewrite subjectsuspicious - deliver but rewrite the subject (e.g. prefix [SPAM])
soft rejecttemporary rejection, “for example, due to rate limit exhausting”
rejectrefuse the message outright

Rspamd returns this as structured data, so the MTA can act on the recommendation and on the evidence behind it. A scan result looks like:

{ "action": "add header",
  "score": 5.2,
  "required_score": 7,
  "symbols": { "FORGED_SENDER": { "score": 5 },
               "DATE_IN_PAST":  { "score": 0.1 },
               "DKIM_ALLOW":    { "score": -1 } } }

Note that score (5.2) is below required_score (7) here, yet the action is add header - because each action has its own threshold, and the message crossed the lower “add header” boundary while staying under “reject.” That ladder is why greylist and soft reject exist as score-driven actions at all: a borderline message can be deferred rather than accepted or refused, buying time for reputation systems and forcing the cheap retry test (see Greylisting, tarpitting, and rate controls), and a rate-limited or temporarily-suspect sender can get a soft reject instead of a permanent one - all from the same scoring machinery.

A caution this library holds to: the exact score numbers that separate these actions are configurable deployment settings. The documentation describes the action levels and exposes the live values at the controller’s /actions endpoint, but it does not fix universal defaults. So, as with SpamAssassin’s 5.0, there is no universal “reject above N” line - each operator sets its own.

What Rspamd checks: the module set

Rspamd advertises “60+ analysis modules.” The documented capabilities span the same families as the rest of this library, plus several that are bundled rather than bolted on:

Module / capabilityNotes from the docs
Email authenticationspf, dkim/dkim_signing, dmarc, arc - “SPF, DKIM (signing+validation), DMARC, ARC with caching”
Bayesian statisticsBuilt-in OSB classifier; not compatible with SpamAssassin’s Bayes database - must be retrained from scratch (below)
Neural networksneural - “post-process messages using neural network classification” - runs in the post-filter stage (requires Redis)
Fuzzy hashingNative fuzzy_check: shingles for text, blake2b for attachments, HTML structure since 3.14.0 (below)
Real-time blocklistsrbl - “50+ preconfigured RBLs, SURBL, URIBL with parallel DNS queries”
greylisting”allows to delay suspicious messages” (requires Redis)
ratelimit”implements leaked bucket algorithm for ratelimiting” (requires Redis)
reputation”manages reputation evaluation based on various rules” (replaced the old ip_score module)
multimap”a complex module that operates with different types of maps” - match senders, IPs, URLs, etc. against lists
whitelistflexible allow/block “based on SPF/DKIM/DMARC combinations”
URL filtering, antivirus, AI/ML servicesphishing, antivirus, external integrations

The notable contrast with SpamAssassin: greylisting, rate limiting, reputation, neural networks, ARC, multimap, and native fuzzy hashing are all in the box, where SpamAssassin reaches DCC/Razor through external plugins and does not ship greylisting, rate limiting, or a neural module in its default set.

Under the hood there are two kinds of module. A small set of C modules is statically linked for speed - the default filters line is just chartable, dkim, regexp, fuzzy_check, where regexp is the core engine that evaluates regular-expression rules and embedded Lua. Everything else is a Lua module, loaded dynamically at startup and reloaded on reconfiguration; the docs note Lua modules “are very close to C modules in terms of performance,” which is why most new functionality (including multimap, ratelimit, reputation, greylisting, and neural) is written in Lua. Several modules - anything stateful - “require Redis,” which is the recurring reason Redis is treated as a core dependency rather than an optional add-on.

The neural module

Rspamd bundles a neural-network classifier that runs in the post-filter stage and “adapt[s] to your mail patterns.” It is a refinement layer on top of the symbol model rather than a replacement for it - the symbols produced by the main filters become features the network can weigh. (The overview documentation states its existence and stage; deeper architectural specifics live in the dedicated module documentation, which this library does not assert beyond what is quoted.)

Statistics: the OSB Bayesian classifier

Rspamd’s statistical classifier is Bayesian - “based on the Bayesian theorem, which combines probabilities to assess the likelihood of a message belonging to a particular class” - but two design choices distinguish it from a textbook word-counting filter, and both trace straight to the history covered in Bayesian and statistical spam filtering:

  • OSB tokens, not single words. The default tokenizer is osb (Orthogonal Sparse Bigram), which “goes beyond considering single words as tokens and instead takes into account combinations of words, taking into consideration their positions.” Rspamd uses a window of 5 tokens, so “the number of tokens being approximately 5 times larger than the number of words.” This is the same insight CRM114 demonstrated - short word tuples outperform isolated words.
  • Inverse chi-square combination. Rspamd combines token probabilities with “the inverse chi-square distribution” - i.e. Robinson’s Fisher-based method, not Graham’s naive product. The Robinson/Fisher math on the Bayesian page is exactly what is running here.

The operational defaults matter for a sender:

SettingDefaultMeaning
backendredisstatistics live in Redis (recommended/default since 2.0)
min_learns200needs 200 learned spam and 200 ham before it classifies
min_tokens11a message needs enough tokens to be worth classifying
tokenizerosbthe only supported tokenizer

That min_learns = 200 is a close echo of SpamAssassin’s 200/200 rule - an untrained classifier stays silent rather than guessing. Rspamd also tokenizes the Subject and a configurable set of headers (classify_headers) plus meta-tokens like message size and attachment count, and because it only learns the headers it is told to, “there is no need to remove any additional headers (e.g., X-Spam) before the learning process.” It supports per_user statistics (when invoked at final delivery) and, since 3.13, multi-class classifiers for categories like newsletter, transactional, and phishing alongside the binary spam/ham model. As the docs warn, its database is not compatible with SpamAssassin’s - a migration means retraining from scratch.

Fuzzy storage

Rspamd’s native fuzzy subsystem is one of its strongest differentiators (the near-duplicate theory is on its own page). The architecture worth knowing here:

  • Text uses the shingles algorithm - overlapping word trigrams, 32 hashes per shingle - producing a similarity score rather than an exact match, so templated campaigns with minor per-recipient variations are caught.
  • Attachments and images use exact blake2b digests - identical files are matched precisely.
  • HTML structure fuzzy hashing (since 3.14.0) matches “DOM structure, layout, and link patterns - independent of text content,” weighted as structure shingles 50%, CTA domains 30%, all domains 15%, structural features 5%. A clever anti-phishing detail: if the DOM is identical but the call-to-action domains differ, similarity is heavily penalized (x0.3), exposing a cloned-brand phishing page even though its layout is a perfect copy.

How fuzzy weight becomes a score

Each stored hash has a weight (“hits”) that “accumulates as users report the same content,” and Rspamd converts weight to score with a hyperbolic-tangent curve:

symbol_score = tanh((weight - max_score) / max_score) x metric_weight

The effect is deliberate smoothing: the score is 0 below the threshold, partial at the threshold, and full at twice the threshold. As the docs put it, this “prevents a single report from triggering the maximum score while ensuring well-confirmed spam gets full weight.” (The max_score parameter is being renamed to hits_limit; both names are currently accepted.) Standard flags distinguish FUZZY_DENIED (flag 1, confirmed spam), FUZZY_PROB (flag 2, probable spam), and FUZZY_WHITE (flag 3, legitimate content - a negative-weight whitelist). By default Rspamd uses fuzzy feeds from rspamd.com over UDP port 11335; if usage is blocked, a zero-weight FUZZY_BLOCKED symbol appears and does not affect processing. The default hash algorithm is mumhash.

Performance design

Rspamd’s reason for existing is throughput, and the numbers from its documentation are concrete:

MetricValue
Concurrent messages per worker100+
Typical scan time50–200 ms per message (incl. network)
Throughput5–10 messages/sec per worker core (~500K–1M/day)
Memory50–100 MB per worker process

The design choices behind those numbers: an event-driven core with non-blocking DNS, Redis, and HTTP; Hyperscan for fast regular-expression execution on x86_64; and a worker model split into proxy (protocol translation, load balancing), normal (message scanning), and controller (web UI and management API) roles. Rspamd’s own migration documentation claims “10–100x faster processing” versus SpamAssassin; treat that as the project’s own comparison.

How it talks to the MTA

Rspamd “communicates results to your MTA via HTTP/JSON API or Milter protocol, recommending an action without directly handling mail delivery.” The default ports:

PortWorker / use
11332Milter listener (e.g. Postfix smtpd_milters = inet:localhost:11332)
11333Normal worker - HTTP scan API (/checkv2)
11334Controller worker - web UI and management
11335Fuzzy storage (UDP)

Because Rspamd only recommends an action, the MTA stays in control of delivery - the “independent layer” philosophy in practice. Redis underpins the stateful parts: “Statistics storage, learning data, rate limiting, and caching - all backed by Redis,” with the production quick-start installing redis-server alongside Rspamd and Redis HA supported.

Rspamd vs. SpamAssassin at a glance

DimensionSpamAssassinRspamd
Core languagePerlC core + Lua rules/plugins
ArchitectureProcess-per-message (spamd/spamc)Event-driven, 100+ concurrent/worker
Throughput~0.5–1 msg/sec/core~5–10 msg/sec/core
Memory30–50 MB50–100 MB/worker
ScoringNamed rules → score; default tag at 5.0Named symbols → score → 4 actions
BayesPlugin; 200+200 minimumBuilt-in; incompatible with SA’s DB
Fuzzy hashingExternal DCC/Razor plugins (DCC off by default)Native fuzzy_check (shingles + blake2b + HTML)
Greylist / rate limit / neural / ARCNot in default setAll bundled
RedisNot native (SQL/LDAP)Core dependency

Both figures for SpamAssassin’s throughput and memory come from Rspamd’s own comparison table; treat the head-to-head numbers as the project’s framing.

What this means for you, and what Egressif does

Rspamd’s action model is the practical thing for a sender to internalize: your score does not just decide spam-or-not, it decides deliver vs. greylist vs. tag vs. rewrite-subject vs. soft-reject vs. reject. That makes the deterministic, sender-controlled signals - aligned SPF/DKIM/DMARC (which Rspamd validates and caches), a stable identity that earns whitelist and reputation symbols, and disciplined lists that avoid fuzzy and complaint signals - directly responsible for which rung of that ladder you land on. And because the ladder includes deferral rather than only deletion, a borderline score rarely means a lost message - which is the architecture quietly honoring the asymmetric cost of a false positive.

Egressif keeps those inputs clean and consistent so that, across an Rspamd deployment, the negative-weight symbols (valid DKIM, good reputation, fuzzy whitelist) fire for you and the positive-weight ones stay quiet, pulling your score toward pass. We cannot set another operator’s action thresholds or train their Bayesian or neural models, and we do not claim a universal number. We make sure the symbols you generate add up in your favor.

Related references

Tell us what you run today.

Domains, rough volume, current providers, and what hurts. You will get a straight answer on fit, and a real number, in one conversation.

Talk to our team