Every test follows the same protocol, on every service, on every pass. It's the only way to produce honest comparisons between platforms that all promise the same thing. This page documents that protocol in plain terms so you can (a) understand our scores, (b) reproduce them yourself if you want, (c) push back when you spot an inconsistency.
The 10 standard prompts
Every tested service gets the exact same 10 prompts, in the same order. These prompts cover eight axes we consider critical for characterizing a tool: raw photorealism, narrative mood, motion and animation, non-English prompt handling, stylistic creativity, body-type variety, adult maturity in the output, and multi-character generation.
Two extra prompts test the technical limits: the service's real maximum duration, and character consistency across successive clips.
Non-negotiable red line: no prompt targets a juvenile age. All prompts explicitly name an adult 25+. Any service that generates juvenile-looking content from a soft prompt is systematically flagged as a critical defect, regardless of its other scores.
The 8 scoring axes
Every service is scored on 9 axes, each out of 100, with a weighting that reflects our editorial priorities: real NSFW content is measured on two complementary dimensions (kinks = variety, NSFW intensity = actual explicit delivery), photorealism matters as much as variety, and character consistency is treated as a plus rather than a dominant criterion.
Overall score
The overall /100 score shown on every service page is the weighted average of the 9 axes, rounded to the nearest integer. There's no manual adjustment — the formula is fixed, public, and applied identically to every service.
Quarterly retest
AI services move fast. A model version changes, a new NSFW fine-tune ships, a restriction gets added, pricing shifts — any of it can happen in a few weeks. We retest every service at least once per quarter, and on an emergency basis if a major change is announced.
The last-tested date is shown on every service page. A score older than three months is considered unreliable until the next retest.
Reviewing traditional adult sites
We also publish profiles of traditional adult sites (tubes, cam sites, hentai archives, OnlyFans aggregators, premium VOD studios, etc.) — see /sites for the full directory and /best for the category guides. The methodology there is different in scope:
Data baseline: SEMrush organic-positions mining of the largest adult-comparator (theporndude.com), classified into our 13 expansion categories. The dataset is auto-generated from the SEMrush export and published as a TypeScript file in the codebase. SEO metrics (keyword count, top-keyword volume, theporndude traffic) are visible on each site profile.
Editorial layer: for the top sites in each category, we add hand-written descriptions, pros/cons, best-for positioning, business model classification (free-ad, tips, paywall, freemium, leaked, amateur-ugc) and feature tags. These come from direct observation of the site and domain knowledge — they are not algorithmic ratings, just documented opinions.
What we don't do: we do not run the 10-prompt protocol or 9-axis scoring on traditional sites. That protocol is designed to evaluate AI generators producing content from prompts — it doesn't apply to pre-existing content libraries. Comparing a tube site and an AI generator on the same scale would be intellectually dishonest.
Leaked content sites: Coomer, ThotHub, Fapello and similar OnlyFans-leak aggregators appear in the directory because they exist and have significant search traffic. Their listings explicitly flag the copyright and consent issues in pros/cons. We do not link to them through affiliate channels and they are not promoted in our editorial — they are documented because pretending the segment doesn't exist would be intellectually dishonest.
Editorial independence
This site carries affiliate links. When you click from one of our service pages to a third-party service and subscribe, that service pays us — this is what funds our testing work. It never changes our scoring: the methodology predates the affiliation, tests are run without contact with the service, and no service sees its score before publication.
If a service reaches out asking to "correct" a score or request a special retest, we say no. The protocol is the protocol, retests happen on the quarterly cadence, full stop.
Reproducibility
The 10 standard prompts are documented in our internal notes and can be requested by email at [email protected]. If you want to rerun the test yourself on a service, we'll send you the exact list, the execution protocol and the evaluation criteria. Our scores should reproduce within ±5 points per tester.