Methodology

How we test.

Every test follows the same protocol, on every service, on every pass. It's the only way to produce honest comparisons between platforms that all promise the same thing. This page documents that protocol in plain terms so you can (a) understand our scores, (b) reproduce them yourself if you want, (c) push back when you spot an inconsistency.

The 10 standard prompts

Every tested service gets the exact same 10 prompts, in the same order. These prompts cover eight axes we consider critical for characterizing a tool: raw photorealism, narrative mood, motion and animation, non-English prompt handling, stylistic creativity, body-type variety, adult maturity in the output, and multi-character generation.

Two extra prompts test the technical limits: the service's real maximum duration, and character consistency across successive clips.

Non-negotiable red line: no prompt targets a juvenile age. All prompts explicitly name an adult 25+. Any service that generates juvenile-looking content from a soft prompt is systematically flagged as a critical defect, regardless of its other scores.

The 8 scoring axes

Every service is scored on 9 axes, each out of 100, with a weighting that reflects our editorial priorities: real NSFW content is measured on two complementary dimensions (kinks = variety, NSFW intensity = actual explicit delivery), photorealism matters as much as variety, and character consistency is treated as a plus rather than a dominant criterion.

Axis Weight

Max video length

Maximum clip duration on the tested tier, measured in seconds. Minor axis in our weighting — video is still a differentiator, but no service on the market in 2026 goes past 10s without chaining, so the effective gap is small.

5%

Photorealism

Visual quality on a standardized panel of 10 prompts. Human scoring backed up by automated CLIP score. Foundational axis for a porn-first, realism-first positioning.

22%

Kinks variety

Range of scenarios, fetishes and archetypes supported without arbitrary refusals from the model. Tested with a set of 20 varied prompts. Heaviest axis — this is the line between a companion that tolerates NSFW and an actual porn generator.

25%

NSFW intensity

How explicit the service actually gets when asked, without self-censoring or politely rewording the prompt. Distinct from variety — a "companion tolerating NSFW" scores low here, a "hardcore-first" service scores high.

20%

Useful free tier

How much you can try without pulling out a card. Counts available generations, limits, watermarks, quality degradation.

10%

Value for money

Effective price per usable generation on the lowest tier, compared against category peers.

7%

UX / platform

Interface, generation time, mobile support, onboarding quality, absence of dark patterns. Deliberately weighted low — a premium UX doesn't save a service that fails to deliver the content.

5%

Character consistency

Identity stability across successive clips, measured via InsightFace cosine similarity on 3 chained clips. Important for narrative use, secondary for one-shot exploration.

4%

Language support

Quality of non-English prompt handling. Weighted low after the SEMrush analysis: 54.8% of French-language volume is actually on English keywords, so native FR support is secondary to the EN catalog.

2%

Overall score

The overall /100 score shown on every service page is the weighted average of the 9 axes, rounded to the nearest integer. There's no manual adjustment — the formula is fixed, public, and applied identically to every service.

Quarterly retest

AI services move fast. A model version changes, a new NSFW fine-tune ships, a restriction gets added, pricing shifts — any of it can happen in a few weeks. We retest every service at least once per quarter, and on an emergency basis if a major change is announced.

The last-tested date is shown on every service page. A score older than three months is considered unreliable until the next retest.

Reviewing traditional adult sites

We also publish profiles of traditional adult sites (tubes, cam sites, hentai archives, OnlyFans aggregators, premium VOD studios, etc.) — see /sites for the full directory and /best for the category guides. The methodology there is different in scope:

Data baseline: SEMrush organic-positions mining of the largest adult-comparator (theporndude.com), classified into our 13 expansion categories. The dataset is auto-generated from the SEMrush export and published as a TypeScript file in the codebase. SEO metrics (keyword count, top-keyword volume, theporndude traffic) are visible on each site profile.

Editorial layer: for the top sites in each category, we add hand-written descriptions, pros/cons, best-for positioning, business model classification (free-ad, tips, paywall, freemium, leaked, amateur-ugc) and feature tags. These come from direct observation of the site and domain knowledge — they are not algorithmic ratings, just documented opinions.

What we don't do: we do not run the 10-prompt protocol or 9-axis scoring on traditional sites. That protocol is designed to evaluate AI generators producing content from prompts — it doesn't apply to pre-existing content libraries. Comparing a tube site and an AI generator on the same scale would be intellectually dishonest.

Leaked content sites: Coomer, ThotHub, Fapello and similar OnlyFans-leak aggregators appear in the directory because they exist and have significant search traffic. Their listings explicitly flag the copyright and consent issues in pros/cons. We do not link to them through affiliate channels and they are not promoted in our editorial — they are documented because pretending the segment doesn't exist would be intellectually dishonest.

Editorial independence

This site carries affiliate links. When you click from one of our service pages to a third-party service and subscribe, that service pays us — this is what funds our testing work. It never changes our scoring: the methodology predates the affiliation, tests are run without contact with the service, and no service sees its score before publication.

If a service reaches out asking to "correct" a score or request a special retest, we say no. The protocol is the protocol, retests happen on the quarterly cadence, full stop.

Reproducibility

The 10 standard prompts are documented in our internal notes and can be requested by email at [email protected]. If you want to rerun the test yourself on a service, we'll send you the exact list, the execution protocol and the evaluation criteria. Our scores should reproduce within ±5 points per tester.