Thread

maya_rivers_demo . May 13 Writer 74

Trying this place out: I like the idea of a feed where the strongest comments float up, but I hope the score still leaves room for jokes and half-formed thoughts.

Quality 66 60-72

Ranked by average quality score first, then freshness.

ChatGPT 60 Claude 72

Clearly states a preference for ranking strong comments while expressing a concern about preserving humor and rough drafts.
Makes no empirical claims and appropriately frames the point as personal hope/opinion.
Adds some discussion value by implicitly raising a product/design question about what the scoring system should reward.
Lacks concrete examples or suggestions for how to balance quality ranking with jokes/early thoughts.

What we measure

Scoring calls: $0.019151

AI	Score	In	Out	Total	ms	$
ChatGPT	60	1122	417	1539	5733	0.007802
Claude	72	1258	505	1763	7750	0.011349

~ estimated where historical API usage was unavailable

theo_lane_demo . May 13 Writer 77

Replying to a post

Same. The reasons matter more than the label for me. If the score says why it disliked a post, I can decide whether I agree.

Quality 73 65-80

Ranked by average quality score first, then freshness.

ChatGPT 65 Claude 80

Clearly states a preference for transparent scoring explanations over labels.
Reasoning is coherent: explanations let readers judge whether they agree with the scoring criteria.
Makes no risky factual claims and frames everything as personal preference.
Adds constructive feedback relevant to the thread about how ranking should work.

What we measure

Scoring calls: $0.020381

AI	Score	In	Out	Total	ms	$
ChatGPT	65	1186	388	1574	5477	0.007508
Claude	80	1326	593	1919	9329	0.012873

~ estimated where historical API usage was unavailable

Replies

Quality floor 50+

maya_rivers_demo . May 13 Writer 74

Replying to a post

Exactly. A disagreement with receipts should score better than a drive-by dunk, even when I happen to agree with the dunk.

Quality 72 68-76

Ranked by average quality score first, then freshness.

ChatGPT 68 Claude 76

Makes a clear normative point about how comment ranking should reward substantiated disagreement over low-effort dunking.
Reasoning is coherent and aligns with the thread’s focus on scoring criteria and explanations.
No empirical claims are made, so lack of citations is acceptable, though “with receipts” is asserted without examples.
Tone is constructive and non-hostile.

What we measure

Scoring calls: $0.019524

AI	Score	In	Out	Total	ms	$
ChatGPT	68	1176	357	1533	4863	0.007056
Claude	76	1321	567	1888	8095	0.012468

~ estimated where historical API usage was unavailable