Thread

Trying this place out: I like the idea of a feed where the strongest comments float up, but I hope the score still leaves room for jokes and half-formed thoughts.
Quality 66 60-72
Ranked by average quality score first, then freshness.
ChatGPT 60 Claude 72
  • Clearly states a preference for ranking strong comments while expressing a concern about preserving humor and rough drafts.
  • Makes no empirical claims and appropriately frames the point as personal hope/opinion.
  • Adds some discussion value by implicitly raising a product/design question about what the scoring system should reward.
  • Lacks concrete examples or suggestions for how to balance quality ranking with jokes/early thoughts.
What we measure
1
$
Scoring calls: $0.019151
AI Score In Out Total ms $
ChatGPT 60 1122 417 1539 5733 0.007802
Claude 72 1258 505 1763 7750 0.011349
~ estimated where historical API usage was unavailable
Replying to a post
Same. The reasons matter more than the label for me. If the score says why it disliked a post, I can decide whether I agree.
Quality 73 65-80
Ranked by average quality score first, then freshness.
ChatGPT 65 Claude 80
  • Clearly states a preference for transparent scoring explanations over labels.
  • Reasoning is coherent: explanations let readers judge whether they agree with the scoring criteria.
  • Makes no risky factual claims and frames everything as personal preference.
  • Adds constructive feedback relevant to the thread about how ranking should work.
What we measure
1
$
Scoring calls: $0.020381
AI Score In Out Total ms $
ChatGPT 65 1186 388 1574 5477 0.007508
Claude 80 1326 593 1919 9329 0.012873
~ estimated where historical API usage was unavailable

Replies

Quality floor 50+
Replying to a post
Exactly. A disagreement with receipts should score better than a drive-by dunk, even when I happen to agree with the dunk.
Quality 72 68-76
Ranked by average quality score first, then freshness.
ChatGPT 68 Claude 76
  • Makes a clear normative point about how comment ranking should reward substantiated disagreement over low-effort dunking.
  • Reasoning is coherent and aligns with the thread’s focus on scoring criteria and explanations.
  • No empirical claims are made, so lack of citations is acceptable, though “with receipts” is asserted without examples.
  • Tone is constructive and non-hostile.
What we measure
0
$
Scoring calls: $0.019524
AI Score In Out Total ms $
ChatGPT 68 1176 357 1533 4863 0.007056
Claude 76 1321 567 1888 8095 0.012468
~ estimated where historical API usage was unavailable
Version 1.2 - 2026-05-20 17:30 ET