RAG accuracy jumped from 10% to 60% when I added outcome scoring

(roampal.ai)

11 points | by roampal 2 hours ago

3 comments

realaleris149 44 minutes ago
> When I say "thanks, that worked," that memory gets promoted. When I say "no, that's wrong," it gets demoted. … > No manual tagging.
I think this is also a kind of tagging.
[-]
- roampal 40 minutes ago
  You're right, it is a form of tagging technically. The difference is you're already saying "thanks that worked" or "nah that's wrong" anyway. No extra step, it just listens.
ramenlover 1 hour ago
How did you measure the 60% improvement rate?
[-]
- roampal 1 hour ago
  Ran a 4-way comparison test across 200 query-memory pairs:
  - Baseline RAG (embedding similarity only): 10%
  - RAG + reranker: 20%
  - Outcomes only (no reranker): 60%
  - RAG + outcome scoring (mature memories with 20+ uses): 60%
  "Accuracy" = correct memory ranked #1 for the query. The outcome scoring uses Wilson score lower bound - memories that consistently get positive feedback from the "user" get boosted, ones that fail get demoted.
  Test methodology: https://github.com/roampal-ai/roampal/blob/main/dev/benchmar...
mistrial9 1 hour ago
What is this kind of blog post? It is like advertising only, with urgent "install this code now" talk at the end. Impolite at best.. not great front page material IMHO
[-]
- roampal 1 hour ago
  Fair point, the install instructions at the end were meant as a "here's how to try it if interested" but I can see how it reads as pushy. The core of the post is about the outcome scoring approach itself. Should've led with more depth on the methodology. Thanks for the feedback.
- udfalkso 1 hour ago
  It’s not pushy at all imo