Prompt quality is often judged subjectively. This project provides a repeatable simulation pipeline with transparent scoring, configurable weighting, and benchmark workflows that can be run from both ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results