
![]() 1. What metrics and algorithms are used? 2. How frequent are surveys w/ human raters? 3. How are benchmarking results interpreted? |
![]() |



@ljmanso.bsky.social