We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Will the mmbench test set score drop after dpo? Does this repo supports dpo without another reward model loaded?