Skip to content

Commit 871db5d

Browse files
authored
Update README.md
1 parent 7466603 commit 871db5d

File tree

1 file changed

+4
-7
lines changed

1 file changed

+4
-7
lines changed

README.md

Lines changed: 4 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -62,8 +62,6 @@ The task statistics are shown as follows:
6262

6363
For the three LLMs (Llama2-70b-Chat, Mixtral-8x7b-Instruct, and GPT-3.5-Turbo), we evaluate a total of 106,758 segments drawn from 54 MT systems. For GPT-4, we restrict the evaluation to Chinese–English, using 30 randomly selected segments per MT system, for a total of 600 samples ("WMT22-Subset").
6464

65-
The querys and responses of the LLMs can be found in "[results](./results/)".
66-
6765
<h2 align="center">EAPrompt Implementation</h2>
6866

6967
The main implementation is provided in [./EAPrompt](./EAPrompt/).
@@ -92,11 +90,9 @@ All prompt types used in the study are provided for replication. We adopt a stru
9290
- `"SRC"` for **reference-free** evaluation (source only);
9391
- `"REF"` for **reference-based** evaluation.
9492

95-
> Note: For the counting step, we use a simple identifier "COUNT". No additional keywords are required.
96-
97-
According to our ablation experiments, we recommend using the prompt type **ERROR\_\{LANG\}\_ITEMIZED\_\{IS_REF\}** as the default configuration, For example: ERROR_ENDE_ITEMIZED_SRC
93+
> Note: For the counting step, we use a simple identifier `"COUNT"`. No additional keywords are required.
9894
99-
According to our ablation experiments, we recommend **ERROR\_\{LANG\}\_ITEMIZED\_\{IS_REF\}** as prompt type, e.g. ERROR_ENDE_ITEMIZED_SRC.
95+
According to our ablation experiments, we recommend using the prompt type **ERROR\_\{LANG\}\_ITEMIZED\_\{IS_REF\}** as the default configuration, For example: `ERROR_ENDE_ITEMIZED_SRC`
10096

10197
**🚀 Generating Queries & Responses**
10298

@@ -109,9 +105,10 @@ For large-scale evaluation across multiple MT systems, we provide two example sc
109105

110106
These scripts demonstrate the complete workflow for evaluating entire datasets efficiently.
111107

112-
113108
<h2 align="center">Results and Findings</h2>
114109

110+
The querys and responses of the LLMs can be found in "[results](./results/)".
111+
115112
1. **EAPrompt significantly enhances the performance of LLMs at the system level**. Notably, prompting *GPT-3.5-Turbo* with EAPrompt outperforms all other metrics and prompting strategies, establishing a new state-of-the-art.
116113

117114
2. **EAPrompt surpasses GEMBA in 8 out of 9 test scenarios** across various language models and language pairs at the segment level.

0 commit comments

Comments
 (0)