Update README.md

Coldmist-Lu · web-flow · commit 871db5d0319f · 2025-10-14T11:44:11.000+08:00
diff --git a/README.md b/README.md
@@ -62,8 +62,6 @@ The task statistics are shown as follows:
 
 For the three LLMs (Llama2-70b-Chat, Mixtral-8x7b-Instruct, and GPT-3.5-Turbo), we evaluate a total of 106,758 segments drawn from 54 MT systems. For GPT-4, we restrict the evaluation to Chinese&ndash;English, using 30 randomly selected segments per MT system, for a total of 600 samples ("WMT22-Subset").
 
-The querys and responses of the LLMs can be found in "[results](./results/)".
-
 <h2 align="center">EAPrompt Implementation</h2>
 
 The main implementation is provided in [./EAPrompt](./EAPrompt/).
@@ -92,11 +90,9 @@ All prompt types used in the study are provided for replication. We adopt a stru
   - `"SRC"` for **reference-free** evaluation (source only);  
   - `"REF"` for **reference-based** evaluation.
 
-> Note: For the counting step, we use a simple identifier "COUNT". No additional keywords are required.
-
-According to our ablation experiments, we recommend using the prompt type **ERROR\_\{LANG\}\_ITEMIZED\_\{IS_REF\}** as the default configuration, For example: ERROR_ENDE_ITEMIZED_SRC
+> Note: For the counting step, we use a simple identifier `"COUNT"`. No additional keywords are required.
 
-According to our ablation experiments, we recommend **ERROR\_\{LANG\}\_ITEMIZED\_\{IS_REF\}** as prompt type, e.g. ERROR_ENDE_ITEMIZED_SRC. 
+According to our ablation experiments, we recommend using the prompt type **ERROR\_\{LANG\}\_ITEMIZED\_\{IS_REF\}** as the default configuration, For example: `ERROR_ENDE_ITEMIZED_SRC`
 
 **&#128640; Generating Queries & Responses**
 
@@ -109,9 +105,10 @@ For large-scale evaluation across multiple MT systems, we provide two example sc
 
 These scripts demonstrate the complete workflow for evaluating entire datasets efficiently.
 
-
 <h2 align="center">Results and Findings</h2>
 
+The querys and responses of the LLMs can be found in "[results](./results/)".
+
 1. **EAPrompt significantly enhances the performance of LLMs at the system level**. Notably, prompting *GPT-3.5-Turbo* with EAPrompt outperforms all other metrics and prompting strategies, establishing a new state-of-the-art.
 
 2. **EAPrompt surpasses GEMBA in 8 out of 9 test scenarios** across various language models and language pairs at the segment level.