This repository contains the code for our ECIR 2023 accepted work: Towards Effective Paraphrasing for Information Disguise
.
If you face any issues, you can contact the author(s) at [email protected]
.
code/beam_search_code/Disguise Text.ipynb
: Shows the disguise of a true sentence (query) via our modelcode/beam_search_code/beam_helper
: contains all the helper modules for our modelbeam_utils.py
: contains the code dealing with single level phrase substitution, Beam Search, Constituency Parse Tree creation etc.synonyms_store.py
: contains the code to get synonyms of a term in Counterfitting synonyms vector spacefaiss_fetch.py
: Contains the code for initializing DPR and fetching top K relevant documentsperplexity_calculation.py
: contains the code initiating the perplexity calculationfetch_use_scores.py
: contains the code to create Universal Sentence Encoding for a given piece of text
code/beam_search_code/counter-fitted-vectors.txt
: Counterfitting vectors used for fetching synonymsdata/all_syns.json
: Contains the 10 nearest neighbours for all terms in the dictionary (the nearest neighbours were calcuated by usingFacebook AI Similarity Search (FAISS)
) on the vectors incounter-fitted-vectors.txt
sql_lite_dbs/<name>.db
: expects the database containing the metadata and contents of the document store (to be used by DPR)code/faiss_indexes/<name>.faiss
: expects the vectors for the documents in the document storecode/faiss_indexes/exp_with_two_thou_short.json
: expects the configuration file containing the parameters describing how to read ".faiss"
Details of the conda environment
for the above codebase is present in adversarial_search.yaml
.
We use Haystack's DPR implementation.
Parameter Name | Description |
MAX_DEPTH | Number of levels in the beam search tree ie the MAXIMUM number of phrase substitutions allowed to be made in the query |
ALPHA_VAL |
|
NUM_PERPLEXITY_NODES_TO_EXPAND |
|
BeamWidth | Max number of nodes at each level of the beam tree. |
NUM_FAISS_DOCS_TO_RETRIEVE | Max relevant documents to be fetched for the query in which the source document's presence needs to be checked. |
SIMILARITY_CUT_OFF_THRESHOLD |
|
The work can be cited as:
@inproceedings{10.1007/978-3-031-28238-6_22,
author = {Agarwal, Anmol and Gupta, Shrey and Bonagiri, Vamshi and Gaur, Manas and Reagle, Joseph and Kumaraguru, Ponnurangam},
title = {Towards Effective Paraphrasing For Information Disguise},
year = {2023},
isbn = {978-3-031-28237-9},
publisher="Springer Nature Switzerland",
address="Cham",
url = {https://doi.org/10.1007/978-3-031-28238-6_22},
doi = {10.1007/978-3-031-28238-6_22},
booktitle = {Advances in Information Retrieval: 45th European Conference on Information Retrieval, ECIR 2023, Dublin, Ireland, April 2–6, 2023, Proceedings, Part II},
pages = {331–340},
keywords = {Neural information retrieval, Adversarial retrieval, Information disguise, Paraphrasing, Computational ethics},
location = {Dublin, Ireland}
}