PersonalityPrediction

This repository contains the code used for the experimentation shown in the paper published in the Journal of Personality (Journal Version, Preprint version).

Installation

Download data.zip file from https://drive.google.com/drive/folders/1WSJl_EW8hJ50IyygE2y0t5OiEmzlZtMD?usp=sharing and unzip it in the project dir.
Download yelp_academic_dataset_review.json from https://www.yelp.com/dataset/download and put it in the data/yelp_dataset folder.
execute pip install -r requirements.txt .
execute python -m embedding_tuning.nltk_init .

Project structure

The project is composed by the following directories.

data

Contains embeddings (GloVe and tuned embeddings), the known terms scoring dataset and the Yelp reviews corpus.

embedding_tuning

Module that tunes GloVe embedding.

tune_embedding.py: tunes GloVe embedding by training the CNN models on Yelp reviews corpus. Saves the tuned embeddings in the data directory. Each trait i (i=0: trait O, i=1: trait C, i=2: trait E, i=3: trait A, i=4: trait N) has its own i_trait subdirecotry containing the pickle file representing the hidden layer weights' matrix associated to the CNN model of the trait i. It stores also other pickle files: embedding vocabulary, words' frequencies, test/train inputs/outputs and initial weights.

fnn

Module that trains fnn models and performs performance tests (coherence, kfcv and embedding test).

coherence_test.py: performs the coherence test on the specified embedding. Results are stored in outputs/embedding/coherence_test/k_folds/d_dist/final_performances_coherence_(d)dist_(k)folds_(embedding).xlsx, where embedding is the specified embedding, k is the specified number of folds and d is the specified type of neighbors' set.
kfcv_test.py: performs the K-folds Cross-Validation test on the specified embedding. Results are stored in outputs/(embedding)/KFCV/(k)_folds/final_performances_(k)fcv_(embedding).xlsx, where embedding is the specified embedding and k is the specified number of folds.
predict.ipynb: notebook that trains the five fnn models on all the known terms and stores unknown terms' marker indices associated to the five traits in outputs/(embedding)/predictions.xlsx, where embedding is the specified embedding.
test_embedding.py: performs the KNN test on the specified embedding. Performances are stored in outputs/(embedding)/KNN/performances_(k)nn_(embedding).xlsx where embedding is the specified embedding and k is the specified number of nearest neighbors.

Running configuration

Each module has its own config subfolder that contains a file, whose name ends with _config.py, for each of the specified py files in the Project structure paragraph. To specify, for the first time, the running configuration of a specific py file, run the associated _config.py file. A yaml file with the same name will be created. Edit the yaml file with the desired running configuration and run the py file. The meaning of each parameter is specified in _config.py's comments.

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
embedding_tuning		embedding_tuning
fnn		fnn
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
settings.py		settings.py
utils.py		utils.py
vocab.py		vocab.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PersonalityPrediction

Installation

Project structure

data

embedding_tuning

fnn

Running configuration

About

Releases 1

Packages

Languages

federicogiannini13/personality_prediction

Folders and files

Latest commit

History

Repository files navigation

PersonalityPrediction

Installation

Project structure

data

embedding_tuning

fnn

Running configuration

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages