Translation EDA/Visualization

Dataset

The dataset used in this project is a CSV file containing translation data between English and Arabic. The file includes the following columns:

english: Translated English text
arabic: Translated Arabic text

Notebook Overview

This notebook performs Exploratory Data Analysis (EDA) and data visualization on the translation dataset. Below is an outline of the steps included:

Libraries and Dependencies

The following libraries and tools are used:

NLTK: Natural Language Toolkit for text processing.
TextBlob: Simplified text processing library for sentiment analysis.
Arabic-Reshaper and Python-Bidi: For proper rendering of Arabic text.
Pandas: For data manipulation and analysis.
Matplotlib and Seaborn: For static data visualization.
Plotly: For interactive visualizations and word clouds.

Steps in the Notebook

Data Loading: The dataset is loaded into a Pandas DataFrame from the CSV file. The data is then prepared for analysis.
Exploratory Data Analysis (EDA):
- Frequency Analysis: Calculation and display of the frequency of each unique value in both the English and Arabic columns.
- Distribution of Character Lengths: Visualization of the distribution of character lengths for both languages.
- Length Comparison: Scatter plot comparing the lengths of English and Arabic text.
- Word Frequency: Identification of the 30 most frequently occurring words in both English and Arabic using CountVectorizer.
- Word Clouds: Visualization of the most common words in both languages.
Sentiment Analysis:
- Distribution of Sentiment Polarity: Analysis of the sentiment polarity distribution for both English and Arabic text.

Usage

Install the required libraries:

 pip install nltk textblob arabic-reshaper python-bidi matplotlib seaborn plotly

Download necessary resources:
```
 python -m textblob.download_corpora
```
Run the notebook: Open the notebook in a Jupyter environment or colab and execute the cells to perform EDA and visualize the data.

For accurate rendering of Arabic text in word clouds, an Arabic font file (NotoNaskhArabic-Regular.ttf) is required.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
NotoNaskhArabic-unhinted.zip		NotoNaskhArabic-unhinted.zip
README.md		README.md
Translation_EDA&VIS (1).ipynb		Translation_EDA&VIS (1).ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Translation EDA/Visualization

Dataset

Notebook Overview

Libraries and Dependencies

Steps in the Notebook

Usage

About

Releases

Packages

Languages

Yumna10/Translation-EDA-Visualization

Folders and files

Latest commit

History

Repository files navigation

Translation EDA/Visualization

Dataset

Notebook Overview

Libraries and Dependencies

Steps in the Notebook

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages