Borges Corpus

(WIP)

by Karen Palacio

Introducción

No hay muchos recursos sobre PLN sobre textos en español.

Este es un intento de contribuir al ecosistema del mismo.

En este repo hay dos tipos de datasets: uno se correponde con los textos completos de cuentos de autorxs latinoamericanxs, y el otro contiene el análisis de sentimientos oración por oración de los textos. Para el armado se scrapeó la página Ciudad Seva con Selenium. Para el análisis de sentimientos se usó sentiment-analysis-spanish

En este repositorio además vas a encontrar los scripts que generan este dataset.

datasets/
├── arlt_full_texts.pkl
├── arredondo_full_texts.pkl
├── benedetti_full_texts.pkl
├── bombal_full_texts.pkl
├── borges_full_texts.pkl
├── carrington_full_texts.pkl
├── davila_full_texts.pkl
├── de_la_parra_full_texts.pkl
├── garro_full_texts.pkl
├── links/
├── lispector_full_texts.pkl
├── lyra_full_texts.pkl
├── ocampo_full_texts.pkl
└── sents_borges.pkl

In this repo you'll find:

datasets/<author>_full_texts.pkl: in here you will find the complete scraped texts in raw form, plus their metadata.
datasets/links/links_<author>.txt: urls, sources of the text data. used for the scraping part of the dataset building.
sents_<author>.pkl: a list of dataframes, corresponding to each of the texts, in the same order. This is done with the library sentiment-analysis-spanish (More libraries comming soon)
scraper.py: the script that builds the dataset with sentiment analysis sents.pkl
full_text_scraper.py: the script that builds <author>_full_texts.pkl given an author's name and the file with the links to scrap from.
links_scraper.py: script that builds a ./datasets/links/links_<author>.txt file, used in the scraping process.

TO DO

Expand vertically

Complementar con más datasets de otrxs autorxs y otras bibliotecas digitales

In doing so, do it with the following methodology:

for each male author, look for a woman author.

Trying to keep it balanced. Complicado teniendo en cuenta que incluso si hay un par de textos por autora, no se compara con la cantidad de textos de los hombres. Como no encuentro cantidad por autora voy a tener que hacer un esfuerzo a lo ancho ... es decir tener de cantidad de nombres.

Expaind horizontally

Would be of interest to also collect:

Years of each story, genre(s), author's nationality

find a way to automate this. even just a CLI would do.

Separate and expand genres

Hasta ahora solo estoy scrapeando cuentos. Estaría bueno tener poemas, "otros" , minicuentos.

Incorporate authors:

blogspot también dividido por nacionalidad

Mujeres

    Isabel Allende (1942)
    Marcela Serrano (1951) ...
    Gioconda Belli. ...
    Sor Juana Inés de la Cruz (1651 - 1695) ...
    Alfonsina Storni (1892 - 1938) ...
    Gabriela Mistral (1889 - 1957) ...
    Juana de Ibarbourou (1892 - 1979) ...

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
datasets		datasets
.gitignore		.gitignore
LDA.ipynb		LDA.ipynb
README.md		README.md
borges.ipynb		borges.ipynb
full_text_scrapper.py		full_text_scrapper.py
link_scrapper.py		link_scrapper.py
scraper.py		scraper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Borges Corpus

Introducción

Contents

TO DO

Expand vertically

Expaind horizontally

Separate and expand genres

Incorporate authors:

blogspot también dividido por nacionalidad

María Elena Walsh - Argentina:

Gabriela Mistral - Chile:

Claudia Cortalezzi - Argentina:

Alejandra D'Atri - Argentina:

Salzano - Argentina:

Rubén Darío - Nicaragua

Horacio Quiroga

Cortazar

Macedonio Fernández

About

Releases

Packages

Languages

JuanuMusic/borges

Folders and files

Latest commit

History

Repository files navigation

Borges Corpus

Introducción

Contents

TO DO

Expand vertically

Expaind horizontally

Separate and expand genres

Incorporate authors:

blogspot también dividido por nacionalidad

María Elena Walsh - Argentina:

Gabriela Mistral - Chile:

Claudia Cortalezzi - Argentina:

Alejandra D'Atri - Argentina:

Salzano - Argentina:

Rubén Darío - Nicaragua

Horacio Quiroga

Cortazar

Macedonio Fernández

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages