-
Notifications
You must be signed in to change notification settings - Fork 330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questions about ambiguous POS and cases and about extracting lines of poetry. #1218
Comments
--No, but it might be a good new feature; but it's problematic since doc ~ sentences and not verses.
--I would create a doc object for each sentence in verse, otherwise POS tags may not resolve as desired.
--No there's no built-in method to accommodate POS tag ambiguity in a corpus. With poetry one has to expect the rules of language to be broken. You probably want to have a look at the HexameterScanner class and related ones; knowing which vowels must be accented can help narrow the cases; POS taggers are mostly trained on prose and unaccented texts. |
Thank you so much! That explains a lot. |
Designing a specific |
Well, in the corpus poetry lines seem to be organized via line breaks, so can/could the DOC include the line break as a POS or other token? Then the workflow could go as follows:
|
I've been tasked to learn more about Bard AI, so here's what Bard just suggested. import cltk Create a new grammargrammar = cltk.parse.LabeledTreebankGrammar() Add a new rule to the grammargrammar.add_rule('line_break', '\n') Parse the sentencesentence = "Imminet exitio vir coniugis, illa mariti; Lurida terribiles miscent aconita novercae" Print the POS of the line breakprint(tokens[tokens.index('\n')]) |
I'm trying to figure out if CLTK would be the best library for a project I'm working on.
I'd like to quickly scan many corpora of Latin poetry for golden lines, at least to assemble a greedy list of possible lines that I can easily whittle down by eyeballing them.
So after about an hour of combing through these docs, I have a few questions. These are at a basic level because I haven't worked with CLTK and I am a very basic programmer. Feel free to tell me to just dive in and work with this until I understand it. And I'm not entirely sure if this is the forum for questions like this.
Ideally a golden line is a Latin hexameter or pentameter that consists of two nouns in different cases, each accompanied by an adjective/determinative/participle/propername/number agreeing with it, with a verb separating the modifiers from their nouns. Here are the two common types:
Lucan 1.95 Fraterno primi maduerunt sanguine muri. (chiastic abVAB form)
Lucan 1.105 Assyrias Latio maculavit sanguine Carrhas (concentric abVBA form)
Ambiguity in whether a form is a noun, proper noun, or adjective comes up in the phrases "ductorem Varum", "Deum Christum", "Augusti Caesaris" and "Ammon Jupiter" in the following golden lines. However we classify these parts of speech, they fulfill the form:
Manilius 1.899 cum fera ductorem rapuit Germania Varum
Prudentius Psychom. 74 atque innupta Deum concepit femina Christum
Corippus In laudem 4.138 Augusti priscum renovasti Caesaris aevum;
Rafael Landívar Rusticatio Mexicana 1.123 Et Lybicas Ammon contemnat Jupiter undas.
The text was updated successfully, but these errors were encountered: