You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As it stands now, atoms for sentences are ctestset_scf790_2 (context testset, sentence cf790_2), while word atoms are like ctestset_i25 (context testset, word 25). For instance, we have:
?- nlp_sentence(S), nlp_dependency(S,Y,Z,W).
S = ctestset_scf790_2,
Y = ctestset_i25,
Z = ctestset_i9,
W = punct
However, for analysing multiple sentences this isn't so great, as the 1st word from different sentences of same context are both called cCONTEXT_i1.
Perhaps we should add a sentence identifier on each word atom as well.
The text was updated successfully, but these errors were encountered:
can you give a concrete example where this fails? We never use a token ID outside the context of a single sentence, so I'm not sure why this would be a problem.
continuing, the reason that this is this way is that it makes much easier to debug rules when we have simple ids instead of long self-contained ones. In fact, I never use the context at all, since we can make all sentence ids to be unique over multiple files. This way we have simple ids for both sentences and tokens. I'm not opposed to changing it, this but I need a valid scenario where this would not work.
As it stands now, atoms for sentences are
ctestset_scf790_2
(context testset, sentence cf790_2), while word atoms are likectestset_i25
(context testset, word 25). For instance, we have:However, for analysing multiple sentences this isn't so great, as the 1st word from different sentences of same context are both called
cCONTEXT_i1
.Perhaps we should add a sentence identifier on each word atom as well.
The text was updated successfully, but these errors were encountered: