Comparing Czech and Russian Valency On The Material of Vallex
Comparing Czech and Russian Valency On The Material of Vallex
Natalia Klyueva
Institute of Formal and Applied Linguistics
Charles University in Prague
[email protected]
446
Proceedings of KONVENS 2012 (LexSem 2012 workshop), Vienna, September 21, 2012
morphological tagger for Russian language upon 3 Czech Vallex for Russian verb frames
Czech data and tools. In (Benešová and Bojar,
We made a comparison of Czech and Russian
2006) authors compare the similarity between the
frames based on the Czech Lexicon in the follow-
automatically extracted valency frames and the
ing way. We took a Czech verb and said if its
manually annotated frames.
frame fits the frame of a Russian equivalent verb
2 Vallex and Verb Classes as well. At this stage, it was impossible to eval-
uate a big amount of verb frames (totally 2,903
Vallex - a manually created Lexicon of Czech lexical units have a verb class assigned), so we
Verbs - is based on the valency theory in the took the selection of the most frequent verbs dis-
Functional Generative Description (Panevová, tributed among the following verb classes: mo-
1994),(Sgall et al., 1986). It provides an informa- tion, communication, change, exchange and men-
tion on valency frames of the most frequent verbs tal action, as the most representative ones.
(in version Vallex 2.5 over 2.700 lexeme multi- This verb set contains frequent verbs of various
plied by different senses of the verbs). The frame semantic types. Our assumption is that the differ-
consists of a slot that reflects the number of com- ence in valency frames might be related to a verb
plements the verb may govern. A slot includes a class, in other words, verbs from certain classes
functor (a deep semantic role, written after a pe- might have tendency to have different valency in
riod attached to the word) and the surface realiza- Czech and Russian. In our study we focus on
tion of it(mostly morphosyntactic case, written in morphemic forms of noun complements, leav-
brackets). The main deep semantic roles that have ing aside verb complements and sentence com-
frequently been used in our work are: plements of verbs. Within a semantic class for
ACT:Actor, ex. I.ACT love peaches. each Czech verb we state whether or not a Rus-
PAT:Patient, ex. Cats love rats.PAT sian equivalent has the same valency structure.
ADDR:Addressee(a person or an object to For example, (1) shows the verb with the same
whom/to which the action is performed - more in valency frame and the verb in example (2) has
the paper below), ex. He gave him.ADDR a book two discrepancies in it..1
DIFF:Difference measure, ex. Prices have fallen
twice.DIFF (1cz)obhajovat ACT(Nom) PAT(Acc),to de-
Verbs are classified into verb classes accord- fend
ing to their meaning, which we have used in (1ru)zaščiščat’ ACT(Nom) PAT(Acc),to defend
our research as well. Vallex distinguishes 22 The frame is the same in both languages
verb classes, among them are communication,
exchange, motion, perception, transport, psych (2cz)blahopřál mu.ADDR(Dat) k narozeninám
verb, just to mention some. Naturally, words ’congratulated him.ADDR(Dat) to birthday’
that belong to the same semantic field or share (2ru)pozdravljal ego(Acc) s dnem roždenija
some component of meaning will have a similar ’congratulated him.ADDR(Acc) with birth-
valency frame. Vallex entry also provides other day(with+Ins)’
valuable information on aspect, reciprocity, re-
flexivity etc. that we have not used in our work, In the example (2) it is illustrated that in Czech
so it will not appear in our examples. Here is an and Russian different prepositions and different
example of a Czech verb frame that belongs to the cases are used to express the same semantic roles
Mental Action verb class: - Patient and Addressee. Especially diverse in
apelovat Act(Nom) Pat(na+Acc)-(on+Acc). This this case is the surface realization of Patient as a
means that the verb to appeal governs two argu- prepositional phrase across the languages: Czech
ments: an Actor in the Nominative case and a Pa-
1
tient in prepositional phrase on+Accusative. The There are 6 cases in Russian and 7 cases in Czech (7th,
Vocative, is not relevant for our study) and case endings are
case systems in Czech and Russian are very simi- very similar in both languages. Czech and Russian prepo-
lar and prepositions have almost identical surface sitions are almost identical as well. All this makes it rather
form which simplifies the process of comparison. easy to detect differences in valency frames.
447
Proceedings of KONVENS 2012 (LexSem 2012 workshop), Vienna, September 21, 2012
- congratulate to, Russian - congratulate with, En- (4cz) Administrace zkrátila dovolenou o 2
glish - congratulate on/upon. dny
We consider the Russian frame to be similar to ’administration cut off the holiday about 2 days’
the Czech one if it has the same number com- (4ru) Administracija sokratila otpusk na 2 dnja
plements, the same semantic roles and if these ’administration cut off the holiday on 2 days’
semantic roles have the same surface realization.
All the verbs we observed met the first two con- For the functor DIFF, we should mention,
ditions because we tried hard to find the clos- that the form (about+Acc) is typical of Czech
est translation equivalent in Russian. It was al- while Russian language uses the preposition ’o’
ways the surface form that was different in two (about) mainly with mental predicates like En-
languages. If a surface form is represented by a glish(forget about+Loc) or communication verbs
preposition with some case, we judge the default (tell about+Loc) and does not occur with the Ac-
translation of prepositions as the similar realiza- cusative case at all.
tion.
Further on, to simplify the examples, we will 4.2 Class of Motion
leave only the slot of the frame that is differ- We have not found many dissimilarities in Czech
ent in the languages and leave out the slots that and Russian valency frames within the class of
are irrelevant to our comparison. So the ex- Motion verbs. One most evident is that verbs of
ample verb (2cz) will be shortened to blahopřát classes motion with the semantic component of
PAT(k+Dat) ADDR(Dat) and (2ru)pozdravljat’ ’going away from somewhere’ in the case they
PAT(s+Ins) ADDR(Acc) ’ leaving aside the func- have the surface realization of PAT as (před+Ins)-
tor ACT(Nom) which is almost always the same (before+Ins) in Czech are translated into Russian
in Czech and Russian. The examples in this pa- with the respective verb plus the prepositional
per are either taken from corpus, invented or taken phrase (ot+Gen)-(from+Gen), not the expected
straightly from Vallex examples. ru:(pered +Ins): prchat, ujı́ždět, unikat.
448
Proceedings of KONVENS 2012 (LexSem 2012 workshop), Vienna, September 21, 2012
whom/which something is taken. us believe that difference in valency frames can
depend on the semantic class. Only two words
(7cz)brát ADDR(Dat)-’take +Dat’ of this class with different valency framedo not
bere dı́těti hračku -’takes baby.Dat toy’ belong to the group described, and we consider
(7ru)brat’ ADDR(u+Gen)-’take (u+Gen)’ them to be occasional discrepancies. The number
beret u rebenka igrushku ’He takes of baby.Gen of occasional discrepancies in the verb classes is
toy’ not so big in comparison with ones that have some
(7en)’He takes a toy from a baby’ regular difference.
449
Proceedings of KONVENS 2012 (LexSem 2012 workshop), Vienna, September 21, 2012
(19ru)pomnit’ PAT(pro+Acc) ’to remember
(14cz)poblahopřát ADDR(Dat)’congratulate about’
+Dat’ (20cz)myslet PAT(na+Acc) ’to think on’
(14ru)pozdravit’ ADDR(Acc) ’congratulate (20ru)dumat’ PAT(o+Loc) ’to think about’
+Acc’
(21cz)zvykat si PAT(na+Acc) ’get used on’
(15cz)děkovat ADDR(Dat) ’thank +Dat’ (21ru)privykat’ PAT(k+Dat) ’to get used to’
(15ru)blagodarit’ ADDR(Acc) ’to thank + Acc’
The structure of the following verb coincides
4. Similar to the verbs of Exchange class, a lot with that from ex. (14) and (15) though the
some Czech communication verbs with surface functor is PAT, not ADDR:
form (o+Acc)-(about+Acc) will be translated in (22cz)rozumět PAT(Dat) ’understand’
another manner in Russian due to the fact that, (22ru)ponimat’ PAT(Acc) ’understand’
unlike in Czech, the preposition ’o’-’about’ does
not combine with the Accusative: Other coincidental differences:
(23)pohrdat PAT(Ins)
(16cz)hlásit se PAT(o+Acc): (23)prezirat’ PAT(Acc) ’to despice’
hlásı́ se o slovo ’ask about word’ (24cz)mrzet ACT(Acc)
(16ru)prosit’ PAT(Gen) (24ru)sožalet ACT(Nom) ’to be sorry for’
Ona prosit slova The example (24) is the one of a very few verbs
’She ask word.gen’ with different surface realization of ACTor.
(16en)’ask for a word’
4.6 Overall results
Coincidental differences occurring only once
or twice are not going to any scheme: We have compared Czech and Russian valency
frames of verbs from 5 semantic classes, totally
(17cz)doznávat se PAT(k+Dat) ’confess to 1473 lexical entries. 111(7.5%) of them were
smth’ different in Czech and Russian. The compari-
(17ru)priznavat’sja PAT(v+Loc) ’to confess in son was rather straightforward because of the re-
smth’ latedness of the languages. If some more dis-
(18cz)konzultovat PAT(Acc) ’to consult +Acc’ tant languages were compared, more complicated
(18ru)konsultirovat’ PAT(po+Loc)-(about+Loc) method of evaluation should be chosen. From the
examples above we can make the following ob-
servations:
4.5 Class of Mental Action
• most dissimilarities occur in prepositional
Verbs of this class often have differences in va- phrases.
lency frames, but they are rather coincidental and
we have found only one regular difference - when • the regular discrepancies are more frequent
Czech PAT(na+Acc)-(on+Acc) corresponds to than the coincidental ones.
Russian PAT (o+Loc)-(about+Loc), (pro+Acc)-
• Within a verb class we can find some typ-
(about+Acc) or (k+Dat)-(to+Dat). The surface
ical valency patterns of Czech verbs which
form (na+Acc) in Czech is also different for
correspond regularly to the different Russian
verbs belonging in the class Communication,
pattern.
but for that class it was regularly translated as
(o+Loc)-(about+Loc) whereas for the class of The table 1 presents the distribution of verbs
Mental Action no common translation equivalent with different frames according to the verb
exists. classes.
(19cz)pamatovat PAT(na+Acc) ’to remem- From this table we can see that verbs of phys-
ber on’ ical activity(change, motion, exchange) have in
450
Proceedings of KONVENS 2012 (LexSem 2012 workshop), Vienna, September 21, 2012
Verb class same frame different frame # of verbs
Change 309(95%) 14(5%) 323
Exchange 166(92%) 13(8%) 179
Motion 305(99%) 3(1%) 308
Communication 312(88%) 42(12%) 354
Mental Action 270(87%) 39(13%) 309
Total 1362(92%) 111(8%) 1473
some sense less complicated valency structures V. Benešová and O. Bojar. 2006. Czech Verbs of
than verbs of mental activity(communication, Communication and the Extraction of their Frames.
mental action) and that in most cases, their va- Proceedings of the 9th International Conference,
lency structure corresponds to that of Russian TSD 2006, pages 29-36.
J. Hana and A. Feldman. 2004. Portable Lan-
verbs.
guage Technology: Russian via Czech. Proceed-
ings of the Midwest Computational Linguistics Col-
5 Conclusion loquium, June 25-26, 2004, Bloomington, Indiana.
D. Hlaváčková and A. Horák. 2006. VerbaLex -
In this paper we have described the dissimilari- New Comprehensive Lexicon of Verb Valencies for
ties in Czech and Russian Valency based on the Czech. Computer Treatment of Slavic and East Eu-
material of the Czech lexicon. Our main hypoth- ropean Languages. Bratislava, Slovakia: Slovenský
esis was that the differences in valency structure národný korpus, p. 107-115.
might be explained by the semantics of verbs , so N. Klyueva and V. Kuboň. 2010. Verbal Valency in
we have exploited the classification of the seman- the MT Between Related Languages. Proceedings
tic classes provided by Vallex. In almost in each of Verb 2010, Interdisciplinary Workshop on Verbs,
The Identification and Representation of Verb Fea-
verb class we have found some regular dissimilar-
tures Universita di Pisa - Dipartimento di Linguis-
ity that is typical of this class. Still, there are some tica, Pisa, Italy, pp. 160-164.
cases when verbs from other classes are subjected M. Lopatková and J. Panevová. 2006. Recent devel-
to this regularity as well, so other aspects (such opments in the theory of valency in the light of the
as surface realization) should also be taken into Prague Dependency Treebank. In Mária Šimková,
con- sideration. A practical result of our paper is editor, Insight into Slovak and Czech Corpus Lin-
that we have made a draft version of a small bilin- guistic, pages 83-92. Veda Bratislava, Slovakia.
gual Czech-Russian lexicon with different frames M. Lopatková, Z. Žabokrtský and V. Benešová. 2006.
Valency Lexicon of Czech Verbs VALLEX 2.0. Tech-
in the Vallex format.
nical Report 34, UFAL MFF UK.
I. Mel’čuk and A. Zholkovsky. 1984. Explana-
Acknowledgments tory Combinatorial Dictionary of Modern Russian.
Semantico-syntactic Studies of Russian Vocabulary.
The research is supported by the grants Vienna: Wiener Slawistischer Almanach.
P406/2010/0875 GAČR and GAUK 639012. K. Oliva. 1989. A parser for Czech implemented in
systems Q. Praha:MFF UK
J. Panevová. 1994. Valency Frames and the Mean-
References ing of the Sentence. The Prague School of Struc-
tural and Functional Linguistics (ed. Ph. L. Luels-
Ch. Fellbaum. 1998. WordNet: An Electronic Lexical
dorff), Amsterdam-Philadelphia, John Benjamins,
Database. Cambridge, MA: MIT Press
pp. 223-243.
C. Baker, C. Fillmore and J. Lowe. 1998. The Berke- P. Sgall, E. Hajičová and J. Panevová. 1986. The
ley FrameNet Project. Proceedings of the 17th in- Meaning of the Sentence and Its Semantic and
ternational conference on Computational linguis- Pragmatic Aspects. Academia/Reidel Publish-
tics - Volume 1 (COLING ’98), Vol. 1. Associa- ing Company, Prague,Czech Republic/Dordrecht,
tion for Computational Linguistics, Stroudsburg, Netherlands
PA, USA, 86-90.
451
Proceedings of KONVENS 2012 (LexSem 2012 workshop), Vienna, September 21, 2012