0% found this document useful (0 votes)
25 views6 pages

Comparing Czech and Russian Valency On The Material of Vallex

The document compares Czech and Russian valency frames based on data from the Czech Valency Lexicon (Vallex). It assumes that for most verbs, Czech and Russian will have similar valency structures, as they are closely related Slavic languages. However, the focus is on verbs where the valency structures differ between the languages. It analyzes verb frames from certain semantic classes to determine if differences correlate with class. Preliminary analysis found some verbs had identical frames, while others differed in case or preposition usage between Czech and Russian.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
25 views6 pages

Comparing Czech and Russian Valency On The Material of Vallex

The document compares Czech and Russian valency frames based on data from the Czech Valency Lexicon (Vallex). It assumes that for most verbs, Czech and Russian will have similar valency structures, as they are closely related Slavic languages. However, the focus is on verbs where the valency structures differ between the languages. It analyzes verb frames from certain semantic classes to determine if differences correlate with class. Preliminary analysis found some verbs had identical frames, while others differed in case or preposition usage between Czech and Russian.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 6

Comparing Czech and Russian Valency on the Material of Vallex

Natalia Klyueva
Institute of Formal and Applied Linguistics
Charles University in Prague
[email protected]

Abstract natorial Dictionary of Modern Russian(Mel’čuk


and Zholkovsky, 1984), which is comparably big
In this study we have compared Czech and rich in terms of language information is not
and Russian valency frames based on available on-line, so we can not make a straight-
monolingual and bilingual data. We
forward comparison. Instead, we have looked at
assume that Czech and Russian are close
enough to have, for the majority of their the Russian verbal valency through the prism of
verbs, similar valency structures . We have the Czech one.
exploited Vallex as a source of valency Czech and Russian are Slavic languages which
frames and have used a Czech-Russian are related and therefore share many morpholog-
dictionary to automatically translate Czech
ical and syntactic features. A valency frame of a
verbs into Russian. Afterwards, we have
manually checked whether the Czech
Czech verb is, in the majority of cases, similar to
frame fits the Russian verb and, in case it that of a Russian one. The focus of our study is
was different, we have added the verb to on the verbs which have a different valency struc-
the set that will be described in our paper. ture between the two languages. It seemed inter-
We suggest that there is a connection esting for us not just to collect the set of those
between the semantic class of some verbs verbs, but rather to find out whether those Czech
and the type of difference between their and Russian verbs that present some dissimilar-
Czech and Russian valency frames.
ity between their valency have some regularity or
rule, or if this discrepancies are merely coinciden-
tal. Our hypothesis is that these differences have
1 Introduction something to do with the semantic class of a verb.
Verbal valency is an important topic in Natu- There is a resource of valency bilingual data for
ral Language Processing which has been broadly Czech and Russian - the dictionary Ruslan (Oliva,
studied within various linguistic branches - the- 1989) that contains this information. But it is
oretical and practical. Bilingual research on not big and it can only give us ’absolute’ num-
valency is crucial for practical fields such as bers - the percentage of verbs with different va-
Machine Translation, or second language acqui- lency structure (Klyueva and Kuboň, 2010), with-
sition. There are many sources of informa- out a insight into the nature of these dissimilar-
tion on both valency and word classes - Word- ities. Vallex enables us to browse various verbs
Net(Fellbaum, 1998), FrameNet(Baker et al., classes and see the underlying connections be-
1998), VerbaLex(Hlaváčková and Horák, 2006) tween the semantics of these verbs and the dif-
and Vallex(Lopatková et al., 2006) to name some ference of frames in the two languages.
of them. The central resource for our research has The idea of using data from Czech language in
been the Czech Valency Lexicon Vallex. The Re- order to create new data for Russian was exploited
source for Russian Valency, Explanatory Combi- by (Hana and Feldman, 2004), who constructed a

446

Proceedings of KONVENS 2012 (LexSem 2012 workshop), Vienna, September 21, 2012
morphological tagger for Russian language upon 3 Czech Vallex for Russian verb frames
Czech data and tools. In (Benešová and Bojar,
We made a comparison of Czech and Russian
2006) authors compare the similarity between the
frames based on the Czech Lexicon in the follow-
automatically extracted valency frames and the
ing way. We took a Czech verb and said if its
manually annotated frames.
frame fits the frame of a Russian equivalent verb
2 Vallex and Verb Classes as well. At this stage, it was impossible to eval-
uate a big amount of verb frames (totally 2,903
Vallex - a manually created Lexicon of Czech lexical units have a verb class assigned), so we
Verbs - is based on the valency theory in the took the selection of the most frequent verbs dis-
Functional Generative Description (Panevová, tributed among the following verb classes: mo-
1994),(Sgall et al., 1986). It provides an informa- tion, communication, change, exchange and men-
tion on valency frames of the most frequent verbs tal action, as the most representative ones.
(in version Vallex 2.5 over 2.700 lexeme multi- This verb set contains frequent verbs of various
plied by different senses of the verbs). The frame semantic types. Our assumption is that the differ-
consists of a slot that reflects the number of com- ence in valency frames might be related to a verb
plements the verb may govern. A slot includes a class, in other words, verbs from certain classes
functor (a deep semantic role, written after a pe- might have tendency to have different valency in
riod attached to the word) and the surface realiza- Czech and Russian. In our study we focus on
tion of it(mostly morphosyntactic case, written in morphemic forms of noun complements, leav-
brackets). The main deep semantic roles that have ing aside verb complements and sentence com-
frequently been used in our work are: plements of verbs. Within a semantic class for
ACT:Actor, ex. I.ACT love peaches. each Czech verb we state whether or not a Rus-
PAT:Patient, ex. Cats love rats.PAT sian equivalent has the same valency structure.
ADDR:Addressee(a person or an object to For example, (1) shows the verb with the same
whom/to which the action is performed - more in valency frame and the verb in example (2) has
the paper below), ex. He gave him.ADDR a book two discrepancies in it..1
DIFF:Difference measure, ex. Prices have fallen
twice.DIFF (1cz)obhajovat ACT(Nom) PAT(Acc),to de-
Verbs are classified into verb classes accord- fend
ing to their meaning, which we have used in (1ru)zaščiščat’ ACT(Nom) PAT(Acc),to defend
our research as well. Vallex distinguishes 22 The frame is the same in both languages
verb classes, among them are communication,
exchange, motion, perception, transport, psych (2cz)blahopřál mu.ADDR(Dat) k narozeninám
verb, just to mention some. Naturally, words ’congratulated him.ADDR(Dat) to birthday’
that belong to the same semantic field or share (2ru)pozdravljal ego(Acc) s dnem roždenija
some component of meaning will have a similar ’congratulated him.ADDR(Acc) with birth-
valency frame. Vallex entry also provides other day(with+Ins)’
valuable information on aspect, reciprocity, re-
flexivity etc. that we have not used in our work, In the example (2) it is illustrated that in Czech
so it will not appear in our examples. Here is an and Russian different prepositions and different
example of a Czech verb frame that belongs to the cases are used to express the same semantic roles
Mental Action verb class: - Patient and Addressee. Especially diverse in
apelovat Act(Nom) Pat(na+Acc)-(on+Acc). This this case is the surface realization of Patient as a
means that the verb to appeal governs two argu- prepositional phrase across the languages: Czech
ments: an Actor in the Nominative case and a Pa-
1
tient in prepositional phrase on+Accusative. The There are 6 cases in Russian and 7 cases in Czech (7th,
Vocative, is not relevant for our study) and case endings are
case systems in Czech and Russian are very simi- very similar in both languages. Czech and Russian prepo-
lar and prepositions have almost identical surface sitions are almost identical as well. All this makes it rather
form which simplifies the process of comparison. easy to detect differences in valency frames.

447

Proceedings of KONVENS 2012 (LexSem 2012 workshop), Vienna, September 21, 2012
- congratulate to, Russian - congratulate with, En- (4cz) Administrace zkrátila dovolenou o 2
glish - congratulate on/upon. dny
We consider the Russian frame to be similar to ’administration cut off the holiday about 2 days’
the Czech one if it has the same number com- (4ru) Administracija sokratila otpusk na 2 dnja
plements, the same semantic roles and if these ’administration cut off the holiday on 2 days’
semantic roles have the same surface realization.
All the verbs we observed met the first two con- For the functor DIFF, we should mention,
ditions because we tried hard to find the clos- that the form (about+Acc) is typical of Czech
est translation equivalent in Russian. It was al- while Russian language uses the preposition ’o’
ways the surface form that was different in two (about) mainly with mental predicates like En-
languages. If a surface form is represented by a glish(forget about+Loc) or communication verbs
preposition with some case, we judge the default (tell about+Loc) and does not occur with the Ac-
translation of prepositions as the similar realiza- cusative case at all.
tion.
Further on, to simplify the examples, we will 4.2 Class of Motion
leave only the slot of the frame that is differ- We have not found many dissimilarities in Czech
ent in the languages and leave out the slots that and Russian valency frames within the class of
are irrelevant to our comparison. So the ex- Motion verbs. One most evident is that verbs of
ample verb (2cz) will be shortened to blahopřát classes motion with the semantic component of
PAT(k+Dat) ADDR(Dat) and (2ru)pozdravljat’ ’going away from somewhere’ in the case they
PAT(s+Ins) ADDR(Acc) ’ leaving aside the func- have the surface realization of PAT as (před+Ins)-
tor ACT(Nom) which is almost always the same (before+Ins) in Czech are translated into Russian
in Czech and Russian. The examples in this pa- with the respective verb plus the prepositional
per are either taken from corpus, invented or taken phrase (ot+Gen)-(from+Gen), not the expected
straightly from Vallex examples. ru:(pered +Ins): prchat, ujı́ždět, unikat.

4 Differences According to the Verb (5cz)prchat před policii-’run before police’


Classes (5ru)ubegat’ ot policii ’run from police’
While analyzing Czech and Russian frames, it
became evident that the differences between Va- In other words, Russian prefers the preposition
lency frames can be either regular or occasional. ’from’ whereas Czech uses ’before’ in this con-
In this paper we will present the description of the text. Verbs of other semantic classes with the
differences according to the semantic classes of similar component of meaning, ex. class loca-
the verbs. Some groups of verbs that have some tion - share this rule as well(cz:schovat před+Ins -
regular discrepancy in a valency frame may be- ’to hide before’ vs. ru:sprjatat’ ot+Gen - ’to hide
long to different classes, as it will be illustrated from’).
below. The following example illustrates a coinciden-
tal difference in verb frame :
4.1 Class of Change (6cz)trefit PAT(Acc) ’to hit smth ’
Verbs of the class Change often have the com- (6ru)popast’ PAT(v+Acc) ’to hit into+Acc’
plement DIFF, and we observed that it often
has different realization in Czech and Russian, 4.3 Verbs of Exchange
namely the slot cz:DIFF(o+Acc)-(about+Acc) One of the regular and rather evident differ-
generally corresponds to ru:(na+Acc)-(on+Acc) ences between Czech and Russian frames was
in Russian (other variations are possible), see described in (Lopatková and Panevová, 2006).
examples (3) and (4). This is the case of some exchange verbs with the
(3cz)ceny klesly o 20% ’prices fall about 20%’ meaning of removing something from someone,
(3ru)ceny upali na 20% ’prices fall on 20%’ ex. sebrat(take away), krást(steal), brát(take) etc.
The addresse here is a person or an object from

448

Proceedings of KONVENS 2012 (LexSem 2012 workshop), Vienna, September 21, 2012
whom/which something is taken. us believe that difference in valency frames can
depend on the semantic class. Only two words
(7cz)brát ADDR(Dat)-’take +Dat’ of this class with different valency framedo not
bere dı́těti hračku -’takes baby.Dat toy’ belong to the group described, and we consider
(7ru)brat’ ADDR(u+Gen)-’take (u+Gen)’ them to be occasional discrepancies. The number
beret u rebenka igrushku ’He takes of baby.Gen of occasional discrepancies in the verb classes is
toy’ not so big in comparison with ones that have some
(7en)’He takes a toy from a baby’ regular difference.

(8cz)zabı́rat ADDR(Dat)-take(time)+Dat 4.4 Class of Communication


studium mi zabı́rá hodně času Czech and Russian verbs belonging in this class
’study me.Dat takes many time’ have many differences with respect to valency.
(8ru) otnimat’ ADDR(u+Gen) Here we could not observe some of the leading
ucheba otnimaet u menja mnogo vremeni difference present in the previous classes. Differ-
’study takes from me many time’ ences may concern several functors and several
(8en)’Study takes me a lot of time. ’ surface forms. They may be considered coinci-
dental, but we can allocate several groups of verbs
In this cases if the sentence (8cz) was translated with some dissimilarity in valency frames.
into Russian according to the Czech valency pat- 1. The functor Addressee with the surface
tern, they would have the reverse meaning in Rus- form ADDR(na+Acc)-(on+Acc) in Czech is
sian, because the Dative case of the noun in this presented in another way in Russian
context is understood as Benefactor (taken TO
someone), not Addressee (taken FROM some- (10cz)mluvit (na+Acc)-’speak on smb(Acc)’
one). Especially this difference causes big prob- (10ru)obraščat’sja ADDR(k+Dat)-’speak to
lems to learners of foreign languages: they project smb(Dat)’
the known pattern from their native language onto
the phrase in the foreign language and, given that (11cz)zavolat (na+Acc)-’call on smb(Acc)’
the surface form of the preposition is the same, (11ru)pozvat’ (Acc) -’to call smb(Acc)’
they make a mistake.
This scheme does not work for all words 2. Patient with the surface form (na+Loc)-
with this meaning in this class, for example a (on+Loc) in Czech corresponds to another
semantically related word ’odpı́rat’(to deny) in realization in Russian, generally the morphemic
Russian has the same surface form of Addressee form is (o+Loc)-(about+Loc) for such verbs used
in Russian(ADDR(Dat)) as in Czech, yet another for ’asking question’ as (ze)ptát se, tázat se etc.:
non-direct realization of Patient:
(12cz)ptát se PAT(na+Acc) zdravı́ -’ask on
(9cz)odpı́rat ADDR(Dat) PAT(Acc) health’
odpı́ral mu pomoc (12ru)sprosit’ PAT(o+Loc) zdorov’je -’ask about
’denied him help’ health’
(9ru)otkazyvat’ ADDR (Dat) PAT(v+Loc)-
(in+Loc) Other verbs with a frame slot PAT(na+Loc)-
on otkazal emu v pomošči (on+Loc) are also very similar to the above
’denied him in help’ sample:
(9en)’he denied to help him’
(13cz)domlouvat se PAT(na+Loc) - ’to agree on’
On the example of this verb class we can see (13ru)dogovorit’sja PAT(o+Loc) ’to agree about’
that the semantically related group of words has
different surface realizations of a functor (ADDR 3. Addressee in Dative case for the following
in this case) in Czech and Russian. This makes verbs corresponds to Accusative in Russian:

449

Proceedings of KONVENS 2012 (LexSem 2012 workshop), Vienna, September 21, 2012
(19ru)pomnit’ PAT(pro+Acc) ’to remember
(14cz)poblahopřát ADDR(Dat)’congratulate about’
+Dat’ (20cz)myslet PAT(na+Acc) ’to think on’
(14ru)pozdravit’ ADDR(Acc) ’congratulate (20ru)dumat’ PAT(o+Loc) ’to think about’
+Acc’
(21cz)zvykat si PAT(na+Acc) ’get used on’
(15cz)děkovat ADDR(Dat) ’thank +Dat’ (21ru)privykat’ PAT(k+Dat) ’to get used to’
(15ru)blagodarit’ ADDR(Acc) ’to thank + Acc’
The structure of the following verb coincides
4. Similar to the verbs of Exchange class, a lot with that from ex. (14) and (15) though the
some Czech communication verbs with surface functor is PAT, not ADDR:
form (o+Acc)-(about+Acc) will be translated in (22cz)rozumět PAT(Dat) ’understand’
another manner in Russian due to the fact that, (22ru)ponimat’ PAT(Acc) ’understand’
unlike in Czech, the preposition ’o’-’about’ does
not combine with the Accusative: Other coincidental differences:
(23)pohrdat PAT(Ins)
(16cz)hlásit se PAT(o+Acc): (23)prezirat’ PAT(Acc) ’to despice’
hlásı́ se o slovo ’ask about word’ (24cz)mrzet ACT(Acc)
(16ru)prosit’ PAT(Gen) (24ru)sožalet ACT(Nom) ’to be sorry for’
Ona prosit slova The example (24) is the one of a very few verbs
’She ask word.gen’ with different surface realization of ACTor.
(16en)’ask for a word’
4.6 Overall results
Coincidental differences occurring only once
or twice are not going to any scheme: We have compared Czech and Russian valency
frames of verbs from 5 semantic classes, totally
(17cz)doznávat se PAT(k+Dat) ’confess to 1473 lexical entries. 111(7.5%) of them were
smth’ different in Czech and Russian. The compari-
(17ru)priznavat’sja PAT(v+Loc) ’to confess in son was rather straightforward because of the re-
smth’ latedness of the languages. If some more dis-
(18cz)konzultovat PAT(Acc) ’to consult +Acc’ tant languages were compared, more complicated
(18ru)konsultirovat’ PAT(po+Loc)-(about+Loc) method of evaluation should be chosen. From the
examples above we can make the following ob-
servations:
4.5 Class of Mental Action
• most dissimilarities occur in prepositional
Verbs of this class often have differences in va- phrases.
lency frames, but they are rather coincidental and
we have found only one regular difference - when • the regular discrepancies are more frequent
Czech PAT(na+Acc)-(on+Acc) corresponds to than the coincidental ones.
Russian PAT (o+Loc)-(about+Loc), (pro+Acc)-
• Within a verb class we can find some typ-
(about+Acc) or (k+Dat)-(to+Dat). The surface
ical valency patterns of Czech verbs which
form (na+Acc) in Czech is also different for
correspond regularly to the different Russian
verbs belonging in the class Communication,
pattern.
but for that class it was regularly translated as
(o+Loc)-(about+Loc) whereas for the class of The table 1 presents the distribution of verbs
Mental Action no common translation equivalent with different frames according to the verb
exists. classes.

(19cz)pamatovat PAT(na+Acc) ’to remem- From this table we can see that verbs of phys-
ber on’ ical activity(change, motion, exchange) have in

450

Proceedings of KONVENS 2012 (LexSem 2012 workshop), Vienna, September 21, 2012
Verb class same frame different frame # of verbs
Change 309(95%) 14(5%) 323
Exchange 166(92%) 13(8%) 179
Motion 305(99%) 3(1%) 308
Communication 312(88%) 42(12%) 354
Mental Action 270(87%) 39(13%) 309
Total 1362(92%) 111(8%) 1473

Table 1: Differences according to the verb classes

some sense less complicated valency structures V. Benešová and O. Bojar. 2006. Czech Verbs of
than verbs of mental activity(communication, Communication and the Extraction of their Frames.
mental action) and that in most cases, their va- Proceedings of the 9th International Conference,
lency structure corresponds to that of Russian TSD 2006, pages 29-36.
J. Hana and A. Feldman. 2004. Portable Lan-
verbs.
guage Technology: Russian via Czech. Proceed-
ings of the Midwest Computational Linguistics Col-
5 Conclusion loquium, June 25-26, 2004, Bloomington, Indiana.
D. Hlaváčková and A. Horák. 2006. VerbaLex -
In this paper we have described the dissimilari- New Comprehensive Lexicon of Verb Valencies for
ties in Czech and Russian Valency based on the Czech. Computer Treatment of Slavic and East Eu-
material of the Czech lexicon. Our main hypoth- ropean Languages. Bratislava, Slovakia: Slovenský
esis was that the differences in valency structure národný korpus, p. 107-115.
might be explained by the semantics of verbs , so N. Klyueva and V. Kuboň. 2010. Verbal Valency in
we have exploited the classification of the seman- the MT Between Related Languages. Proceedings
tic classes provided by Vallex. In almost in each of Verb 2010, Interdisciplinary Workshop on Verbs,
The Identification and Representation of Verb Fea-
verb class we have found some regular dissimilar-
tures Universita di Pisa - Dipartimento di Linguis-
ity that is typical of this class. Still, there are some tica, Pisa, Italy, pp. 160-164.
cases when verbs from other classes are subjected M. Lopatková and J. Panevová. 2006. Recent devel-
to this regularity as well, so other aspects (such opments in the theory of valency in the light of the
as surface realization) should also be taken into Prague Dependency Treebank. In Mária Šimková,
con- sideration. A practical result of our paper is editor, Insight into Slovak and Czech Corpus Lin-
that we have made a draft version of a small bilin- guistic, pages 83-92. Veda Bratislava, Slovakia.
gual Czech-Russian lexicon with different frames M. Lopatková, Z. Žabokrtský and V. Benešová. 2006.
Valency Lexicon of Czech Verbs VALLEX 2.0. Tech-
in the Vallex format.
nical Report 34, UFAL MFF UK.
I. Mel’čuk and A. Zholkovsky. 1984. Explana-
Acknowledgments tory Combinatorial Dictionary of Modern Russian.
Semantico-syntactic Studies of Russian Vocabulary.
The research is supported by the grants Vienna: Wiener Slawistischer Almanach.
P406/2010/0875 GAČR and GAUK 639012. K. Oliva. 1989. A parser for Czech implemented in
systems Q. Praha:MFF UK
J. Panevová. 1994. Valency Frames and the Mean-
References ing of the Sentence. The Prague School of Struc-
tural and Functional Linguistics (ed. Ph. L. Luels-
Ch. Fellbaum. 1998. WordNet: An Electronic Lexical
dorff), Amsterdam-Philadelphia, John Benjamins,
Database. Cambridge, MA: MIT Press
pp. 223-243.
C. Baker, C. Fillmore and J. Lowe. 1998. The Berke- P. Sgall, E. Hajičová and J. Panevová. 1986. The
ley FrameNet Project. Proceedings of the 17th in- Meaning of the Sentence and Its Semantic and
ternational conference on Computational linguis- Pragmatic Aspects. Academia/Reidel Publish-
tics - Volume 1 (COLING ’98), Vol. 1. Associa- ing Company, Prague,Czech Republic/Dordrecht,
tion for Computational Linguistics, Stroudsburg, Netherlands
PA, USA, 86-90.

451

Proceedings of KONVENS 2012 (LexSem 2012 workshop), Vienna, September 21, 2012

You might also like