A CORPUS-BASED APPROACH TO THE UKRAINIAN POLITICAL DISCOURSE STUDY

INTRODUCTION Among various studies discussing different aspects of political discourse we had to mention some of them which use corpus-based approach. Mostly they are focused on American top-politician speeches (J. McCain, B. Obama, D. Trump, H. Clinton). A number of papers by Jacques Savoy (2010, 2016) describe a US political corpora and present the study of the style and rhetoric of John McCain & Barack Obama 1 , Hillary Clinton & Donald Trump 2 during presidential elections. To detect and analyze differences between Trump and Clinton, the author examined both the oral communication form and the written form of the last. Asomwan Adagbonyin, Isaiah Aluya & Samuel Edem (2016) described a corpus-based approach to identify the linguistic devices used in Nigerian and American presidential speeches. Authors compared both presidents’ usage of linguistic devices in terms of frequency at the levels of keyword, part of speech and semantic domain as well as the communicative purpose which the linguistic devices serve 3 . Valentin Kassarnig (2016) presented an approach of training a system on speech transcripts in order to generate new speeches for a desired political party. A simple statistical language model based on 6-grams was used 4 . Interesting results of N-gram-based method of a presidential debate transcript analysis were presented by Daniel Walterʼs (2016) 5 .

Ukrainіan political discourse studies usually dill with the methods of traditional linguistics and manual approaches to establish particular properties of political contexts, structure and lexicon, syntax and rhetoric, speech acts and interaction. We can mention only a few researches in which corpus-based approached was applied, such as: Margaryta Dorofeyeva (2005) studied category of subject in Federal chancellors' political speech 6 , Dariia Kharytonova (2019) investigated cognitive and pragmatic dimensions of Ukrainian political discourse 7 . But governmental or parliamentary speeches or presidential programs were not studied jet, also there are no corpus-based studies of them, what emphasizes the urgency of this paper.
There are a lot of classifications of discourse type. For example, according to ideological stances political discourse subdivides into LGBT discourse, religious, green, nationalistic, feminist. All of them represent a discourse of a particular segment/group of society 8 . It is one of the Teun van Dijk's ideas interpretation ways: "Since people and their practices may be categorized in many ways, most groups and their members will occasionally (also) 'act politicallyʼ, and we may propose that 'acting politically', and hence also political discourse" 9 .
Political discourse is subdivided according to different denominations: spoken or written; prepared or spontaneous speech; spoken monologue or dialogue / polylogue. Representatives of each show different genres: biography of a politician, slogan, political program, campaign text, ritual speech etc. 10 Parliament speech and ritual speech are both monologues but they are delivered to a different audience. In the first case it is a commonly homogeneous type of recipientsgovernment officials and also the media, in the second one speech is aimed at a large heterogeneous audience. These different types are reflected in corpora: governmental corpora (Labbé & Monière), electoral corpus (Jacques Savoy). None of open corpora 6 Дорофєєва М.С. Категорія суб'єкта в політичній промові (на матеріалі виступів федеральних канцлерів ФРН повоєнного періоду) : автореф. дис. … канд. філол. н. : 10 developed for the Ukrainian language contains the governmental or parliament speeches.
For this study presidential programs as a type of written discourse (section 1) and parliament speeches transcripts (section 2) as an oral text genre were chosen. The main goal of this paper is to distinguish linguistic features of each type of mentioned discourse which the corpus-based approach can reveal.

Presidential programs
The 2019 presidential election in Ukraine was characterized by two figures: Volodymyr Zelenskyi and Petro Poroshenko who won the presidential election in 2014 but lost 2019 election to his opponent. So, that's why their presidential programs were chosen for analysis in this research paper. The programs were downloaded from official website of The Central Election Commission of Ukraine 11 . Further work was done using NLP tools for quantitative analysis of text. For this we removed quotes and dashes from the texts of both programs. Uppercase was ignored.

Keywords
At the first stage of the study we focus on word occurrence frequencies. Software TextusPro 1.0 12 allows to receive a list of keywords without stopwords which are usually the most frequent (conjunctions, prepositions, particles etc.). As a result, we can see tokens sorted by absolute frequency (AF) and relative frequency according to free stop-words text (RF stop ). As the Ukrainian language is an inflectional language we had to lead similar tokens to lemmas. Thus with manual search we sorted lemmas taking into account phonetic alternations in suffixes. In such way we received semantic hard core of each presidential program. Comparison of key lemmas in both presidential programs allows to find out common concepts ( fig. 1), which represent meaningful topics. Figure 1 shows relative frequency according to free stop-words text. As we can see, the most significant concept in both programs is Україн-(Україна / українці / український) (Ukraine / Ukrainian): RF stop= 3,23 in Zelenskyi's program and 3,14 in Poroshenko's program.
We chose 10 lemmas with AF≥ 4 for each presidential program and compared two lists. Tables 1, 2 present distinctions in frequency of key lemmas in both presidential programs.  The contents of the tables illustrate some interesting points. At first, it is obviously, the most frequent lemmas reveal the most important concepts in presidential programs. Semantic analysis of the main concepts of Zelenskyi's program shows that they are connected to terms of internal policy such as priority areas, as problems. The text of Poroshenko's program is focused on both external policy (joining NATO and the EU) and internal policy. The second point is extremely different frequency of key lemmas. It means that the most important concepts in Poroshenko's program have low frequency or zero rate in Zelenskyi's program and vice versa. The key words of Porohenko's political slogan -2019 армія -церква -мова (armychurchlanguage) are absent among frequent key words in his presidential program, all of them have AF 1. There are no such concepts in his program as: майбутнє / майбутній, вибори, президент, справедливість / справедливий (future, elections, president, justice). But Zelenskyi doesn't speak about вільний, змога / змогти, підвищення / підвищений (free, able, raised).

Pronouns
Quantitative parameters of pronouns in political speech allow to reveal hidden intentions. For example, according to a corpus-based research, "Obama's use of the personal pronouns "she", "he" and "they" suggests references beyond himself and indicate the level of difference in the distance from the electorates observes" 13 .
Quantitative measurements of pronouns in presidential programs which introduce written speech genre reveals tactics of the self-presentation strategy of presidential candidates. We used the classification proposed by Marina Dorofeeva: tactics of singularity, tactics of plurality, tactics of indefiniteness, tactics of elimination 14 . Fig. 2 and fig. 3 show percentage of each pronoun among all pronoun tokens used in the text. On this stage of our study the software AntConc 3.4.4w 15 was used.

Fig. 2, 3. Percentage of personal pronoun tokens in presidential programs
In general, Poroshenko used pronoun tokens a bit more than Zelenskyi: 5% and 4% respectively. But their number is less: 9 pronoun tokens in Poroshenko's program against 13 pronoun forms in Zelenskyi's text. Figure 4 shows relative frequency (RF) of pronoun lemmas (e.g., the lemma ми (we) includes forms нас, нам (us) etc.). If we compare lemmas in both programs we'll see that Poroshenko used only 5 pronouns, Zelenskyi used 7 pronouns. In both presidential programs the most frequent pronoun is ми (we) and it indicates tactics of plurality. This tactic is realized in metonymy "we = I + my team", "we = I + my sympathizer / my electorate", "we = I + you / addressee / a people". In the first case, the metaphor states that the responsibility for the words lies on the speaker's team. In the second and third cases, the level of responsibility of the speaker for his own statements is significantly lower, as the responsibility for decision-making is transferred to the listener.
An interesting fact related to the frequencies of the pronoun we in American political discourse was pointed by Jacques Savoy: "The written form tends to use the we more frequently than the I. The pronoun we owns the useful advantage of being ambiguous (who is really behind the we? Myself and the future government? Me and the people? Me and the workers? Me and the (future) Congress? etc.)" 16 . A similar observation was made by Concepción Hernández-Guerra in her research of the Barack Obamaʼs speech: "Moving on to something else, the wide use of the pronoun "we" referring to different addressees may be done purposely to involve everybody indirectly in the solution of the problems or to reflect that everybody is responsible of the problems that threaten the world, not just America" 17 .
It is interesting to trace in what collocations pronoun ми (we) were used by Poroshenko and Zelenskyi.

N-grams
The N-grams option (with min/max cluster size = 3) of AntConc 3.4.4w allows to find 3-grams which contain pronoun ми (we).
There are 16 3-grams in Zelenskyi's program, which are presented in the table 3. All of them have AF=1. As we can see, 69% of them are constructions with verb in the Future tense, only 6% contains the verb in the Past tense and 13% have modal verbs. So as V. Zelensky ran to presidential election for the first time and he had no experience in politics he appealed to the future. One of 3-gram explains what does ми (we) mean: ми це народ (we are a people).  There are 26 3-grams, all of them have AF=1. In Poroshenko's case we found out more variety tense forms of verbs in 3-grams: 46% contains verbs in the Future Tense, 12% is in the Present tense, 31% is in the Past tense and 4% have modal verbs. It is obvious, candidate appeals to positive experience and achievements but also he builds a chain of events 'pastpresentfuture'. Poroshenkoʼs program reveals metonymy "we=I+…" as ми -країна (we are country), which is wider and more indefinite because includes not only people but state structures, territory, resources etc.
At the last stage of presidential program study we wanted to know whether there were phrases that popped up frequently for each candidate. For this purpose, we analyzed N-grams for n=2...10 whether there were repeated phrases. This analysis appeared an interesting point about syntax.
The maximum repeated cluster size in Zelenskyi's program is 4. As we can see from the list of 4-grams (table 5), Zelenskyi used opposite constructions (1,2% from total number of 4-Gram types 1590). Obviously such constructions were used to emphasize a difference between reality and proposing perspective. Repetition of the opposite constructions has a suggestive effect, like any repetition.
The maximum repeated cluster size in Poroshenko's program is also 4. In Poroshenkoʼs program we have find only three 4-grams (0,2% from total number of 4-gram types 1541) which indicated contrastive constructions (table 6). Table 6 List of 4-grams in Porosheno's program а й всю європейську але справжнє лідерство можливе але щоб стати справжнім So, using NLP tools for quantitative method of text analyses allows to appear some interesting linguistic features about tactics and strategies of presidential programs which are not evident with manual research.

Ukrainian Parliamentary speeches Speech in Ukrainian Parliament is a kind of formal communication.
Often it combines prepared and spontaneous spoken monologue. The first addressee is deputies and government official and then the media which interprets parliament events for wide audience. All these facts determine the politician's strategies and tactics and, of course, their linguistic features.

Corpus
For this stage of our study we created a corpus of transcripts of parliamentary speeshes in 2004-2021, i.e. the parliament from IV to IX convocation. Transcripts had been downloaded from official site of Ukrainian parliament Verkhovna Rada 18 . We employed a commercial corpus management and corpus query software SkethEngine 19 . Figure 5 reports our corpus text size.

Fig. 5. Corpus statistics
The Keywords option allows to find as the most frequent tokens in compare with reference corpuse as rare or unusual words in our corpus which is focus corpus. Ukrainian Web 2014 (ukTenTen14) amounting to 1 388 494 043 tokens was chosen as the reference corpus. In order to perform this task advanced settings were used. Non-words (tokens which do not start with a letter of the alphabet) were excluded.

Slang
N. Kondratenko says that fashion in language violates normative requirements, and politicians are the subjects of language fashion creation in political discourse 20 . During Trump's term a lot of media (e.g. The New York Times, Salon, Fortune, The New Yorker, The Washington Post) discussed his "rhetorical strategy to gain popularity, in accordance to the trend of anti-intellectualism" 21 .
We distinguish political slang and common slang usage of which are understandable to everybody despite out of social stratification. In the first case we deal with professional slang (or jargon) which is used in informal communication of politicians. But some units of political slang can be spread via media to society (e.g. ширкаbroad coalition, піаніст, кнопкодавa deputy who votes in the parliament with someone's card, противсіхa voter who chooses the line "against all" in the ballot). Ukrainian politicians use both slangs.
Earlier we were studying slang in informal communication of Ukrainian deputies, in particular during talk-show 22  the unprepared oral speech of politicians, its proneness to conflict, expressiveness, evaluativity, often substandart language elements. In such case the addressee was political opponents, the direct audience in the studio of talk-show and the mass addressee.
As L. Morawskinotes that political discourse is not aimed at dialogue 23 . So, it is always monologue with two main strategies: self-presentation and discrediting the opponent. From these points we analyzed slang in deputies' speech in the parliament. First of all, the self-presentation strategy is realizing in tactics of demonstration of power and authority. And argot usage helps in it. This tradition of criminal argot usage has been inherited since Soviet Union, since 1917 when the revolutionaries used the vocabulary of the lowest strata of society to show prestige of proletarian language against the language of the clerisy. In the 70's of the ХХth century there was an amalgamation of top managers of the USSR and the criminal world. This co-operation has had new perspectives after the collapse of the USSR and in the conditions of an independent Ukraine. Also, there were common interests of members of political and criminal circles. M. Nadel-Chervinska & A. Chervinska use R. Barthes' term "sadic language" for marking a communicative form of the "Soviet zone" which arose as a result of the interaction of prison argot with administrative & political jargon 24 . If in the Soviet times the "sadic language" existed mainly in the colloquial informal sphere (in such communicative situations as 'bosssubordinate'), then after the collapse of the USSR it came into widespread usage on the background of general criminalization of society and colloquialization of the Ukrainian language. The socio-psychological habit of politicians to use argot vocabulary of the power vertical manifests itself to this day. Usage of argot in formal communication may be unconsciously.
As we can see, in all examples slang (argot) was used for describing a political opponent, his/her ideology, his/her activity explicitly or implicitly. In such way archetypal binary opposition 'mystranger' is realized.

CONCLUSIONS
Corpus-based approach allows the quantitative analysis which deals with occurrence frequency, quantitative parameters of pronouns and N-grams of Zelenskyiʼs and Poroshenko's presidential programs. The result shows that the most frequent lemmas reveal the most important concepts in presidential programs, 10 lemmas are common for both programs and the most significant concept is Україн -(Україна / українці / український) (Ukraine / Ukrainian). Semantic analysis shows that the main concepts in Zelenskyi's program reflect the terms of internal policy. The text of Poroshenko's program is focused on both external policy (joining NATO and the EU) and internal policy. Quantitative measurements of pronouns are revealed by the most frequent pronoun in both presidential programs ми (we). It indicates such tactics of self-presentation strategy as plurality which points on collective responsibility. Poroshenkoʼs program provides metonymy ми -країна (we are country), which is wider and more indefinite because includes not only people but state structures, territory, resources etc. N-grams analysis show an interesting point about syntax.
A corpus of parliament speeches transcripts was created with SkethEngine software. It provides finding out such linguistic features as surzhykisms, Russian words which are in Russian graphics, occasionalisms, barbarisms, gender forms, colloquialisms ans slang which indicate cultural and educational level of politicians, as well as their communicative tactics.
So, using NLP tools for quantitative method of text analyses allows to underline some interesting linguistic features about tactics and strategies of presidential programs which are not evident with manual research.

SUMMARY
The research is aimed at distinguishing linguistic features of presidential programs and parliament speeches transcripts which the corpus-based approach can reveal. The data was taken from official websites. For quantitative analysis of presidential programs TextusPro and AntConc were used. Discussion based on word occurrence frequencies, quantitative parameters of pronouns and N-grams. The result shows the presidential programs tactics and strategies which are not evident with manual research. A corpus of parliamentary speeches transcripts was created with SkethEngine software. It provides finding out some linguistic features including slang which indicates cultural and educational level of politicians, as well as their communicative tactics.