Monitorings of Russian disinformation topics
Data collection and preprocessing
We collect news pieces from a sample of the following websites: (1) Ukrainian clickbait sites, (2) Russian sites aimed at Ukraine, (3) mainstream Russian news sites, and (4) mainstream Ukrainian news sites. We monitor only manipulative news from the first two group of sites, and all pieces from mainstream Russian and Ukrainian sites. Topics and news on Ukrainian mainstream sites are given for comparison, the majority of them are not manipulative.
Data is loaded from sites’ RSS-feeds. Each news item is loaded with its time of publishing, title, fulltext, and a link. We selected only items written in Russian since more disinformation is written in this language. Each items’ text was prepared for analysis — tokenized (split words and punctuation), lemmatized (words converted to normal form, infinitives).
We analyze only news about politics, economy, society, and external affairs of Ukraine. News about weather, celebrities, sports, car accidents, and similar are not included in monitoring and treated as irrelevant. The distinction between relevant and irrelevant news items is made by separate classification algorithm
Detection of manipulative news
Each news item was evaluated by an improved version of the manipulative news classifier: the algorithm was further trained on new data to improve accuracy (previous version of the classifier). It estimates the likelihood that the news contains emotional manipulation and/or false argumentation. According to news from clickbait sites, sites from the occupied territories and publications with the anti-Ukrainian position, the classifier finds 62% of the materials containing at least one type of manipulation, while incorrectly marks as manipulative 6% of the materials. That is, the algorithm rather misses the manipulation than falsely marks it.
In groups of Ukrainian manipulative sites there are only sites where algorithm has classified more than 10% of all relevant items as manipulative. From this Ukrainian clickbait sites and Russian sites targeting Ukraine we analyze only manipulative news pieces. Topics on mainstream sites are displayed for all news items, they are not checked by the classifier of manipulative news.
Sites in monitoring
- 68 russian sites, targeting Ukraine (only manipulative items): 3652.ru, 3654.ru, 8692.ru, anna-news.info, antifashist.com, antimaydan.info, c-inform.info, comitet.su, crisis.in.ua, delovoydonbass.ru, dnr-lnr.info, dnr-pravda.ru, dnr24.com, dnr24.su, donbasstoday.ru, doneck-news.com, dontimes.ru, dosie.su, e-gorlovka.com.ua, e-news.su, evening-crimea.com, free-news.su, fresh.org.ua, fromdonetsk.net, front-novorossii.ru, gorlovka.today, jankoy.org.ua, kafanews.com, komtv.org, kv-journal.su, lgt.su, luga1news.ru, lugansk1.info, meridian.in.ua, metayogg.com, miaistok.su, mir-lug.info, mnyug.com, mozaika.dn.ua, nahnews.org, naspravdi.info, newc.info, news-front.info, newsland.com, nk.org.ua, novorosinform.org, novorossiy.info, novosti.icu, on-line.lg.ua, patriot-donetsk.ru, pohnews.org, politnavigator.net, pravdanews.info, ruinformer.com, rusdnepr.ru, rusnext.ru, russian-vesna.ru, rusvesna.su, sevastopol.su, sevnews.info, sobytiya.info, svodki24.ru, time-news.net, ukraina.ru, voenkor.info, voskhodinfo.su, vsednr.ru, xvesti.ru
- Ukrainian online publications where over 10% of all news about Ukraine have been spotted as being manipulative (manipulative materials only): 112.ua, agrimpasa.com, aif.ua, akcenty.com.ua, antikor.com.ua, baza-pravda.in.ua, bbcccnn.com.ua, begemot.media, bessarabiainform.com, censoru.net, dialog.ua, expres.life, finoboz.net, fraza.ua, from-ua.com, glavcom.life, glavk.info, glavred.info, glavred.life, golos.ua, hpib.life, hyser.com.ua, inforesist.org, inform-ua.info, informator.news, ivasi.news, jizn.info, khersonline.net, kompromat1.info, kompromat1.news, kordon.org.ua, korr.com.ua, kyiv.press, lifedon.com.ua, mignews.com.ua, newnews.in.ua, news247.com.ua, newsmir.info, onpress.info, podrobnosti.ua, politeka.net, politica.com.ua, pravda.rv.ua, prioritet.org, proua.com.ua, replyua.net, rupor.info, sharij.net, skelet-info.org, spektrnews.in.ua, spichka.news, spzh.news, strana.ua, t.ks.ua, timer-odessa.net, ua24ua.net, ukr.life, ukrainianwall.com, ukranews.com, ukranews.life, ukrrudprom.ua, vesti-ukr.com, vesti.ua, voi.com.ua, vremya.com.ua, vybor.ua, vz.ua, xn--j1aidcn.org, zik.ua, znaj.ua
- 17 Ukrainian online publications (all materials): 24tv.ua, bykvu.com, censor.net.ua, fakty.com.ua, fakty.ua, gordonua.com, interfax.com.ua, lb.ua, liga.net, nv.ua, pravda.com.ua, rbc.ua, segodnya.ua, tsn.ua, ukrinform.ru, unian.net, zn.ua
- 16 major Russian publications (all materials): aif.ru, dni.ru, kommersant.ru, kp.ru, lenta.ru, lentainform.com, life.ru, newsru.com, pravda.ru, regnum.ru, riafan.ru, russian.rt.com, slovodel.com, svpressa.ru, tass.ru, vz.ru
Selected manipulative news, 3,000 pieces per week on average, was broken down into topics of the week by automatic topic modeling (NMF). We edited the resulting news clusters manually: similar topics combined, irrelevant or overly general clusters discarded, topics that did not follow Russian disinformation deleted. Topics were identified automatically, so a small part of the news may be unrelated to the topic.
Next, we assigned each weekly topic a general topic and trained a classifier to predict the topic of news items (if any). Topics are detected automatically, so there is a small number of incorrectly classified items in each topic. For the dashboard home page general topics were united in meta-topics, which represent key narratives of disinformation.
Each subtopic is illustrated by a sample of headlines. For mainstream sites it is a random sample of all news on topic. A sample of headlines on Ukrainian clickbait sites and Russian sites targeting Ukraine is composed of materials classified as manipulative with high confidence.