2022 CIDA - Presentation

Author

Naim Çınar, Züleyha Özbaş Anbarlı

Exploring Political Polarization in Femicide Discourse through Social Network Analysis: Relationships in Polarizations in the Pınar Gültekin Case

Web page where you can follow the presentation content:

1 Introduction

After Pınar Gültekin went missing on July 16, 2020, her family, friends, and women’s movement organizations began efforts to find her, especially by using social media platforms. During this period, Twitter was used very actively.

Key dates in the process:

On July 21, 2020, it was revealed that she had been killed by a man named Cemal Metin Avcı.
In July 2020, debates around the Istanbul Convention intensified (there was a strong reaction on social media to Abdurrahman Dilipak’s July 27 column titled “AKP’nin papatyaları”).
The first hearing took place on November 9, 2020.
On March 20, 2021, Türkiye withdrew from the Istanbul Convention as a result of the opposing campaigns.
At the 13th hearing on June 20, 2022, a sentence of aggravated life imprisonment was first issued, then reduced to 23 years with “unjust provocation” mitigation.

1.1 Research Questions

Between December 27, 2020 and January 3, 2021, what kind of network do users who tweet about the Pınar Gültekin murder form?

- Who are the main actors in the network?
- Who dominates the network? Which actors form clusters?
- Who are the actors around the main actors?
- Which political pole do actors belong to?

Between December 27, 2020 and January 3, 2021, what is most frequently discussed in tweets about the Pınar Gültekin murder?

1.2 Data Collection

Using Twitter API v2 Academic Research access level, the data was collected in R (via the academictwitteR package).

Academic research level: monthly upper limit of 10 million requests (Full archive search, Full archive tweet count). In other levels (Essential, Elevated) only the last week of data can be accessed.

1.2.1 Distribution of tweet counts (time series plot)

$title
[1] "Saatte atılan tweet sayısı"

attr(,"class")
[1] "labels"

Distribution by tweet type:

1.2.2 Accessing all tweets

Code

get_all_tweets(
  query = c("Pınar Gültekin", "#PınarGültekin", "#PınarGültekinİçinAdalet"),
  start_tweets = "2020-07-21T10:00:00Z",
  end_tweets = "2022-05-05T10:00:00Z",
  lang = "tr",
  file = "pinargultekin",
  data_path = "pg-tweet-data",
  n = 632200,
)

Total number of tweets in the dataset:

Code

nrow(joined)

[1] 478278

Dataset columns:

Code

colnames(joined.clean)

 [1] "id"                     "created_at"             "retweet_count"         
 [4] "like_count"             "quote_count"            "url"                   
 [7] "hashtag"                "mention_username"       "mention_id"            
[10] "sourcetweet_type"       "sourcetweet_id"         "sourcetweet_text"      
[13] "sourcetweet_author_id"  "text"                   "possibly_sensitive"    
[16] "author_id"              "user_username"          "user_name"             
[19] "user_description"       "user_profile_image_url" "user_url"              
[22] "user_verified"          "user_location"          "user_followers_count"  
[25] "user_following_count"   "user_tweet_count"       "user_list_count"       
[28] "source"                 "in_reply_to_user_id"

Quoted, Retweet, Unique (including reply-to) tweet counts:

Code

joined %>%
  count(sourcetweet_type, name = "frequency")

  sourcetweet_type frequency
1           quoted      7620
2        retweeted    412642
3             <NA>     58016

Interactive data table created for the large dataset:

Twitter Data - Reactable Data Table

1.3 Social Network Analysis

1.3.1 Definition

An approach based on examining interactions among social actors.
It is grounded in graph theory, a branch of mathematics.
Graph theory examines graphs, a mathematical representation of objects and the relationships among them. Graphs consist of nodes (node, unit, vertex) and edges (edge, line, tie, link).
In its simplest form, a graph is an edge list where nodes appear in two columns.

1.3.2 Centrality measures

Actors (nodes) in a social network can take different structural positions and can affect the flow of information in different ways and at different levels.
Centrality measures make these positions visible.

Degree-based centrality measures	Shortest-path-based centrality measures
Degree centrality	Betweenness centrality
Eigenvector centrality	Closeness centrality

Degree centrality (degree, in-degree, out-degree) depends on whether the network is directed or undirected. Twitter can be analyzed as both directed and undirected; Facebook is typically undirected.
Degree centrality counts all neighbors equally; what matters is the number of neighbors.
In eigenvector centrality, a node becomes more important if it is connected to important (highly connected) nodes.
Betweenness centrality identifies which nodes are important in the flow of the network using shortest paths; it counts how many shortest paths pass through each node.
Closeness centrality also uses shortest paths and computes a node’s average distance to all other nodes; the smaller the distance, the more central the node.

1.3.3 Preparing the data for analysis

After a cleaning step, the total number of tweets in our dataset is: 443811

Graphs, respectively: retweet, quoted, replyto, mentions, whole

This graph was created by an old(er) igraph version.
ℹ Call `igraph::upgrade_graph()` on it to use with the current igraph version.
For now we convert it on the fly...

IGRAPH da1878f DN-- 182668 412615 -- 
+ attr: name (v/c), device (e/c), relationship_type (e/c),
| created_at_round (e/n)

This graph was created by an old(er) igraph version.
ℹ Call `igraph::upgrade_graph()` on it to use with the current igraph version.
For now we convert it on the fly...

IGRAPH 762b833 DN-- 7301 7615 -- 
+ attr: name (v/c), device (e/c), relationship_type (e/c),
| created_at_round (e/n)

This graph was created by an old(er) igraph version.
ℹ Call `igraph::upgrade_graph()` on it to use with the current igraph version.
For now we convert it on the fly...

IGRAPH 476230f DN-- 6416 6962 -- 
+ attr: name (v/c), device (e/c), relationship_type (e/c),
| created_at_round (e/n)

This graph was created by an old(er) igraph version.
ℹ Call `igraph::upgrade_graph()` on it to use with the current igraph version.
For now we convert it on the fly...

IGRAPH bd55e4e DN-- 14226 16619 -- 
+ attr: name (v/c), device (e/c), relationship_type (e/c),
| created_at_round (e/n)

This graph was created by an old(er) igraph version.
ℹ Call `igraph::upgrade_graph()` on it to use with the current igraph version.
For now we convert it on the fly...

IGRAPH e607555 DN-- 194261 443811 -- 
+ attr: name (v/c), device (e/c), relationship_type (e/c),
| created_at_round (e/n)

1.3.4 Number of nodes and edges in the network created for the selected time window

Retweet graph:

This graph was created by an old(er) igraph version.
ℹ Call `igraph::upgrade_graph()` on it to use with the current igraph version.
For now we convert it on the fly...

IGRAPH 0b5c565 DN-- 15719 35851 -- 
+ attr: name (v/c), device (e/c), relationship_type (e/c),
| created_at_round (e/n)

Quoted graph:

This graph was created by an old(er) igraph version.
ℹ Call `igraph::upgrade_graph()` on it to use with the current igraph version.
For now we convert it on the fly...

IGRAPH eafd0da DN-- 363 296 -- 
+ attr: name (v/c), device (e/c), relationship_type (e/c),
| created_at_round (e/n)

Reply-to graph:

This graph was created by an old(er) igraph version.
ℹ Call `igraph::upgrade_graph()` on it to use with the current igraph version.
For now we convert it on the fly...

IGRAPH 8c39ba7 DN-- 705 758 -- 
+ attr: name (v/c), device (e/c), relationship_type (e/c),
| created_at_round (e/n)

Mentions graph:

This graph was created by an old(er) igraph version.
ℹ Call `igraph::upgrade_graph()` on it to use with the current igraph version.
For now we convert it on the fly...

IGRAPH 93b6552 DN-- 1044 1316 -- 
+ attr: name (v/c), device (e/c), relationship_type (e/c),
| created_at_round (e/n)

Whole graph:

This graph was created by an old(er) igraph version.
ℹ Call `igraph::upgrade_graph()` on it to use with the current igraph version.
For now we convert it on the fly...

IGRAPH 97730e8 DN-- 16601 38221 -- 
+ attr: name (v/c), device (e/c), relationship_type (e/c),
| created_at_round (e/n)

1.3.5 Centrality measures for the whole network (`whole_graph`)

Out-degree (whole):

  denizschmosby     aktepeyekta         gmrrty3   fahri10698828       umitcan25 
            107              95              85              75              62 
     battalgaz3        ykilicer arslansabanrt11  hudutsuzmenzil        kaangucl 
             59              59              57              57              51

In-degree (whole):

      yenisafak    themarginale       debuffer2       zekibahce      medyaadami 
           3007            1612            1567            1127             936 
ajanshaberresmi    asliaydincer        hurriyet   anadoluajansi     malikejder_ 
            903             879             855             853             801

Eigenvector centrality (whole):

	in-degree	Eigenvector	Outdegree
yenisafak	3007.0	1.0	3.0
debuffer2	1567.0	0.5294414939568258	4.0
themarginale	1612.0	0.484534292740911	0.0
malikejder_	801.0	0.454546744192743	23.0
ajanshaberresmi	903.0	0.4239984681274763	11.0
zekibahce	1127.0	0.34175188184066685	4.0
medyaadami	936.0	0.29958486738959045	5.0
asliaydincer	879.0	0.26428676830335074	0.0
anadoluajansi	853.0	0.2591656138502106	0.0
hurriyet	855.0	0.25872961070801237	0.0
enveryan	763.0	0.24934838027922762	2.0
fatmanuraltun	622.0	0.20846019997440765	0.0
neslihan3029	619.0	0.2027788949827502	6.0
slmhktn	600.0	0.2014278125610776	4.0
fatihtezcan	655.0	0.19705698932603072	2.0
yazparov	648.0	0.19666719949604478	2.0
avicenna_razi	603.0	0.18523846428714535	0.0
umutmurare	564.0	0.1759469537165706	6.0
eha_medya	383.0	0.16208797670774444	7.0
ferayicinadale1	429.0	0.14402642998342383	3.0
cnnturk	417.0	0.13016569021344046	0.0
akantalyali	396.0	0.12702158934500543	5.0
nuhalbayrak	421.0	0.126683476034561	2.0
cakiciefe1453	418.0	0.12614186875642203	1.0
blrcano0o_	391.0	0.12094855432093157	0.0
manidar_hayat	361.0	0.11274778908590867	0.0
yilmazgul35351	353.0	0.10700030294467901	1.0
tasdemir_cemile	351.0	0.10583156270356196	0.0
thelaikyobaz	348.0	0.10460126341451588	1.0
herkesicinchp	188.0	0.10329376786893778	0.0
drhuriyet	327.0	0.09980553210814692	22.0
mediamuhtari	322.0	0.0968945716676449	3.0
bayanteror	297.0	0.08921523645965673	1.0

Closeness (whole):

aahmetterdogann ibrahim61966688      dilek__rte       mondstern         ikiyaka 
    0.001512859     0.001517451     0.001517451     0.001519757     0.001522070 
      dilikedi1     diiek__rte_   enimenelegzet      ismhndncsy         dondu_e 
    0.001522070     0.001547988     0.001577287     0.001602564     0.001626016

Betweenness (whole):

ferayicinadale1   denizschmosby     malikejder_       zekibahce   sanli_turk___ 
      24653.522       23747.595       12569.292        9507.165        6389.786 
      drhuriyet belkibirgun2335 hayriyeberberl1  kampuscadilari      umutmurare 
       5784.300        5271.323        4713.151        3443.000        3264.622

1.3.5.1 Most informative centrality measure

To determine which centrality measure is the most informative about the network, we use Principal Component Analysis (PCA).

PCA is a dimensionality reduction technique used in linear analysis.

The analysis was conducted using the CINNA package in R.

1.3.6 Centrality measures in the retweet network

Out-degree (retweet):

    aktepeyekta         gmrrty3   fahri10698828       umitcan25      battalgaz3 
             95              85              75              62              59 
arslansabanrt11  hudutsuzmenzil        ykilicer        kaangucl     fedaimalkoc 
             57              57              55              51              49

In-degree (retweet):

      yenisafak    themarginale       debuffer2       zekibahce      medyaadami 
           2988            1608            1549            1116             931 
ajanshaberresmi    asliaydincer   anadoluajansi        hurriyet     malikejder_ 
            890             873             829             816             786

Closeness (retweet):

   borankaplan6     bahgunaydin       ahmt_gzel     nur26139681      tevfikaliz 
    0.006024096     0.006493506     0.006535948     0.006622517     0.006896552 
urfaliogluihsan         meb6307        ilkatman      yusra__571       sskartal3 
    0.006896552     0.006944444     0.006944444     0.006993007     0.006993007

Betweenness (retweet):

1.3.7 Centrality measures in the quoted network

Out-degree (quoted):

politicalinnov2       aahmttprk      emtevbrane yildizdilek2009   unsalkartal58 
             21               5               5               4               4 
 muradcobanoglu     albay_birol       dcankocak    ahmettozlu29 yunus_arslan_ya 
              3               3               2               2               2

In-degree (quoted):

  sgirgin48tbmm        hurriyet     devapartisi       yenisafak       debuffer2 
             31              19              15              14               9 
  anadoluajansi ajanshaberresmi      enginozkoc   unsalkartal58          takvim 
              9               8               5               4               4

Closeness (quoted):

yildizdilek2009       aahmttprk      emtevbrane politicalinnov2    ahmettozlu29 
      0.1666667       0.2000000       0.2000000       0.2000000       0.2500000 
      insanvard   bozdagbulentt         29ercan        aygun_tk        yahreynn 
      0.2500000       0.3333333       0.3333333       0.3333333       0.3333333

Betweenness (quoted):

      debuffer2 yankibuyuksezer      medyaadami        enveryan      ersinceliq 
              9               1               1               1               1 
 biriktisatci11    gurler_rustu       dcankocak yildizdilek2009    connectumkut 
              1               0               0               0               0

1.3.8 Centrality measures in the reply-to network

Out-degree (reply-to):

  denizschmosby         orca34o    vatan66sever    okan54359803     malikejder_ 
             40              20              19              16              15 
       tumham11     murat202202 belkibirgun2335       sedatkck3     1071fatihan 
             14              13              12              12              11

In-degree (reply-to):

 kilicdarogluk  sgirgin48tbmm  herkesicinchp   eczozgurozel     enginozkoc 
            53             39             29             28             18 
   malikejder_    gazetesozcu canan_kaftanci  cumhuriyetgzt       alimahir 
            15             13             11              8              8

Closeness (reply-to):

belkibirgun2335   denizschmosby       yunlu1905         orca34o    vatan66sever 
     0.01694915      0.02777778      0.04000000      0.04166667      0.05555556 
    murat202202    okan54359803   ziyahafizoglu       sedatkck3    cerkes_giray 
     0.08333333      0.08333333      0.10000000      0.11111111      0.11111111

Betweenness (reply-to):

  denizschmosby    okan54359803       zekibahce belkibirgun2335        yazparov 
           26.0            12.0             4.5             3.0             3.0 
    akantalyali      medyaadami     murat202202      miraataba1         slmhktn 
            2.0             1.5             0.0             0.0             0.0

1.3.9 Centrality measures in the mentions network

Out-degree (mentions):

  denizschmosby    bizimtvcomtr    vatan66sever        kpopabla    okan54359803 
             67              32              26              23              22 
        orca34o     murat202202 belkibirgun2335     1071fatihan     executive61 
             20              18              16              15              15

In-degree (mentions):

 herkesicinchp  kilicdarogluk  sgirgin48tbmm   eczozgurozel   bizimtvcomtr 
           157             97             84             34             32 
   gazetesozcu     enginozkoc  cumhuriyetgzt       hurriyet canan_kaftanci 
            18             18             17             15             15

Closeness (mentions):

belkibirgun2335       yunlu1905   denizschmosby         orca34o    vatan66sever 
     0.01639344      0.03125000      0.03333333      0.04166667      0.05263158 
   okan54359803 smailen89581578     murat202202        kpopabla    cerkes_giray 
     0.06666667      0.07142857      0.08333333      0.08333333      0.10000000

Betweenness (mentions):

 denizschmosby   okan54359803    haberturktv      zekibahce  aydemirbulent 
          26.0           15.0            9.0            4.5            4.0 
      yazparov        slmhktn    akantalyali gultekindavasi     medyaadami 
           4.0            3.0            2.0            2.0            1.5

1.4 Social network visualizations

Visualization link

1.4.1 Interpreting “unnatural-looking” relationships

Difference between troll and bot accounts:

Bot: automated social media accounts programmed to imitate human behavior.

Botometer, developed at Indiana University by a team including Onur Varol (Sabancı University).

Troll: these accounts are controlled by humans (they can act individually or as coordinated groups).

Time-based correlation (retweets happening at similar times right after the original tweet)
Account activity (e.g., user_tweet_count)
Content similarity (e.g., accounts that continuously retweet)

Similarly, a recent Harvard University study on identifying troll accounts (Detecting Troll, Saving Democracy) uses:

Content (detecting frequently repeated word groups using techniques similar to our word analysis)
Followers
Following
Retweet count

1.4.2 Example: identifying suspicious interactions

Information on which devices tweets were sent from (device type as an edge attribute):

Accounts that retweeted TheMarginale’s tweet (device type not Android, iPhone, or iPad):

Filtered data:

Code

st1.joined.clean <- subset(joined.clean, created_at> "2020-12-27T00:00:18" & created_at < "2021-01-02T00:23:18")

st1.joined.clean.filtered <- st1.joined.clean%>%
  filter(st1.joined.clean$source == "Twitter Web App") 

st1.joined.clean.filtered <- st1.joined.clean.filtered%>%
  filter(st1.joined.clean.filtered$sourcetweet_id == "1343167361466191874")

Time series analysis for the filtered data:

Screenshots of accounts that retweeted the relevant tweet:

themarginale-retweet-accounts-tweet-counts

1.5 Automated text analysis

1.5.1 Text cleaning steps

In order: | ========================================================================================================================================+ 1. Converting some special characters and Turkish uppercase letters to Latin letters (e.g., ‘α’ = ‘a’, ‘á’ = ‘a’, ‘é’ = ‘e’, ‘Ü’ = ‘u’) | 2. Converting uppercase to lowercase | | 3. Removing Turkish stopwords (e.g., “ve”, “şuna”, “tamam”, “yine”… 473 words) | | 4. Removing some special characters (e.g., removepunctuation, removenumbers, removehashtags, removeurl…) | | 5. Converting Turkish characters to Latin equivalents |

1.5.2 Most frequent words

              word    n
1            pınar 4760
2       gültekinin 2827
3         gültekin 2514
4          davadan 2247
5              chp 2128
6            chpli 1896
7            muğla 1457
8           babası 1121
9     milletvekili 1097
10        süleyman  938
11          vazgeç  870
12        babasını  756
13          girgin  722
14        vazgeçin  611
15        ailesine  562
16           vekil  518
17           diyen  495
18 milletvekilinin  489
19        arayarak  487
20      katledilen  446

1.5.3 Word cloud

1.5.4 Most frequent emojis

# A tibble: 10 × 2
   emoji     n
   <chr> <int>
 1 😡       44
 2 🔴       34
 3 ❗       29
 4 🔹       25
 5 🔥       23
 6 👇       22
 7 ▪️        18
 8 📌       17
 9 💣       16
10 🤬       16

1.6 Skip-gram model

Splitting text into smaller parts with n-grams and skip-grams enables examining correlations and the context around words.

An n-gram is a sequence of n adjacent items (here, items are words) in an example text.
The value n indicates how many items we split the text into. If n = 1 it is a “unigram”; if n = 2, a “bigram” (two consecutive words); if n = 3, a “trigram”.
n-gram models are frequently used in natural language processing (NLP) to predict the next word/text.
For k-skip-n-grams, n indicates the number of items (words) and k indicates how many skips are allowed.
Therefore, an n-gram (with no skips) is the same as a 0-skip-n-gram.
The skip-gram model is an unsupervised learning technique used to identify contextually related surrounding words for a given word in a text.

1.6.1 Network visualization

1.6.2 Clusters from the “all-time” text analysis

 [1] "gültekin, pınar, katledilen, öğrencisi, yeni, cinayetinde, vahşice, üniversite, öldürülen, gültekini, cinayeti, davasında, son, katleden, flaş"                                  
 [2] "cemal, metin, katil, mertcan, zanlısı, kardeşi, avcı, tahliye, avcının, sanık, cma, cüce, muğladaki, isimli, mekanın"                                                            
 [3] "kadınların, pınargültekin, önceki, kişiyi, etiketler, sesiyim, yazıp, etiketleyebildiğiniz, çiçek, yeter, istanbulsözleşmesiyaşatır, çek, üzerinden, kadınasiddetedurde, misiniz"
 [4] "gültekinin, davadan, katili, babasını, arayarak, vazgeç, sıddık, babası, ailesinin, ailesine, arayıp, rezan, cansız, diyen, avukatı"                                             
 [5] "emine, ozgecan, sule, münevver, aleyna, ceren, helin, cet, güleda, bulut, aslan, cakır, karabulut, ozdemir, oldürüldü"                                                           
 [6] "süleyman, chp, muğla, chpli, milletvekili, ağır, suç, yönetim, katilin, ilçe, iddianame, hakkında, ceza, gündür, ailesi"                                                         
 [7] "ortaya, ifadesi, görüntüleri, çıktı"                                                                                                                                             
 [8] "pinargultekin, adalet, pınargültekiniçinadalet, gerçek, erkek, istiyoruz, eski, tweet, imza, kampanyaya, arkadaşı, pınargueltekinicinadalet, yerini, sevgilisi, atın"            
 [9] "kadın, reddi, hakim, öldü, ülkede, cinayetleri, kız, yakılarak, talebi, insanlık, koklamaya, diyor, cesedi, külünü, öpüp"                                                        
[10] "adli, otopsi, tıp, raporu"                                                                                                                                                       
[11] "diri, yakılmış, yakıldığı"                                                                                                                                                       
[12] "üzerine, üstüne, beton, dökülen, varile, koyup, döken, dökülmüş, konup, dökülerek"                                                                                               
[13] "kan, donduran"                                                                                                                                                                   
[14] "allahtan, allah, rahmet, belanızı, versin, eylesin"                                                                                                                              
[15] "bağ, keşif, evinde, yapılacak, evine"                                                                                                                                            
[16] "cinayete, kurban, giden"                                                                                                                                                         
[17] "ört, bas, etmeye, etmek, isteyen, pis, çalıştı, ellerini"                                                                                                                        
[18] ""                                                                                                                                                                                
[19] ""

1.6.3 Text analysis – network visualization

667,875 bigram pairs (created via skip-gram analysis)

Code

nrow(skip.gram.count)

[1] 667875

Network visualization link

1 Introduction

1.1 Research Questions

1.2 Data Collection

1.2.1 Distribution of tweet counts (time series plot)

1.2.2 Accessing all tweets

1.3 Social Network Analysis

1.3.1 Definition

1.3.2 Centrality measures

1.3.3 Preparing the data for analysis

1.3.4 Number of nodes and edges in the network created for the selected time window

1.3.5 Centrality measures for the whole network (whole_graph)

1.3.5.1 Most informative centrality measure

1.3.6 Centrality measures in the retweet network

1.3.7 Centrality measures in the quoted network

1.3.8 Centrality measures in the reply-to network

1.3.9 Centrality measures in the mentions network

1.4 Social network visualizations

1.4.1 Interpreting “unnatural-looking” relationships

1.4.2 Example: identifying suspicious interactions

1.5 Automated text analysis

1.5.1 Text cleaning steps

1.5.2 Most frequent words

1.5.3 Word cloud

1.5.4 Most frequent emojis

1.6 Skip-gram model

1.6.1 Network visualization

1.6.2 Clusters from the “all-time” text analysis

1.6.3 Text analysis – network visualization

1.3.5 Centrality measures for the whole network (`whole_graph`)