The Coronavirus in Italy from the Twitter's Point of View

An overview on Twitter Activites

In those days we are facing the epidemy of COVID-19, aka Coronavirus. We are continuously bombed by very different types of information along the day: epidemic related trends, market’s and economy’s suffering, and severe governments’ measures. But how people reacted during the crisis? In such a context people produces tons of data among the different social networks, sharing their feelings and their thoughts. So we can try to understand the feelings of the population from what is shared by the people themselves.

In the next sections I will explore inside a Twitter’s tweets sample (updated on daily basis) to get some, obviously unexhaustive, insights about the feelings of the italian population.

First of all, I searched tweets between 2020-02-22 and 2020-03-14. I got 493690 tweets. The sample has been requested from those tweets where the language is the Italian, and using the following search keys.

#> [1] "coronavirus"         "covid"               "#COVID19italia"     
#> [4] "#covid2019"          "emergenza sanitaria" "#covid19italia"     
#> [7] "#iorestoacasa"       "#irresponsabili"     "#iostoacasa"

All charts are interactive: in particular, on the trends you can zoom in or out or remove some of the traces.

Twitting Activity

The most simple analysis I can do to start the data exploration is the temporal distribution of the tweets of the sample.

The activity’s peaks has been reached in correspondence to important news and announcements:

  • 2020-02-23 some of the cities in the Northern Italy were ‘closed’;

  • 2020-02-27 starts the (short-live) campaign “#milanononsiferma”, to contrast negative feelings related to the spread of the virus in the region;

  • 2020-03-07 lockdown of Northern Italy;

  • 2020-03-09 whole Italy declared red area.

Exploration of the Hashtag Key Set

Let’s explore the set of hashtags contained over all the tweets belonging to the sample.

We can see that hashtags refers to:

  • the regions where the contagious is more active (milano, lombardia);

  • politicians involved somehow in the management of the crisis or that (conte, salvini);

  • a topic very hot was the one related to the Italian Serie A championship (seriea): this may be accentuate by the controversy around some of the matches that were not played in the expected dates, e.g. Juventus-Inter, or can be caused by the several discussion and uncertainty around the decision to stop or continue with the season. Italians don’t stop thinking football in any occasion!

  • an hashtag that has been exponentially used is the restiamoacasa (“we all stay home”), something very similar to a keen suggestion for all people to follow the acts and indications of the health national system and of the government.

Most Frequent Hashtags

The previous section gave an overview of the most used hashtags throughout all the period the entire sample of tweets refer to. A further step can be to separate tweets by day and analyze the trends on hourly basis. In this section I will go deeper through the most frequent hashtags. The next chart shows the temporal distribution of the hashtags I used to retrieve the tweets using the twitter-scraper tool.

The #coronavirus hashtag has been the most used before the so-called Italy lockdown, decided during the night of March the 9th. From that point on, the institutional #iorestoacasa and the #covid2019 hashtags started to be stable at higher frequencies than the period before the lockdown’s announcement.

Excluding the previous hashtags, and all the misspelled ones that refers the search keys, in the next chart we can see the trends for the top-10 of the hashtags used since the start of the crisis:

The peaks represent several important moments during the crisis: lockdown of Northern Italy (lombardia, zonerosse), and institutional messages and moments (conte, restateacasa,restiamoacasa). It is interesting to note, by de-selecting all traces except restateacasa and restiamoacasa, how those hashtags are somehow related. The first one (stay your home) refers to the moment where several people before the lockdown of Northern Italy went to the available buses/trains in order to travel home, out of Lombardia, to their families. Such a plea became something that involved all italians when the Government decided for the lockdown, and in fact it has been sponsored by the institutions the hashtag “#iorestoacasa” (I’m staying home), which became also “#restiamoacasa” (we are staying home).

Another way to get some insights within the changes happened into the italian public opinion is to see what are the most used hashtags by days. Since the hashtag try to summarise the concept or the topic of a particular tweet, this can give us a point of view on the common feelings or on particular focuses day-by-day. The next chart shows, on daily basis, the top-6 hashtags used given the initial sample. Are out of scope all the hashtags used as search keys (they would be obviously the most frequent), and all their mispelled versions, e.g. coronvirus, coronaviriusitalia.

The history of the crisis by hashtags, following the previous charts:

  • 2020-02-22, the crisis started as a regional case (#lombardia, #veneto, #codogno)

  • 2020-02-23 2020-02-24, italian’s hand sanitizer became the new gold, throughout the entire country (#amuchina)

  • 2020-02-27, ‘Milano won’t stop’ campaign, but it stopped soon (#milanononsiferma, the hashtag will be no more trending in a couple of days)

  • 2020-02-28 2020-02-29, the Governor of Region Veneto states: “We have all seen that Chinese people eat living mouses” (#zaia, #topivivi) SERIOUSLY

  • 2020-02-29 meanwhile, the very important problems: will the match Juventus-Inter be played regularly? And other typical italian controversies about Serie A (#seriea, #juveinter)

  • #seriea will remain a very used hashtag for the following 4 days (until the 2020-03-04) while:
    • 2020-02-29 The Governor of Region Veneto publicy apologize for its statement
    • 2020-03-03 The epidemy rapidly spreads in France
    • 2020-03-04 Schools closure in Italy
  • 2020-03-05, the President of the Italian Republic give a speech to reassure the population since the crisis is becoming important (#mattarella)

  • 2020-03-07 2020-03-08, after the lockdown of the Northern Italy people continue to go on leave and living as nothing was happening or come back home from red areas, being insulted by several persons (#restateacasa, #irresponsabili)

  • from 2020-03-09 on, Italy lockdown (#restiamoacasa, #italiazonaprotetta, #italiazonarossa)

Text’s Words from Tweets

In this section I will try to get some information from the textual part of the tweets. Before execute any analysis, however, it is necessary to make some preprocessing to extract the words from the tweets. I used the stopwords package to remove the stop words from the text. Furthermore I removed all the hashtags from the text of the tweets and I removed or adjusted words in order to get the more cleaner list of words from tweets I could do.

Some words that were excluded by the processing:

#>  [1] "bit.ly"   "youtu.be" "status"   "è"        "c'è"      "c'é"     
#>  [7] "<U+30FC>" "c’è"      "ift.tt"   "é"        "html"     "de"      
#> [13] "x"        "ce"       "igshid"   "eh"       "ve"       "p_cp"    
#> [19] "m_pd"     "shtml"    "ow.ly"    "the"      "xl"

Others preprocessing steps:

  filter(is.na(as.integer(word))) %>%
  filter(!(stringr::str_starts(word, "www"))) %>%
  filter(!(stringr::str_starts(word, "utm_"))) %>%
  filter(!(stringr::str_ends(word, ".com"))) %>%
  filter(!(stringr::str_ends(word, ".it"))) %>% 
  mutate(word = stringr::str_replace(word, "l'", "")) %>%
  mutate(word = stringr::str_replace(word, "l’", "")) %>%
  mutate(word = stringr::str_replace(word, "un'", "")) %>%
  mutate(word = stringr::str_replace(word, "un’", "")) %>%
  mutate(word = stringr::str_replace(word, "dell'", "")) %>%
  mutate(word = stringr::str_replace(word, "dell’", "")) %>%
  mutate(word = stringr::str_replace(word, "qual'", "")) %>%
  mutate(word = stringr::str_replace(word, "qual’", "")) %>%
  mutate(word = stringr::str_replace(word, "c'", "")) %>%
  mutate(word = stringr::str_replace(word, "c’", "")) %>%
  mutate(word = stringr::str_replace(word, "d'", "")) %>%
  mutate(word = stringr::str_replace(word, "d’", "")) %>%
  mutate(word = stringr::str_replace(word, "delemergenza", "emergenza")) %>% 
  filter(!(stringr::str_starts(word,"\\d+")))

From the previous preprocessing phase I get the input for the next two charts. In the first the 30 words most used in the tweets since the starting of the crisis.

Feelings of bewilderment mixed with a bit of hope kind of prevail: words like stare, spero, dicono, vorrei, chiedo show that (“to stay”, “I hope”, “someone say”, “would”, “I ask”). Other feelings related to the wish to face the critical situation are becoming present but in a lighter way: words like voglio, restare, affrontare stay in the middle-lower positions of the top-30 chart. This is obviously a possible interpretation that is very volatile also due to the fact that this trend vary day-by-day until new daily tweets are included in the sample.

In the next chart the top-10 chart on daily basis of the most used words: