A sentiment analysis of The Little Prince using the NRC lexicon and a nice word cloud with the 100 mos used words in the text.
This is a text analysis on the The Little Prince, written and illustrated by Antoine de Saint Exupéry. We use the translation from the French by Katherine Woods.
# --- READ IN TEXT ---
prince_text <- pdf_text("TheLittlePrince.pdf")
# --- TEXT CLEANING ---
prince_tokens <- data.frame(prince_text) %>% # convert to data frame
mutate(text_full = str_split(prince_text,pattern="\\n")) %>% #split by line
unnest(text_full) %>% # make each line into a string
mutate(text_full = str_trim(text_full)) %>% # remove extra spaces
unnest_tokens(word, text_full) %>% # make a column with each word as a row
select(word) %>% # keep the words
anti_join(stop_words) # remove stop words
The word “prince”, which is the most frequently used word in the text, is excluded from this analysis. To make the wordcloud I used the ggworldcloud
library.
# --- TOP 100 WORDS WORD CLOUD ---
# select top 100 words
top_100_words <- prince_tokens %>%
count(word) %>%
arrange(-n) %>%
slice(2:101) %>% # remove prince (#1 word with 179 counts)
mutate(angle = 45 * sample(-2:2, n(), # add random angles to words
replace = TRUE,
prob = c(1, 1, 4, 1, 1)))
top_100_words$angle[1]=45
# wordcloud graph
ggplot(top_100_words, aes(label = word,
size = n,
color = factor(sample.int(10, nrow(top_100_words), replace = TRUE)),
angle = angle)) +
geom_text_wordcloud_area() +
scale_size_area(max_size = 10) +
labs(title="Most used words in The Little Prince")+
theme_void()
To do the sentiment analysis we used the NRC lexicon, which assigns to each word one of 10 sentiments and/or a negative or positive tag. As we might have suspected, The Little Prince is quite a positive book and feelings most often associated to it according to the NRC lexicon are trust, anticipation and joy.
nrc_count <- prince_tokens %>%
inner_join(get_sentiments("nrc")) %>%
count(sentiment)
ggplot(data=nrc_count, aes(x=fct_reorder(sentiment,n), # order from high to low count
y=n))+
geom_col(color='darkslateblue', fill='darkslateblue')+
coord_flip() +
labs( x = "sentiment",
y = "words with given sentiment",
title = "Sentiment analysis in 'The Little Prince' by Antoine de Saint-Exupéry")+
theme_minimal()
NRC lexicon: Crowdsourcing a Word-Emotion Association Lexicon, Saif Mohammad and Peter Turney, Computational Intelligence, 29 (3), 436-465, 2013.
The The Little Prince, written and illustrated by Antoine de Saint Exupéry. Translation by Katherine Woods. Available here.