Text and sentiment analysis of ‘The Little Prince’

A sentiment analysis of The Little Prince using the NRC lexicon and a nice word cloud with the 100 mos used words in the text.

Carmen Galaz-García true
02-10-2021

Introduction

This is a text analysis on the The Little Prince, written and illustrated by Antoine de Saint Exupéry. We use the translation from the French by Katherine Woods.

Text wrangling

Show code
# --- READ IN TEXT ---
prince_text <- pdf_text("TheLittlePrince.pdf")

# --- TEXT CLEANING ---
prince_tokens <- data.frame(prince_text) %>%   # convert to data frame
  mutate(text_full = str_split(prince_text,pattern="\\n")) %>%  #split by line
  unnest(text_full) %>% # make each line into a string
  mutate(text_full = str_trim(text_full)) %>%  # remove extra spaces
  unnest_tokens(word, text_full) %>%           # make a column with each word as a row
  select(word) %>%      # keep the words
  anti_join(stop_words) # remove stop words

Word cloud of most used words

The word “prince”, which is the most frequently used word in the text, is excluded from this analysis. To make the wordcloud I used the ggworldcloud library.

Show code
# --- TOP 100 WORDS WORD CLOUD ---

# select top 100 words
top_100_words <- prince_tokens %>% 
  count(word) %>% 
  arrange(-n) %>% 
  slice(2:101) %>%   # remove prince (#1 word with 179 counts)
  mutate(angle = 45 * sample(-2:2, n(), # add random angles to words
                             replace = TRUE, 
                             prob = c(1, 1, 4, 1, 1))) 

top_100_words$angle[1]=45

# wordcloud graph
ggplot(top_100_words, aes(label = word, 
                          size = n,
                          color = factor(sample.int(10, nrow(top_100_words), replace = TRUE)),
                          angle = angle)) +
  geom_text_wordcloud_area() +
  scale_size_area(max_size = 10) +
  labs(title="Most used words in The Little Prince")+
  theme_void()

Sentiment analysis

To do the sentiment analysis we used the NRC lexicon, which assigns to each word one of 10 sentiments and/or a negative or positive tag. As we might have suspected, The Little Prince is quite a positive book and feelings most often associated to it according to the NRC lexicon are trust, anticipation and joy.

Show code
nrc_count <- prince_tokens %>% 
  inner_join(get_sentiments("nrc")) %>% 
  count(sentiment)

ggplot(data=nrc_count, aes(x=fct_reorder(sentiment,n), # order from high to low count 
                           y=n))+
  geom_col(color='darkslateblue', fill='darkslateblue')+
  coord_flip() +
  labs( x = "sentiment",
    y = "words with given sentiment",
          title = "Sentiment analysis in 'The Little Prince' by Antoine de Saint-Exupéry")+
  theme_minimal()

References

NRC lexicon: Crowdsourcing a Word-Emotion Association Lexicon, Saif Mohammad and Peter Turney, Computational Intelligence, 29 (3), 436-465, 2013.

The The Little Prince, written and illustrated by Antoine de Saint Exupéry. Translation by Katherine Woods. Available here.