We are using the method where we have a list of positive words and a list of negative words. We use these words to calculate a sentiment score based on whether the sentence contains more of positive or negative words. The code is in R.
List of positive and negative words
positive_words <- c('abounded', 'contentment','exceed')
negative_words <- c('abolish', 'baseless','caustic')
sentence <- c("manufacturing is abounded in St. Louis and exceed the expectation though it abolished traditional industries", "Acme has to deal with baseless and caustic arguments")
Normally, we should be reading these sentence from some file, but here we are creating a sample text to show the process categorizing text based on the sentimental score.
Convert the sentence into a list of words using str_split function.
word_list = str_split(sentence, '\\s+')
words = unlist(word_list)
The object 'words' is a vector. Now, we will count how many of positive words are found in the first sentence.
positive_match = match(words, positive_words)
The output of match() function will look like this where '1' and '3' represent word position in positive_words ('abound' and 'exceed')
> [1] NA NA 1 NA NA NA NA 3 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Convert the positive_match into logical vector.
pos_matches = !is.na(positive_match)
pos_count = sum(pos.matches)
Now, we can count how many times positive words occurred in a sentence. We can repeat the above process to count the number of negative words occurred in the sentence and then do difference with pos_count to calculate the score.
score = pos_count - neg_count
Calculate the score for each sentence. Once we know the score of each sentence, we can categorize them into groups based on their score value. For example, we can categorize them into highly positive, somewhat positive, neutral, somewhat negative, and highly negative.
List of positive and negative words
positive_words <- c('abounded', 'contentment','exceed')
negative_words <- c('abolish', 'baseless','caustic')
sentence <- c("manufacturing is abounded in St. Louis and exceed the expectation though it abolished traditional industries", "Acme has to deal with baseless and caustic arguments")
Convert the sentence into a list of words using str_split function.
word_list = str_split(sentence, '\\s+')
words = unlist(word_list)
The object 'words' is a vector. Now, we will count how many of positive words are found in the first sentence.
positive_match = match(words, positive_words)
The output of match() function will look like this where '1' and '3' represent word position in positive_words ('abound' and 'exceed')
> [1] NA NA 1 NA NA NA NA 3 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Convert the positive_match into logical vector.
pos_matches = !is.na(positive_match)
pos_count = sum(pos.matches)
score = pos_count - neg_count
Calculate the score for each sentence. Once we know the score of each sentence, we can categorize them into groups based on their score value. For example, we can categorize them into highly positive, somewhat positive, neutral, somewhat negative, and highly negative.
Comments
Post a Comment