Skip to main content

UnSupervised Sentimental Analysis

We are using the method where we have a list of positive words and a list of negative words. We use these words to calculate a sentiment score based on whether the sentence contains more of positive or negative words.  The code is in R.


List of positive and negative words

positive_words <- c('abounded', 'contentment','exceed')
negative_words <- c('abolish', 'baseless','caustic')


sentence <- c("manufacturing is abounded in St. Louis and exceed the expectation though it abolished traditional industries", "Acme has to deal with baseless and caustic arguments")

Normally, we should be reading these sentence from some file, but here we are creating a sample text to show the process categorizing text based on the sentimental score.

Convert the sentence into a list of words using str_split function.

word_list = str_split(sentence, '\\s+')
words = unlist(word_list)

The object 'words' is a vector. Now, we will count how many of positive words are found in the first sentence.

positive_match = match(words, positive_words)

The output of match() function will look like this where '1' and '3' represent word position in positive_words ('abound' and 'exceed')
> [1] NA NA  1 NA NA NA NA  3 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

Convert the positive_match into logical vector.
pos_matches = !is.na(positive_match)
pos_count = sum(pos.matches)

Now, we can count how many times positive words occurred in a sentence. We can repeat the above process to count the number of negative words occurred in the sentence and then do difference with pos_count to calculate the score.

score = pos_count - neg_count


Calculate the score for each sentence. Once we know the score of each sentence, we can categorize them into groups based on their score value.  For example, we can categorize them into highly positive, somewhat positive, neutral, somewhat negative, and highly negative.

Comments

Popular posts from this blog

Decision Tree

Decision tree is a multi-class classification tool allowing a data point to be classified into one of many (two or more) classes available.  A decision tree divides the sample space into a rectilinear region. This will be more clear with an example. Let us say we have this auto-insurance claim related data as shown in the following table. We want to predict what type of customer profile may more likely lead to claim payout.  The decision tree model may first divide the sample space based on age. So, now we have two regions divided based on the age. Next, one of those regions will further sub-divided based Marital_status, and then that newly divided sub-regision may further get divide based on Num_of_vehicle_owned.  A decision tree is made up of a root node followed by intermediate node and leaf node.  Each leaf node represents one of the class into which data points have been classified to. An intermediate node represents the decision rule based...

Recommender System using Collaborative filtering

Recommender system using collaborative filtering approach uses the past users' behavior to predict what items the current user would like. We create a UxM matrix where U is the number of users and M is the number of different items or products. Uij is the rating expressed by the user-i for product-j. In the real world, not every user expresses an opinion about every product. For example, let us say there are five users including Bob has expressed their opinion about four movies as shown below Table 1: movie1 movie2 movie3 movie4 user1 1 3 3 5 user2 2 4 5 user3 3 2 2 user4 1 3 4 Bob 3 2 5 ?  Our goal is to predict what movies to recommend to Bob, or put it another way should we recommend movie4 to Bob, knowing the rating for four movies from other users including Bob. Traditionally, we could do item to item comparison, which means if the user has liked item1 in the past then that user may like other items similar to item1. Another way to recommend...

Sentimental Analysis Using Scikit-Learn and Neural Network

Using Scikit-Learn and NLTK for Sentimental Analysis Sentimental analysis is a way of categorizing text into subgroup based on the opinion or sentiments expressed in the text. For example, we would like to categorize review or comments of people about a movie to determine how many like the movie and how many don't. In a supervised sentimental analysis, we have some training data which is already categorized or sub-grouped into different categories, for example, into 'positive' or 'negative' sentiments. We used these training data to train our model to learn what makes a text to be part of a specific group. By text I mean a sentence or a paragraph. Using this labeled sentences, we are going to build a model. So, let us say we have following training text: training_positive = list() training_positive[0] =  "bromwell high is a nice cartoon comedy perfect for family" training_positive[1] =  " homelessness or houselessness as george carlin s...