SENTIMENT ANALYSIS: A WAY TO IMPROVE YOUR BUSINESS

SENTIMENT ANALYSIS: A WAY TO IMPROVE YOUR BUSINESS

SENTIMENT ANALYSIS: A WAY TO IMPROVE YOUR BUSINESS

In this blog post, we are going to introduce the readers to an important field of artificial intelligence which is known as Sentiment Analysis. It’s something that is used to discover an individual's beliefs, emotions, and feelings about a product or a service. As we proceed further in this tutorial readers and passionate individuals will come to know how such an amazing approach is implemented with the flow diagram. To help readers understand the things better and practical, live code is also inserted in one of the sections.

At the end while concluding I will also mention a better approach to improve the existing state of the art method to implement sentimental analysis and will also provide an API where the users can do practical testing of customized approach.

Now, let's define sentiment analysis through a customer's reviews. As an example, if we take customer feedback, it's sentiment analysis in a form of text measures the user's attitude all towards the aspects of a product or a service which they explain in a text.

The contents of this blog post are as follows:

  • Why Sentiment analysis is important
  • How it is achieved
  • Live Code
  • What improvement we’ve done  

What is Sentiment Analysis?

Sentiment analysis is the process of using natural language processing, text analysis, and statistics to analyze customer analysis or sentiments. The most reputable businesses appreciate the sentiment of their customers—what people are saying, how they’re saying it, and what they mean.

If we look at the theory, then it is a computational study of opinions, attitudes, view, emotions, sentiments etc. expressed in the particular text. And that text can be seen in a variety of formats like reviews, news, comments or blogs.

Why is sentimental analysis of amazon or Flipkart important?

In today's world, marketing and branding have become the strength of colossal businesses and to build a connection between the customers' such businesses leverage social media. The major aim of establishing this connection is to simply encourage two-way communication, where everyone benefits from online engagement. Simultaneously, two huge platforms are emerging in the field of marketing. In proceeding further, we’ll grasp why these two enormous platforms have become so efficient specifically for analysing the sentiments of the customers.

Flipkart and Amazon India are emerging as the two colossal players in the swiftly expanding online retail industry in India. Although Amazon started its operations in India much later than Flipkart, it is giving tough competition to Flipkart.

How does sentiment analysis work?

Sentiment analysis uses various Natural Language Processing (NLP) methods and algorithms. There are two processes which clarify to you how machine learning classifiers can be implemented. Take a look.

The training process: In this process (a), the model learns to associate a particular text form to the corresponding output which can be recognised as a tag in the image. Tag is based on the test samples used for training. The feature extractor simply transfers the input of the text into a feature vector. Pairs of tags and feature extractor (e.g. positive, neutral, or negative) are fed into the machine learning algorithm to generate a model.

  1. The prediction process: The feature extractor used to transfer unseen text inputs into feature vectors. Then the feature vector fen into the model which simply generates predicted tags (positive, negative, or neutral).

This kind of representation makes it possible for words with similar meaning to have a similar representation, which can improve the performance of classifiers. In the later section, you'll get to learn the sentiment analysis with the bag-of-words model, data collection and so on.

I am explaining the general approach for implementing sentiment analysis using a predefined library.

I will implement three phases of this approach such as data gathering, data cleaning and predicting with live code. Each line of the code will be explained in the respective section. General approach is also mentioned in the flowchart below.

Steps for sentiment analysis using predefined library

Data collection via web scraping

For collecting data, web scrapers automatically collect the data information and other data that are usually only accessible by visiting a website in a browser. This model focuses completely on the words, or sometimes a string of words, but usually pays no attention to the "context" so-to-speak.  

In other words, web scraping is the process of extracting data and content using bots from the internet. Unlike screen scraping, which only copies pixels displayed on screen, web scraping extracts underlying HTML code and, with it, data stored in a database.

The scraper can then replicate or store the complete website data or content elsewhere and use it for further processing. Web scraping is also used for illegal purposes, including the undercutting of prices and the theft of copyrighted content. An online entity targeted by a scraper can suffer severe financial losses, especially if it’s a business strongly relying on competitive pricing models or deals in content distribution.

2. Data Preprocessing (Cleaning)

Data Preprocessing is the technique of data mining which is implemented to transform the raw data in a useful and efficient format. And if your data hasn’t been cleaned and preprocessed, your model does not work.  

  • Using Regex

Line 1: \W stands for punctuation and \d for digits.

Line 2: Removes the link from the text.

Line 3: Values are returned by the function to the part of the program where the function is called.

  • Using nltk

Line 4: This function converts a string into a list based on splitter mentioned in the argument of split function. If not splitter is mentioned then space is used as a default. Join converts list into a string. One can say that joining is a reverse function for the split.

Line 5: So the first step is to convert the string into a list and then each token is iterated in the next step and if there are any stop words in the tokens they are removed.

Line 6: In the final step, the filtered tokens are again converted into a list.

Line 7: Values are returned by the function to the part of the program where the function is called.

SENTIMENT ANALYSIS: A WAY TO IMPROVE YOUR BUSINESS

In this blog post, we are going to introduce the readers to an important field of artificial intelligence which is known as Sentiment Analysis. It’s something that is used to discover an individual's beliefs, emotions, and feelings about a product or a service. As we proceed further in this tutorial readers and passionate individuals will come to know how such an amazing approach is implemented with the flow diagram. To help readers understand the things better and practical, live code is also inserted in one of the sections.

At the end while concluding I will also mention a better approach to improve the existing state of the art method to implement sentimental analysis and will also provide an API where the users can do practical testing of customized approach.

Now, let's define sentiment analysis through a customer's reviews. As an example, if we take customer feedback, it's sentiment analysis in a form of text measures the user's attitude all towards the aspects of a product or a service which they explain in a text.

The contents of this blog post are as follows:

Why Sentiment analysis is important  

How it is achieved  

Live Code  

What improvement we’ve done  

What is Sentiment Analysis?

Sentiment analysis is the process of using natural language processing, text analysis, and statistics to analyze customer analysis or sentiments. The most reputable businesses appreciate the sentiment of their customers—what people are saying, how they’re saying it, and what they mean.

If we look at the theory, then it is a computational study of opinions, attitudes, view, emotions, sentiments etc. expressed in the particular text. And that text can be seen in a variety of formats like reviews, news, comments or blogs.

Why is sentimental analysis of amazon or Flipkart important?

In today's world, marketing and branding have become the strength of colossal businesses and to build a connection between the customers' such businesses leverage social media. The major aim of establishing this connection is to simply encourage two-way communication, where everyone benefits from online engagement. Simultaneously, two huge platforms are emerging in the field of marketing. In proceeding further, we’ll grasp why these two enormous platforms have become so efficient specifically for analysing the sentiments of the customers.

Flipkart and Amazon India are emerging as the two colossal players in the swiftly expanding online retail industry in India. Although Amazon started its operations in India much later than Flipkart, it is giving tough competition to Flipkart.

How does sentiment analysis work?

Sentiment analysis uses various Natural Language Processing (NLP) methods and algorithms. There are two processes which clarify to you how machine learning classifiers can be implemented. Take a look.

The training process: In this process (a), the model learns to associate a particular text form to the corresponding output which can be recognised as a tag in the image. Tag is based on the test samples used for training. The feature extractor simply transfers the input of the text into a feature vector. Pairs of tags and feature extractor (e.g. positive, neutral, or negative) are fed into the machine learning algorithm to generate a model.  

The prediction process:  The feature extractor used to transfer unseen text inputs into feature vectors. Then the feature vector fen into the model which simply generates predicted tags (positive, negative, or neutral).

This kind of representation makes it possible for words with similar meaning to have a similar representation, which can improve the performance of classifiers. In the later section, you'll get to learn the sentiment analysis with the bag-of-words model, data collection and so on.

I am explaining the general approach for implementing sentiment analysis using a predefined library.

I will implement three phases of this approach such as data gathering, data cleaning and predicting with live code. Each line of the code will be explained in the respective section. General approach is also mentioned in the flowchart below.

Steps for sentiment analysis using predefined library

Data collection via web scraping

For collecting data, web scrapers automatically collect the data information and other data that are usually only accessible by visiting a website in a browser. This model focuses completely on the words, or sometimes a string of words, but usually pays no attention to the "context" so-to-speak.  

In other words, web scraping is the process of extracting data and content using bots from the internet. Unlike screen scraping, which only copies pixels displayed on screen, web scraping extracts underlying HTML code and, with it, data stored in a database.

The scraper can then replicate or store the complete website data or content elsewhere and use it for further processing. Web scraping is also used for illegal purposes, including the undercutting of prices and the theft of copyrighted content. An online entity targeted by a scraper can suffer severe financial losses, especially if it’s a business strongly relying on competitive pricing models or deals in content distribution.

2. Data Preprocessing (Cleaning)

Data Preprocessing is the technique of data mining which is implemented to transform the raw data in a useful and efficient format. And if your data hasn’t been cleaned and preprocessed, your model does not work.  

Using Regex

Line 1:  \W stands for punctuation and \d for digits.

Line 2: Removes the link from the text.

Line 3: Values are returned by the function to the part of the program where the function is called.

Using nltk

Line 4: This function converts a string into a list based on splitter mentioned in the argument of split function. If not splitter is mentioned then space is used as a default. Join converts list into a string. One can say that joining is a reverse function for the split.

Line 5: So the first step is to convert the string into a list and then each token is iterated in the next step and if there are any stop words in the tokens they are removed.

Line 6: In the final step, the filtered tokens are again converted into a list.

Line 7: Values are returned by the function to the part of the program where the function is called.

3. Predicting the scores using predefined library SentimentIntensityAnalyzer

Line 8: sid  is the object of the class SentimenIntensityAnalyzer(). This class is taken from nltk.sentiment.vader.

Line 9: It will return four sentiments such as (negative, neutral, positive and compound) along with their confidence score.

Line 10: Function will return the output to the calling function.

Code Snippet for General Approach

from django.shortcuts import render

import re

import nltk

from nltk.corpus import stopwords

from nltk.sentiment.vader import SentimentIntensityAnalyzer

def pre_processing(data):

cleaned_data = re.sub('[\W\d]',' ',data) # \W stands for punctuation and \d for digits

cleaned_data = re.sub('(https)?://[A-Za-z0-9./]+', ' ',   cleaned_data) # removes the link from the text

return cleaned_data

# tokenization and stop words

def token_and_stopwords(obj):

tokens = obj.split()

filtered_sentence = [w for w in tokens if not w in stop_words]

final_text = " ".join(filtered_sentence).strip()

return final_text

# calculate compound, negative, positive and neutral

def calculate_sentiment(res):

sid = SentimentIntensityAnalyzer()

scores = sid.polarity_scores(res)

return scores

# for key in sorted(scores):

# print('{0}: {1}, '.format(key, scores[key]), end='')

def hello_world(request):

if request.method =="POST":

text = request.POST.get('text')

obj = pre_processing(text)

res = token_and_stopwords(obj)

senti = calculate_sentiment(res)

return render(request, 'result.html', {'sentimental_result': senti})

return render(request, 'index.html', {})

Two files are used one for getting the text from the end user named as index.html and another for rendering the response result.html. This small application is built using django and this is a small code snippet to let you know how predefined approach works. In the next section we will have some discussion on how to create a custom approach to make better sentimental analysis api.

Discussion on custom sentimental analysis approach

In this approach, data gathering will remain the same as this is the basic step and is needed for any approach. Different regex patterns can be applied after data is gathered to make it clean and data is subjected to different operations of nltk such as stemming, removing stop words, lemmatizer to clean it more effectively. Here, some custom functions can be developed based on the requirement and structure of the dataset. After this step one will get refined text which is applied to an mechanism which converts the text into some tensors or integer. This could be word embedding or using a bag of words or tfidf. The benefit of using word embedding as compared to later method is former helps to maintain semantic relationship with the words and helps to understand the context better. Output of such is passed to any deep learning or ml model. I would suggest plotting the graph for the text and if graph represents some non-linear relationship then it is good to opt for deep learning else machine learning. When the choice of model is done in the previous step it is time to feed tensor to model for training the model. Training model time depends on the amount of data you have. Once this step is complete I would recommend saving the model so that for prediction phase one needs to load the model instead of training the model again. Suppose if you are using keras then follow the below steps to save the model.

from keras.models import model_from_json
model_json = model.to_json()

with open("model.json", "w") as json_file:

json_file.write(model_json)

# serialize weights to HDF5

model.save_weights("model.h5")

If you are making the model using pytorch, then please execute the below code to save the model

def save_model(model_dictionary):

checkpoint_directory = 'torch_directory'

file_path = os.path.join(checkpoint_directory, 'model.pt')

torch.save(model_dictionary, file_path)

APIs to web scraping

Some websites providers offer Application Programming Interfaces (APIs) that simply allows you to access their data in a predefined manner. You can avoid parsing HTML with APIs and instead access the data directly using formats like JSON, and XML. HTML is primarily a way to visually present content to users.  

Scrape HTML Content From a Page

Then open up a new file in your favourite text editor. All you need to retrieve the HTML are a few lines of code:

"Bags of Words"

The bag of words model usually has a large list, probably better thought of as a sort of "dictionary," which are considered to be words that carry the sentiment. These words each have their own "value" when found in the text. The values are typically all added up and the result is a sentiment valuation.

The equation to add and derive a number can vary, but this model mainly focuses on the words and does not attempt to understand language fundamentals.

Benefits of Sentiment Analysis

3. Predicting the scores using predefined library SentimentIntensityAnalyzer

Line 8: sid is the object of the class SentimenIntensityAnalyzer(). This class is taken from nltk.sentiment.vader.

Line 9: It will return four sentiments such as (negative, neutral, positive and compound) along with their confidence score.

Line 10: Function will return the output to the calling function.

Code Snippet for General Approach

from django.shortcuts import render
import re
import nltk
from nltk.corpus import stopwords
from nltk.sentiment.vader import SentimentIntensityAnalyzer

def pre_processing(data):

  cleaned_data = re.sub('[\W\d]',' ',data) # \W stands for punctuation and \d for digits
  cleaned_data = re.sub('(https)?://[A-Za-z0-9./]+', ' ',   cleaned_data) # removes the link from the text
return cleaned_data

# tokenization and stop words
def token_and_stopwords(obj):
  tokens = obj.split()
  filtered_sentence = [w for w in tokens if not w in stop_words]
  final_text = " ".join(filtered_sentence).strip()
  return final_text

# calculate compound, negative, positive and neutral
def calculate_sentiment(res):
  sid = SentimentIntensityAnalyzer()
  scores = sid.polarity_scores(res)
  return scores
# for key in sorted(scores):
# print('{0}: {1}, '.format(key, scores[key]), end='')



def hello_world(request):
  if request.method =="POST":
    text = request.POST.get('text')
  obj = pre_processing(text)
  res = token_and_stopwords(obj)
  senti = calculate_sentiment(res)
  return render(request, 'result.html', {'sentimental_result': senti})

  return render(request, 'index.html', {})
                                                                                                                                                          


Two files are used one for getting the text from the end user named as index.html and another for rendering the response result.html. This small application is built using django and this is a small code snippet to let you know how predefined approach works. In the next section we will have some discussion on how to create a custom approach to make better sentimental analysis api.

Discussion on custom sentimental analysis approach

In this approach, data gathering will remain the same as this is the basic step and is needed for any approach. Different regex patterns can be applied after data is gathered to make it clean and data is subjected to different operations of nltk such as stemming, removing stop words, lemmatizer to clean it more effectively. Here, some custom functions can be developed based on the requirement and structure of the dataset. After this step one will get refined text which is applied to an mechanism which converts the text into some tensors or integer. This could be word embedding or using a bag of words or tfidf. The benefit of using word embedding as compared to later method is former helps to maintain semantic relationship with the words and helps to understand the context better. Output of such is passed to any deep learning or ml model. I would suggest plotting the graph for the text and if graph represents some non-linear relationship then it is good to opt for deep learning else machine learning. When the choice of model is done in the previous step it is time to feed tensor to model for training the model. Training model time depends on the amount of data you have. Once this step is complete I would recommend saving the model so that for prediction phase one needs to load the model instead of training the model again. Suppose if you are using keras then follow the below steps to save the model.

from keras.models import model_from_json

model_json = model.to_json()

with open("model.json", "w") as json_file:

json_file.write(model_json)

# serialize weights to HDF5

model.save_weights("model.h5")

If you are making the model using pytorch, then please execute the below code to save the model

def save_model(model_dictionary):



  checkpoint_directory = 'torch_directory'


  file_path = os.path.join(checkpoint_directory, 'model.pt')


  torch.save(model_dictionary, file_path)




APIs to web scraping

Some websites providers offer Application Programming Interfaces (APIs) that simply allows you to access their data in a predefined manner. You can avoid parsing HTML with APIs and instead access the data directly using formats like JSON, and XML. HTML is primarily a way to visually present content to users.  

Scrape HTML Content From a Page

Then open up a new file in your favourite text editor. All you need to retrieve the HTML are a few lines of code:

"Bags of Words"

The bag of words model usually has a large list, probably better thought of as a sort of "dictionary," which are considered to be words that carry the sentiment. These words each have their own "value" when found in the text. The values are typically all added up and the result is a sentiment valuation.

The equation to add and derive a number can vary, but this model mainly focuses on the words and does not attempt to understand language fundamentals.