Guides
Log In
Guides

Sentiment Analysis

Sentiment Analysis allows you to interpret and classify emotions within text. It’s particularly useful in understanding customer feedback, social media monitoring, product reviews and many other applications where understanding sentiment can provide valuable insights. This page shows how to perform inference for sentiment analysis on a financial dataset. The model predicts three possible sentiment labels, positive, negative and neutral, as well as the associated probabilities for each prediction.

Quick Start

Before you start, make sure the NuPIC Inference Server and Training Module are up and running, and the Python environment is set up.

📘

Fine-tuning required

A fine-tuned model is required for sentiment analysis. By default, NuPIC BERT models simply return an embedding vector, so the Training Module helps to add a model head (consisting of simple linear layers) for a classification tasks like sentiment analysis.

Now navigate to the directory containing the GPT summarization example:

cd nupic.examples/examples/fine_tuned_sentiment_analysis

Open inference.py in a text editor to check the configurations. Specifically, we want to make sure that the Python client is pointing to the correct inference server URL. The URL below would work if the Inference Server is hosted on port 8000 on the same machine as the Python client. Otherwise, please adjust accordingly.

MODEL = args.model_name
URL = "localhost:8000" <--------------------
BATCH_SIZE = 4
PROTOCOL = "http"
DATA = f"{script_dir}/datasets/financial_sentiment_test_dataset.csv"
CERTIFICATES = {}

We'll now perform inference on the test dataset specified above. The argument below specifies that we want to use the sentiment analysis model that we had fine-tuned earlier.

python inference.py nupic-sbert.base-v3-tuned-CBADA2D3709C40A88259E55988BA84BA

Since we are running on a test dataset with known ground truths, the script can help evaluate the model by calculating the confusion matrix and accuracy score:

Calculating logits for test set...

Confusion matrix:

[[ 68  64  40]
 [ 43 537  46]
 [ 32 101 238]]

Accuracy: 0.7211291702309667

The confusion matrix show the number of predictions against actual ground truth labels in the following format:

Actual NegativeActual NeutralActual Positive
Predicted Negative
Predicted Neutral
Predicted Positive