These docs are for v1.1.2. Click to read the latest docs for v2.2.0.

Benchmarking NuPIC

The NuPIC Inference Server and optimized models are specifically designed to perform well on AMX- and AVX-compatible CPUs. Don't just take our word for it—check it out for yourself!

How it Works

This benchmark example runs a series of accuracy and performance tests, comparing HuggingFace's standard BERT-base-cased model to NuPIC BERT models on the Semantic Textual Similarity Benchmark (STSB). Each row of the STSB dataset contains a sentence pair, as well as a floating point score denoting their semantic similarity which serves as a ground truth.

In our benchmark, the task is to compare the similarity of each sentence with its paired counterpart in the same row. We do so through the following steps:

  1. Compute the respective embeddings of each sentence pair by passing it through a BERT model
  2. Calculate cosine similarity between the two embedding vectors
  3. After having done steps 1 and 2 for all examples in the dataset, take the vector of all cosine similarity scores, and compute Spearman correlation against the vector of all ground truth scores. A strong correlation indicates that the modeling approach is able to accurately represent semantic similarities in the dataset.
  4. We also compare the time taken for a single pass of the entire dataset using the above. This gives an indication of efficiency of the modeling approach.
  5. We repeat the steps above using standard and Numenta-optimized BERT models.

Quick Start

Before you start, make sure the NuPIC Inference Server is running, and the Python client/environment is set up.

To run the code, ensure that you are in the nupic/ directory. From there, navigate to the benchmark/ folder and run the Python script.

cd nupic.examples/examples/benchmark
python benchmark.py

Digging a little into the Python code in benchmark.py, you'll see that it uses our Python client to interact with the Inference Server:

from nupic.client.inference_client import ClientFactory

Expected results

You should expect to see results following the trends below. However, the exact numbers will vary with your hardware setup.

Results and times per model:
+-------------------------------------------+---------------------+---------------+---------------+
| Model                                     | Use Case            | Result        | Time (s)      |
+-------------------------------------------+---------------------+---------------+---------------+
| nupic-sbert-1-v3-wtokenizer               | Sentence Similarity |0.83697        |32.70247       |
+-------------------------------------------+---------------------+---------------+---------------+
| nupic-sbert-2-v1-wtokenizer               | Sentence Similarity |0.84838        |36.09769       |
+-------------------------------------------+---------------------+---------------+---------------+
| bert-base-cased                           | Sentence Similarity |0.63346        |227.84145      |
+-------------------------------------------+---------------------+---------------+---------------+

Compared to the standard BERT model (bert-base-cased), NuPIC models are more than 6x faster on CPU, and are also slightly more accurate!