Guides
Log In
Guides

Benchmarking BERT

The NuPIC Inference Server and optimized BERT models are specifically designed to perform well on AMX- and AVX-compatible CPUs. Don't just take our word for it—check it out for yourself!

How it Works

This benchmark example runs a series of accuracy and performance tests, comparing HuggingFace's standard BERT-base-cased model to NuPIC BERT models on the Semantic Textual Similarity Benchmark (STSB). Each row of the STSB dataset contains a sentence pair, as well as a floating point score denoting their semantic similarity which serves as a ground truth.

In our benchmark, the task is to compare the similarity of each sentence with its paired counterpart in the same row. We do so through the following steps:

  1. Compute the respective embeddings of each sentence pair by passing it through a BERT model
  2. Calculate cosine similarity between the two embedding vectors
  3. After having done steps 1 and 2 for all examples in the dataset, take the vector of all cosine similarity scores, and compute Spearman correlation against the vector of all ground truth scores. A strong correlation indicates that the modeling approach is able to accurately represent semantic similarities in the dataset.
  4. We also compare the time taken for a single pass of the entire dataset using the above. This gives an indication of efficiency of the modeling approach.
  5. We repeat the steps above using standard and NuPIC BERT models.

Quick Start

Before you start, make sure the NuPIC Inference Server is running, and the Python client/environment is set up.

To run the code, ensure that you are in the nupic/ directory. From there, navigate to the benchmark/ folder and run the Python script.

cd nupic.examples/examples/benchmark
python benchmark.py

Digging a little into the Python code in benchmark.py, you'll see that it uses our Python client to interact with the Inference Server:

from nupic.client.inference_client import ClientFactory

Expected results

You should expect to see results following the trends below. However, the exact numbers will vary with your hardware setup.

Results and times per model:
+-------------------------------------------+---------------------+---------------+---------------+
| Model                                     | Use Case            | Result        | Time (s)      |
+-------------------------------------------+---------------------+---------------+---------------+
| bert.base                                 | Sentence Similarity |0.63366        |249.29219      |
+-------------------------------------------+---------------------+---------------+---------------+
| nupic-sbert.base-v3                       | Sentence Similarity |0.83668        |16.61641       |
+-------------------------------------------+---------------------+---------------+---------------+
| nupic-sbert.large-v1                      | Sentence Similarity |0.84838        |29.87208       |
+-------------------------------------------+---------------------+---------------+---------------+

Compared to the standard BERT model (bert.base), NuPIC's optimized models are more than 8x faster on CPU, and are also more accurate!

We're just getting started! For a more in-depth analysis of BERT model throughputs, refer to this tutorial.