Benchmarking NuPIC
The NuPIC Inference Server and optimized models are specifically designed to perform well on AMX- and AVX-compatible CPUs. Don't just take our word for it—check it out for yourself!
How it Works
This benchmark example runs a series of accuracy and performance tests, comparing HuggingFace's standard BERT-base-cased model to NuPIC BERT models on the Semantic Textual Similarity Benchmark (STSB). Each row of the STSB dataset contains a sentence pair, as well as a floating point score denoting their semantic similarity which serves as a ground truth.
In our benchmark, the task is to compare the similarity of each sentence with its paired counterpart in the same row. We do so through the following steps:
- Compute the respective embeddings of each sentence pair by passing it through a BERT model
- Calculate cosine similarity between the two embedding vectors
- After having done steps 1 and 2 for all examples in the dataset, take the vector of all cosine similarity scores, and compute Spearman correlation against the vector of all ground truth scores. A strong correlation indicates that the modeling approach is able to accurately represent semantic similarities in the dataset.
- We also compare the time taken for a single pass of the entire dataset using the above. This gives an indication of efficiency of the modeling approach.
- We repeat the steps above using standard and Numenta-optimized BERT models.
Quick Start
Before you start, make sure the NuPIC Inference Server is running, and the Python client/environment is set up.
To run the code, ensure that you are in the nupic/
directory. From there, navigate to the benchmark/
folder and run the Python script.
cd nupic.examples/examples/benchmark
python benchmark.py
Digging a little into the Python code in benchmark.py
, you'll see that it uses our Python client to interact with the Inference Server:
from nupic.client.inference_client import ClientFactory
Expected results
You should expect to see results following the trends below. However, the exact numbers will vary with your hardware setup.
Results and times per model:
+-------------------------------------------+---------------------+---------------+---------------+
| Model | Use Case | Result | Time (s) |
+-------------------------------------------+---------------------+---------------+---------------+
| nupic-sbert-1-v3-wtokenizer | Sentence Similarity |0.83697 |32.70247 |
+-------------------------------------------+---------------------+---------------+---------------+
| nupic-sbert-2-v1-wtokenizer | Sentence Similarity |0.84838 |36.09769 |
+-------------------------------------------+---------------------+---------------+---------------+
| bert-base-cased | Sentence Similarity |0.63346 |227.84145 |
+-------------------------------------------+---------------------+---------------+---------------+
Compared to the standard BERT model (bert-base-cased), NuPIC models are more than 6x faster on CPU, and are also slightly more accurate!
Updated 8 months ago