Benchmarking BERT
The NuPIC Inference Server and optimized BERT models are specifically designed to perform well on AMX- and AVX-compatible CPUs. Don't just take our word for it—check it out for yourself!
How it Works
This benchmark example runs a series of accuracy and performance tests, comparing HuggingFace's standard BERT-base-cased model to NuPIC BERT models on the Semantic Textual Similarity Benchmark (STSB). Each row of the STSB dataset contains a sentence pair, as well as a floating point score denoting their semantic similarity which serves as a ground truth.
In our benchmark, the task is to compare the similarity of each sentence with its paired counterpart in the same row. We do so through the following steps:
- Compute the respective embeddings of each sentence pair by passing it through a BERT model
- Calculate cosine similarity between the two embedding vectors
- After having done steps 1 and 2 for all examples in the dataset, take the vector of all cosine similarity scores, and compute Spearman correlation against the vector of all ground truth scores. A strong correlation indicates that the modeling approach is able to accurately represent semantic similarities in the dataset.
- We also compare the time taken for a single pass of the entire dataset using the above. This gives an indication of efficiency of the modeling approach.
- We repeat the steps above using standard and NuPIC BERT models.
Quick Start
Before you start, make sure the NuPIC Inference Server is running, and the Python client/environment is set up.
To run the code, ensure that you are in the nupic/
directory. From there, navigate to the benchmark/
folder and run the Python script.
cd nupic.examples/examples/benchmark
python benchmark.py
Digging a little into the Python code in benchmark.py
, you'll see that it uses our Python client to interact with the Inference Server:
from nupic.client.inference_client import ClientFactory
Expected results
You should expect to see results following the trends below. However, the exact numbers will vary with your hardware setup.
Results and times per model:
+-------------------------------------------+---------------------+---------------+---------------+
| Model | Use Case | Result | Time (s) |
+-------------------------------------------+---------------------+---------------+---------------+
| bert.base | Sentence Similarity |0.63366 |249.29219 |
+-------------------------------------------+---------------------+---------------+---------------+
| nupic-sbert.base-v3 | Sentence Similarity |0.83668 |16.61641 |
+-------------------------------------------+---------------------+---------------+---------------+
| nupic-sbert.large-v1 | Sentence Similarity |0.84838 |29.87208 |
+-------------------------------------------+---------------------+---------------+---------------+
Compared to the standard BERT model (bert.base), NuPIC's optimized models are more than 8x faster on CPU, and are also more accurate!
We're just getting started! For a more in-depth analysis of BERT model throughputs, refer to this tutorial.
Updated 5 months ago