Inference Server REST APIs
Since the NuPIC Inference Server is built on top of the Triton Inference Server, we can make use of Triton's REST APIs.
Here are a few that you might frequently use. For simplicity, the examples below assume that the client and Inference Server are running on the same machine. If you are using remote clients, please replace localhost
with your Inference Server's IP address.
Check whether Inference Server is running
curl -v http://localhost:8000/v2/health/live
Expected output:
* Host localhost:8000 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
* Trying 127.0.0.1:8000...
* Connected to localhost (127.0.0.1) port 8000
> GET /v2/health/live HTTP/1.1
> Host: localhost:8000
> User-Agent: curl/8.5.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Content-Length: 0
< Content-Type: text/plain
<
* Connection #0 to host localhost left intact
List all models
curl -X POST -v http://localhost:8000/v2/repository/index
Expected output:
* Host localhost:8000 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Trying [::1]:8000...
* connect to ::1 port 8000 from ::1 port 46418 failed: Connection refused
* Trying 127.0.0.1:8000...
* Connected to localhost (127.0.0.1) port 8000
> POST /v2/repository/index HTTP/1.1
> Host: localhost:8000
> User-Agent: curl/8.5.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Content-Type: application/json
< Content-Length: 1908
<
{ [1908 bytes data]
100 1908 100 1908 0 0 1321k 0 --:--:-- --:--:-- --:--:-- 1863k
* Connection #0 to host localhost left intact
[
{
"name": "bert-base-cased-tokenizer",
"version": "1",
...
{
"name": "zephyr-7b-v0-wtokenizer"
}
]
Load a model
curl -X POST http://localhost:8000/v2/repository/models/nupic-sbert.base-v3/load
Get model metadata
curl http://localhost:8000/v2/models/nupic-sbert.base-v3
Expected output showing input payload structure:
{
"name": "nupic-sbert.base-v3",
"versions": [
"1"
],
"platform": "ensemble",
# Expected input payload structure
"inputs": [
{
"name": "TEXT",
"datatype": "BYTES",
"shape": [
-1
]
}
],
"outputs": [
{
"name": "encodings",
"datatype": "FP32",
"shape": [
-1,
-1
]
}
]
}
Inference
Create payload.json
based on model metadata:
{
"inputs": [
{
"name": "TEXT",
"shape": [1],
"datatype": "BYTES",
"data": ["This is a test sentence."]
}
]
}
Send payload:
curl -X POST -H "Content-Type: application/json" \
-d @payload.json \
http://localhost:8000/v2/models/nupic-sbert.base-v3/infer
Expected output:
{
"model_name": "nupic-sbert.base-v3",
"model_version": "1",
"parameters": {
"sequence_id": 0,
"sequence_start": false,
"sequence_end": false
},
"outputs": [
{
"name": "encodings",
"datatype": "FP32",
"shape": [
1,
768
],
"data": [
0.49580085277557375,
0.1195397824048996,
0.06556889414787293,
-0.42177122831344607,
...
Unload a model
curl -X POST http://localhost:8000/v2/repository/models/nupic-sbert.base-v3/unload
Updated 7 months ago