Guides
Log In
Guides

Inference Server REST APIs

Since the NuPIC Inference Server is built on top of the Triton Inference Server, we can make use of Triton's REST APIs.

Here are a few that you might frequently use. For simplicity, the examples below assume that the client and Inference Server are running on the same machine. If you are using remote clients, please replace localhost with your Inference Server's IP address.

Check whether Inference Server is running

curl -v http://localhost:8000/v2/health/live

Expected output:

* Host localhost:8000 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
* Trying 127.0.0.1:8000...
* Connected to localhost (127.0.0.1) port 8000
> GET /v2/health/live HTTP/1.1
> Host: localhost:8000
> User-Agent: curl/8.5.0
> Accept: */*
> 
< HTTP/1.1 200 OK
< Content-Length: 0
< Content-Type: text/plain
< 
* Connection #0 to host localhost left intact

List all models

curl -X POST -v http://localhost:8000/v2/repository/index

Expected output:

* Host localhost:8000 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying [::1]:8000...
* connect to ::1 port 8000 from ::1 port 46418 failed: Connection refused
*   Trying 127.0.0.1:8000...
* Connected to localhost (127.0.0.1) port 8000
> POST /v2/repository/index HTTP/1.1
> Host: localhost:8000
> User-Agent: curl/8.5.0
> Accept: */*
> 
< HTTP/1.1 200 OK
< Content-Type: application/json
< Content-Length: 1908
< 
{ [1908 bytes data]
100  1908  100  1908    0     0  1321k      0 --:--:-- --:--:-- --:--:-- 1863k
* Connection #0 to host localhost left intact
[
  {
    "name": "bert-base-cased-tokenizer",
    "version": "1",
...
  {
    "name": "zephyr-7b-v0-wtokenizer"
  }
]

Load a model

curl -X POST http://localhost:8000/v2/repository/models/nupic-sbert.base-v3/load

Get model metadata

 curl http://localhost:8000/v2/models/nupic-sbert.base-v3 

Expected output showing input payload structure:

{
  "name": "nupic-sbert.base-v3",
  "versions": [
    "1"
  ],
  "platform": "ensemble",
    
  # Expected input payload structure
  "inputs": [
    {
      "name": "TEXT",
      "datatype": "BYTES",
      "shape": [
        -1
      ]
    }
  ],
  "outputs": [
    {
      "name": "encodings",
      "datatype": "FP32",
      "shape": [
        -1,
        -1
      ]
    }
  ]
}

Inference

Create payload.json based on model metadata:

{
    "inputs": [
        {
            "name": "TEXT",
            "shape": [1],
            "datatype": "BYTES",
            "data": ["This is a test sentence."]
        }
    ]
}

Send payload:

curl -X POST -H "Content-Type: application/json" \
    -d @payload.json \
    http://localhost:8000/v2/models/nupic-sbert.base-v3/infer

Expected output:

{
  "model_name": "nupic-sbert.base-v3",
  "model_version": "1",
  "parameters": {
    "sequence_id": 0,
    "sequence_start": false,
    "sequence_end": false
  },
  "outputs": [
    {
      "name": "encodings",
      "datatype": "FP32",
      "shape": [
        1,
        768
      ],
      "data": [
        0.49580085277557375,
        0.1195397824048996,
        0.06556889414787293,
        -0.42177122831344607,
...

Unload a model

curl -X POST http://localhost:8000/v2/repository/models/nupic-sbert.base-v3/unload