Prerequisites
- Service Token: Sign up on the Nscale platform to create your service token.
- Model Selection: Choose a chat model from Nscale’s library.
- Example: Llama 3.1 8B Instruct
meta-llama/Llama-3.1-8B-Instruct
- Example: Llama 3.1 8B Instruct
Step 1: Set up your environment
Before making requests, ensure you have the necessary tools installed for your language of choice: For Python: Install openai libraryStep 2: Sending an inference request
Let’s walk through an example where we summarise a blog post into 100 words. Request structure Each request to the Nscale Chat Completions API endpoint should include the following:- Headers:
"Authorization": "Bearer <SERVICE-TOKEN>"
"Content-Type": "application/json"
- Payload:
"model"
:"<model id e.g., meta-llama/Llama-3.1-8B-Instruct>"
"messages"
:"<array of messages to send to the model>"
Step 3: Understanding the response
The API will return a JSON object containing the model’s output and token usage: Example Response:choices
: An array of message objects containing the model’s output.usage
: An object containing the input (prompt_tokens), output (completion_tokens), and total number of tokens used.
Step 4: Using the CLI for Chat Inferencing
You can also use the Nscale CLI to interact with chat models. This is a convenient way to test models or build command-line applications.Prerequisites
- Ensure you have the Nscale CLI installed
Examples
Here are some examples of using the CLI for chat inferencing:Step 5: Monitoring and scaling
Nscale handles scaling automatically based on traffic patterns—no manual intervention needed! Use the Nscale Console to monitor:- API usage by model
- Spend breakdowns
Troubleshooting
Common status codes and their meanings:Status | Description | Response Format |
---|---|---|
200 | Success (synchronous) | application/json response with completion |
201 | Success (streaming) | text/event-stream with delta updates |
401 | Invalid service token or unauthorized | Error object |
404 | Model not found or unavailable | Error object |
429 | Insufficient credit | Error object |
500 | Internal server error | Error object |
503 | Service temporarily unavailable | Error object |
Success Response Format (200)
Error Response Format
Contact Support
Need assistance? Get help from our support team