Standalone Server

Since the server is one of the goals / highlights of this project. I'm planning to move it into a subpackage e.g. `llama-cpp-python[server]` or something like that.

Work that needs to be done first:
- [x] Ensure compatibility with OpenAI
  - [x] Response objects match
  - [x] Request objects match
  - [x] Loaded model appears under `/v1/models` endpoint
  - [x] ~~Test OpenAI client libraries~~
  - [x] Unsupported parameters should be silently ignored
- [x] Ease-of-use
  - [x] Integrate server as a subpackage
  - [x] CLI tool to run the server
  
Future work
- [ ] Prompt caching to improve latency
- [ ] Support multiple models in the same server
- [ ] Add tokenization endpoints to make it easier to make it easier for small clients to calculate context window sizes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Standalone Server #21

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Standalone Server #21

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions