You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Since the server is one of the goals / highlights of this project. I'm planning to move it into a subpackage e.g. llama-cpp-python[server] or something like that.
Work that needs to be done first:
Ensure compatibility with OpenAI
Response objects match
Request objects match
Loaded model appears under /v1/models endpoint
Test OpenAI client libraries
Unsupported parameters should be silently ignored
Ease-of-use
Integrate server as a subpackage
CLI tool to run the server
Future work
Prompt caching to improve latency
Support multiple models in the same server
Add tokenization endpoints to make it easier to make it easier for small clients to calculate context window sizes
Since the server is one of the goals / highlights of this project. I'm planning to move it into a subpackage e.g.
llama-cpp-python[server]or something like that.Work that needs to be done first:
/v1/modelsendpointTest OpenAI client librariesFuture work