Add RTX 4060 / i7-13700H optimized inference example for Lenovo Legion Slim 5 by Copilot · Pull Request #1 · adelmorad273-cmyk/llama-cpp-python

Copilot · 2026-04-05T01:32:32Z

Adds a ready-to-use inference script tuned for the Lenovo Legion Slim 5 (RTX 4060 8 GB VRAM, i7-13700H, 16 GB DDR5 RAM).

Changes

examples/high_level_api/legion_slim5_rtx4060.py — new example with hardware-aware defaults:
- n_gpu_layers=-1 + offload_kqv=True — full GPU offload with KV-cache on VRAM
- n_threads=6 — P-cores only for best throughput on the hybrid i7-13700H topology
- use_mmap=True / use_mlock=False — fast NVMe model loading without pinning the constrained 16 GB RAM
- n_ctx=4096, n_batch=512 — safe defaults within 8 GB VRAM budget
- Inline quantisation guide (Q5_K_M / Q6_K recommended; Q8_0 flagged as marginal)

Usage

# Install with CUDA support
CMAKE_ARGS="-DGGML_CUDA=on" pip install llama-cpp-python --force-reinstall --no-cache-dir

# Run inference
python examples/high_level_api/legion_slim5_rtx4060.py \
    -m ./mistral-7b-Q5_K_M.gguf \
    -p "Your prompt here" \
    --max-tokens 256

# Reduce GPU layers if VRAM OOM
python examples/high_level_api/legion_slim5_rtx4060.py \
    -m ./model.gguf --n-gpu-layers 28

Agent-Logs-Url: https://github.com/adelmorad273-cmyk/llama-cpp-python/sessions/8d4c18b1-fddb-4d4d-8d5c-f8e9099eeb94 Co-authored-by: adelmorad273-cmyk <269225024+adelmorad273-cmyk@users.noreply.github.com>

Add Legion Slim 5 RTX 4060 optimized inference example

51a065e

Agent-Logs-Url: https://github.com/adelmorad273-cmyk/llama-cpp-python/sessions/8d4c18b1-fddb-4d4d-8d5c-f8e9099eeb94 Co-authored-by: adelmorad273-cmyk <269225024+adelmorad273-cmyk@users.noreply.github.com>

Copilot AI assigned Copilot and adelmorad273-cmyk Apr 5, 2026

Copilot created this pull request from a session on behalf of adelmorad273-cmyk April 5, 2026 01:32 View session

adelmorad273-cmyk marked this pull request as ready for review April 5, 2026 01:35

adelmorad273-cmyk merged commit 8d80ca6 into main Apr 5, 2026
1 check failed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add RTX 4060 / i7-13700H optimized inference example for Lenovo Legion Slim 5#1

Add RTX 4060 / i7-13700H optimized inference example for Lenovo Legion Slim 5#1
adelmorad273-cmyk merged 1 commit into
mainfrom
copilot/add-pyright-support

Copilot AI commented Apr 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented Apr 5, 2026

Changes

Usage

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants