Skip to content

Add RTX 4060 / i7-13700H optimized inference example for Lenovo Legion Slim 5#1

Merged
adelmorad273-cmyk merged 1 commit into
mainfrom
copilot/add-pyright-support
Apr 5, 2026
Merged

Add RTX 4060 / i7-13700H optimized inference example for Lenovo Legion Slim 5#1
adelmorad273-cmyk merged 1 commit into
mainfrom
copilot/add-pyright-support

Conversation

Copy link
Copy Markdown

Copilot AI commented Apr 5, 2026

Adds a ready-to-use inference script tuned for the Lenovo Legion Slim 5 (RTX 4060 8 GB VRAM, i7-13700H, 16 GB DDR5 RAM).

Changes

  • examples/high_level_api/legion_slim5_rtx4060.py — new example with hardware-aware defaults:
    • n_gpu_layers=-1 + offload_kqv=True — full GPU offload with KV-cache on VRAM
    • n_threads=6 — P-cores only for best throughput on the hybrid i7-13700H topology
    • use_mmap=True / use_mlock=False — fast NVMe model loading without pinning the constrained 16 GB RAM
    • n_ctx=4096, n_batch=512 — safe defaults within 8 GB VRAM budget
    • Inline quantisation guide (Q5_K_M / Q6_K recommended; Q8_0 flagged as marginal)

Usage

# Install with CUDA support
CMAKE_ARGS="-DGGML_CUDA=on" pip install llama-cpp-python --force-reinstall --no-cache-dir

# Run inference
python examples/high_level_api/legion_slim5_rtx4060.py \
    -m ./mistral-7b-Q5_K_M.gguf \
    -p "Your prompt here" \
    --max-tokens 256

# Reduce GPU layers if VRAM OOM
python examples/high_level_api/legion_slim5_rtx4060.py \
    -m ./model.gguf --n-gpu-layers 28

Agent-Logs-Url: https://github.com/adelmorad273-cmyk/llama-cpp-python/sessions/8d4c18b1-fddb-4d4d-8d5c-f8e9099eeb94

Co-authored-by: adelmorad273-cmyk <269225024+adelmorad273-cmyk@users.noreply.github.com>
@adelmorad273-cmyk adelmorad273-cmyk marked this pull request as ready for review April 5, 2026 01:35
@adelmorad273-cmyk adelmorad273-cmyk merged commit 8d80ca6 into main Apr 5, 2026
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants