LLM Local Inference

Local Inference

These are wrappers over llama.cpp

Lambdafile
Ollama
LlamaEdge: running models via WebAssembly

Compiling llama.cpp

With Vulkan

Install the Vulkan SDK.

set VULKAN_SDK=C:\VulkanSDK\1.3.275.0
mkdir build
cd build
cmake .. --fresh -DLLAMA_AVX512=on -DLLAMA_AVX512_VBMI=on -DLLAMA_AVX512_VNNI=on -DLLAMA_VULKAN=on
cmake --build . --config Release

Test performance by:

main.exe -m "d:/downloads/llama-2-7b-chat.Q4_K_M.gguf" -p "Hi you how are you" -ngl 33
./llama-bench -m "d:/downloads/llama-2-7b-chat.Q4_K_M.gguf" -p 3968

Experiment: trying to get AOCL support working but it doesn’t compile:

-DLLAMA_BLAS_VENDOR=AOCL-mt -DLLAMA_BLAS=ON -DBLAS_LIBRARIES="C:\Program Files\AMD\AOCL-Windows\amd-blis\lib\LP64\AOCL-LibBlis-Win-dll.dll" -DBLAS_INCLUDE_DIRS="C:\Program Files\AMD\AOCL-Windows\amd-blis\include\LP64"

ROCm

Didn’t manage to get this to work, since AMD doesn’t support the 7840U yet with ROCm.

Install visual studio C++ workload, and select the Clang tools.

Install ROCM (unchecking the visual studio extension as that causes an error), and Pyrl as well.

cmake.exe .. --fresh -G "Ninja" -DCMAKE_BUILD_TYPE=Release -DLLAMA_HIPBLAS=on -DLLAMA_AVX512_VBMI=on -DLLAMA_AVX512_VNNI=on -DCMAKE_C_COMPILER="clang.exe" -DCMAKE_CXX_COMPILER="clang++.exe" -DAMDGPU_TARGETS="gfx1100" -DCMAKE_PREFIX_PATH="C:\Program Files\AMD\ROCm\5.7"
 
cmake --build . --config Release

Trying to run it results in an error saying that the tensor for GFX1103 isn’t found. The commonly suggested solution to set HSA_OVERRIDE_GFX_VERSION=11.0.0 only works on Linux.

Personal Knowledge Base

Explorer

LLM Local Inference

Local Inference

Compiling llama.cpp

With Vulkan

ROCm

Frontends

Models

Table of Contents

Graph View

Personal Knowledge Base

Explorer

LLM Local Inference

Local Inference

Compiling llama.cpp

With Vulkan

ROCm

Frontends

Models

Related

Table of Contents

Graph View