Local Inference
These are wrappers over llama.cpp
- Lambdafile
- Ollama
- LlamaEdge: running models via WebAssembly
Compiling llama.cpp
With Vulkan
Install the Vulkan SDK.
set VULKAN_SDK=C:\VulkanSDK\1.3.275.0
mkdir build
cd build
cmake .. --fresh -DLLAMA_AVX512=on -DLLAMA_AVX512_VBMI=on -DLLAMA_AVX512_VNNI=on -DLLAMA_VULKAN=on
cmake --build . --config ReleaseTest performance by:
main.exe -m "d:/downloads/llama-2-7b-chat.Q4_K_M.gguf" -p "Hi you how are you" -ngl 33
./llama-bench -m "d:/downloads/llama-2-7b-chat.Q4_K_M.gguf" -p 3968Experiment: trying to get AOCL support working but it doesn’t compile:
-DLLAMA_BLAS_VENDOR=AOCL-mt -DLLAMA_BLAS=ON -DBLAS_LIBRARIES="C:\Program Files\AMD\AOCL-Windows\amd-blis\lib\LP64\AOCL-LibBlis-Win-dll.dll" -DBLAS_INCLUDE_DIRS="C:\Program Files\AMD\AOCL-Windows\amd-blis\include\LP64"ROCm
Didn’t manage to get this to work, since AMD doesn’t support the 7840U yet with ROCm.
Install visual studio C++ workload, and select the Clang tools.
Install ROCM (unchecking the visual studio extension as that causes an error), and Pyrl as well.
cmake.exe .. --fresh -G "Ninja" -DCMAKE_BUILD_TYPE=Release -DLLAMA_HIPBLAS=on -DLLAMA_AVX512_VBMI=on -DLLAMA_AVX512_VNNI=on -DCMAKE_C_COMPILER="clang.exe" -DCMAKE_CXX_COMPILER="clang++.exe" -DAMDGPU_TARGETS="gfx1100" -DCMAKE_PREFIX_PATH="C:\Program Files\AMD\ROCm\5.7"
cmake --build . --config ReleaseTrying to run it results in an error saying that the tensor for GFX1103 isn’t found. The commonly suggested solution to set HSA_OVERRIDE_GFX_VERSION=11.0.0 only works on Linux.
Other relevant links: