BlogMay 6, 2024

How to Run Exl2 LLMs Locally for Fast Speed

Fahd Mirza

This video shows how to install exllamav2 locally and run any model in exl2 format locally.

Code:

pip install huggingface_hub

huggingface-cli login

mkdir llama38b

cd llama38b

huggingface-cli download hjhj3168/Llama-3-8b-Orthogonalized-exl2 --local-dir llama38b --local-dir-use-symlinks False

cd ..

git clone https://github.com/turboderp/exllamav2

cd exllamav2

conda create -n exl2 python=3.11

conda activate exl2

pip install -r requirements.txt

pip install .

python test_inference.py -m /home/ubuntu/llama38b/ -p "To travel without ticket in train,"

Share this post:

Recent posts

On this page

If you are looking to build, deploy or scale AI solutions — whether you're just starting or facing production-scale challenges — let's chat.

Weekly updates on AI, cloud engineering, and tech innovations