Australia/Sydney
BlogMay 6, 2024

How to Run Exl2 LLMs Locally for Fast Speed

Fahd Mirza

 This video shows how to install exllamav2 locally and run any model in exl2 format locally.




Code:

pip install huggingface_hub

huggingface-cli login


mkdir llama38b

cd llama38b

huggingface-cli download hjhj3168/Llama-3-8b-Orthogonalized-exl2 --local-dir llama38b --local-dir-use-symlinks False


cd ..


git clone https://github.com/turboderp/exllamav2

cd exllamav2


conda create -n exl2 python=3.11

conda activate exl2


pip install -r requirements.txt

pip install .


python test_inference.py -m /home/ubuntu/llama38b/ -p "To travel without ticket in train,"

Share this post:
On this page

Let's Partner

If you are looking to build, deploy or scale AI solutions — whether you're just starting or facing production-scale challenges — let's chat.

Subscribe to Fahd's Newsletter

Weekly updates on AI, cloud engineering, and tech innovations