Australia/Sydney
BlogOctober 19, 2023

Step by Step Mistral 7B Installation Local on Linux Windows or in Cloud

Fahd Mirza

 This is detailed tutorial as how to locally install Mistral 7B model in AWS, Linux, Windows, or anywhere you like.





Commands Used:


pip3 install optimum

pip3 install git+https://github.com/huggingface/transformers.git@72958fcd3c98a7afdc61f953aa58c544ebda2f79


git clone https://github.com/PanQiWei/AutoGPTQ

cd AutoGPTQ

git checkout v0.4.2

pip3 install .



from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline


model_name_or_path = "TheBloke/SlimOpenOrca-Mistral-7B-GPTQ"

# To use a different branch, change revision

# For example: revision="gptq-4bit-32g-actorder_True"


model = AutoModelForCausalLM.from_pretrained(model_name_or_path,

                                             device_map="auto",

                                             trust_remote_code=False,

                                             revision="main")


tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)


system_message = "You are an expert at bathroom renovations."

prompt = """

Renovate the following old bathroom:

I have a 25 year old house with an old bathroom. I want to renovate it completely. 

Think about it step by step, and give me steps to renovate the bathroom. Also give me cost of every step in Australian dollars.

"""


prompt_template=f'''<|im_start|>system

{system_message}<|im_end|>

<|im_start|>user

{prompt}<|im_end|>

<|im_start|>assistant

'''


print("\ \ *** Generate:")


input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()

output = model.generate(inputs=input_ids, temperature=0.7, do_sample=True, top_p=0.95, top_k=40, max_new_tokens=512)

print(tokenizer.decode(output[0]))


# Inference can also be done using transformers' pipeline


print("*** Pipeline:")

pipe = pipeline(

    "text-generation",

    model=model,

    tokenizer=tokenizer,

    max_new_tokens=512,

    do_sample=True,

    temperature=0.7,

    top_p=0.95,

    top_k=40,

    repetition_penalty=1.1

)


print(pipe(prompt_template)[0]['generated_text'])

Share this post:
On this page

Let's Partner

If you are looking to build, deploy or scale AI solutions — whether you're just starting or facing production-scale challenges — let's chat.

Subscribe to Fahd's Newsletter

Weekly updates on AI, cloud engineering, and tech innovations