Australia/Sydney
BlogSeptember 7, 2023

Falcon-180B Local Installation on Linux or Windows - Step by Step

Fahd Mirza

 This is an installation tutorial of Falcon-180B model locally on Linux or Windows with all the steps.




Commands Used:


pip3 install transformers>=4.33.0 optimum>=1.12.0


!git clone https://github.com/PanQiWei/AutoGPTQ

cd AutoGPTQ

! git checkout a7167b1

!pip3 install .



from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline


model_name_or_path = "TheBloke/Falcon-180B-Chat-GPTQ"


# To use a different branch, change revision

# For example: revision="gptq-3bit--1g-actorder_True"

model = AutoModelForCausalLM.from_pretrained(model_name_or_path,

                                             device_map="auto",

                                             revision="main")


tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)


prompt = "What is capital of Australia"

prompt_template=f'''User: {prompt}

Assistant: '''


print("\ \ *** Generate:")


input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()

output = model.generate(inputs=input_ids, do_sample=True, temperature=0.7, max_new_tokens=512)

print(tokenizer.decode(output[0]))


# Inference can also be done using transformers' pipeline


print("*** Pipeline:")

pipe = pipeline(

    "text-generation",

    model=model,

    tokenizer=tokenizer,

    max_new_tokens=512,

    temperature=0.7,

    do_sample=True,

    top_p=0.95,

    repetition_penalty=1.15

)


print(pipe(prompt_template)[0]['generated_text'])

Share this post:
On this page

Let's Partner

If you are looking to build, deploy or scale AI solutions — whether you're just starting or facing production-scale challenges — let's chat.

Subscribe to Fahd's Newsletter

Weekly updates on AI, cloud engineering, and tech innovations