DukunLM - Indonesian Language Model πŸ§™β€β™‚οΈ

πŸš€ Welcome to the DukunLM repository! DukunLM is an open-source language model trained to generate Indonesian text using the power of AI. DukunLM, meaning "WizardLM" in Indonesian, is here to revolutionize language generation with its massive 7 billion parameters! 🌟

Model Details

Open in Google Colab

⚠️ Warning: DukunLM is an uncensored model without filters or alignment. Please use it responsibly as it may contain errors, cultural biases, and potentially offensive content. ⚠️

Installation

To use DukunLM, ensure that PyTorch has been installed and that you have an Nvidia GPU (or use Google Colab). After that you need to install the required dependencies:

pip install -U git+https://github.com/huggingface/transformers.git
pip install -U git+https://github.com/huggingface/peft.git
pip install -U bitsandbytes==0.39.0
pip install -U einops==0.6.1

How to Use

Stream Output

import torch
from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer, BitsAndBytesConfig, TextStreamer

model = AutoPeftModelForCausalLM.from_pretrained(
    "azale-ai/DukunLM-Uncensored-7B",
    load_in_4bit=True,
    torch_dtype=torch.float32,
    trust_remote_code=True,
    quantization_config=BitsAndBytesConfig(
        load_in_4bit=True,
        llm_int8_threshold=6.0,
        llm_int8_has_fp16_weight=False,
        bnb_4bit_compute_dtype=torch.float16,
        bnb_4bit_use_double_quant=True,
        bnb_4bit_quant_type="nf4",
    )
)
tokenizer = AutoTokenizer.from_pretrained("azale-ai/DukunLM-Uncensored-7B")
streamer = TextStreamer(tokenizer)

instruction_prompt = "Jelaskan mengapa air penting bagi kehidupan manusia."
input_prompt = ""

if input_prompt == "":
  text = f"""Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{instruction_prompt}

### Response:
"""
else:
    text = f"""Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{instruction_prompt}

### Input:
{input_prompt}

### Response:
"""

inputs = tokenizer(text, return_tensors="pt").to("cuda")
_ = model.generate(
    inputs=inputs.input_ids,
    streamer=streamer,
    pad_token_id=tokenizer.pad_token_id,
    eos_token_id=tokenizer.eos_token_id,
    max_length=2048, temperature=0.7,
    do_sample=True, top_k=4, top_p=0.95
)

No Stream Output

import torch
from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer, BitsAndBytesConfig

model = AutoPeftModelForCausalLM.from_pretrained(
    "azale-ai/DukunLM-Uncensored-7B",
    load_in_4bit=True,
    torch_dtype=torch.float32,
    trust_remote_code=True,
    quantization_config=BitsAndBytesConfig(
        load_in_4bit=True,
        llm_int8_threshold=6.0,
        llm_int8_has_fp16_weight=False,
        bnb_4bit_compute_dtype=torch.float16,
        bnb_4bit_use_double_quant=True,
        bnb_4bit_quant_type="nf4",
    )
)
tokenizer = AutoTokenizer.from_pretrained("azale-ai/DukunLM-Uncensored-7B")

instruction_prompt = "Bangun dialog chatbot untuk layanan pelanggan yang ingin membantu pelanggan memesan produk tertentu."
input_prompt = "Produk: Sepatu Nike Air Max"

if input_prompt == "":
  text = f"""Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{instruction_prompt}

### Response:
"""
else:
    text = f"""Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{instruction_prompt}

### Input:
{input_prompt}

### Response:
"""

inputs = tokenizer(text, return_tensors="pt").to("cuda")
_ = model.generate(
    inputs=inputs.input_ids,
    pad_token_id=tokenizer.pad_token_id,
    eos_token_id=tokenizer.eos_token_id,
    max_length=2048, temperature=0.7,
    do_sample=True, top_k=4, top_p=0.95
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Limitations

  • The base model language is English and fine-tuned to Indonesia
  • Cultural and contextual biases

License

DukunLM is licensed under the Creative Commons NonCommercial (CC BY-NC 4.0) license.

Contributing

We welcome contributions to enhance and improve DukunLM. If you have any suggestions or find any issues, please feel free to open an issue or submit a pull request.

Contact Us

[email protected]

Downloads last month
31
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train azale-ai/DukunLM-Uncensored-7B

Spaces using azale-ai/DukunLM-Uncensored-7B 3

Collection including azale-ai/DukunLM-Uncensored-7B