Kaggle Model Upload Made Easy. How to upload your models on Kaggle… | by Imran Roshan

Learn how to add your fashions on Kaggle with out something fancy

By enabling you to distribute your skilled machine studying fashions to a bigger viewers, Kaggle fashions promote innovation, teamwork, and data trade. You possibly can add to the collective assortment of machine studying sources by importing your mannequin to Kaggle Fashions, which can assist others with their tasks and research.

However hey! you may be pondering that this shall require toolsets past funds of my pockets. Properly, taking a look at my account statements, I believed the identical. Seems you do not want a penny to get began!!

Importing fashions as Datasets vs Kaggle Mannequin add

Why add fashions as datasets?

No direct APIs: As of proper now, Kaggle doesn’t provide a direct API for importing fashions to the Fashions part. You possibly can share your mannequin recordsdata with others underneath the Datasets part, which is the only strategy to automate the add. The add for the “Fashions” space nonetheless must be accomplished by hand.
Distributing Pretrained Fashions: Pretrained weights and configurations are primarily shared when a mannequin is uploaded as a dataset. The mannequin recordsdata (config.json, vocab.json, and safetensors.json) may be downloaded by customers and loaded straight into their very own functions.
Mannequin Reusability in Kaggle Kernels: The mannequin can be utilized in Kaggle notebooks (kernels) by you or by different Kaggle customers when it has been posted as a dataset. Utilizing Kaggle’s strong infrastructure for coaching or inference is one utilization for this. Making it easy for others to make use of the mannequin of their Kaggle tasks.
Ease of Entry: Customers can obtain recordsdata or hyperlink datasets to their Kaggle notebooks with ease utilizing Kaggle’s Datasets part. Working with fashions in a collaborative setting the place notebooks and datasets are absolutely related is made simpler by this.
File Dimension: Since Kaggle’s Datasets part makes it easy to avoid wasting and distribute bigger recordsdata (as much as 20 GB), some customers add their fashions as datasets.

Why add fashions underneath Fashions part?

For structured machine studying fashions, versioning, tagging, and monitoring metrics (similar to accuracy, loss, and many others.) are included within the Fashions part.
If you would like individuals to have the ability to monitor updates, give enter on numerous iterations, or actively interact along with your mannequin, the Fashions space is a greater match.

What’s the last conclusion?

If you wish to share the mannequin with others so they might use it for extra functions (such fine-tuning, loading for inference, or integrating into tasks), publishing it to Kaggle as a dataset remains to be a viable and useful technique. Importing manually to the Fashions space is a greater choice, although, should you’re looking for for structured mannequin administration with variations, mannequin analysis, and direct monitoring of mannequin utilization.

You will want to manually add the recordsdata by means of the Kaggle Fashions interface should you solely need the mannequin within the Fashions part for issues like versioning and metrics.

Why do you have to contribute?

Cooperation: Make connections with different machine studying aficionados and information scientists who could provide strategies, evaluation, and doable enhancements.
Acknowledgment: Set up your repute within the machine studying group by showcasing your data. That is your likelihood to be Batman.
Group Contribution: By permitting others to make use of your fashions, it’s possible you’ll assist the sphere of machine studying develop.
Potentialities for Studying: Make the most of the insightful conversations and suggestions that your mannequin generates.

We’ll stroll you thru your complete strategy of publishing your mannequin to Kaggle Fashions on this weblog publish, in order that the broader group may even see your priceless work. We will be utilizing two strategies to make our approach by means of this:

Standalone — Immediately importing our fashions from our native environments/Kaggle notebooks.
Using Huggingface — Utilizing a intermediary to attain our aim.

Pre requisites

Google Colab
Huggingface account
Kaggle account
Brains

Course of

For this instance we will be utilizing Unsloth (https://github.com/unslothai/unsloth) and following the easy directions supplied to make our personal mannequin/chatbot.

Allow us to initialize the environment and get began with the set up. Putting in the unsloth package deal and configuring it particularly for Google Colab setups is step one. This package deal can optimize massive language fashions. The put in model of PyTorch, a widely known deep studying framework, is then examined. It installs a known-to-be-compatible model of the xformers package deal if the model is older than 2.4.0. Lastly, it installs quite a lot of extra packages which can be most likely related to {hardware} acceleration, optimization, and transformers — one other form of deep studying structure.

%%seize
# Installs Unsloth, Xformers (Flash Consideration) and all different packages!
!pip set up "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"# We now have to verify which Torch model for Xformers (2.3 -> 0.0.27)
from torch import __version__; from packaging.model import Model as V
xformers = "xformers==0.0.27" if V(__version__) < V("2.4.0") else "xformers"
!pip set up --no-deps {xformers} trl peft speed up bitsandbytes triton

Utilizing the FastLanguageModel class from the unsloth library, we now load “Meta-Llama-3.1–8B” giant language mannequin (LLM) from the Hugging Face library. Moreover, a tokenizer is constructed as much as translate textual content right into a format that the mannequin can comprehend. By choosing the precise information kind and using 4-bit quantization, the code maximizes reminiscence utilization. Ultimately, it retrieves the tokenizer and the mannequin for later utilization.

from unsloth import FastLanguageModel
import torch
max_seq_length = 2048 # Select any! We auto assist RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to cut back reminiscence utilization. Might be False.
# 4bit pre quantized fashions we assist for 4x sooner downloading + no OOMs.
fourbit_models = [
"unsloth/mistral-7b-v0.3-bnb-4bit",      # New Mistral v3 2x faster!
"unsloth/mistral-7b-instruct-v0.3-bnb-4bit",
"unsloth/llama-3-8b-bnb-4bit",           # Llama-3 15 trillion tokens model 2x faster!
"unsloth/llama-3-8b-Instruct-bnb-4bit",
"unsloth/llama-3-70b-bnb-4bit",
"unsloth/Phi-3-mini-4k-instruct",        # Phi-3 2x faster!
"unsloth/Phi-3-medium-4k-instruct",
"unsloth/mistral-7b-bnb-4bit",
"unsloth/gemma-7b-bnb-4bit",             # Gemma 2.2x faster!
] # Extra fashions at https://huggingface.co/unslothmannequin, tokenizer = FastLanguageModel.from_pretrained(
model_name = "unsloth/Meta-Llama-3.1-8B",
max_seq_length = max_seq_length,
dtype = dtype,
load_in_4bit = load_in_4bit,
# token = "hf_...", # use one if utilizing gated fashions like meta-llama/Llama-2-7b-hf
)

Making use of the PEFT (Parameter-Environment friendly High quality-Tuning) approach to initialize a fine-tuned language mannequin subsequent. Utilizing an present language mannequin as enter, we reduce the quantity of trainable parameters whereas adapting the mannequin to a brand new job by making use of LoRA (Low-Rank Adaptation) to sure layers (q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj). Hyperparameters together with the bias setting, dropout fee, and rank of the LoRA matrices are specified by the code. As well as, we set the random state for reproducibility and allow gradient checkpointing for reminiscence effectivity.

mannequin = FastLanguageModel.get_peft_model(
mannequin,
r = 16, # Select any quantity > 0 ! Prompt 8, 16, 32, 64, 128
target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj",],
lora_alpha = 16,
lora_dropout = 0, # Helps any, however = 0 is optimized
bias = "none",    # Helps any, however = "none" is optimized
# [NEW] "unsloth" makes use of 30% much less VRAM, suits 2x bigger batch sizes!
use_gradient_checkpointing = "unsloth", # True or "unsloth" for very lengthy context
random_state = 3407,
use_rslora = False,  # We assist rank stabilized LoRA
loftq_config = None, # And LoftQ
)

Its time to add some information, we received’t dive deep with the reason of a easy dataset import from Google Drive. Right here’s the code snippet:

Onto the subsequent step, the next code helps us to be iterating over question-answer pairs, alpaca_prompt codecs them into a specific immediate template and appends an Finish-of-Sequence (EOS) token (EOS_TOKEN) to point that the specified response has ended. Lastly, it batches (batched=True) the formatting operate (formatting_prompts_func) over the entire dataset. This will get the information prepared for coaching the LLM to supply appropriate solutions in response to cues.

import pandas as pd
from datasets import load_dataset
from datasets import Datasetalpaca_prompt = """Beneath is an instruction that describes a job, paired with an enter that gives additional context. Write a response that appropriately completes the request.
### Enter:
{}
### Response:
{}"""
EOS_TOKEN = tokenizer.eos_token # Should add EOS_TOKEN
def formatting_prompts_func(examples):
inputs       = examples["question"]
outputs      = examples["answer"]
texts = []
for enter, output in zip(inputs, outputs):
# Should add EOS_TOKEN, in any other case your technology will go on ceaselessly!
textual content = alpaca_prompt.format(enter, output) + EOS_TOKEN
texts.append(textual content)
#print(texts)
return { "textual content" : texts, }
go
dataset = dataset.map(formatting_prompts_func, batched = True,)

Printing our dataset:

We proceed with initializing an occasion of SFTTrainer to start optimizing the language mannequin. The mannequin and tokenizer are loaded utilizing the trl library and transformers, and coaching settings together with batch measurement, gradient accumulation steps, studying fee, and optimizer specified. Moreover, relying on {hardware} assist, the algorithm decides whether or not to make use of combined precision coaching utilizing fp16 or bf16 so we don’t have to fret about that no extra. Sure coaching elements, such the amount of coaching steps, frequency of logging, and output listing, are outlined with the coach.

from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supportedcoach = SFTTrainer(
mannequin = mannequin,
tokenizer = tokenizer,
train_dataset = dataset,
dataset_text_field = "textual content",
max_seq_length = max_seq_length,
dataset_num_proc = 2,
packing = False, # Could make coaching 5x sooner for brief sequences.
args = TrainingArguments(
per_device_train_batch_size = 2,
gradient_accumulation_steps = 4,
warmup_steps = 5,
max_steps = 20,
learning_rate = 2e-4,
fp16 = not is_bfloat16_supported(),
bf16 = is_bfloat16_supported(),
logging_steps = 1,
optim = "adamw_8bit",
weight_decay = 0.01,
lr_scheduler_type = "linear",
seed = 3407,
output_dir = "outputs",
),
)

FastLanguageModel.for_inference(mannequin) # Allow native 2x sooner inference

Now lastly, we load a pre-trained language mannequin known as “lora_model” into reminiscence and import the FastLanguageModel class from the unsloth library. Explicit parameters for quantization, information kind, and sequence size are arrange within the mannequin. The mannequin is sped up through inference optimization after loading. Subsequent, we use a tokenizer to arrange an enter immediate, which is then fed into the mannequin to be generated. The tokenizer decodes the response that the mannequin creates, returning textual content within the course of.

if False:
from unsloth import FastLanguageModel
mannequin, tokenizer = FastLanguageModel.from_pretrained(
model_name = "lora_model", # YOUR MODEL YOU USED FOR TRAINING
max_seq_length = max_seq_length,
dtype = dtype,
load_in_4bit = load_in_4bit,
)
FastLanguageModel.for_inference(mannequin) # Allow native 2x sooner inference# alpaca_prompt = You MUST copy from above!
inputs = tokenizer(
[
alpaca_prompt.format(
"Explain the concept of inertia with an example", # instruction
"Explain the concept of inertia with an example", # input
"", # output - leave this blank for generation!
)
], return_tensors = "pt").to("cuda")
outputs = mannequin.generate(**inputs, max_new_tokens = 64, use_cache = True)
tokenizer.batch_decode(outputs)

Now that we’re accomplished with our inventive course of. Its time to add.

Importing our mannequin on to Huggingface

Be sure you notice down the very intricate code under:

mannequin.push_to_hub_merged("os_lora_chat_model", tokenizer, save_method = "merged_16bit", token = "<your-token>")

Phew! that was an enormous chunk (sarcastically).

Importing to Kaggle Fashions

Importing as a dataset

The next code supplies a vivid instruction on easy methods to add the mannequin as a dataset from Huggingface with out the necessity to obtain the recordsdata and making it heavy on your self and your pc.

Begin off with producing a Kaggle token and importing the identical on Colab recordsdata:

You are able to do this by clicking in your Profile icon > Settings > API > Create New Token — This shall offer you a brand new JSON token.

# Step 1: Set up vital libraries
!pip set up -q kaggle transformers# Step 2: Load and Save the Mannequin Utilizing Optimizations
import torch
from transformers import AutoModel, AutoTokenizer
# Outline mannequin identify and listing
model_name = "<your huggingface mannequin ID>"
save_directory = "./huggingface_model"
# Use CPU and half-precision to avoid wasting reminiscence
machine = torch.machine('cpu')
# Load the mannequin and tokenizer
mannequin = AutoModel.from_pretrained(model_name, torch_dtype=torch.float16).to(machine)
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Save the mannequin and tokenizer
mannequin.save_pretrained(save_directory)
tokenizer.save_pretrained(save_directory)
# Step 3: Authenticate with Kaggle: Ensure to retailer your Kaggle token on Colab recordsdata
from google.colab import recordsdata
# Add your kaggle.json file right here
recordsdata.add()
# Transfer it to the right listing
!mkdir -p ~/.kaggle
!mv kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json
# Step 4: Create and Add a Dataset on Kaggle
# Create a dataset metadata file
import os
dataset_metadata = {
"title": "Wonderful LoRA Chat Mannequin",
"id": "<your huggingface mannequin ID>",  # Make sure that the ID makes use of solely lowercase letters, numbers, and hyphens
"licenses": [{"name": "CC0-1.0"}]
}
# Create the listing for the metadata file
os.makedirs(save_directory, exist_ok=True)
# Write the metadata to a JSON file
with open(os.path.be a part of(save_directory, "dataset-metadata.json"), "w") as f:
import json
json.dump(dataset_metadata, f, indent=4)
# Add the dataset to Kaggle
!kaggle datasets create -p ./huggingface_model
# Step 5: Confirm the Add
# Checklist your datasets to confirm (no '-u' flag is required right here)
!kaggle datasets record --mine

VOILA!! you simply printed your dataset!!

Importing as a Mannequin

We will make the most of KaggleHub for this operation. Beginning off by initializing the kaggle credentials

Following the directions supplied by Kagglehub beginning information. Be at liberty to vary the framework to your want and customise in accordance with your necessities.

Closing output on efficiently publishing the mannequin

It might sound daunting at first however hey! that was easy proper? Go forward choose your poison and get going!

https://linktr.ee/imranfosec

Source link

Feature Caching for Recommender Systems w/ Cachelib | by Pinterest Engineering | Pinterest Engineering Blog | Sep, 2024

Title: How Pavlov and Markov Were Closer Than They Knew: A Journey from Conditioned Responses to the Free Energy Principle | by Graham Wallington | Sep, 2024

Principal Component Analysis (PCA) in Machine Learning | by Dossier Analysis | Sep, 2024

Leave A Reply Cancel Reply

Feature Caching for Recommender Systems w/ Cachelib | by Pinterest Engineering | Pinterest Engineering Blog | Sep, 2024

I switched to the iPhone 16 from an iPhone 15, and the upgrade was bigger than expected

Mastering SQL for Data Engineering: Part I

Title: How Pavlov and Markov Were Closer Than They Knew: A Journey from Conditioned Responses to the Free Energy Principle | by Graham Wallington | Sep, 2024

Through the Uncanny Mirror: Do LLMs Remember Like the Human Mind? | by Salvatore Raieli | Sep, 2024

Most Popular

The Hamas Threat of Hostage Execution Videos Looms Large Over Social Media

Revolutionizing the Way We Find Love

Federal Investigators Widen Tesla Inquiry, Company Says

Our Picks

Feature Caching for Recommender Systems w/ Cachelib | by Pinterest Engineering | Pinterest Engineering Blog | Sep, 2024

I switched to the iPhone 16 from an iPhone 15, and the upgrade was bigger than expected

Mastering SQL for Data Engineering: Part I

Kaggle Model Upload Made Easy. How to upload your models on Kaggle… | by Imran Roshan | Sep, 2024

Importing fashions as Datasets vs Kaggle Mannequin add

Why do you have to contribute?

Pre requisites

Course of

Importing our mannequin on to Huggingface

Importing to Kaggle Fashions

Importing as a dataset

Importing as a Mannequin

Related Posts

Leave A Reply Cancel Reply