Research on Perturbed Gradient Descent part1(Machine Learning 2024) | by Monodeep Mukherjee

1.PAC-tuning:Advantageous-tuning Pretrained Language Fashions with PAC-driven Perturbed Gradient Descent

Authors: Guangliang Liu, Zhiyu Xue, Xitong Zhang, Kristen Marie Johnson, Rongrong Wang

Summary: Advantageous-tuning pretrained language fashions (PLMs) for downstream duties is a large-scale optimization downside, wherein the selection of the coaching algorithm critically determines how effectively the skilled mannequin can generalize to unseen check knowledge, particularly within the context of few-shot studying. To realize good generalization efficiency and keep away from overfitting, strategies equivalent to knowledge augmentation and pruning are sometimes utilized. Nonetheless, including these regularizations necessitates heavy tuning of the hyperparameters of optimization algorithms, equivalent to the favored Adam optimizer. On this paper, we suggest a two-stage fine-tuning methodology, PAC-tuning, to deal with this optimization problem. First, primarily based on PAC-Bayes coaching, PAC-tuning straight minimizes the PAC-Bayes generalization sure to be taught correct parameter distribution. Second, PAC-tuning modifies the gradient by injecting noise with the variance discovered within the first stage into the mannequin parameters throughout coaching, leading to a variant of perturbed gradient descent (PGD). Prior to now, the few-shot state of affairs posed difficulties for PAC-Bayes coaching as a result of the PAC-Bayes sure, when utilized to massive fashions with restricted coaching knowledge, may not be stringent. Our experimental outcomes throughout 5 GLUE benchmark duties exhibit that PAC-tuning efficiently handles the challenges of fine-tuning duties and outperforms robust baseline strategies by a visual margin, additional confirming the potential to use PAC coaching for another settings the place the Adam optimizer is presently used for coaching

Source link

Joyce’s picks: musings and readings in AI/ML, May 20, 2024 | by joyce shen | May, 2024

Exploring the Benefits of FP16 in Model Weights Quantization | by Mithilesh Biradar | May, 2024

Overwhelmed by Data? How to Combat Information Overload and Stay Focused | by Nowigence | May, 2024

Leave A Reply Cancel Reply

Microsoft rebuilt Windows 11 around AI and Arm chips

Joyce’s picks: musings and readings in AI/ML, May 20, 2024 | by joyce shen | May, 2024

I Make AI Models to Sell Real People Clothes

Microsoft Paint is getting an AI-powered image generator that responds to your text prompts and doodles

Exploring the Benefits of FP16 in Model Weights Quantization | by Mithilesh Biradar | May, 2024

Most Popular

The Hamas Threat of Hostage Execution Videos Looms Large Over Social Media

Revolutionizing the Way We Find Love

Federal Investigators Widen Tesla Inquiry, Company Says

Our Picks

Microsoft rebuilt Windows 11 around AI and Arm chips

Joyce’s picks: musings and readings in AI/ML, May 20, 2024 | by joyce shen | May, 2024

I Make AI Models to Sell Real People Clothes

Research on Perturbed Gradient Descent part1(Machine Learning 2024) | by Monodeep Mukherjee | May, 2024

Related Posts

Leave A Reply Cancel Reply