1.PAC-tuning:Advantageous-tuning Pretrained Language Fashions with PAC-driven Perturbed Gradient Descent
Authors: Guangliang Liu, Zhiyu Xue, Xitong Zhang, Kristen Marie Johnson, Rongrong Wang
Summary: Advantageous-tuning pretrained language fashions (PLMs) for downstream duties is a large-scale optimization downside, wherein the selection of the coaching algorithm critically determines how effectively the skilled mannequin can generalize to unseen check knowledge, particularly within the context of few-shot studying. To realize good generalization efficiency and keep away from overfitting, strategies equivalent to knowledge augmentation and pruning are sometimes utilized. Nonetheless, including these regularizations necessitates heavy tuning of the hyperparameters of optimization algorithms, equivalent to the favored Adam optimizer. On this paper, we suggest a two-stage fine-tuning methodology, PAC-tuning, to deal with this optimization problem. First, primarily based on PAC-Bayes coaching, PAC-tuning straight minimizes the PAC-Bayes generalization sure to be taught correct parameter distribution. Second, PAC-tuning modifies the gradient by injecting noise with the variance discovered within the first stage into the mannequin parameters throughout coaching, leading to a variant of perturbed gradient descent (PGD). Prior to now, the few-shot state of affairs posed difficulties for PAC-Bayes coaching as a result of the PAC-Bayes sure, when utilized to massive fashions with restricted coaching knowledge, may not be stringent. Our experimental outcomes throughout 5 GLUE benchmark duties exhibit that PAC-tuning efficiently handles the challenges of fine-tuning duties and outperforms robust baseline strategies by a visual margin, additional confirming the potential to use PAC coaching for another settings the place the Adam optimizer is presently used for coaching