The choice to make use of Fourier KAN layers was motivated by their skill to seize complicated, non-linear relationships in knowledge. These layers apply a learnable Fourier collection to the enter:
Consider it like this:
think about attempting to approximate a fancy wave sample. As a substitute of utilizing easy curves, we are able to mix varied sine and cosine waves of various frequencies. That is basically what a Fourier collection does — it decomposes complicated patterns into less complicated, oscillating elements.
Right here’s how the NaiveFourierKANLayer leverages this idea:
- Learnable Fourier Coefficients: The layer learns a set of weights (referred to as “Fourier coefficients”) for a spread of frequencies, represented by sine and cosine capabilities. These coefficients decide how a lot every frequency contributes to representing the enter knowledge’s underlying patterns.
- Making a Frequency Grid: The code generates a grid of frequencies (utilizing torch.arange) that decide the decision at which it analyzes the enter knowledge.
- Projecting to Fourier House: The enter knowledge is successfully projected right into a “Fourier area” utilizing sine and cosine capabilities utilized to the frequency grid. This transformation highlights patterns at totally different frequencies.
- Making use of Fourier Coefficients: The discovered Fourier coefficients are multiplied with the reworked knowledge. This step weights the significance of every frequency element.
- Combining and Outputting: The contributions from totally different frequencies are mixed, successfully reconstructing the enter knowledge with enhanced non-linear options. The output of the layer captures these complicated relationships.
In less complicated phrases, this layer acts like a complicated filter financial institution that learns to pick vital patterns in your knowledge at varied frequencies, permitting the community to mannequin extremely non-linear relationships. This may be notably helpful for duties the place the info reveals complicated periodicity or intricate patterns.
This code defines a NaiveFourierKANLayer, a neural community layer that leverages the ability of Fourier collection to be taught complicated relationships inside knowledge. Huge shoutout to
for this superior implementation (which I tweaked for my functions after all 🤖)
class NaiveFourierKANLayer(nn.Module):
"""
Naive Fourier Kolmogorov-Arnold Community Layer.This layer applies a Fourier rework to the enter and makes use of learnable coefficients
to create a fancy, non-linear transformation.
"""
def __init__(self, inputdim: int, outdim: int, gridsize: int = 300, addbias: bool = True):
"""
Initialize the NaiveFourierKANLayer.
Args:
inputdim (int): Dimension of the enter options.
outdim (int): Dimension of the output options.
gridsize (int): Measurement of the Fourier grid. Defaults to 300.
addbias (bool): Whether or not so as to add a bias time period. Defaults to True.
"""
tremendous(NaiveFourierKANLayer, self).__init__()
self.gridsize = gridsize
self.addbias = addbias
self.inputdim = inputdim
self.outdim = outdim
# Initialize Fourier coefficients
self.fouriercoeffs = nn.Parameter(
torch.randn(2, outdim, inputdim, gridsize) /
(torch.sqrt(torch.tensor(inputdim).float()) * torch.sqrt(torch.tensor(self.gridsize).float()))
)
if self.addbias:
self.bias = nn.Parameter(torch.zeros(1, outdim))
def ahead(self, x: torch.Tensor) -> torch.Tensor:
"""
Ahead go of the NaiveFourierKANLayer.
Args:
x (torch.Tensor): Enter tensor of form (batch_size, inputdim).
Returns:
torch.Tensor: Output tensor of form (batch_size, outdim).
"""
xshp = x.form
outshape = xshp[0:-1] + (self.outdim,)
x = x.view(-1, self.inputdim)
# Create frequency grid
ok = torch.arange(1, self.gridsize + 1, gadget=x.gadget).view(1, 1, 1, self.gridsize)
xrshp = x.view(x.form[0], 1, x.form[1], 1)
# Compute Fourier options
c = torch.cos(ok * xrshp)
s = torch.sin(ok * xrshp)
c = c.view(1, x.form[0], x.form[1], self.gridsize)
s = s.view(1, x.form[0], x.form[1], self.gridsize)
# Apply Fourier coefficients
y = torch.einsum("dbik,djik->bj", torch.cat([c, s], dim=0), self.fouriercoeffs)
if self.addbias:
y += self.bias
return y.view(outshape)
These layers permit the mannequin to seize periodic patterns and complicated interactions that is perhaps current in molecular binding knowledge.
The mannequin is educated utilizing PyTorch Lightning, which simplifies the coaching loop and permits for straightforward integration of superior options like early stopping and studying fee scheduling
Along with the lightning mannequin above, we should be positive it has the appropriate metrics and optimizers
def training_step(self, batch: Batch, batch_idx: int) -> torch.Tensor:
"""
Carry out a single coaching step.Args:
batch (Batch): The enter batch of information.
batch_idx (int): The index of the present batch.
Returns:
torch.Tensor: The computed loss for the batch.
"""
y_hat = self(batch)
y = batch.y.to(y_hat.gadget).view(-1, 1) # Reshape y to match y_hat
loss = self.criterion(y_hat, y)
self.log('train_loss', loss, on_step=True, on_epoch=True, prog_bar=True)
self.train_accuracy(y_hat.sigmoid(), y)
self.log('train_acc', self.train_accuracy, on_step=True, on_epoch=True, prog_bar=True)
return loss
# ... Repeat for val and take a look at steps
def configure_optimizers(self) -> Dict[str, Any]:
"""
Configure optimizers and studying fee schedulers.
Returns:
Dict[str, Any]: A dictionary containing the optimizer and studying fee scheduler configuration.
"""
optimizer = torch.optim.Adam(self.parameters(), lr=self.learning_rate)
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', issue=0.1, endurance=10,
verbose=True)
return {
"optimizer": optimizer,
"lr_scheduler": scheduler,
"monitor": "val_loss"
}
Lastly, we outline our Lightning knowledge module to make sure every little thing is appropriate
class BELKADataModule(pl.LightningDataModule):
def __init__(self, train_file: str, test_file: str, batch_size: int = 32, num_workers: int = 4):
tremendous().__init__()
self.train_file = train_file
self.test_file = test_file
self.batch_size = batch_size
self.num_workers = num_workersdef setup(self, stage: Optionally available[str] = None):
if stage == 'match' or stage is None:
self.train_dataset = StreamingMoleculeDataset(self.train_file)
self.val_dataset = StreamingMoleculeDataset(self.train_file) # Use a separate iterator for validation
if stage == 'take a look at' or stage is None:
self.test_dataset = StreamingMoleculeDataset(self.test_file, target_col=None)
def train_dataloader(self) -> DataLoader:
return DataLoader(self.train_dataset, batch_size=self.batch_size, num_workers=self.num_workers,
persistent_workers=True)
def val_dataloader(self) -> DataLoader:
return DataLoader(self.val_dataset, batch_size=self.batch_size, num_workers=self.num_workers,
persistent_workers=True)
def test_dataloader(self) -> DataLoader:
return DataLoader(self.test_dataset, batch_size=self.batch_size, num_workers=self.num_workers,
persistent_workers=True)
…and arrange our principal coaching loop.
def principal():
pl.seed_everything(42)
train_file = "kaggle/enter/leash-BELKA/practice.csv"
test_file = "kaggle/enter/leash-BELKA/take a look at.parquet"
data_module = BELKADataModule(train_file, test_file, batch_size=128, num_workers=4)
data_module.setup()# Get a pattern batch to find out enter dimensions
sample_batch = subsequent(iter(data_module.train_dataloader()))
in_feat = sample_batch.x.measurement(-1)
num_proteins = len(data_module.train_dataset.protein_encoder.categories_[0])
global_feat = sample_batch.global_features.measurement(-1) # This could now be 1001
mannequin = BELKAModule(
in_feat=in_feat,
hidden_feat=256,
out_feat=1,
num_layers=2,
num_proteins=num_proteins,
protein_embedding_dim=32,
global_feat=global_feat,
learning_rate=1e-3,
grid_feat=200
)
logger.data("🏗️ Mannequin structure:")
logger.data(mannequin)
Personally, I exploit MLFlow its free and useless easy to make use of, so I carried out their logger integration, in addition to their early stopping, checkpointing and studying fee monitor — all of that are superior coaching methods. I can’t stress sufficient, everybody must be utilizing early stopping.
callbacks = [
EarlyStopping(monitor='val_ap', patience=10, mode='max'),
ModelCheckpoint(monitor='val_ap', save_top_k=3, mode='max', filename='belka-{epoch:02d}-{val_ap:.4f}'),
LearningRateMonitor(logging_interval='step')
]
mlf_logger = MLFlowLogger(experiment_name="BELKA_GKAN", tracking_uri="mlruns")
In case your utilizing a number of GPU’s you may set the technique to ddp
or ddp_notebook
in case your in a colab atmosphere
coach = pl.Coach(
max_epochs=100,
callbacks=callbacks,
logger=mlf_logger,
accelerator='gpu' if torch.cuda.is_available() else 'cpu',
units=1, # variety of gpus
precision="16-mixed",
gradient_clip_val=0.5,
accumulate_grad_batches=4,
log_every_n_steps=50,
num_sanity_val_steps=0
# technique="ddp"
)
logger.data("🚂 Beginning mannequin coaching")
coach.match(mannequin, data_module)
logger.data("🔮 Making predictions on take a look at set")
predictions = coach.predict(mannequin, data_module.test_dataloader())
predictions = torch.cat([pred for batch in predictions for pred in batch]).cpu().numpy()
sample_submission = pd.read_csv("kaggle/enter/leash-BELKA/sample_submission.csv")
sample_submission['binds'] = predictions
sample_submission.to_csv('submission.csv', index=False)
logger.data("💾 Submission saved to submission.csv")if __name__ == "__main__":
principal()