-
Notifications
You must be signed in to change notification settings - Fork 31.5k
Description
System Info
python==3.9
transformers==4.41.2
linux
Who can help?
No response
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
I'm performing some parameter sharing operations. For example, I define a new attribute in the model's self_atten layer and assign it the value of a parameter (nn.Parameter) from one of the weights in self_atten, like this:
self.test = self.o_proj.weight
However, I noticed that when the model is loaded, this operation is actually a deep copy, as self.test is self.o_proj.weight returns False; it should be a shallow copy instead. But when assigning an object like nn.Linear instead of nn.Parameter, it becomes a shallow copy.
Interestingly, if this operation is performed after loading the model:
model.model.layers[1].self_attn.test = model.model.layers[1].self_attn.k_proj.weight
at this point, it becomes a shallow copy."
Additionally, if a custom model architecture is defined:
import torch
import torch.nn as nn
class MyModule(nn.Linear):
def __init__(self, in_features, r):
nn.Linear.__init__(self, in_features, in_features)
meta_device = torch.device('meta')
self.weight = nn.parameter.Parameter(torch.randn(in_features, in_features, device = meta_device))
self.lora_A = nn.parameter.Parameter(self.weight.new_zeros((r, in_features), device = meta_device))
self.test = self.lora_A
print(self.test is self.lora_A)
def forward(self, x):
return x
model = MyModule(in_features=10, r=5)
At this point, the output is True.
May I ask why it's not possible to assign an nn.Parameter with the same memory address to a model attribute within the model? Is this a bug or a feature?
Expected behavior
Please address my question by explaining the reason for this phenomenon and clarifying whether it is a bug or a feature.