Passing nn.Parameter values within the model architecture as deep copies.

### System Info

python==3.9
transformers==4.41.2
linux

### Who can help?

_No response_

### Information

- [ ] The official example scripts
- [X] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [X] My own task or dataset (give details below)

### Reproduction

I'm performing some parameter sharing operations. For example, I define a new attribute in the model's self_atten layer and assign it the value of a parameter (nn.Parameter) from one of the weights in self_atten, like this:

`self.test = self.o_proj.weight
`
However, I noticed that when the model is loaded, this operation is actually a deep copy, as self.test is self.o_proj.weight returns False; it should be a shallow copy instead. But when assigning an object like nn.Linear instead of nn.Parameter, it becomes a shallow copy.

Interestingly, if this operation is performed after loading the model:
`model.model.layers[1].self_attn.test = model.model.layers[1].self_attn.k_proj.weight
`
at this point, it becomes a shallow copy."


Additionally, if a custom model architecture is defined:
```
import torch
import torch.nn as nn
class MyModule(nn.Linear):
    def __init__(self, in_features, r):
        nn.Linear.__init__(self, in_features, in_features)
        meta_device = torch.device('meta')
        self.weight = nn.parameter.Parameter(torch.randn(in_features, in_features, device = meta_device))
        self.lora_A = nn.parameter.Parameter(self.weight.new_zeros((r, in_features), device = meta_device))
        self.test = self.lora_A
        print(self.test is self.lora_A)

    def forward(self, x):
        return x

model = MyModule(in_features=10, r=5)
```
At this point, the output is True.

May I ask why it's not possible to assign an nn.Parameter with the same memory address to a model attribute within the model? Is this a bug or a feature?



### Expected behavior

Please address my question by explaining the reason for this phenomenon and clarifying whether it is a bug or a feature.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Passing nn.Parameter values within the model architecture as deep copies. #34643

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Passing nn.Parameter values within the model architecture as deep copies. #34643

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions