Skip to content

Passing nn.Parameter values within the model architecture as deep copies. #34643

@James6Chou

Description

@James6Chou

System Info

python==3.9
transformers==4.41.2
linux

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

I'm performing some parameter sharing operations. For example, I define a new attribute in the model's self_atten layer and assign it the value of a parameter (nn.Parameter) from one of the weights in self_atten, like this:

self.test = self.o_proj.weight
However, I noticed that when the model is loaded, this operation is actually a deep copy, as self.test is self.o_proj.weight returns False; it should be a shallow copy instead. But when assigning an object like nn.Linear instead of nn.Parameter, it becomes a shallow copy.

Interestingly, if this operation is performed after loading the model:
model.model.layers[1].self_attn.test = model.model.layers[1].self_attn.k_proj.weight
at this point, it becomes a shallow copy."

Additionally, if a custom model architecture is defined:

import torch
import torch.nn as nn
class MyModule(nn.Linear):
    def __init__(self, in_features, r):
        nn.Linear.__init__(self, in_features, in_features)
        meta_device = torch.device('meta')
        self.weight = nn.parameter.Parameter(torch.randn(in_features, in_features, device = meta_device))
        self.lora_A = nn.parameter.Parameter(self.weight.new_zeros((r, in_features), device = meta_device))
        self.test = self.lora_A
        print(self.test is self.lora_A)

    def forward(self, x):
        return x

model = MyModule(in_features=10, r=5)

At this point, the output is True.

May I ask why it's not possible to assign an nn.Parameter with the same memory address to a model attribute within the model? Is this a bug or a feature?

Expected behavior

Please address my question by explaining the reason for this phenomenon and clarifying whether it is a bug or a feature.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions