Skip to content

When extending embeddings, multivariate distribution isn't correctly estimated even when the calculated sigma matrix is symmetric and positive definite  #35075

Closed
@MayStepanyan

Description

@MayStepanyan

System Info

  • transformers version: 4.37.1
  • Platform: Linux-5.15.0-91-generic-x86_64-with-glibc2.35
  • Python version: 3.10.12
  • Huggingface_hub version: 0.26.2
  • Safetensors version: 0.4.5
  • Accelerate version: not installed
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.4.1+cu121 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?: Yes
  • Using distributed or parallel set-up in script?: No

Who can help?

When resizing token embeddings for models like MobileBert, iBert etc, resize_token_embeddings calls an underlying transformers.modeling_utils._init_added_embeddings_with_mean. It should initialize new embedding weights using the old ones:

  1. calculate the mean vector of old embedding vectors
  2. calculate a sigma matrix using this vector - vector * vector.T / vector_dim
  3. check if its positive-definite, i.e. can be used as a covariance matrix for a new distribution
  • if so, sample from estimated distribution
  • else just initialize the new embeddings from the mean vector of previous ones

I noticed the check in step 3 ALWAYS fails, i.e. no matrix is considered as positive definite.

The problem seems to be in these lines

         eigenvalues = torch.linalg.eigvals(covariance)
         is_covariance_psd = bool(
            (covariance == covariance.T).all() and not torch.is_complex(eigenvalues) and (eigenvalues > 0).all()
        )

since the eigenvalues calculated with torch.linalg.eigvals are complex and torch.is_complex returns True for them. Hence, the main logic, i.e. constructing a multivariate distribution from the previous embeddings and sample from it, might never work (at least in my experiments).

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Here's an isolated example testing the lines I mentioned above:

import torch

covariance = torch.Tensor([[5,4],[1,2]])
eigenvalues = torch.linalg.eigvals(covariance)
is_covariance_psd = bool((covariance == covariance.T).all() and not torch.is_complex(eigenvalues) and (eigenvalues > 0).all())
print(is_covariance_psd)

This outputs False despite the matrix having two positive real eigenvalues - 6 and 1

Expected behavior

The function should successfully generate a multivariate normal distribution whenever the calculated sigma is positive definite and symmetric.

I think the check might be replaced with something like:

from torch.distributions import constraints

is_psd = constraints.positive_definite.check(covariance).item()

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions