Description
System Info
transformers
version: 4.37.1- Platform: Linux-5.15.0-91-generic-x86_64-with-glibc2.35
- Python version: 3.10.12
- Huggingface_hub version: 0.26.2
- Safetensors version: 0.4.5
- Accelerate version: not installed
- Accelerate config: not found
- PyTorch version (GPU?): 2.4.1+cu121 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: Yes
- Using distributed or parallel set-up in script?: No
Who can help?
When resizing token embeddings for models like MobileBert, iBert etc, resize_token_embeddings
calls an underlying transformers.modeling_utils._init_added_embeddings_with_mean
. It should initialize new embedding weights using the old ones:
- calculate the mean vector of old embedding vectors
- calculate a sigma matrix using this vector -
vector * vector.T / vector_dim
- check if its positive-definite, i.e. can be used as a covariance matrix for a new distribution
- if so, sample from estimated distribution
- else just initialize the new embeddings from the mean vector of previous ones
I noticed the check in step 3
ALWAYS fails, i.e. no matrix is considered as positive definite.
The problem seems to be in these lines
eigenvalues = torch.linalg.eigvals(covariance)
is_covariance_psd = bool(
(covariance == covariance.T).all() and not torch.is_complex(eigenvalues) and (eigenvalues > 0).all()
)
since the eigenvalues calculated with torch.linalg.eigvals
are complex and torch.is_complex
returns True
for them. Hence, the main logic, i.e. constructing a multivariate distribution from the previous embeddings and sample from it, might never work (at least in my experiments).
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
Here's an isolated example testing the lines I mentioned above:
import torch
covariance = torch.Tensor([[5,4],[1,2]])
eigenvalues = torch.linalg.eigvals(covariance)
is_covariance_psd = bool((covariance == covariance.T).all() and not torch.is_complex(eigenvalues) and (eigenvalues > 0).all())
print(is_covariance_psd)
This outputs False
despite the matrix having two positive real eigenvalues - 6
and 1
Expected behavior
The function should successfully generate a multivariate normal distribution whenever the calculated sigma is positive definite and symmetric.
I think the check might be replaced with something like:
from torch.distributions import constraints
is_psd = constraints.positive_definite.check(covariance).item()