-
Notifications
You must be signed in to change notification settings - Fork 27.3k
Insights: huggingface/transformers
Overview
Could not load contribution data
Please try again later
1 Release published by 1 person
-
v4.47.0 v4.47.0: PaliGemma-2, I-JEPA, OLMo-2, LayerSkip, Tensor Parallel
published
Dec 5, 2024
49 Pull requests merged by 36 people
-
docs: clarify initializer_range parameter description in Idefics3VisionConfig
#35215 merged
Dec 11, 2024 -
Fix seamless TTS generate
#34968 merged
Dec 11, 2024 -
Fix CI
#35208 merged
Dec 11, 2024 -
Cleanup: continue the init refactor
#35170 merged
Dec 11, 2024 -
Add TimmWrapper
#34564 merged
Dec 11, 2024 -
[PEFT] Better Trainer error when prompt learning with loading best model at the end
#35087 merged
Dec 11, 2024 -
🧹 Remove deprecated RotaryEmbedding parts in the Attention layers
#34858 merged
Dec 11, 2024 -
BLIP: enable device map
#34850 merged
Dec 11, 2024 -
[i18n-<languageCode>] Translating agents.md to Chinese
#35139 merged
Dec 10, 2024 -
Update data collator docstrings to accurately reference Nvidia tensor core compute capability version
#35188 merged
Dec 10, 2024 -
[docs] Fix FlashAttention link
#35171 merged
Dec 10, 2024 -
[i18n-<languageCode>] Translating Benchmarks.md to Chinese
#35137 merged
Dec 10, 2024 -
Only import torch.distributed if it is available
#35133 merged
Dec 10, 2024 -
Multiple typo fixes in NLP, Audio docs
#35181 merged
Dec 10, 2024 -
[i18n-ar] Translated file :
docs/source/ar/community.md
into Arabic#33027 merged
Dec 10, 2024 -
Fixing GGUF support for StableLm
#35060 merged
Dec 10, 2024 -
Fix DBRX LayerNorm init method
#35177 merged
Dec 10, 2024 -
Remove unnecessary masked_fill in deberta models
#35182 merged
Dec 10, 2024 -
Support BatchNorm in Hubert pos_conv_emb as in fairseq
#34389 merged
Dec 10, 2024 -
Fix file path for shard_num 1 with mllama converter
#35053 merged
Dec 10, 2024 -
Assisted decoding multi-gpu
#35116 merged
Dec 10, 2024 -
Fix
num_items_in_batch
not being an integer#35115 merged
Dec 10, 2024 -
[CI] Fix bnb quantization tests with accelerate>=1.2.0
#35172 merged
Dec 9, 2024 -
Fixed typo of 'avilable' in prompts.py
#35145 merged
Dec 9, 2024 -
Super tiny fix logging message
#35132 merged
Dec 9, 2024 -
Cleanup: continue the init refactor
#35167 merged
Dec 9, 2024 -
Fix typo in EETQ Tests
#35160 merged
Dec 9, 2024 -
Option to set 'non_blocking' for to(device) in BatchEncoding and BatchFeature
#34883 merged
Dec 9, 2024 -
Corrected typo in agent system prompts
#35143 merged
Dec 9, 2024 -
[I-JEPA] Update docs
#35148 merged
Dec 9, 2024 -
Fix GA loss bugs and add unit test
#35121 merged
Dec 9, 2024 -
Update I-JEPA checkpoints path
#35120 merged
Dec 6, 2024 -
Add feature dim attributes to BitLinear for easier PEFT integration
#34946 merged
Dec 6, 2024 -
Add Aria
#34157 merged
Dec 6, 2024 -
Fix private forked repo. CI
#35114 merged
Dec 6, 2024 -
[docs] top_p, top_k, temperature docstrings
#35065 merged
Dec 5, 2024 -
[docs] Update Python version in translations
#35096 merged
Dec 5, 2024 -
Fix signatures for processing kwargs
#35105 merged
Dec 5, 2024 -
Adaptive dynamic number of speculative tokens
#34156 merged
Dec 5, 2024 -
Fix flaky Hub CI (
test_trainer.py
)#35062 merged
Dec 5, 2024 -
[
trainer
] fix the GAmodel_accepts_loss_kwargs
#34915 merged
Dec 5, 2024 -
BLIP: this is correct now
#35081 merged
Dec 5, 2024 -
Add I-JEPA
#33125 merged
Dec 5, 2024 -
Deprecate quanto and switch to optimum-quanto
#35001 merged
Dec 5, 2024 -
Fix
tie_word_embeddings
handling for GGUF models#35085 merged
Dec 5, 2024 -
Update Mistral conversion script
#34829 merged
Dec 5, 2024 -
[
tokenizers
] bump to 0.21#34972 merged
Dec 5, 2024 -
[Whisper] Fix whisper tokenizer
#34537 merged
Dec 5, 2024 -
Informative
#35059 merged
Dec 5, 2024
48 Pull requests opened by 30 people
-
[Idefics3] Move image features to same device as input embeds
#35100 opened
Dec 5, 2024 -
Let `EarlyStoppingCallback` not require `load_best_model_at_end`
#35101 opened
Dec 5, 2024 -
Add check for if num_items_in_batch is not None
#35102 opened
Dec 5, 2024 -
Support Python 3.10+ Union style in chat template type hints parsing
#35103 opened
Dec 5, 2024 -
Add dinov2 with registers attempt 2
#35104 opened
Dec 5, 2024 -
Fix the structure of images in PixtralProcessor
#35107 opened
Dec 5, 2024 -
Keep `image_sizes` in output of `PixtralProcessor`
#35110 opened
Dec 6, 2024 -
[WIP] Add flex attention for Qwen2VL
#35112 opened
Dec 6, 2024 -
Use `rsfE` with `pytest`
#35119 opened
Dec 6, 2024 -
[WIP] Pixtral: vectorize patch embeddings
#35122 opened
Dec 6, 2024 -
Add weight norm rename in _load_state_dict_into_model
#35123 opened
Dec 6, 2024 -
Add common test for `torch.export` and fix some vision models
#35124 opened
Dec 6, 2024 -
logic was inverted for passing loss_kwargs to forward pass
#35128 opened
Dec 6, 2024 -
Add compute_loss_func to Seq2SeqTrainer
#35136 opened
Dec 7, 2024 -
addressing the issue #34611 to make FlaxDinov2 compatible with any batch size
#35138 opened
Dec 7, 2024 -
Samhq model addition
#35147 opened
Dec 8, 2024 -
[`Mamba2`] Fix caching, slow path, and multi-gpu
#35154 opened
Dec 8, 2024 -
Adding FlexAttention Support for Qwen2 models
#35155 opened
Dec 8, 2024 -
Add reminder config to issue template and print DS version in env
#35156 opened
Dec 9, 2024 -
don't use no_sync when deepspeed doesn't support it for certain zero stages
#35157 opened
Dec 9, 2024 -
Add ModernBERT to Transformers
#35158 opened
Dec 9, 2024 -
Use modules_to_not_convert in AQLM
#35161 opened
Dec 9, 2024 -
Init cache on meta device
#35164 opened
Dec 9, 2024 -
Ascend NPU support SDPA
#35165 opened
Dec 9, 2024 -
Add dtype check in check_quantized_param for bnb4
#35183 opened
Dec 10, 2024 -
Add compile test for fast image processor
#35184 opened
Dec 10, 2024 -
make LlamaModel._update_causal_mask torch compilable
#35187 opened
Dec 10, 2024 -
Fix f-string to show `ACCELERATE_MIN_VERSION` on error
#35189 opened
Dec 10, 2024 -
Add Flax activation functions and corresponding tests
#35191 opened
Dec 10, 2024 -
[i18n-ar] Translated file : docs/source/ar/tasks/sequence_classification.md into Arabic
#35192 opened
Dec 10, 2024 -
[i18n-ar] Translated file : docs/source/ar/tasks/token_classification.md into Arabic
#35193 opened
Dec 10, 2024 -
[i18n-ar] Translated file: `docs/source/ar/tasks/translation.md` into Arabic
#35194 opened
Dec 11, 2024 -
[i18n-ar] Translated file: `docs/source/ar/tasks/summarization.md` into Arabic
#35195 opened
Dec 11, 2024 -
[i18n-ar] Translated file: `docs/source/ar/tasks/question_answering.md` into Arabic
#35196 opened
Dec 11, 2024 -
[i18n-ar] Translated file: `docs/source/ar/tasks/language_modeling.md` into Arabic
#35197 opened
Dec 11, 2024 -
[i18n-ar] Translated file: `docs/source/ar/tasks/masked_language_modeling.md` into Arabic
#35198 opened
Dec 11, 2024 -
[i18n-ar] Translated file: `docs/source/ar/tasks/multiple_choice.md` into Arabic
#35199 opened
Dec 11, 2024 -
PaliGemma: Make sure to add <eos> to suffix if <image> is present in `text`
#35201 opened
Dec 11, 2024 -
🔴 Video processors as a separate class
#35206 opened
Dec 11, 2024 -
Scale loss before backward
#35207 opened
Dec 11, 2024 -
Cleanup: continue the init refactor
#35209 opened
Dec 11, 2024 -
Trigger GitHub CI with a comment on PR
#35211 opened
Dec 11, 2024 -
Fix FSDP no longer working
#35212 opened
Dec 11, 2024 -
Add retry hf hub decorator
#35213 opened
Dec 11, 2024 -
Added RAdamScheduleFree support | updated codecarbon to run test_trainer.py properly
#35214 opened
Dec 11, 2024 -
Fix type hints for apply_chat_template
#35216 opened
Dec 11, 2024 -
Fix loading with only state dict and low_cpu_mem_usage = True
#35217 opened
Dec 11, 2024 -
added cached tokenizer
#35218 opened
Dec 11, 2024
39 Issues closed by 18 people
-
How to load local transformers?
#35118 closed
Dec 12, 2024 -
`SeamlessM4TForTextToSpeech.generate` not working if `generation_config` is passed
#34811 closed
Dec 11, 2024 -
top-p sampling gives different results even after fixing all random seeds
#34693 closed
Dec 11, 2024 -
[Idefics3] processing_idefics3 - IndexError: list index out of range for multiple image input
#34727 closed
Dec 11, 2024 -
Video-Llava model's generation error due to causal mask shape mismatch
#34696 closed
Dec 11, 2024 -
Compile of the Generate function raises ValueError
#34767 closed
Dec 11, 2024 -
[i18n-<languageCode>] Translating Benchmarks.md to Chinese
#35134 closed
Dec 11, 2024 -
No module named 'torch._C._distributed_c10d'; 'torch._C' is not a package on darwin
#35129 closed
Dec 10, 2024 -
Did you mean masked_fill_ in modeling_deberta.py?
#35162 closed
Dec 10, 2024 -
Add support for HuBERT batch norm instead of weight norm in pos_conv_emb
#34229 closed
Dec 10, 2024 -
Incorrect hardcoded consolidated.pth path for Llama 3.2 11B Vision+Instruct Model
#35049 closed
Dec 10, 2024 -
Running AG and SD when assistant and target models are on different devices
#35099 closed
Dec 10, 2024 -
Exception raised with trainer + `accelerate launch` FSDP + large gradient accumulation steps + small dataset
#33413 closed
Dec 10, 2024 -
Documentation for HuBERT is Incomplete
#33536 closed
Dec 10, 2024 -
CausalLM loss function throws runtime error in multi-gpu setup
#35086 closed
Dec 10, 2024 -
Why can't `inputs_embeds` be used during the first generation in a multimodal model?
#35131 closed
Dec 10, 2024 -
XLA FSDP V2 + TPU + T5 Family Models doesn't work
#35142 closed
Dec 9, 2024 -
Typo in prompts.py: "avilable" should be "available"
#35144 closed
Dec 9, 2024 -
Typo in agent system prompts
#35109 closed
Dec 9, 2024 -
DDP error with load_best_model_at_end enabled
#30702 closed
Dec 8, 2024 -
Saving model in safetensors format through Trainer fails for Gemma 2 due to shared tensors
#33807 closed
Dec 8, 2024 -
T5Attention forward pass failing when not using KV cache
#34448 closed
Dec 8, 2024 -
Flash attention build running forever on colab
#34466 closed
Dec 8, 2024 -
Falcon model training on multiple GPUs
#34492 closed
Dec 8, 2024 -
num_quantizer in EncodecConfig should accept variable codebook size
#34521 closed
Dec 8, 2024 -
Deprecated `shard_checkpoint`'s replacement `save_torch_state_dict` does not save tied embeddings
#35080 closed
Dec 7, 2024 -
[Query] Preprocessing Tutorial
#35111 closed
Dec 6, 2024 -
who are you
#35117 closed
Dec 6, 2024 -
Some Whisper beam search output (sequences_scores, etc.) is lost in _stack_split_outputs
#32373 closed
Dec 6, 2024 -
Assert error in convert_llava_onevision_weights_to_hf.py
#34467 closed
Dec 6, 2024 -
The document of generation seems to wrongly describe the default value of top_p, top_k and temperature
#35045 closed
Dec 5, 2024 -
Mismatched keyword argument names of llama make GA fix invalid
#34577 closed
Dec 5, 2024 -
WhisperTokenizer decode is offsetting timestamps incorrectly
#34472 closed
Dec 5, 2024 -
Manually setting `device_map` causes a RuntimeError.
#34456 closed
Dec 5, 2024
32 Issues opened by 27 people
-
Numpy is not available
#35221 opened
Dec 11, 2024 -
Models numpy version and version installed after running `pip install transformers[sentencepiece]`
#35220 opened
Dec 11, 2024 -
Shape mismatch in RoPE embeddings gpt_neox model when rotary_ndims is odd
#35219 opened
Dec 11, 2024 -
run_mlm_flax on tpu v5-pods
#35205 opened
Dec 11, 2024 -
logged loss is not correct with gradient accumulation
#35204 opened
Dec 11, 2024 -
gradient calculation is not correct with gradient accumulation in Pretrain
#35203 opened
Dec 11, 2024 -
Improve tensor parallel memory usage
#35202 opened
Dec 11, 2024 -
PaliGemma2 Processor returns wrong labels array when <image> token is present in `text`
#35200 opened
Dec 11, 2024 -
How to convert my Mask2Former model (ResNet-50 backbone) to Hugging Face transformer
#35186 opened
Dec 10, 2024 -
QuantizedCache first token processing is counterintuitive / worse than in papers
#35185 opened
Dec 10, 2024 -
Adding Mamba2ForTokenClassification to Mamba2
#35180 opened
Dec 10, 2024 -
More rich documentation on pipelines
#35179 opened
Dec 10, 2024 -
Incorrect file structure in convert_mask2former_original_pytorch_checkpoint_to_pytorch.py?
#35178 opened
Dec 10, 2024 -
Detokenization discrepancy with Llama3.1
#35175 opened
Dec 9, 2024 -
LlavaForConditionalGeneration._merge_input_ids_with_image_features throws error
#35169 opened
Dec 9, 2024 -
DynamicCache does not support variable lengths, except for FA2
#35168 opened
Dec 9, 2024 -
Mimi model gives different outputs when using batch encode vs single encode
#35166 opened
Dec 9, 2024 -
Calling Trainer.create_model_card() with an empty dataset list causes an IndexError
#35163 opened
Dec 9, 2024 -
[tests] run one test but got 2 test results
#35159 opened
Dec 9, 2024 -
Impossible to change attention implementation
#35153 opened
Dec 8, 2024 -
how to load the weight of decoder.embed_tokens.weight seperately from the shared weight?
#35152 opened
Dec 8, 2024 -
Qwen2vl float16 inference bug in naive attention
#35151 opened
Dec 8, 2024 -
Cuda OOM
#35150 opened
Dec 8, 2024 -
RuntimeError: shape '[1, 3098, 6, 5, 128]' is invalid for input of size 12689408
#35146 opened
Dec 7, 2024 -
FileNotFoundError: Couldn't find a module script at /home/tooko/transformers-course/glue/glue.py.
#35140 opened
Dec 7, 2024 -
[i18n-<languageCode>] Translating docs to Chinese Translate agents.md into Chinese
#35135 opened
Dec 7, 2024 -
Special token ids are not longer typed properly in 4.47.0
#35126 opened
Dec 6, 2024 -
Error during training: "Expected dtype float for end but got dtype c10::BFloat16"
#35106 opened
Dec 5, 2024 -
TextIteratorStreamer unable to create generator_kwargs
#35098 opened
Dec 5, 2024
102 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
Add LightGlue model
#31718 commented on
Dec 9, 2024 • 54 new comments -
Efficient Inference Kernel for SpQR
#34976 commented on
Dec 11, 2024 • 23 new comments -
[Whisper] 🚨 Fix whisper decoding 🚨
#34135 commented on
Dec 11, 2024 • 18 new comments -
Enhanced Installation Section in README.md
#35094 commented on
Dec 5, 2024 • 12 new comments -
Add Zamba2
#34517 commented on
Dec 7, 2024 • 12 new comments -
[WIP] Refactoring of ImageProcessorFast
#35069 commented on
Dec 11, 2024 • 12 new comments -
Run model as compressed/uncompressed mode
#34719 commented on
Dec 11, 2024 • 10 new comments -
HIGGS Quantization Support
#34997 commented on
Dec 9, 2024 • 10 new comments -
[GGUF] Refactor and decouple gguf checkpoint loading logic
#34385 commented on
Dec 10, 2024 • 8 new comments -
Add dithering to the `Speech2TextFeatureExtractor` API.
#34638 commented on
Dec 10, 2024 • 6 new comments -
Universal Speculative Decoding `CandidateGenerator`
#35029 commented on
Dec 5, 2024 • 5 new comments -
Aggeregate test summary files in CircleCI workflow runs
#34989 commented on
Dec 10, 2024 • 4 new comments -
add bnb support for Ascend NPU
#31512 commented on
Dec 11, 2024 • 4 new comments -
Add diffllama
#34083 commented on
Dec 10, 2024 • 4 new comments -
Fix : Falcon processor doesn't account for a layout difference of qkv between transformers and GGUF
#35088 commented on
Dec 10, 2024 • 3 new comments -
Enable different torch dtype in sub models
#34873 commented on
Dec 10, 2024 • 3 new comments -
Fix case of nested tensors in BatchMixFeature
#35063 commented on
Dec 9, 2024 • 2 new comments -
[whisper] added dropping of attention weights after DTW calculations related to word timestamps if these weights are not requested in the output
#33732 commented on
Dec 9, 2024 • 2 new comments -
Add ColPali to 🤗 transformers
#33736 commented on
Dec 10, 2024 • 2 new comments -
Add support for Apple's Depth-Pro
#34583 commented on
Dec 6, 2024 • 2 new comments -
Output dicts support in text generation pipeline
#35092 commented on
Dec 7, 2024 • 2 new comments -
Add TextNet
#34979 commented on
Dec 8, 2024 • 1 new comment -
enable StaticCache for assisted generation
#34797 commented on
Dec 11, 2024 • 0 new comments -
FEAT : Adding VPTQ quantization method to HFQuantizer
#34770 commented on
Dec 6, 2024 • 0 new comments -
Update config validation
#34726 commented on
Dec 9, 2024 • 0 new comments -
Add GOT-OCR 2.0 to Transformers
#34721 commented on
Dec 5, 2024 • 0 new comments -
Add TimesFM Time Series Forecasting Model
#34082 commented on
Dec 10, 2024 • 0 new comments -
change bnb tests
#34713 commented on
Dec 9, 2024 • 0 new comments -
Past Keys Output now working with output router logits
#34707 commented on
Dec 6, 2024 • 0 new comments -
VLMs: major clean up 🧼
#34502 commented on
Dec 10, 2024 • 0 new comments -
LLaVA-NeXT: add new model checkpoints
#34195 commented on
Dec 11, 2024 • 0 new comments -
Modular phi
#34361 commented on
Dec 11, 2024 • 0 new comments -
[FEAT] Compatibility with dduf format from diffusers
#35093 commented on
Dec 11, 2024 • 0 new comments -
[Clean-up] Planned removal of the `max_size` argument
#35090 commented on
Dec 6, 2024 • 0 new comments -
Fix : model used to test ggml conversion of Falcon-7b is incorrect
#35083 commented on
Dec 10, 2024 • 0 new comments -
[setup] migrate setup script to `pyproject.toml` (reland #22539)
#35077 commented on
Dec 6, 2024 • 0 new comments -
Use AMD CI workflow defined in hf-workflows
#35058 commented on
Dec 6, 2024 • 0 new comments -
Add: num_additional_image_tokens to models
#35052 commented on
Dec 9, 2024 • 0 new comments -
Enable gptqmodel
#35012 commented on
Dec 10, 2024 • 0 new comments -
switch from `training_args.bin` `training_args.json`
#35010 commented on
Dec 12, 2024 • 0 new comments -
Refactoring `AssistedCandidateGenerator` for Improved Modularity and Reusability
#35009 commented on
Dec 10, 2024 • 0 new comments -
Make `test_generate_with_static_cache` even less flaky
#34995 commented on
Dec 10, 2024 • 0 new comments -
Deprecate _is_quantized_training_enabled
#34991 commented on
Dec 10, 2024 • 0 new comments -
[ `Core`] Refactor modeling code
#34987 commented on
Dec 11, 2024 • 0 new comments -
Add the Bamba Model
#34982 commented on
Dec 11, 2024 • 0 new comments -
[`ESM`] Add support for sdpa.
#34954 commented on
Dec 9, 2024 • 0 new comments -
Add sdpa for Beit
#34941 commented on
Dec 11, 2024 • 0 new comments -
Implement AsyncTextIteratorStreamer for asynchronous streaming
#34931 commented on
Dec 9, 2024 • 0 new comments -
[tests] fix "Tester object has no attribute '_testMethodName'"
#34910 commented on
Dec 10, 2024 • 0 new comments -
Add Relation DETR
#34900 commented on
Dec 9, 2024 • 0 new comments -
[WIP] Add flex attention for gpt2
#34861 commented on
Dec 6, 2024 • 0 new comments -
Add Flex Attention for Mistral along with refactoring
#34845 commented on
Dec 5, 2024 • 0 new comments -
[`GPTQ`, `CompressedTensors`] Fix unsafe imports and metada check
#34815 commented on
Dec 8, 2024 • 0 new comments -
Adding support for OpenLMForCausalLM from DataComp
#34081 commented on
Dec 5, 2024 • 0 new comments -
The dot in the model name when using auto_map will cause a path parsing error.
#35082 commented on
Dec 10, 2024 • 0 new comments -
Add Flax diverse group search
#25355 commented on
Dec 9, 2024 • 0 new comments -
Resuming from checkpoint runs into OOM
#30822 commented on
Dec 9, 2024 • 0 new comments -
NaN model parameter found in meta-llama/Llama-3.2-11B-Vision under 4.46.1 version
#34602 commented on
Dec 9, 2024 • 0 new comments -
Trying to train a model using automatic1111. Error - Exception training model: 'module 'transformers.integrations' has no attribute 'deepspeed''.
#34427 commented on
Dec 9, 2024 • 0 new comments -
`dataloader_persistent_workers=True` causes fork-bomb due to repeated creation of `eval_dataloader`
#28469 commented on
Dec 9, 2024 • 0 new comments -
bus error on version 4.43.0 with pretrained community CLIP model - MacOS
#33357 commented on
Dec 8, 2024 • 0 new comments -
Accelerate x Trainer issue tracker:
#33345 commented on
Dec 8, 2024 • 0 new comments -
Passing nn.Parameter values within the model architecture as deep copies.
#34643 commented on
Dec 8, 2024 • 0 new comments -
ValueError: Architecture deepseek2 not supported
#34335 commented on
Dec 7, 2024 • 0 new comments -
Does per_device_train_batch_size have a loss error similar to that of GA?
#34579 commented on
Dec 7, 2024 • 0 new comments -
Enhancing Hugging Face Models with Tensor Parallelism for Large-Scale Model Support 🚀
#32470 commented on
Dec 7, 2024 • 0 new comments -
xpu device is not used running pipeline(device_map="auto")
#31922 commented on
Dec 6, 2024 • 0 new comments -
Is there a way to find the earliest version of transformers that has a certain model?
#35097 commented on
Dec 6, 2024 • 0 new comments -
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argument for argument mat1 in method wrapper_CUDA_addmm)
#34695 commented on
Dec 6, 2024 • 0 new comments -
How to specific customized force_token_ids in whisper
#34107 commented on
Dec 6, 2024 • 0 new comments -
Unexpected output of _flash_attention_forward() for cross attention
#35032 commented on
Dec 6, 2024 • 0 new comments -
Duplicate ZeRo 3 Global Step Checkpoint Saves
#34534 commented on
Dec 6, 2024 • 0 new comments -
ValueError: You are trying to save a non-contiguous tensor in MT5 finetunning
#34623 commented on
Dec 6, 2024 • 0 new comments -
Make it possible to save and evaluate checkpoint on CTRL+C / `KeyboardInterrupt` with Hugging Face Trainer
#35033 commented on
Dec 6, 2024 • 0 new comments -
The same situation as #31377 occurred when using Qwen/Qwen2-VL-7B-Instruct
#33399 commented on
Dec 5, 2024 • 0 new comments -
When extending embeddings, multivariate distribution isn't correctly estimated even when the calculated sigma matrix is symmetric and positive definite
#35075 commented on
Dec 5, 2024 • 0 new comments -
Documentation for SWAG contradicts itself when constructing the first sentence.
#35095 commented on
Dec 5, 2024 • 0 new comments -
AssertionError for Pytorch PiPPy example
#34600 commented on
Dec 5, 2024 • 0 new comments -
Add Molmo (7B-D, 7B-O, 70B)
#33962 commented on
Dec 11, 2024 • 0 new comments -
#33512 handle last element out of range error
#33625 commented on
Dec 9, 2024 • 0 new comments -
[WIP] - Enable speculative decoding with batch size >1
#32189 commented on
Dec 11, 2024 • 0 new comments -
[docs] Redesign
#31757 commented on
Dec 11, 2024 • 0 new comments -
Support Kosmos-2.5
#31711 commented on
Dec 6, 2024 • 0 new comments -
Implement SuperGlue model
#29886 commented on
Dec 6, 2024 • 0 new comments -
Fix model code to accurately convert fairseq wav2vec2 model
#28250 commented on
Dec 12, 2024 • 0 new comments -
Add Model Support for xLSTM
#27011 commented on
Dec 11, 2024 • 0 new comments -
Confusing error message
#34658 commented on
Dec 11, 2024 • 0 new comments -
AutoModelForDepthEstimation/DepthAnythingDepthEstimationHead unexpected behavior in JIT
#34679 commented on
Dec 11, 2024 • 0 new comments -
Discrepancy in Training Loss Behavior with Gradient Accumulation using DeepSpeed
#34694 commented on
Dec 11, 2024 • 0 new comments -
trainer resume from checkpoint,the learning rate is not the same as retraining,learning rate is discontinuous
#34053 commented on
Dec 11, 2024 • 0 new comments -
[i18n-ar] Translating docs to Arabic (العربية)
#32435 commented on
Dec 10, 2024 • 0 new comments -
Verify interpolation of image processors
#28180 commented on
Dec 10, 2024 • 0 new comments -
Bug in running facebook/wav2vec2-xlsr-53-espeak-cv-ft
#35064 commented on
Dec 10, 2024 • 0 new comments -
rework `test_multi_gpu_data_parallel_forward`
#31087 commented on
Dec 10, 2024 • 0 new comments -
Padding error when using Universal Assisted Generation with ASR pipeline
#34639 commented on
Dec 10, 2024 • 0 new comments -
Silent failure in generation parameters
#33690 commented on
Dec 10, 2024 • 0 new comments -
BarkProcessor voice_preset doesn't work
#34634 commented on
Dec 10, 2024 • 0 new comments -
about gradient accumulation
#34648 commented on
Dec 10, 2024 • 0 new comments -
Neftune computation is probably wrong with packed training
#34659 commented on
Dec 10, 2024 • 0 new comments -
FlaxWhisperForConditionalGeneration Out Of Memory Error
#34668 commented on
Dec 10, 2024 • 0 new comments -
Vision Encoder-Decoder fails with LLaMA decoder due to missing cross-attention implementation
#34674 commented on
Dec 10, 2024 • 0 new comments -
tokenizer.json modified after tokenizer.save_pretrained of OLMO models
#34744 commented on
Dec 10, 2024 • 0 new comments