Pulse · huggingface/transformers

February 28, 2025 – March 3, 2025

69 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

`GPT2Model` StaticCache support
#35761 commented on Mar 3, 2025 • 6 new comments
Fix sdpa in sam and refactor relative position embeddings
#36422 commented on Mar 4, 2025 • 5 new comments
Integrate SwanLab for offline/online experiment tracking and local visualization
#36433 commented on Mar 3, 2025 • 3 new comments
Fix Batch Size Mismatch When Using `crops_n_layers` in `mask-generation` Pipeline #35530
#35627 commented on Mar 3, 2025 • 3 new comments
Customize docstrings fast image processor
#36466 commented on Mar 3, 2025 • 2 new comments
Add FAST
#35476 commented on Mar 2, 2025 • 1 new comment
🚧 [WiP] Add Janus model
#36053 commented on Mar 4, 2025 • 1 new comment
Support QuestionAnswering Module for ModernBert based models.
#35566 commented on Mar 3, 2025 • 1 new comment
Fix model saving bug post training with tensor parallel in Accelerate
#36434 commented on Mar 3, 2025 • 1 new comment
[ModernBERT] Add CausalLM functionality to ModernBERT
#35946 commented on Mar 3, 2025 • 0 new comments
Add generation config validation using Pydantic
#35910 commented on Mar 3, 2025 • 0 new comments
Introduce modular files for speech models
#35902 commented on Mar 3, 2025 • 0 new comments
Add Doge model
#35891 commented on Mar 2, 2025 • 0 new comments
Adds GGUF support for Gemma models
#35887 commented on Mar 3, 2025 • 0 new comments
Add padding-free to bamba
#35861 commented on Mar 3, 2025 • 0 new comments
Github action for auto-assigning reviewers
#35846 commented on Mar 3, 2025 • 0 new comments
Add MultipleChoice & QuestionAnswering heads to ModernBERT
#35825 commented on Mar 3, 2025 • 0 new comments
Add StyleTTS 2
#35790 commented on Mar 3, 2025 • 0 new comments
Add AIMv2 to Transformers
#35550 commented on Mar 3, 2025 • 0 new comments
🔴 Video processors as a separate class
#35206 commented on Mar 3, 2025 • 0 new comments
Add Phi-3.5-vision
#36036 commented on Mar 3, 2025 • 0 new comments
Proper performant flex attention implementation
#36103 commented on Mar 3, 2025 • 0 new comments
add DeepSpeed tensor parallel initialization.
#36114 commented on Mar 3, 2025 • 0 new comments
Try working around the processor registration bugs
#36184 commented on Mar 3, 2025 • 0 new comments
Add evolla rebase main
#36232 commented on Mar 4, 2025 • 0 new comments
Add support for DeepseekAI's DeepseekVL
#36248 commented on Mar 2, 2025 • 0 new comments
[docs] Update README
#36265 commented on Mar 3, 2025 • 0 new comments
Remove remote code warning
#36285 commented on Mar 3, 2025 • 0 new comments
Fix ONNX export for sequence classification head
#36332 commented on Mar 3, 2025 • 0 new comments
Handle DAC conversion when using weight_norm with newer PyTorch versions
#36393 commented on Mar 2, 2025 • 0 new comments
HPU support
#36424 commented on Mar 3, 2025 • 0 new comments
Avoid error when using `torch.compile` and `DataCollatorWithFlattening`
#36450 commented on Mar 2, 2025 • 0 new comments
Fix fp16 ONNX export for RT-DETR and RT-DETRv2
#36460 commented on Mar 3, 2025 • 0 new comments
Fix incorrect attention mask truncate in WhisperFlashAttention2
#36477 commented on Mar 3, 2025 • 0 new comments
Sanitize Model Module Names to Follow Python Conventions
#36478 commented on Mar 3, 2025 • 0 new comments
Mask2FormerImageProcessor support overlapping features
#35536 commented on Mar 3, 2025 • 0 new comments
model.parameters() return [Parameter containing: tensor([], device='cuda:0', dtype=torch.bfloat16, requires_grad=True)] when using zero3
#35994 commented on Mar 3, 2025 • 0 new comments
Qwen2VLForConditionalGeneration doesn't work with MPS devices
#36413 commented on Mar 3, 2025 • 0 new comments
Support H100 training with FP8 in Trainer and Deepspeed
#25333 commented on Mar 2, 2025 • 0 new comments
Support SDPA & Flash Attention 2 for LayoutLMv3
#35467 commented on Mar 2, 2025 • 0 new comments
LayerDrop broken in various Flax models (Whisper/BART/more...)
#35468 commented on Mar 2, 2025 • 0 new comments
Memory Access out of bounds in mra/cuda_kernel.cu::index_max_cuda_kernel()
#35507 commented on Mar 2, 2025 • 0 new comments
Very slow to load deep seekv3 int4 model and device_map="auto" "sequential" bug
#35522 commented on Mar 2, 2025 • 0 new comments
adalomo and deepspeed zero3 offload error
#35977 commented on Mar 2, 2025 • 0 new comments
llama code break with torch compile
#36484 commented on Mar 2, 2025 • 0 new comments
Speed up image processors - cast to array before BatchFeature
#31205 commented on Mar 2, 2025 • 0 new comments
A warning message showing that `MultiScaleDeformableAttention.so` is not found in `/root/.cache/torch_extensions` if `ninja` is installed with `transformers`
#35349 commented on Mar 2, 2025 • 0 new comments
[Whisper] TypeError: '<=' not supported between instances of 'NoneType' and 'float'
#33552 commented on Mar 1, 2025 • 0 new comments
Saving model with shared tensors fails on cpu but succeeds on gpu
#33688 commented on Mar 1, 2025 • 0 new comments
Tokenizer does not split text according to newly added input tokens
#35447 commented on Mar 1, 2025 • 0 new comments
Can't use Trainer on mps device
#35954 commented on Mar 1, 2025 • 0 new comments
DeepSeek V3 Support
#35425 commented on Mar 1, 2025 • 0 new comments
Samhq model addition
#35147 commented on Mar 3, 2025 • 0 new comments
Add TimesFM Time Series Forecasting Model
#34082 commented on Mar 3, 2025 • 0 new comments
TypeError: CustomTrainer.compute_loss() got an unexpected keyword argument 'num_items_in_batch'
#36331 commented on Mar 4, 2025 • 0 new comments
Groq inference provider
#36353 commented on Mar 3, 2025 • 0 new comments
Dtensor support requires torch>=2.5.1
#36472 commented on Mar 3, 2025 • 0 new comments
Gemma2 (quantized) inference is broken - torch._dynamo.exc.UserError: Dynamic control flow is not supported at the moment.
#36485 commented on Mar 3, 2025 • 0 new comments
The output tensor's data type is not torch.long when the input text is empty.
#36277 commented on Mar 3, 2025 • 0 new comments
Possible bug when using cosine lr scheduler with gradient accumulation
#35484 commented on Mar 3, 2025 • 0 new comments
Accidentally allocating 2x memory in new caching_allocator_warmup
#36483 commented on Mar 3, 2025 • 0 new comments
Misleading documentation for `is_decoder` configuration parameter
#36482 commented on Mar 3, 2025 • 0 new comments
Add type checking to CI
#36481 commented on Mar 3, 2025 • 0 new comments
ViTPose tutorial fails
#36454 commented on Mar 3, 2025 • 0 new comments
SAM mask-generation - crops_n_layers
#35530 commented on Mar 3, 2025 • 0 new comments
Enhance the memory efficiency of loading large models (400B) to prevent out-of-memory errors when using tensor parallelism.
#36467 commented on Mar 3, 2025 • 0 new comments
Add EVEv2 : an Encoder-free VLM
#36379 commented on Mar 3, 2025 • 0 new comments
Inference with FSDP during training affects checkpoints
#34530 commented on Mar 3, 2025 • 0 new comments
Recomputed tensor size does not match when using activation checkpointing when using FSDP and accelerate
#34928 commented on Mar 3, 2025 • 0 new comments

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

February 28, 2025 – March 3, 2025

Overview

Could not load contribution data

14 Pull requests merged by 11 people

17 Pull requests opened by 15 people

18 Issues closed by 8 people

11 Issues opened by 11 people

69 Unresolved conversations

Insights: huggingface/transformers

February 28, 2025 – March 3, 2025

Overview

Could not load contribution data

14 Pull requests merged by 11 people

17 Pull requests opened by 15 people

18 Issues closed by 8 people

11 Issues opened by 11 people

69 Unresolved conversations