Refactoring `AssistedCandidateGenerator` for Improved Modularity and Reusability #35009

keyboardAnt · 2024-11-28T21:51:37Z

What does this PR do?

This PR refactors the AssistedCandidateGenerator and AssistedCandidateGeneratorDifferentTokenizers classes to improve code readability, maintainability, and reusability. By breaking down large methods into smaller helper functions and reusing common logic, we enhance the modularity of these classes. This refactoring lays a cleaner foundation for upcoming features like UniversalSpeculativeDecodingGenerator without introducing any new functionality.

Background

While working on Universal Speculative Decoding (#34760), which introduces the UniversalSpeculativeDecodingGenerator, we identified opportunities to refactor existing code. The goal is to reuse core logic across different candidate generators and simplify the integration of new features that enable speculative decoding across models with different tokenizers.

By submitting this refactoring as a separate PR, we aim to:

Streamline the review process: Focus on structural improvements without the complexity of new features.
Accelerate availability: Make these improvements accessible to the community sooner.
Facilitate future development: Provide a robust base for upcoming enhancements in the generation capabilities.

This refactor is a collaboration with @jmamou, who has already reviewed it (keyboardAnt#1).

Key Changes

1. Code Restructuring

Decomposed Large Methods: Broke down the get_candidates methods in both classes into smaller, focused helper functions.
- Example helper functions:
  - _calculate_new_tokens
  - _update_past_and_masks
  - _prepare_generation_args
  - _generate_candidates
Simplified Initialization: Streamlined the __init__ methods to remove redundancy and enhance clarity.

2. Improved Reusability

Modularized Operations: Encapsulated unique functionalities in dedicated methods within each class.
- For AssistedCandidateGeneratorDifferentTokenizers, methods like _prepare_assistant_input_ids and _process_assistant_outputs handle tokenizer-specific logic.
Reused Common Logic: Leveraged inheritance and method overriding to share code between classes and reduce duplication.

3. Enhanced Readability

Clear Naming Conventions: Renamed variables and methods for better clarity and understanding.
Consistent Formatting: Applied uniform code formatting and style guidelines across the classes.
Documentation: Added docstrings and comments to explain the purpose and functionality of methods.

Motivation

This refactoring is motivated by the need to:

Reduce Technical Debt: Clean up the codebase to make it more maintainable.
Facilitate Future Features: Prepare the existing classes for integration with upcoming speculative decoding functionalities.
Encourage Community Contributions: Make the code more approachable for contributors by improving readability and structure.

By isolating these changes, we enable reviewers to focus solely on the structural improvements without the added complexity of new features. This approach helps maintain a high code quality standard and simplifies the review and merging process.

Dependencies

No New Dependencies: This PR does not introduce any new dependencies. It solely refactors existing code.

Before Submitting

Part of Speculative Decoding Effort: This PR is a preparatory step for enhancing speculative decoding capabilities ([OLD] New PR: #35029. [[Universal Speculative Decoding CandidateGenerator]] #34760).
Contributor Guidelines: I have read and followed the contributor guidelines.
No New Functionality: This PR strictly involves refactoring without adding new features.
Testing: All existing tests pass successfully. No new tests are needed as functionality remains unchanged.

Who Can Review?

The following reviewers are well-suited to review this PR: @gante, @ArthurZucker

This PR aims to strengthen the foundation for speculative decoding and other future enhancements by improving the existing code's structure and maintainability. We appreciate your time and look forward to your feedback.

jmamou · 2024-11-29T08:46:20Z

@zucchini-nlp feel free to review it :-)

zucchini-nlp

Thanks a lot for making the code more composable! LGTM as nothing changed in terms of functionality. Just left one comment as we had several PRs in parallel modifying the assisted generation code :)

src/transformers/generation/candidate_generator.py

HuggingFaceDocBuilderDev · 2024-12-03T10:49:44Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

zucchini-nlp

Thanks @keyboardAnt ! LGTM as nothing was changed except for composing code into smaller functions.

The only question is the use of self.prev_target_ids which I've mentioned was removed in prev PR explaining it is not needed. This is what I got from reviewing the prev PR

hmm, so to make sure, that means the prev impl when we checked prev_target_ids was not really correct? And we should check the length of already accepted input_ids

yes

So do we need to save prev token ids from target model or we can re-use the current token ids, because the current token ids in any case will have the prev token ids as prefix with new accepted tokens appended at the end

jmamou · 2024-12-03T11:17:29Z

Thanks @keyboardAnt ! LGTM as nothing was changed except for composing code into smaller functions.

The only question is the use of self.prev_target_ids which I've mentioned was removed in prev PR explaining it is not needed. This is what I got from reviewing the prev PR

hmm, so to make sure, that means the prev impl when we checked prev_target_ids was not really correct? And we should check the length of already accepted input_ids

yes

So do we need to save prev token ids from target model or we can re-use the current token ids, because the current token ids in any case will have the prev token ids as prefix with new accepted tokens appended at the end

For UAG, we need just the number of tokens in self.prev_target_ids (prev_target_ids.shape[1]) in order to know which target tokens were added after the target validation of the last speculative iteration and to convert them to the relevant draft tokens (instead of converting from the beginning).
I propose to use similar approach as in USD when we store only the previous number of target tokens _prev_target_seq_len.

zucchini-nlp · 2024-12-03T12:18:06Z

we need just the number of tokens in self.prev_target_ids (prev_target_ids.shape[1]

Ah that makes sense if we don't care about the actual token ids used previously, because the tokens should be available without storing them. Then we can indeed store only the prev length

…esting file

keyboardAnt · 2024-12-05T02:23:36Z

we need just the number of tokens in self.prev_target_ids (prev_target_ids.shape[1]

Ah that makes sense if we don't care about the actual token ids used previously, because the tokens should be available without storing them. Then we can indeed store only the prev length

Thanks, @zucchini-nlp. I've replaced self.prev_target_ids with self.prev_target_ids_len and rebased main to keep the history clean. Can you please approve the awaiting workflows?

jmamou

@keyboardAnt
please apply the same fix as at keyboardAnt#4.
else SD will not work when target and assistant are not on the same device

jmamou · 2024-12-06T10:10:38Z

@keyboardAnt please apply the same fix as at keyboardAnt#4. else SD will not work when target and assistant are not on the same device

@keyboardAnt
PR #35116 fixes multi-gpu issue

zucchini-nlp · 2024-12-06T10:21:42Z

You can now request review from the core maintainer to merge this :)

jmamou · 2024-12-08T14:23:35Z

You can now request review from the core maintainer to merge this :)

@ArthurZucker

ArthurZucker

Looks great! Thanks for the detailed explanations 🤗
Merging!

Need to dismiss it to merge!

ArthurZucker · 2024-12-10T08:12:04Z

Ah can you resolve conflicts? 🤗

jmamou · 2024-12-10T14:35:13Z

Ah can you resolve conflicts? 🤗

@ArthurZucker done!

keyboardAnt mentioned this pull request Nov 28, 2024

[OLD] New PR: #35029. [[Universal Speculative Decoding CandidateGenerator]] #34760

Closed

6 tasks

keyboardAnt mentioned this pull request Nov 30, 2024

Universal Speculative Decoding CandidateGenerator #35029

Open

6 tasks

zucchini-nlp reviewed Dec 2, 2024

View reviewed changes

src/transformers/generation/candidate_generator.py Outdated Show resolved Hide resolved

keyboardAnt force-pushed the uag-refactor branch from a120dbf to d4ff091 Compare December 3, 2024 00:32

keyboardAnt requested a review from zucchini-nlp December 3, 2024 01:36

zucchini-nlp approved these changes Dec 3, 2024

View reviewed changes

keyboardAnt mentioned this pull request Dec 5, 2024

Replaces self.prev_target_ids by self.prev_target_ids_len keyboardAnt/transformers#5

Merged

keyboardAnt added 10 commits December 4, 2024 21:14

move TestAssistedCandidateGeneratorDifferentTokenizers into a new t…

b6d31e2

…esting file

refactor

7ffa8a5

NOTHING. add space to rerun github actions tests

9a232d7

remove it...

c50b70c

NOTHING. add space to rerun github actions tests

65dda9f

remove it...

cf639ef

replace: self.prev_tokens -> self.prev_assistant_ids

4e12653

NOTHING. rerun CI tests

12fe69e

remove it

1826a1c

introduce self.prev_target_ids_len

fcd129f

keyboardAnt force-pushed the uag-refactor branch from 4745dee to fcd129f Compare December 5, 2024 02:19

jmamou previously requested changes Dec 5, 2024

View reviewed changes

ArthurZucker self-requested a review December 9, 2024 12:34

ArthurZucker approved these changes Dec 10, 2024

View reviewed changes

jmamou added 4 commits December 10, 2024 14:24

Merge branch 'main' into uag-refactor

01b1eec

fix style

0cf8eb9

fix style

f4ccf0d

Merge branch 'main' into uag-refactor

4e7070b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactoring `AssistedCandidateGenerator` for Improved Modularity and Reusability #35009

Refactoring `AssistedCandidateGenerator` for Improved Modularity and Reusability #35009

keyboardAnt commented Nov 28, 2024 •

edited

Loading

jmamou commented Nov 29, 2024

zucchini-nlp left a comment

HuggingFaceDocBuilderDev commented Dec 3, 2024

zucchini-nlp left a comment

jmamou commented Dec 3, 2024

zucchini-nlp commented Dec 3, 2024

keyboardAnt commented Dec 5, 2024 •

edited

Loading

jmamou left a comment •

edited

Loading

jmamou commented Dec 6, 2024

zucchini-nlp commented Dec 6, 2024

jmamou commented Dec 8, 2024

ArthurZucker left a comment

ArthurZucker commented Dec 10, 2024

jmamou commented Dec 10, 2024

Refactoring AssistedCandidateGenerator for Improved Modularity and Reusability #35009

Are you sure you want to change the base?

Refactoring AssistedCandidateGenerator for Improved Modularity and Reusability #35009

Conversation

keyboardAnt commented Nov 28, 2024 • edited Loading

What does this PR do?

Background

Key Changes

1. Code Restructuring

2. Improved Reusability

3. Enhanced Readability

Motivation

Dependencies

Before Submitting

Who Can Review?

jmamou commented Nov 29, 2024

zucchini-nlp left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Dec 3, 2024

zucchini-nlp left a comment

Choose a reason for hiding this comment

jmamou commented Dec 3, 2024

zucchini-nlp commented Dec 3, 2024

keyboardAnt commented Dec 5, 2024 • edited Loading

jmamou left a comment • edited Loading

Choose a reason for hiding this comment

jmamou commented Dec 6, 2024

zucchini-nlp commented Dec 6, 2024

jmamou commented Dec 8, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

ArthurZucker commented Dec 10, 2024

jmamou commented Dec 10, 2024

Refactoring `AssistedCandidateGenerator` for Improved Modularity and Reusability #35009

Refactoring `AssistedCandidateGenerator` for Improved Modularity and Reusability #35009

keyboardAnt commented Nov 28, 2024 •

edited

Loading

keyboardAnt commented Dec 5, 2024 •

edited

Loading

jmamou left a comment •

edited

Loading