add base_layer suffix for expert weights#159
Conversation
There was a problem hiding this comment.
Code Review
This pull request modifies the _add_base_layer_suffix function in megatron.py to handle expert-related parameters. A critical issue was identified where the code could trigger an UnboundLocalError because the variable base_layer_name might be undefined if a parameter name contains 'experts' but does not end with the expected '.weight' or '.bias' suffixes.
| if 'experts' in name: | ||
| return base_layer_name |
There was a problem hiding this comment.
This logic introduces a potential UnboundLocalError. The variable base_layer_name is only defined within the if name.endswith('.weight') or elif name.endswith('.bias') blocks. If name contains 'experts' but does not end with either suffix (e.g., a buffer or a router parameter), the code will crash when attempting to return base_layer_name.
Furthermore, this change bypasses the model_keys validation for expert weights. If this bypass is intentional (to force the suffix for experts regardless of whether the key was found in model_keys), you should still ensure the variable is defined before returning it.
| if 'experts' in name: | |
| return base_layer_name | |
| if 'experts' in name and (name.endswith('.weight') or name.endswith('.bias')): | |
| return base_layer_name |
No description provided.