Files
tensorflow/tensorflow
Byungchul Kim fec780d7fe Set FC's keep_num_dims to false when output dims is different from input dims after quantization.
On gemma3n with decode batch > 1, it happens when the embedding is coupled with PLE by einsum.
The export steps are:
1) Initial: BMM([b,2048]x[2048,7680] -> [b,7680])
2) FuseInputReshape_BatchMatMulWithFlattenedRhsDims: BMM([b,2048]x[2048,7680] -> [b,7680])
3) ConvertBatchMatMulOp2FullyConnectedOp_Rank2ConstantRhs: FC([b,2048]x[2048,7680] -> [b,7680])
4) StrictQuantizationPattern(by IsDrqTensor): FC([b,1,2048]x[2048,7680] -> [b,7680])

When FC's keep_num_dims is false and it's followed by reshape op (like gemma3n), keep_num_dims will be set to true later with correct shapes by EnableFullyConnectedKeepNumDimsBeforeReshape.

PiperOrigin-RevId: 847813526
2025-12-22 10:45:22 -08:00
..
2025-12-20 15:30:21 -08:00
2025-12-16 20:37:24 -08:00
2025-12-22 02:44:02 -08:00
2025-12-20 13:49:36 -08:00
2025-11-16 19:46:01 -08:00
2025-11-14 09:17:31 -08:00
2025-11-13 23:21:36 -08:00
2025-11-07 08:06:42 -08:00
2025-12-02 07:53:24 -08:00
2025-12-12 15:55:19 -08:00
2025-12-18 05:22:58 -08:00