Commit Graph

188611 Commits

Author SHA1 Message Date
Niklas Vangerow
dd10786acd Migrate conditional_test to PjRt.
PiperOrigin-RevId: 847911726
2025-12-22 16:03:25 -08:00
Subhankar Shah
69ea2a9308 Allow prefetching an hlo value if its use is colored in alternate memory even if if loop optimizer has decided otherwise.
PiperOrigin-RevId: 847872344
2025-12-22 13:56:48 -08:00
Haibo Huang
21d80205a6 Add PJRT_Buffer_DonateWithControlDependency to the PJRT C API.
PiperOrigin-RevId: 847868215
2025-12-22 13:40:01 -08:00
Niklas Vangerow
3480eee02b Add HloModuleFromXlaComputation to HloRunnerAgnosticTestBase.
Sometimes it is useful to turn an XlaComputation straight into a HloModule in a
test. This is already functionality we basically support, but until now the
computation had to be in the form of an XlaBuilder, which is not always
practical.

PiperOrigin-RevId: 847856677
2025-12-22 13:03:06 -08:00
Maxim Ermilov
b7650e843b Add proto serialization for CollectivePermuteStartThunk
PiperOrigin-RevId: 847846872
2025-12-22 12:27:26 -08:00
Dirk Hornung
678058948b [Autotuner] Limit CuDNN tests to CuDNN autotuner backend.
PiperOrigin-RevId: 847842272
2025-12-22 12:12:11 -08:00
Maxim Ermilov
1fa15367ad Add proto serialization for RaggedAllToAllStartThunk
PiperOrigin-RevId: 847830182
2025-12-22 11:37:45 -08:00
Byungchul Kim
fec780d7fe Set FC's keep_num_dims to false when output dims is different from input dims after quantization.
On gemma3n with decode batch > 1, it happens when the embedding is coupled with PLE by einsum.
The export steps are:
1) Initial: BMM([b,2048]x[2048,7680] -> [b,7680])
2) FuseInputReshape_BatchMatMulWithFlattenedRhsDims: BMM([b,2048]x[2048,7680] -> [b,7680])
3) ConvertBatchMatMulOp2FullyConnectedOp_Rank2ConstantRhs: FC([b,2048]x[2048,7680] -> [b,7680])
4) StrictQuantizationPattern(by IsDrqTensor): FC([b,1,2048]x[2048,7680] -> [b,7680])

When FC's keep_num_dims is false and it's followed by reshape op (like gemma3n), keep_num_dims will be set to true later with correct shapes by EnableFullyConnectedKeepNumDimsBeforeReshape.

PiperOrigin-RevId: 847813526
2025-12-22 10:45:22 -08:00
Dirk Hornung
9ca49fcfa5 Limit CublasDot deterministic test to Cublas autotuning backend.
PiperOrigin-RevId: 847803638
2025-12-22 10:18:21 -08:00
A. Unique TensorFlower
573bbe2b41 Migrates builder.create<Op>() => Op::create() in tablegen files
PiperOrigin-RevId: 847796796
2025-12-22 09:54:11 -08:00
Oleg Shyshkov
3cec0d7b92 [XLA:GPU] Clean up RaggedAllToAllStartThunk rendezvous helpers.
PiperOrigin-RevId: 847783200
2025-12-22 09:07:49 -08:00
A. Unique TensorFlower
af38f913d0 Automated Code Change
PiperOrigin-RevId: 847756872
2025-12-22 07:39:52 -08:00
Dirk Hornung
3ea706cab3 Add --xla_gpu_experimental_autotune_backends to allow for selecting backends.
This change for the new autotuner. The new autotuner with its Triton backend competes with cuDNN fusions leading to flaky tests. Also some tests disable some autotuning paths via --xla_gpu_cudnn_gemm_fusion_level or --xla_gpu_cublas_fallback which are not fully compatible with the new autotuner. Other tests rely on the order of the backends, which would be resolved by adding a backend selection mechanism.

PiperOrigin-RevId: 847750954
2025-12-22 07:17:41 -08:00
Kanish Anand
4d0edd395f Refactor std::optional comparison in ReshapeSharding tests
PiperOrigin-RevId: 847749800
2025-12-22 07:07:40 -08:00
Henning Becker
12502acbf5 Remove unnecessary if_gpu_is_configured from Triton tests.
The tests in xla/backends/gpu/codegen/triton/BUILD are already configured to run only on specific GPU backends, making the if_gpu_is_configured check on the srcs redundant.

PiperOrigin-RevId: 847738574
2025-12-22 06:26:40 -08:00
Oleg Shyshkov
2f90852c17 [XLA:GPU] Remove TF_ prefix from RETURN_IF_ERROR and ASSIGN_OR_RETURN macros.
PiperOrigin-RevId: 847716343
2025-12-22 04:59:16 -08:00
deeptanshusekhri
d0b7f40548 [tosa] : fixing dynamic batch handling in FullyConnected legalization (#106638) 2025-12-22 04:10:39 -08:00
Dirk Hornung
f5b102299e [Autotuner] Log autotuner config in readable json format. When debugging the autotuner we often want to know the values of the AutotuneConfig.
PiperOrigin-RevId: 847683182
2025-12-22 03:01:32 -08:00
Henning Becker
23dd865ee5 Remove redundant TENSORFLOW_USE_ROCM define.
The `TENSORFLOW_USE_ROCM=1` local define is no longer required for the `rocm_solver_context` target.

PiperOrigin-RevId: 847677878
2025-12-22 02:50:30 -08:00
A. Unique TensorFlower
dfc5b243ca Automated Code Change
PiperOrigin-RevId: 847667783
2025-12-22 02:44:02 -08:00
Dirk Hornung
79af5068fd [Autotuner] Avoid compiling all configurations if we only return the first one. This happens when we want to select the first configuration that successfuly compiles. E.g. for determinism.
PiperOrigin-RevId: 847656341
2025-12-22 02:37:40 -08:00
A. Unique TensorFlower
d48869043b compat: Update forward compatibility horizon to 2025-12-22
PiperOrigin-RevId: 847654748
2025-12-22 02:31:13 -08:00
A. Unique TensorFlower
14b51dd700 Update GraphDef version to 2449.
PiperOrigin-RevId: 847654695
2025-12-22 02:14:02 -08:00
Dirk Hornung
85172d7831 [XLA:GPU] Shard the gpu_compiler_test. The _h100 test regularly causes timeouts.
PiperOrigin-RevId: 847654247
2025-12-22 02:01:39 -08:00
A. Unique TensorFlower
37da2f6658 Automated Code Change
PiperOrigin-RevId: 847648279
2025-12-22 01:41:30 -08:00
A. Unique TensorFlower
ec8a966f0d Automated Code Change
PiperOrigin-RevId: 847644299
2025-12-22 01:33:33 -08:00
A. Unique TensorFlower
2e5b1e44fc Automated Code Change
PiperOrigin-RevId: 847644164
2025-12-22 01:18:15 -08:00
A. Unique TensorFlower
53c2f78993 Automated Code Change
PiperOrigin-RevId: 847643376
2025-12-22 01:10:46 -08:00
A. Unique TensorFlower
3c7c52e730 Automated Code Change
PiperOrigin-RevId: 847641457
2025-12-22 01:02:55 -08:00
Dirk Hornung
6e5d62bf3e Increase shards for fusion_emitter_device_test to speed up the test.
PiperOrigin-RevId: 847632914
2025-12-22 00:51:12 -08:00
A. Unique TensorFlower
bb8c750b2f Automated Code Change
PiperOrigin-RevId: 847628658
2025-12-22 00:40:30 -08:00
Dirk Hornung
b1d2538541 [Autotuner] Initialize random input values for buffer checks. If values are initialized to 0 buffer checker will fail to detect backends with wrong results.
PiperOrigin-RevId: 847627821
2025-12-22 00:31:57 -08:00
A. Unique TensorFlower
7b0d71c54e Automated Code Change
PiperOrigin-RevId: 847625245
2025-12-22 00:13:36 -08:00
A. Unique TensorFlower
64337a1e3c Automated Code Change
PiperOrigin-RevId: 847624939
2025-12-22 00:03:30 -08:00
A. Unique TensorFlower
6165d577f9 Automated Code Change
PiperOrigin-RevId: 847622964
2025-12-21 23:49:29 -08:00
A. Unique TensorFlower
2b621d61f9 Automated Code Change
PiperOrigin-RevId: 847622876
2025-12-21 23:38:53 -08:00
A. Unique TensorFlower
f4e53263b1 Automated Code Change
PiperOrigin-RevId: 847622680
2025-12-21 23:23:39 -08:00
A. Unique TensorFlower
e3e3bc1946 Reverts c549ee47f8
PiperOrigin-RevId: 847535506
2025-12-21 18:14:27 -08:00
Junwhan Ahn
7733c4c03d Use StartDetachedThread instead of SchedClosure to dispatch atom program compilation
PiperOrigin-RevId: 847528854
2025-12-21 17:33:37 -08:00
A. Unique TensorFlower
ce61030c67 Automated Code Change
PiperOrigin-RevId: 847480750
2025-12-21 13:26:51 -08:00
A. Unique TensorFlower
0cfba6c852 Automated Code Change
PiperOrigin-RevId: 847414785
2025-12-21 08:39:12 -08:00
A. Unique TensorFlower
f356a762f3 Automated Code Change
PiperOrigin-RevId: 847414761
2025-12-21 08:15:05 -08:00
A. Unique TensorFlower
630698a3af Automated Code Change
PiperOrigin-RevId: 847414309
2025-12-21 07:40:13 -08:00
A. Unique TensorFlower
f253afed70 Automated Code Change
PiperOrigin-RevId: 847412849
2025-12-21 07:23:29 -08:00
Kanish Anand
5042531aa8 Moving definitions to cpp file, match function definition declaration order
PiperOrigin-RevId: 847385799
2025-12-21 05:00:39 -08:00
A. Unique TensorFlower
a27a856d1c Automated Code Change
PiperOrigin-RevId: 847361371
2025-12-21 02:58:27 -08:00
A. Unique TensorFlower
ffb02301df Update GraphDef version to 2448.
PiperOrigin-RevId: 847339150
2025-12-21 01:32:04 -08:00
A. Unique TensorFlower
e60b3eb362 compat: Update forward compatibility horizon to 2025-12-21
PiperOrigin-RevId: 847339112
2025-12-21 01:18:45 -08:00
Bhupendra Dubey
ff7eb222c2 Refactor XLA Profiler State Check to Use Low-Overhead C API
This CL refactors the XLA profiler's state-checking mechanism to resolve GIL deadlocks and improve performance.

Previously, the C++ profiler context would import a Python module to update the profiler's state. This operation, performed while holding the GIL, could cause deadlocks if the import failed (e.g., in a JAX-only environment).

This change replaces the fragile cross-language import with a shared C++ std::atomic<bool>. Python code now queries this state via a new, low-overhead C function (is_traceme_enabled_raw) instead of ctypes.

This approach eliminates the deadlocks, decouples the C++ profiler from Python modules, and maintains high performance for the state check. The internal C++ API was also updated to use a safer reference instead of a raw pointer.

PiperOrigin-RevId: 847261952
2025-12-20 19:56:52 -08:00
A. Unique TensorFlower
580eeae4c3 Automated Code Change
PiperOrigin-RevId: 847190483
2025-12-20 16:29:24 -08:00