- Created a `DenormalState` class for easier comparisons.
- Modified denormal module to export `Get`/`SetDenormalState` for testing.
- Added ARM instructions for getting/setting denormal flushing bits.
- Added `denormal_test`.
PiperOrigin-RevId: 359327912
Change-Id: Id6de81872fadbd419a38056f5e5ea76d72b7d8a5
This CL is a non-functional change, except that it fixes a small issue where reducing IndexedSlices with "mean" aggregation would raise an error.
PiperOrigin-RevId: 359327512
Change-Id: Ic21d6c23ded549de70e0ee126e65e47de1d3a06c
- Add SPMD related passes in the GPU compilation pipeline. This includes the domain
isolator at the start of HLO optimizations, and domain remover, sharding propagation,
and SPMD partitioner after a set of HLO optimizations.
- Change GpuLayoutAssignment to initialize channel constraints with a default
constructed constraint (since collective communication ops with channel_id, i.e, cross
host/module collective ops, do not require any specific layout constraint).
PiperOrigin-RevId: 359316087
Change-Id: If0e55db83eea3f5bc01caffed2cc46c2636f4f96
There is an implicit assumption here that Env::Default() is loaded after InitializeCreateGcsFileSystemFnPtr is called, but this seems to hold out in practice. We can also do it the other way around if it changes (e.g. just write out all the Env* variables and then have the initializer register GCS file system to them later too).
PiperOrigin-RevId: 359315295
Change-Id: If40bdf07d6d58d0dd354e8ec1405e79340aa1d2d
Defining dependent dialects for this pass as I am planning to use it in a separate pass pipeline. Most of the TensorFlow passes are not defining this and it doesn't create any issues as other passes in the pipeline declare the required dialects.
PiperOrigin-RevId: 359290671
Change-Id: Ie50f2ba527a619ad6554ff7765a0188a8ccd6d76
IsSupportedNonTFOp is used to check whether case is needed or the ops type can be refined. Previously it only consider TF dialects ops, but ops implementing InferTypeOpInterface also get refined. Expand check to include such ops.
PiperOrigin-RevId: 359197188
Change-Id: I44c6cb0d080a6bcb7e6d173a5c0e11b03aecc691
If a complex value's squared norm was denormal but had a non-zero imaginary part, the Householder reflection computation could yield NaNs. By using a more accurate norm, we can avoid the underflow in this case.
PiperOrigin-RevId: 359180409
Change-Id: I2b6963800da551ab50b4e3e52a06cf92d75c0ee9
For multi-output models, we prefix metric names with their associated output names to disambiguate / uniqify them. When these models are repeatedly saved and loaded, this prefixing repeats as well, leading to long metrics names such as "head_0_head_0_head_0_accuracy", where "head_0" is an output name.
PiperOrigin-RevId: 359151924
Change-Id: I509ea27a7d91446c3893d13d73818f857705ef5f
xprof will display -1 as "18446744073709551615", which can be confusing.
PiperOrigin-RevId: 359150215
Change-Id: I656be876712da5f7543a65c4e968e0cac3a70981
Otherwise we can get inconsistent memory propagations. Consider this example:
fused_comp {
p = s32[1]{0} parameter(0)
...
ROOT b = s32[1]{0} bitcast(p)
}
fusion = s32[1]{0:S(0)} fusion(s32[1]{0:S(1)} foo), fused_computation=fused_comp
If bitcast doesn't define a value, then either fusion operand and parameter or
fusion root and output would disagree about the memory space.
PiperOrigin-RevId: 359147477
Change-Id: Ie785cdf5cc0baeabe4af0f0ec1882592c86d7254