Imported from GitHub PR https://github.com/openxla/xla/pull/6150
Fix the comments and logs for CudaAsyncAllocator, not BFCAllocator which is a synchronized cuda memory allocator.
Copybara import of the project:
--
5cfd652cc759c28d6f03d42e82d92bf19d0c558a by Jane Liu <janeliu@nvidia.com>:
Correction of CudaAsyncAllocator, not the BFCAllocator
Merging this change closes#6150
PiperOrigin-RevId: 571868157
When evaluating Tuple(), create a blank Literal for the destination that is populated with
kUndetermined.
We already copy in all known leaves, so all this does is avoid us creating blank leaf Pieces,
which is incredibly important when dealing with large programs pre-sharding. Also note that
we'd instantaneously throw the blank pieces all away...
PiperOrigin-RevId: 571532410
With larger models becoming more prevalent, it is not inconceivable to run into `MetaGraphDef`s larger than 2 GiB. Protobufs larger than 2 GiB cannot be serialized, making these models impossible to process. This commit removes the `SerializeToString` function, and uses pybind to pass the `MetaGraphDef` protobuf from Python to C++. By default, pybind also serializes the protobuf when passing. Using `//third_party/pybind11_protobuf:native_proto_caster` configures pybind to not do any serialization.
PiperOrigin-RevId: 571446639
This dependence is correct for unpipelined Send and Recv sequences but not for
pipelined Send and Recv sequences. We now expect a backend to add control
dependence to represent the intended ordering of Send and Recv, as the GPU HLO
scheduler does, via P2PSchedulePreparation pass.
PiperOrigin-RevId: 571430708
The string "aarch64" does not appear in the list of valid CPUs in
AArch64.td, making it not a valid processor in
llvm/lib/MC/MCSubtargetInfo.cpp:getFeatures, which reports an
"unrecognized CPU" error, ultimately hitting an llvm_unreachable
assertion during code lowering when running tests such as
cpu_eigen_dot_operation_test. Log of erroneous behavior below.
This is very similar to the x86_64 machine description, which also
does not have a true "x86_64" processor listed. Using the
empty string as the cpu type gets translated to "generic", which is
handled properly by the rest of the infra.
Match that behavior for aarch64, which allows the test to pass properly.
Failing log for cpu_eigen_dot_operation test when run on an aarch64 host:
"""
[==========] Running 6 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 6 tests from CpuEigenDotOperationTestInstantiation/CpuEigenDotOperationTest
[ RUN ] CpuEigenDotOperationTestInstantiation/CpuEigenDotOperationTest.SimpleDotOp/F16
'aarch64' is not a recognized processor for this target (ignoring processor)
'aarch64' is not a recognized processor for this target (ignoring processor)
'aarch64' is not a recognized processor for this target (ignoring processor)
'aarch64' is not a recognized processor for this target (ignoring processor)
'aarch64' is not a recognized processor for this target (ignoring processor)
'aarch64' is not a recognized processor for this target (ignoring processor)
Don't know how to custom expand this
UNREACHABLE executed at llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:24052!
"""
PiperOrigin-RevId: 571415910
Imported from GitHub PR https://github.com/tensorflow/tensorflow/pull/56789
This adds xcframework support for 2 of the core libraries now that the
public bazel rules support producing static framework based
xcframeworks. This forks the hide symbols script to be compatible with
operating on multiple frameworks inside a given xcframework. Currently
this rules feature requires bazel 6.x rolling releases because it uses a
new API from the C++ starlark migration. That release should be public
in the next few months, for now this doesn't affect 5.x because the new
targets are manual.
Copybara import of the project:
--
5e90cde3004bd8bc38ab8853f01dc4b2fbe91164 by Keith Smiley <keithbsmiley@gmail.com>:
[iOS] Add initial xcframework bazel support
This adds xcframework support for 2 of the core libraries now that the
public bazel rules support producing static framework based
xcframeworks. This forks the hide symbols script to be compatible with
operating on multiple frameworks inside a given xcframework. Currently
this rules feature requires bazel 6.x rolling releases because it uses a
new API from the C++ starlark migration. That release should be public
in the next few months, for now this doesn't affect 5.x because the new
targets are manual.
Merging this change closes#56789
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/tensorflow/pull/56789 from keith:ks/ios-add-initial-xcframework-bazel-support 5e90cde3004bd8bc38ab8853f01dc4b2fbe91164
PiperOrigin-RevId: 571405194