Provide an (initially experimental) implementation of the WHATWG Encoding
Standard API (`TextDecoder` and `TextEncoder`). The is the same API
implemented on the browser side.
By default, with small-icu, only the UTF-8, UTF-16le and UTF-16be decoders
are supported. With full-icu enabled, every encoding other than iso-8859-16
is supported.
This provides a basic test, but does not include the full web platform
tests. Note: many of the web platform tests for this would fail by default
because we ship with small-icu by default.
A process warning will be emitted on first use to indicate that the
API is still experimental. No runtime flag is required to use the
feature.
Refs: https://encoding.spec.whatwg.org/
PR-URL: https://github.com/nodejs/node/pull/13644
Reviewed-By: Timothy Gu <timothygu99@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
- Return `MaybeLocal`s from `StringBytes::Encode`
- Add an `error` out parameter to pass JS exceptions to the callers
(instead of directly throwing)
- Simplify some of the string generation methods in `string_bytes.cc`
by unifying the `EXTERN_APEX` logic
- Reduce usage of deprecated V8 APIs.
- Remove error handling logic from JS, the `buffer.*Slice()` methods
now throw errors themselves.
- Left TODO comments for future semver-major error message
improvements.
This paves the way for better error messages coming out of the
StringBytes methods.
Ref: https://github.com/nodejs/node/issues/3175
PR-URL: https://github.com/nodejs/node/pull/12765
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Tobias Nießen <tniessen@tnie.de>
Allow all methods on `buffer` and `Buffer` to take `Uint8Array`
arguments where it makes sense. On the native side, there is
effectively no difference, and as a bonus the `isUint8Array`
check is faster than `instanceof Buffer`.
PR-URL: https://github.com/nodejs/node/pull/10236
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Trevor Norris <trev.norris@gmail.com>
Reviewed-By: Сковорода Никита Андреевич <chalkerx@gmail.com>
Add buffer.transcode(source, from, to) method. Primarily uses ICU
to transcode a buffer's content from one of Node.js' supported
encodings to another.
Originally part of a proposal to add a new unicode module. Decided
to refactor the approach towrds individual PRs without a new module.
Refs: https://github.com/nodejs/node/pull/8075
PR-URL: https://github.com/nodejs/node/pull/9038
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Using the black magic of Symbol.toPrimitive the numeric value of
start/end can be changed when Uint32Value() is called once
Buffer::Fill() is entered. Allowing the CHECK() to be bypassed.
The bug report was only for "start", but the same can be done with
"end". Perform checks for both in node::Buffer::Fill() to make sure the
issue can't be triggered, even if process.binding is used directly.
Include tests for each case. Along with a check to make sure the last
time the value is accessed returns -1. This should be enough to make
sure Buffer::Fill() is receiving the correct value. Along with two tests
against process.binding directly.
Fixes: https://github.com/nodejs/node/issues/9149
PR-URL: https://github.com/nodejs/node/pull/9174
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Franziska Hinkelmann <ranziska.hinkelmann@gmail.com>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
malloc(0) and realloc(ptr, 0) have implementation-defined behavior in
that the standard allows them to either return a unique pointer or a
nullptr for zero-sized allocation requests. Normalize by always using
a nullptr.
- Introduce node::malloc, node::realloc and node::calloc that should
be used throught our source.
- Update all existing node source files to use the new functions
instead of the native allocation functions.
Fixes: https://github.com/nodejs/node/issues/7549
PR-URL: https://github.com/nodejs/node/pull/7564
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Fix handle leaks in Buffer::New() and Buffer::Copy() by creating the
handle scope before looking up the env with Environment::GetCurrent().
Environment::GetCurrent() calls v8::Isolate::GetCurrentContext(), which
creates a handle in the current scope, i.e., the scope created by the
caller of Buffer::New() or Buffer::Copy().
PR-URL: https://github.com/nodejs/node/pull/7711
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Trevor Norris <trev.norris@gmail.com>
`offset` is user supplied variable and may be bigger than
`ts_obj_length`. There is no need to subtract them and pass along, so
just throw when the subtraction result would overflow.
PR-URL: https://github.com/nodejs/node/pull/7494
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
* Speed up buffer.swap16 and swap32 by using builtins. Up to ~6x gain.
Drop transition point between JS and C++ implementations accordingly.
Amount of performance improvement not only depends on buffer size but
also memory alignment.
* Fix tests: C++ impl tests were testing 0-filled buffers so were
always passing.
* Add similar buffer.swap64 method.
* Make buffer-swap benchmark mirror JS impl.
doc/api/buffer.markdown has an entry of "added: REPLACEME" that should
be changed to the correct release number before tagged.
Because node is currently using a very old version of cpplint.py it
doesn't know that std::swap() has moved from <algorithm> to <utility> in
c++11. So until cpplint.py is updated simply NOLINT the line.
Technically it should be NOLINT(build/include_what_you_use), but that
puts the line over 80 characters causing another lint error.
PR-URL: https://github.com/nodejs/node/pull/7157
Reviewed-By: Trevor Norris <trev.norris@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
When node began using the OneByte API (f150d56) it also switched to
officially supporting ISO-8859-1. Though at the time no new encoding
string was introduced.
Introduce the new encoding string 'latin1' to be more explicit. The
previous 'binary' and documented as an alias to 'latin1'. While many
tests have switched to use 'latin1', there are still plenty that do both
'binary' and 'latin1' checks side-by-side to ensure there is no
regression.
PR-URL: https://github.com/nodejs/node/pull/7111
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: James M Snell <jasnell@gmail.com>
Improves performance of allocating unsafe buffers, creating buffers from
an existing ArrayBuffer and creating .slice(...) from existing Buffer by
avoiding deoptimizing change of prototype after Uint8Array allocation
in favor of ES6 native subclassing.
This is done through an internal ES6 class that extends Uint8Array and
is used for allocations, but the regular Buffer function is exposed, so
calling Buffer(...) with or without `new` continues to work as usual
and prototype chains are also preserved.
Performance wins for .slice are +120% (2.2x), and, consequently, for
unsafe allocations up to +95% (1.9x) for small buffers, and for safe
allocations (zero-filled) up to +30% (1.3x).
PR-URL: https://github.com/nodejs/node/pull/6893
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Сковорода Никита Андреевич <chalkerx@gmail.com>
Remove the direct dependency on node::Environment (which is per-context)
from node::ArrayBufferAllocator (which is per-isolate.)
Contexts that want to toggle the zero fill flag, now do so through a
field that is owned by ArrayBufferAllocator. Better, still not great.
PR-URL: https://github.com/nodejs/node/pull/7082
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Trevor Norris <trev.norris@gmail.com>
Fix `buffer.indexOf` for the case that the haystack has odd length
and the needle is not found in it. `StringSearch()` would return
the length of the buffer in multiples of `sizeof(uint16_t)`, but
checking that against `haystack_length` would not work if the latter
one was odd.
PR-URL: https://github.com/nodejs/node/pull/6511
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Trevor Norris <trev.norris@gmail.com>
Use `StringBytes::Size` to determine the needle string length
instead of assuming latin-1 or UTF-8.
Previously, `Buffer.indexOf` could fail with an assertion failure
when the needle's byte length, but not its character count,
exceeded the haystack's byte length.
PR-URL: https://github.com/nodejs/node/pull/6511
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Trevor Norris <trev.norris@gmail.com>
* Remove unnecessary templating from SearchString
SearchString used to have separate PatternChar and SubjectChar template type
arguments, apparently to support things like searching for an 8-bit string
inside a 16-bit string or vice versa. However, SearchString is only used from
node_buffer.cc, where PatternChar and SubjectChar are always the same. Since
this is extra complexity that's unused and untested (simplifying to a single
Char template argument still compiles and didn't break any unit tests), I
removed it.
* Use Boyer-Hoore[-Horspool] for both indexOf and lastIndexOf
Add test cases for lastIndexOf. Test the fallback from BMH to
Boyer-Moore, which looks like it was totally untested before.
* Extra bounds checks in node_buffer.cc
* Extra asserts in string_search.h
* Buffer.lastIndexOf: clean up, enforce consistency w/ String.lastIndexOf
* Polyfill memrchr(3) for non-GNU systems
PR-URL: https://github.com/nodejs/node/pull/4846
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Trevor Norris <trev.norris@gmail.com>
Recent phantom weakness API changes to buffer, ebbbc5a, ending up
introducing an alignment restriction on the native buffer pointers.
It turns out that there are uses in the modules ecosystem that rely
on the ability to create buffers with unaligned pointers (e.g.
node-ffi).
It turns out there is a simpler solution possible here. As a side
effect this also removes the need to have to reserve the first
internal field on buffers.
PR-URL: https://github.com/nodejs/node/pull/5752
Reviewed-By: trevnorris - Trevor Norris <trev.norris@gmail.com>
Reviewed-By: bnoordhuis - Ben Noordhuis <info@bnoordhuis.nl>
Several changes:
* Soft-Deprecate Buffer() constructors
* Add `Buffer.from()`, `Buffer.alloc()`, and `Buffer.allocUnsafe()`
* Add `--zero-fill-buffers` command line option
* Add byteOffset and length to `new Buffer(arrayBuffer)` constructor
* buffer.fill('') previously had no effect, now zero-fills
* Update the docs
PR-URL: https://github.com/nodejs/node/pull/4682
Reviewed-By: Сковорода Никита Андреевич <chalkerx@gmail.com>
Reviewed-By: Stephen Belanger <admin@stephenbelanger.com>
Old style SetWeak is now deprecated, and weakness now works like
phantom references. This means we no longer have a reference to the
object in the weak callback. We use a kInternalFields style weak
callback which provides us with the contents of 2 internal fields
where we can squirrel away the native buffer pointer.
We can no longer neuter the buffer in the weak callback, but that
should be unnecessary as the object is going to be GC'd during the
current gc cycle.
PR-URL: https://github.com/nodejs/node/pull/5204
Reviewed-By: bnoordhuis - Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: indutny - Fedor Indutny <fedor.indutny@gmail.com>
Dynamic checks that CallbackInfo holds an ArrayBuffer handle can be
converted into compiler enforced checks. Removed unused code, and
other minor cleanup.
PR-URL: https://github.com/nodejs/node/pull/5204
Reviewed-By: bnoordhuis - Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: indutny - Fedor Indutny <fedor.indutny@gmail.com>
Can now call fill() using following parameters if value is a String:
fill(string[, start[, end]][, encoding])
And with the following if value is a Buffer:
fill(buffer[, start[, end]])
The encoding is ignored if value is not a String. All other non-Buffer
values are coerced to a uint32.
A multibyte strings will simply be copied into the Buffer until the
number of bytes run out. Meaning partial strings can be left behind:
Buffer(3).fill('\u0222');
// returns: <Buffer c8 a2 c8>
In some encoding cases, such as 'hex', fill() will throw if the input
string is not valid.
PR-URL: https://github.com/nodejs/node/pull/4935
Reviewed-By: James M Snell <jasnell@gmail.com>
If the needle contains an extended latin-1 character then using
String::Utf8Length() will be too large and the search will return early.
Instead use String::Length() when encoding is BINARY.
PR-URL: https://github.com/nodejs/node/pull/4803
Reviewed-By: James M Snell <jasnell@gmail.com>
Versions of Node.js after v0.12 have relocated byte-swapping away from
the StringBytes::Encode function, thereby causing a nan test (which
accesses this function directly) to fail on big-endian machines.
This change re-introduces byte swapping in StringBytes::Encode,
done via a call to a function in util-inl. Another change in
NodeBuffer::StringSlice was necessary to avoid double byte swapping
in big-endian function calls to StringSlice.
PR-URL: https://github.com/nodejs/node/pull/3410
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Trevor Norris <trev.norris@gmail.com>