62981 Commits

Author SHA1 Message Date
Michael Paquier
8e0d32a4a1 Fix orphaned origin in shared memory after DROP SUBSCRIPTION
Since ce0fdbfe97, a replication slot and an origin are created by each
tablesync worker, whose information is stored in both a catalog and
shared memory (once the origin is set up in the latter case).  The
transaction where the origin is created is the same as the one that runs
the initial COPY, with the catalog state of the origin becoming visible
for other sessions only once the COPY transaction has committed.  The
catalog state is coupled with a state in shared memory, initialized at
the same time as the origin created in the catalogs.  Note that the
transaction doing the initial data sync can take a long time, time that
depends on the amount of data to transfer from a publication node to its
subscriber node.

Now, when a DROP SUBSCRIPTION is executed, all its workers are stopped
with the origins removed.  The removal of each origin relies on a
catalog lookup.  A worker still running the initial COPY would fail its
transaction, with the catalog state of the origin rolled back while the
shared memory state remains around.  The session running the DROP
SUBSCRIPTION should be in charge of cleaning up the catalog and the
shared memory state, but as there is no data in the catalogs the shared
memory state is not removed.  This issue would leave orphaned origin
data in shared memory, leading to a confusing state as it would still
show up in pg_replication_origin_status.  Note that this shared memory
data is sticky, being flushed on disk in replorigin_checkpoint at
checkpoint.  This prevents other origins from reusing a slot position
in the shared memory data.

To address this problem, the commit moves the creation of the origin at
the end of the transaction that precedes the one executing the initial
COPY, making the origin immediately visible in the catalogs for other
sessions, giving DROP SUBSCRIPTION a way to know about it.  A different
solution would have been to clean up the shared memory state using an
abort callback within the tablesync worker.  The solution of this commit
is more consistent with the apply worker that creates an origin in a
short transaction.

A test is added in the subscription test 004_sync.pl, which was able to
display the problem.  The test fails when this commit is reverted.

Reported-by: Tenglong Gu <brucegu@amazon.com>
Reported-by: Daisuke Higuchi <higudai@amazon.com>
Analyzed-by: Michael Paquier <michael@paquier.xyz>
Author: Hou Zhijie <houzj.fnst@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/aUTekQTg4OYnw-Co@paquier.xyz
Backpatch-through: 14
2025-12-23 14:32:14 +09:00
Bruce Momjian
ea97154fc2 doc: add "DO" to "ON CONFLICT" in CREATE VIEW text
This is done for consistency.

Reported-by: jian he

Author: Laurenz Albe

Discussion: https://postgr.es/m/CACJufxEW1RRDD9ZWGcW_Np_Z9VGPE-YC7u0C6RcsEY8EKiTdBg@mail.gmail.com
2025-12-22 19:42:11 -05:00
Michael Paquier
e5f3839af6 Switch buffile.c/h to use pgoff_t instead of off_t
off_t was previously used for offsets, which is 4 bytes on Windows,
hence limiting the backend code to a hard limit for files longer than
2GB.  This leads to some simplification in these files, removing some
casts based on long, also 4 bytes on Windows.

This commit removes one comment introduced in db3c4c3a2d, not relevant
anymore as pgoff_t is a safe 8-byte alternative on Windows.

This change is surprisingly not invasive, as the callers of
BufFileTell(), BufFileSeek() and BufFileTruncateFileSet() (worker.c,
tuplestore.c, etc.) track offsets in local structures that just to
switch from off_t to pgoff_t for the most part.

The file is still relying on a maximum file size of
MAX_PHYSICAL_FILESIZE (1GB).  This change allows the code to make this
maximum potentially larger in the future, or larger on a per-demand
basis.

Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/aUStrqoOCDRFAq1M@paquier.xyz
2025-12-23 07:41:34 +09:00
Masahiko Sawada
c6a7d3bab4 psql: Improve tab completion for COPY option lists.
Previously, only the first option in a parenthesized option list was
suggested by tab completion. This commit enhances tab completion for
both COPY TO and COPY FROM commands to suggest options after each
comma.

Also add completion for HEADER and FREEZE option value candidates.

Author: Yugo Nagata <nagata@sraoss.co.jp>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/20250605100835.b396f9d656df1018f65a4556@sraoss.co.jp
2025-12-22 14:28:12 -08:00
Tom Lane
37a1688a1b Add missing .gitignore for src/test/modules/test_cloexec. 2025-12-22 14:06:54 -05:00
Fujii Masao
c5d162435a doc: Fix incorrect reference in pg_overexplain documentation.
Correct the referenced location of the RangeTblEntry definition
in the pg_overexplain documentation.

Backpatched to v18, where pg_overexplain was introduced.

Author: Julien Tachoires <julien@tachoires.me>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/20251218092319.tht64ffmcvzqdz7u@poseidon.home.virt
Backpatch-through: 18
2025-12-22 17:56:28 +09:00
Michael Paquier
d31276f0e2 Fix another typo in gininsert.c
Reported-by: Tender Wang <tndrwang@gmail.com>
Discussion: https://postgr.es/m/CAHewXNkRJ9DMFZMQKWQ32U+OTBR78KeGh2=9Wy5jEeWDxMVFcQ@mail.gmail.com
2025-12-22 12:38:40 +09:00
Peter Geoghegan
fab5cd3dd1 Remove obsolete name_ops index-only scan comments.
nbtree index-only scans of an index that uses btree/name_ops as one of
its index column's input opclasses are no longer at any risk of reading
past the end of currTuples.  We're no longer reliant on such scans being
able to at least read from the start of markTuples storage (which uses
space from the same allocation as currTuples) to avoid a segfault:
StoreIndexTuple (from nodeIndexonlyscan.c) won't actually read past the
end of a cstring datum from a name_ops index.  In other words, we
already have the "special-case treatment for name_ops" that the removed
comment supposed we could avoid by relying on markTuples in this way.

Oversight in commit a63224be49, which added special case handling of
name_ops cstrings to StoreIndexTuple, but missed these comments.
2025-12-21 12:27:38 -05:00
Thomas Munro
bec2a0aa30 Clean up test_cloexec.c and Makefile.
An unused variable caused a compiler warning on BF animal fairywren, an
snprintf() call was redundant, and some buffer sizes were inconsistent.
Per code review from Tom Lane.

The Makefile's test ifeq ($(PORTNAME), win32) never succeeded due to a
circularity, so only Meson builds were actually compiling the new test
code, partially explaining why CI didn't tell us about the warning
sooner (the other problem being that CompilerWarnings only makes
world-bin, a problem for another commit).  Simplify.

Backpatch-through: 16, like commit c507ba55
Author: Bryan Green <dbryan.green@gmail.com>
Co-authored-by: Thomas Munro <tmunro@gmail.com>
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/1086088.1765593851%40sss.pgh.pa.us
2025-12-21 17:34:20 +13:00
Andres Freund
548de59d93 heapam: Move logic to handle HEAP_MOVED into a helper function
Before we dealt with this in 6 near identical and one very similar copy.

The helper function errors out when encountering a
HEAP_MOVED_IN/HEAP_MOVED_OUT tuple with xvac considered current or
in-progress. It'd be preferrable to do that change separately, but otherwise
it'd not be possible to deduplicate the handling in
HeapTupleSatisfiesVacuum().

Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/lxzj26ga6ippdeunz6kuncectr5gfuugmm2ry22qu6hcx6oid6@lzx3sjsqhmt6
Discussion: https://postgr.es/m/6rgb2nvhyvnszz4ul3wfzlf5rheb2kkwrglthnna7qhe24onwr@vw27225tkyar
2025-12-19 13:28:34 -05:00
Andres Freund
09ae2c8bac bufmgr: Optimize & harmonize LockBufHdr(), LWLockWaitListLock()
The main optimization is for LockBufHdr() to delay initializing
SpinDelayStatus, similar to what LWLockWaitListLock already did. The
initialization is sufficiently expensive & buffer header lock acquisitions are
sufficiently frequent, to make it worthwhile to instead have a fastpath (via a
likely() branch) that does not initialize the SpinDelayStatus.

While LWLockWaitListLock() already the aforementioned optimization, it did not
use likely(), and inspection of the assembly shows that this indeed leads to
worse code generation (also observed in a microbenchmark). Fix that by adding
the likely().

While the LockBufHdr() improvement is a small gain on its own, it mainly is
aimed at preventing a regression after a future commit, which requires
additional locking to set hint bits.

While touching both, also make the comments more similar to each other.

Reviewed-by: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff
2025-12-19 13:23:33 -05:00
Bruce Momjian
80f08a6e6a doc: clarify when physical/logical replication is used
The imprecision caused some text to be only partially accurate.

Reported-by: Paul A Jungwirth

Author: Robert Treat

Discussion: https://postgr.es/m/CA%2BrenyULt3VBS1cRFKUfT2%3D5dr61xBOZdAZ-CqX3XLGXqY-aTQ%40mail.gmail.com
2025-12-19 12:01:23 -05:00
Heikki Linnakangas
47a9f61fca Use proper type for RestoreTransactionSnapshot's PGPROC arg
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/08cbaeb5-aaaf-47b6-9ed8-4f7455b0bc4b@iki.fi
2025-12-19 13:40:02 +02:00
John Naylor
44f49511b7 Update pg_hba.conf example to reflect MD5 deprecation
In the wake of commit db6a4a985, remove most use of 'md5' from the
example configuration file. The only remainder is an example exception
for a client that doesn't support SCRAM.

Author: Mikael Gustavsson <mikael.gustavsson@smhi.se>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Andreas Karlsson <andreas@proxel.se>
Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at>
Discussion: https://postgr.es/m/176595607507.978865.11597773194269211255@wrigleys.postgresql.org
Discussion: https://postgr.es/m/4ed268473fdb4cf9b0eced6c8019d353@smhi.se
Backpatch-through: 18
2025-12-19 15:48:18 +07:00
Michael Paquier
5cdbec5aa9 Fix typos in gininsert.c
Introduced by 8492feb98f.

Author: Xingbin She <xingbin.she@qq.com>
Discussion: https://postgr.es/m/tencent_C254AE962588605F132DB4A6F87205D6A30A@qq.com
2025-12-19 14:33:38 +09:00
Fujii Masao
b3ccb0a2cb Add guard to prevent recursive memory context logging.
Previously, if memory context logging was triggered repeatedly and
rapidly while a previous request was still being processed, it could
result in recursive calls to ProcessLogMemoryContextInterrupt().
This could lead to infinite recursion and potentially crash the process.

This commit adds a guard to prevent such recursion.
If ProcessLogMemoryContextInterrupt() is already in progress and
logging memory contexts, subsequent calls will exit immediately,
avoiding unintended recursive calls.

While this scenario is unlikely in practice, it's not impossible.
This change adds a safety check to prevent such failures.

Back-patch to v14, where memory context logging was introduced.

Reported-by: Robert Haas <robertmhaas@gmail.com>
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Artem Gavrilov <artem.gavrilov@percona.com>
Discussion: https://postgr.es/m/CA+TgmoZMrv32tbNRrFTvF9iWLnTGqbhYSLVcrHGuwZvCtph0NA@mail.gmail.com
Backpatch-through: 14
2025-12-19 12:05:37 +09:00
Michael Paquier
9d0f7996e5 Use table/index_close() more consistently
All the code paths updated here have been using relation_close() to
close a relation that has already been opened with table_open() or
index_open(), where a relkind check is enforced.

table_close() and index_open() do the same thing as relation_close(), so
there was no harm, but being inconsistent could lead to issues if the
internals of these close() functions begin to introduce some logic
specific to each relkind in the future.

Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/aUKamYGiDKO6byp5@ip-10-97-1-34.eu-west-3.compute.internal
2025-12-19 07:55:58 +09:00
Noah Misch
d49936f302 Sort DO_SUBSCRIPTION_REL dump objects independent of OIDs.
Commit 0decd5e89d missed
DO_SUBSCRIPTION_REL, leading to assertion failures.  In the unlikely use
case of diffing "pg_dump --binary-upgrade" output, spurious diffs were
possible.  As part of fixing that, align the DumpableObject naming and
sort order with DO_PUBLICATION_REL.  The overall effect of this commit
is to change sort order from (subname, srsubid) to (rel, subname).
Since DO_SUBSCRIPTION_REL is only for --binary-upgrade, accept that
larger-than-usual dump order change.  Back-patch to v17, where commit
9a17be1e24 introduced DO_SUBSCRIPTION_REL.

Reported-by: vignesh C <vignesh21@gmail.com>
Author: vignesh C <vignesh21@gmail.com>
Discussion: https://postgr.es/m/CALDaNm2x3rd7C0_HjUpJFbxpAqXgm=QtoKfkEWDVA8h+JFpa_w@mail.gmail.com
Backpatch-through: 17
2025-12-18 10:23:47 -08:00
Heikki Linnakangas
951b60f7ab Do not emit WAL for unlogged BRIN indexes
Operations on unlogged relations should not be WAL-logged. The
brin_initialize_empty_new_buffer() function didn't get the memo.

The function is only called when a concurrent update to a brin page
uses up space that we're just about to insert to, which makes it
pretty hard to hit. If you do manage to hit it, a full-page WAL record
is erroneously emitted for the unlogged index. If you then crash,
crash recovery will fail on that record with an error like this:

    FATAL:  could not create file "base/5/32819": File exists

Author: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://www.postgresql.org/message-id/CALdSSPhpZXVFnWjwEBNcySx_vXtXHwB2g99gE6rK0uRJm-3GgQ@mail.gmail.com
Backpatch-through: 14
2025-12-18 15:08:48 +02:00
Amit Kapila
b47c50e566 Fix intermittent BF failure in 040_standby_failover_slots_sync.
Commit 0d2d4a0ec3 introduced a test that verifies replication slot
synchronization to a standby server via SQL API. However, the test did not
configure synchronized_standby_slots. Without this setting, logical
failover slots can advance beyond the physical replication slot, causing
intermittent synchronization failures.

Author: Hou Zhijie <houzj.fnst@fujitsu.com>
Discussion: https://postgr.es/m/TY4PR01MB16907DF70205308BE918E0D4494ABA@TY4PR01MB16907.jpnprd01.prod.outlook.com
2025-12-18 05:06:55 +00:00
Michael Paquier
5cf03552fb btree_gist: Fix memory allocation formula
This change has been suggested by the two authors listed in this commit,
both of them providing an incomplete solution (David's formula relied on
a "bytea *", while Bertrand's did not use palloc_array()).  The solution
provided in this commit uses GBT_VARKEY instead of the inconsistent
bytea for the allocation size, with a palloc_array().

The change related to Vsrt is one I am flipping to a more consistent
style, in passing.

Author: David Geier <geidav.pg@gmail.com>
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/ad0748d4-3080-436e-b0bc-ac8f86a3466a@gmail.com
Discussion: https://postgr.es/m/aTrG3Fi4APtfiCvQ@ip-10-97-1-34.eu-west-3.compute.internal
2025-12-18 11:01:43 +09:00
Michael Paquier
167cb26718 Fix const correctness in pgstat data serialization callbacks
4ba012a8ed defined the "header" (pointer to the stats data) of
from_serialized_data() as a const, even though it is fine (and
expected!) for the callback to modify the shared memory entry when
loading the stats at startup.

While on it, this commit updates the callback to_serialized_data() in
the test module test_custom_stats to make the data extracted from the
"header" parameter a const since it should never be modified: the stats
are written to disk and no modifications are expected in the shared
memory entry.

This clarifies the API contract of these new callbacks.

Reported-By: Peter Eisentraut <peter@eisentraut.org>
Author: Michael Paquier <michael@paquier.xyz>
Co-authored-by: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/d87a93b0-19c7-4db6-b9c0-d6827e7b2da1@eisentraut.org
2025-12-18 07:33:40 +09:00
Jacob Champion
ab8af1db43 oauth_validator: Avoid races in log_check()
Commit e0f373ee4 fixed up races in Cluster::connect_fails when using
log_like. t/002_client.pl didn't get the memo, though, because it
doesn't use Test::Cluster to perform its custom hook tests. (This is
probably not an issue at the moment, since the log check is only done
after authentication success and not failure, but there's no reason to
wait for someone to hit it.)

Introduce the fix, based on debug2 logging, to its use of log_check() as
well, and move the logic into the test() helper so that any additions
don't need to continually duplicate it.

Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAOYmi%2BmrGg%2Bn_X2MOLgeWcj3v_M00gR8uz_D7mM8z%3DdX1JYVbg%40mail.gmail.com
Backpatch-through: 18
2025-12-17 11:46:05 -08:00
Jacob Champion
781ca72139 libpq-oauth: use correct c_args in meson.build
Copy-paste bug from b0635bfda: libpq-oauth.so was being built with
libpq_so_c_args, rather than libpq_oauth_so_c_args. (At the moment, the
two lists are identical, but that won't be true forever.)

Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAOYmi%2BmrGg%2Bn_X2MOLgeWcj3v_M00gR8uz_D7mM8z%3DdX1JYVbg%40mail.gmail.com
Backpatch-through: 18
2025-12-17 11:46:01 -08:00
Jacob Champion
8b217c96ea libpq-fe.h: Don't claim SOCKTYPE in the global namespace
The definition of PGoauthBearerRequest uses a temporary SOCKTYPE macro
to hide the difference between Windows and Berkeley socket handles,
since we don't surface pgsocket in our public API. This macro doesn't
need to escape the header, because implementers will choose the correct
socket type based on their platform, so I #undef'd it immediately after
use.

I didn't namespace that helper, though, so if anyone else needs a
SOCKTYPE macro, libpq-fe.h will now unhelpfully get rid of it. This
doesn't seem too far-fetched, given its proximity to existing POSIX
macro names.

Add a PQ_ prefix to avoid collisions, update and improve the surrounding
documentation, and backpatch.

Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAOYmi%2BmrGg%2Bn_X2MOLgeWcj3v_M00gR8uz_D7mM8z%3DdX1JYVbg%40mail.gmail.com
Backpatch-through: 18
2025-12-17 11:45:52 -08:00
Tom Lane
5b4fb2b97d Rename regress.so's .mo file to postgresql-regress-VERSION.mo.
I originally used just "regress-VERSION.mo", but that seems too
generic considering that some packagers will put this file into
a system-wide directory.  Per suggestion from Christoph Berg.

Reported-by: Christoph Berg <myon@debian.org>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/aULSW7Xqx5MqDW_1@msg.df7cb.de
2025-12-17 14:10:42 -05:00
Heikki Linnakangas
5cbaa00592 Make postmaster 003_start_stop.pl test less flaky
The test is very sensitive to how backends start and exit, because it
tests dead-end backends which occur when all the connection slots are
in use. The test failed occasionally in the CI, when the backend that
was launched for the raw_connect_works() check lingered for a while,
and exited only later during the test. When it exited, it released a
connection slot, when the test expected all the slots to be in use at
that time.

The 002_connection_limits.pl test had a similar issue: if the backend
launched for safe_psql() in the test initialization lingers around, it
uses up a connection slot during the test, messing up the test's
connection counting. I haven't seen that in the CI, but when I added a
"sleep(1);" to proc_exit(), the test failed.

To make the tests more robust, restart the server to ensure that the
lingering backends doesn't interfere with the later test steps.

In the passing, fix a bogus test name.

Report and analysis by Jelte Fennema-Nio, Andres Freund, Thomas Munro.

Discussion: https://www.postgresql.org/message-id/CAGECzQSU2iGuocuP+fmu89hmBmR3tb-TNyYKjCcL2M_zTCkAFw@mail.gmail.com
Backpatch-through: 18
2025-12-17 16:23:13 +02:00
Amit Kapila
85ddcc2f4c Support existing publications in pg_createsubscriber.
Allow pg_createsubscriber to reuse existing publications instead of
failing when they already exist on the publisher.

Previously, pg_createsubscriber would fail if any specified publication
already existed. Now, existing publications are reused as-is with their
current configuration, and non-existing publications are created
automatically with FOR ALL TABLES.

This change provides flexibility when working with mixed scenarios of
existing and new publications. Users should verify that existing
publications have the desired configuration before reusing them, and can
use --dry-run with verbose mode to see which publications will be reused
and which will be created.

Only publications created by pg_createsubscriber are cleaned up during
error cleanup operations. Pre-existing publications are preserved unless
'--clean=publications' is explicitly specified, which drops all
publications.

This feature would be helpful for pub-sub configurations where users want
to subscribe to a subset of tables from the publisher.

Author: Shubham Khanna <khannashubham1197@gmail.com>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Reviewed-by: tianbing <tian_bing_0531@163.com>
Discussion: https://postgr.es/m/CAHv8Rj%2BsxWutv10WiDEAPZnygaCbuY2RqiLMj2aRMH-H3iZwyA%40mail.gmail.com
2025-12-17 09:43:53 +00:00
Michael Paquier
f4e797171e Change pgstat_report_vacuum() to use Relation
This change makes pgstat_report_vacuum() more consistent with
pgstat_report_analyze(), that also uses a Relation.  This enforces a
policy that callers of this routine should open and lock the relation
whose statistics are updated before calling this routine.  We will
unlikely have a lot of callers of this routine in the tree, but it seems
like a good idea to imply this requirement in the long run.

Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Suggested-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/aUEA6UZZkDCQFgSA@ip-10-97-1-34.eu-west-3.compute.internal
2025-12-17 11:26:17 +09:00
Michael Paquier
1d325ad99c Remove useless code in InjectionPointAttach()
strlcpy() ensures that a target string is zero-terminated, so there is
no need to enforce it a second time in this code.  This simplification
could have been done in 0eb23285a2.

Author: Feilong Meng <feelingmeng@foxmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/tencent_771178777C5BC17FCB7F7A1771CD1FFD5708@qq.com
2025-12-17 08:58:58 +09:00
Jeff Davis
0a90df58cf Avoid global LC_CTYPE dependency in pg_locale_icu.c.
ICU still depends on libc for compatibility with certain historical
behavior for single-byte encodings. Make the dependency explicit by
holding a locale_t object when required.

We should consider a better solution in the future, such as decoding
the text to UTF-32 and using u_tolower(). That would be a behavior
change and require additional infrastructure though; so for now, just
avoid the global LC_CTYPE dependency.

Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com
2025-12-16 15:32:57 -08:00
Jeff Davis
87b2968df0 downcase_identifier(): use method table from locale provider.
Previously, libc's tolower() was always used for lowercasing
identifiers, regardless of the database locale (though only characters
beyond 127 in single-byte encodings were affected). Refactor to allow
each provider to supply its own implementation of identifier
downcasing.

For historical compatibility, when using a single-byte encoding, ICU
still relies on tolower().

One minor behavior change is that, before the database default locale
is initialized, it uses ASCII semantics to downcase the
identifiers. Previously, it would use the postmaster's LC_CTYPE
setting from the environment. While that could have some effect during
GUC processing, for example, it would have been fragile to rely on the
environment setting anyway. (Also, it only matters when the encoding
is single-byte.)

Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com
2025-12-16 15:32:41 -08:00
Jeff Davis
7f007e4a04 ltree: fix case-insensitive matching.
Previously, ltree_prefix_eq_ci() used lowercasing with the default
collation; while ltree_crc32_sz() used tolower() directly. These were
equivalent only if the default collation provider was libc and the
encoding was single-byte.

Change both to use casefolding with the default collation.

Backpatch through 18, where the casefolding APIs were introduced. The
bug exists in earlier versions, but would require some adaptation.

A REINDEX is required for ltree indexes where the database default
collation is not libc.

Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Backpatch-through: 18
Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com
Discussion: https://postgr.es/m/01fc00fd66f641b9693d4f9f1af0ccf44cbdfbdf.camel@j-davis.com
2025-12-16 12:57:00 -08:00
Jeff Davis
24bf379cb1 Clarify a #define introduced in 8d299052fe.
The value is the same, but use the right symbol for clarity.
2025-12-16 12:48:53 -08:00
Jeff Davis
84d5efa7e3 Fix multibyte issue in ltree_strncasecmp().
Previously, the API for ltree_strncasecmp() took two inputs but only
one length (that of the smaller input). It truncated the larger input
to that length, but that could break a multibyte sequence.

Change the API to be a check for prefix equality (possibly
case-insensitive) instead, which is all that's needed by the
callers. Also, provide the lengths of both inputs.

Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/5f65b85740197ba6249ea507cddf609f84a6188b.camel%40j-davis.com
Backpatch-through: 14
2025-12-16 10:35:40 -08:00
Robert Haas
f1a6e622bd Switch memory contexts in ReinitializeParallelDSM.
We already do this in CreateParallelContext, InitializeParallelDSM, and
LaunchParallelWorkers. I suspect the reason why the matching logic was
omitted from ReinitializeParallelDSM is that I failed to realize that
any memory allocation was happening here -- but shm_mq_attach does
allocate, which could result in a shm_mq_handle being allocated in a
shorter-lived context than the ParallelContext which points to it.

That could result in a crash if the shorter-lived context is freed
before the parallel context is destroyed. As far as I am currently
aware, there is no way to reach a crash using only code that is
present in core PostgreSQL, but extensions could potentially trip
over this. Fixing this in the back-branches appears low-risk, so
back-patch to all supported versions.

Author: Jakub Wartak <jakub.wartak@enterprisedb.com>
Co-authored-by: Jeevan Chalke <jeevan.chalke@enterprisedb.com>
Backpatch-through: 14
Discussion: http://postgr.es/m/CAKZiRmwfVripa3FGo06=5D1EddpsLu9JY2iJOTgbsxUQ339ogQ@mail.gmail.com
2025-12-16 12:24:55 -05:00
Tom Lane
462e247652 Test PRI* macros even when we can't test NLS translation.
Further research shows that the reason commit 7db6809ce failed
is that recent glibc versions short-circuit translation attempts
when LC_MESSAGES is 'C.<encoding>', not only when it's 'C'.
There seems no way around that, so we'll have to live with only
testing NLS when a suitable real locale is installed.

However, something can still be salvaged: it still seems like a
good idea to verify that the PRI* macros work as-expected even when
we can't check their translations (see f8715ec86 for motivation).
Hence, adjust the test to always run the ereport calls, and tweak
the parameter values in hopes of detecting any cases where there's
confusion about the actual widths of the parameters.

Discussion: https://postgr.es/m/1991599.1765818338@sss.pgh.pa.us
2025-12-16 12:01:46 -05:00
Melanie Plageman
bfe5c4bec7 Add explanatory comment to prune_freeze_setup()
heap_page_prune_and_freeze() fills in PruneState->deadoffsets, the array
of OffsetNumbers of dead tuples. It is returned to the caller in the
PruneFreezeResult. To avoid having two copies of the array, the
PruneState saves only a pointer to the array. This was a bit unusual and
confusing, so add a clarifying comment.

Author: Melanie Plageman <melanieplageman@gmail.com>
Suggested-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAEoWx2=jiD1nqch4JQN+odAxZSD7mRvdoHUGJYN2r6tQG_66yQ@mail.gmail.com
2025-12-16 11:04:07 -05:00
Melanie Plageman
4877391ce8 Fix const qualification in prune_freeze_setup()
The const qualification of the presult argument to prune_freeze_setup()
is later cast away, so it was not correct. Remove it and add a comment
explaining that presult should not be modified.

Author: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/fb97d0ae-a0bc-411d-8a87-f84e7e146488%40eisentraut.org
2025-12-16 11:00:05 -05:00
Daniel Gustafsson
b39013b7b1 doc: Update header file mention for CompareType
Commit 119fc30 moved CompareType to cmptype.h but the mention in
the docs still refered to primnodes.h

Author: Daisuke Higuchi <higuchi.daisuke11@gmail.com>
Reviewed-by: Paul A Jungwirth <pj@illuminatedcomputing.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/CAEVT6c8guXe5P=L_Un5NUUzCgEgbHnNcP+Y3TV2WbQh-xjiwqA@mail.gmail.com
Backpatch-through: 18
2025-12-16 09:46:53 +01:00
John Naylor
9303d62c6d Separate out bytea sort support from varlena.c
In the wake of commit b45242fd3, bytea_sortsupport() still called out
to varstr_sortsupport(). Treating bytea as a kind of text/varchar
required varstr_sortsupport() to allow for the possibility of
NUL bytes, but only for C collation. This was confusing. For
better separation of concerns, create an independent sortsupport
implementation in bytea.c.

The heuristics for bytea_abbrev_abort() remain the same as for
varstr_abbrev_abort(). It's possible that the bytea case warrants
different treatment, but that is left for future investigation.

In passing, adjust some strange looking comparisons in
varstr_abbrev_abort().

Author: Aleksander Alekseev <aleksander@tigerdata.com>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAJ7c6TP1bAbEhUJa6+rgceN6QJWMSsxhg1=mqfSN=Nb-n6DAKg@mail.gmail.com
2025-12-16 15:19:16 +07:00
Michael Paquier
15f68cebdc Add TAP test to check recovery when redo LSN is missing
This commit provides test coverage for dc7c77f825, where the redo
record and the checkpoint record finish on different WAL segments with
the start of recovery able to detect that the redo record is missing.

This test uses a wait injection point done in the critical section of a
checkpoint, method that requires not one but actually two wait injection
points to avoid any memory allocations within the critical section of
the checkpoint:
- Checkpoint run with a background psql.
- One first wait point is run by the checkpointer before the critical
section, allocating the shared memory required by the DSM registry for
the wait machinery in the library injection_points.
- First point is woken up.
- Second wait point is loaded before the critical section, allocating
the memory to build the path to the library loaded, then run in the
critical section once the checkpoint redo record has been logged.
- WAL segment is switched while waiting on the second point.
- Checkpoint completes.
- Stop cluster with immediate mode.
- The segment that includes the redo record is removed.
- Start, recovery fails as the redo record cannot be found.

The error message introduced in dc7c77f825 is now reduced to a FATAL,
meaning that the information is still provided while being able to use a
test for it.  Nitin has provided a basic version of the test, that I
have enhanced to make it portable with two points.  Without
dc7c77f825, the cluster crashes in this test, not on a PANIC but due
to the pointer dereference at the beginning of recovery, failure
mentioned in the other commit.

Author: Nitin Jadhav <nitinjadhavpostgres@gmail.com>
Co-authored-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CAMm1aWaaJi2w49c0RiaDBfhdCL6ztbr9m=daGqiOuVdizYWYaA@mail.gmail.com
2025-12-16 14:28:05 +09:00
Michael Paquier
dc7c77f825 Fail recovery when missing redo checkpoint record without backup_label
This commit adds an extra check at the beginning of recovery to ensure
that the redo record of a checkpoint exists before attempting WAL
replay, logging a PANIC if the redo record referenced by the checkpoint
record could not be found.  This is the same level of failure as when a
checkpoint record is missing.  This check is added when a cluster is
started without a backup_label, after retrieving its checkpoint record.
The redo LSN used for the check is retrieved from the checkpoint record
successfully read.

In the case where a backup_label exists, the startup process already
fails if the redo record cannot be found after reading a checkpoint
record at the beginning of recovery.

Previously, the presence of the redo record was not checked.  If the
redo and checkpoint records were located on different WAL segments, it
would be possible to miss a entire range of WAL records that should have
been replayed but were just ignored.  The consequences of missing the
redo record depend on the version dealt with, these becoming worse the
older the version used:
- On HEAD, v18 and v17, recovery fails with a pointer dereference at the
beginning of the redo loop, as the redo record is expected but cannot be
found.  These versions are good students, because we detect a failure
before doing anything, even if the failure is misleading in the shape of
a segmentation fault, giving no information that the redo record is
missing.
- In v16 and v15, problems show at the end of recovery within
FinishWalRecovery(), the startup process using a buggy LSN to decide
from where to start writing WAL.  The cluster gets corrupted, still it
is noisy about it.
- v14 and older versions are worse: a cluster gets corrupted but it is
entirely silent about the matter.  The redo record missing causes the
startup process to skip entirely recovery, because a missing record is
the same as not redo being required at all.  This leads to data loss, as
everything is missed between the redo record and the checkpoint record.

Note that I have tested that down to 9.4, reproducing the issue with a
version of the author's reproducer slightly modified.  The code is wrong
since at least 9.2, but I did not look at the exact point of origin.

This problem has been found by debugging a cluster where the WAL segment
including the redo segment was missing due to an operator error, leading
to a crash, based on an investigation in v15.

Requesting archive recovery with the creation of a recovery.signal or
a standby.signal even without a backup_label would mitigate the issue:
if the record cannot be found in pg_wal/, the missing segment can be
retrieved with a restore_command when checking that the redo record
exists.  This was already the case without this commit, where recovery
would re-fetch the WAL segment that includes the redo record.  The check
introduced by this commit makes the segment to be retrieved earlier to
make sure that the redo record can be found.

On HEAD, the code will be slightly changed in a follow-up commit to not
rely on a PANIC, to include a test able to emulate the original problem.
This is a minimal backpatchable fix, kept separated for clarity.

Reported-by: Andres Freund <andres@anarazel.de>
Analyzed-by: Andres Freund <andres@anarazel.de>
Author: Nitin Jadhav <nitinjadhavpostgres@gmail.com>
Discussion: https://postgr.es/m/20231023232145.cmqe73stvivsmlhs@awork3.anarazel.de
Discussion: https://postgr.es/m/CAMm1aWaaJi2w49c0RiaDBfhdCL6ztbr9m=daGqiOuVdizYWYaA@mail.gmail.com
Backpatch-through: 14
2025-12-16 13:29:28 +09:00
Tom Lane
84a3778c79 Revert "Avoid requiring Spanish locale to test NLS infrastructure."
This reverts commit 7db6809ced.
That doesn't seem to work with recent (last couple of years)
glibc, and the reasons are obscure.  I can't let the farm stay
this broken for long.
2025-12-15 18:36:29 -05:00
Jacob Champion
f7fbd02d32 libpq: Align oauth_json_set_error() with other NLS patterns
Now that the prior commits have fixed missing OAuth translations, pull
the bespoke usage of libpq_gettext() for OAUTHBEARER parsing into
oauth_json_set_error() itself, and make that a gettext trigger as well,
to better match what the other sites are doing. Add an _internal()
variant to handle the existing untranslated case.

Suggested-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/0EEBCAA8-A5AC-4E3B-BABA-0BA7A08C361B%40yesql.se
Backpatch-through: 18
2025-12-15 13:22:04 -08:00
Jacob Champion
301a1dcf00 libpq-oauth: Don't translate internal errors
Some error messages are generated when OAuth multiplexer operations fail
unexpectedly in the client. Álvaro pointed out that these are both
difficult to translate idiomatically (as they use internal terminology
heavily) and of dubious translation value to end users (since they're
going to need to get developer help anyway). The response parsing engine
has a similar issue.

Remove these from the translation files by introducing internal variants
of actx_error() and oauth_parse_set_error().

Suggested-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/CAOYmi%2BkQQ8vpRcoSrA5EQ98Wa3G6jFj1yRHs6mh1V7ohkTC7JA%40mail.gmail.com
Backpatch-through: 18
2025-12-15 13:21:00 -08:00
Jacob Champion
ea3370b18e libpq: Add missing OAuth translations
Several strings that should have been translated as they passed through
libpq_gettext were not actually being pulled into the translation files,
because I hadn't directly wrapped them in one of the GETTEXT_TRIGGERS.

Move the responsibility for calling libpq_gettext() to the code that
sets actx->errctx. Doing the same in report_type_mismatch() would result
in double-translation, so mark those strings with gettext_noop()
instead. And wrap two ternary operands with gettext_noop(), even though
they're already in one of the triggers, since xgettext sees only the
first.

Finally, fe-auth-oauth.c was missing from nls.mk, so none of that file
was being translated at all. Add it now.

Original patch by Zhijie Hou, plus suggested tweaks by Álvaro Herrera
and small additions by me.

Reported-by: Zhijie Hou <houzj.fnst@fujitsu.com>
Author: Zhijie Hou <houzj.fnst@fujitsu.com>
Co-authored-by: Álvaro Herrera <alvherre@kurilemu.de>
Co-authored-by: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/TY4PR01MB1690746DB91991D1E9A47F57E94CDA%40TY4PR01MB16907.jpnprd01.prod.outlook.com
Backpatch-through: 18
2025-12-15 13:17:10 -08:00
Nathan Bossart
48d4a1423d Allow passing a pointer to GetNamedDSMSegment()'s init callback.
This commit adds a new "void *arg" parameter to
GetNamedDSMSegment() that is passed to the initialization callback
function.  This is useful for reusing an initialization callback
function for multiple DSM segments.

Author: Zsolt Parragi <zsolt.parragi@percona.com>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/CAN4CZFMjh8TrT9ZhWgjVTzBDkYZi2a84BnZ8bM%2BfLPuq7Cirzg%40mail.gmail.com
2025-12-15 14:27:16 -06:00
Noah Misch
64bf53dd61 Revisit cosmetics of "For inplace update, send nontransactional invalidations."
This removes a never-used CacheInvalidateHeapTupleInplace() parameter.
It adds README content about inplace update visibility in logical
decoding.  It rewrites other comments.

Back-patch to v18, where commit 243e9b40f1
first appeared.  Since this removes a CacheInvalidateHeapTupleInplace()
parameter, expect a v18 ".abi-compliance-history" edit to follow.  PGXN
contains no calls to that function.

Reported-by: Paul A Jungwirth <pj@illuminatedcomputing.com>
Reported-by: Ilyasov Ian <ianilyasov@outlook.com>
Reviewed-by: Paul A Jungwirth <pj@illuminatedcomputing.com>
Reviewed-by: Surya Poondla <s_poondla@apple.com>
Discussion: https://postgr.es/m/CA+renyU+LGLvCqS0=fHit-N1J-2=2_mPK97AQxvcfKm+F-DxJA@mail.gmail.com
Backpatch-through: 18
2025-12-15 12:19:49 -08:00
Noah Misch
0839fbe400 Correct comments of "Fix data loss at inplace update after heap_update()".
This corrects commit a07e03fd8f.

Reported-by: Paul A Jungwirth <pj@illuminatedcomputing.com>
Reported-by: Surya Poondla <s_poondla@apple.com>
Reviewed-by: Paul A Jungwirth <pj@illuminatedcomputing.com>
Discussion: https://postgr.es/m/CA+renyWCW+_2QvXERBQ+mna6ANwAVXXmHKCA-WzL04bZRsjoBA@mail.gmail.com
2025-12-15 12:19:49 -08:00