Skip to content

Instrument usage stack v16#68

Open
lfittl wants to merge 10 commits into
masterfrom
instrument-usage-stack-v16
Open

Instrument usage stack v16#68
lfittl wants to merge 10 commits into
masterfrom
instrument-usage-stack-v16

Conversation

@lfittl
Copy link
Copy Markdown
Owner

@lfittl lfittl commented Apr 7, 2026

No description provided.

@lfittl lfittl force-pushed the instrument-usage-stack-v16 branch 3 times, most recently from e99cc5c to 33ae950 Compare April 7, 2026 19:36
…eded

This introduces a new field, queryDesc->totaltime_options, that extensions
can use to indicate whether they need queryDesc->totaltime populated,
and with which instrumentation options. Extensions should take care to
only add options they need, instead of replacing the options of others.

This replaces the practice of extensions allocating queryDesc->totaltime
themselves, which required them to always use INSTRUMENT_ALL for the
options argument. If they wouldn't have, another extension could silently
be impacted by it. It also unnecessarily made extensions hooks worry
about being sure to allocate in the per-query memory context.

Adjust pg_stat_statements and auto_explain to match, and lower the
requested instrumentation level for auto_explain to INSTRUMENT_TIMER,
since the summary instrumentation it needs is only runtime.

Author: Lukas Fittl <lukas@fittl.com>
Reviewed-by:
Discussion:
@lfittl lfittl force-pushed the instrument-usage-stack-v16 branch from 33ae950 to 3a264bb Compare April 7, 2026 19:58
lfittl added 9 commits April 7, 2026 12:59
This moves the implementation of ExecProcNodeInstr, the ExecProcNode
variant that gets used when instrumentation is on, to be defined in
instrument.c instead of execProcNode.c, and marks functions it uses
as inline.

This allows compilers to generate an optimized implementation, and
shows a 2 to 5% reduction in instrumentation overhead for queries
that move lots of rows.

Author: Lukas Fittl <lukas@fittl.com>
Suggested-by: Andres Freund <andres@anarazel.de>
Reviewed-by:
Discussion: https://www.postgresql.org/message-id/flat/CAP53PkzdBK8VJ1fS4AZ481LgMN8f9mJiC39ZRHqkFUSYq6KWmg@mail.gmail.com
This replaces different repeated code blocks that read pgBufferUsage /
pgWalUsage, and may have also been running a timer to measure activity,
with the new Instrumentation struct and associated helpers.

Author: Lukas Fittl <lukas@fittl.com>
Reviewed-by:
Discussion:
…ith INSTR_* macros

This encapsulates the ownership of these globals better, and will allow
a subsequent refactoring.

Author: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com>
Discussion: https://www.postgresql.org/message-id/flat/CAP53PkzZ3UotnRrrnXWAv%3DF4avRq9MQ8zU%2BbxoN9tpovEu6fGQ%40mail.gmail.com#fc7140e8af21e07a90a09d7e76b300c4
This adds regression tests that cover some of the expected behaviour
around the buffer statistics reported in EXPLAIN ANALYZE, specifically
how they behave in parallel query, nested function calls and abort
situations.

Testing this is challenging because there can be different sources of
buffer activity, so we rely on temporary tables where we can to prove
that activity was captured and not lost. This supports a future commit
that will rework some of the instrumentation logic that could cause
areas covered by these tests to fail.

Author: Lukas Fittl <lukas@fittl.com>
Reviewed-by:
Discussion:
Previously, in order to determine the buffer/WAL usage of a given code
section, we utilized continuously incrementing global counters that get
updated when the actual activity (e.g. shared block read) occurred, and
then calculated a diff when the code section ended. This resulted in a
bottleneck for executor node instrumentation specifically, with the
function BufferUsageAccumDiff showing up in profiles and in some cases
adding up to 10% overhead to an EXPLAIN (ANALYZE, BUFFERS) run.

Instead, introduce a stack-based mechanism, where the actual activity
writes into the current stack entry. In the case of executor nodes, this
means that each node gets its own stack entry that is pushed at
InstrStartNode, and popped at InstrEndNode. Stack entries are zero
initialized (avoiding the diff mechanism) and get added to their parent
entry when they are finalized, i.e. no more modifications can occur.

To correctly handle abort situations, any use of instrumentation stacks
must involve either a top-level QueryInstrumentation struct, and its
associated InstrQueryStart/InstrQueryStop helpers (which use resource
owners to handle aborts), or the Instrumentation struct itself with
dedicated PG_TRY/PG_FINALLY calls that ensure the stack is in a
consistent state after an abort.

In tests, the stack-based instrumentation mechanism reduces the overhead
of EXPLAIN (ANALYZE, BUFFERS ON, TIMING OFF) for a large COUNT(*) query
from about 50% to 22% on top of the actual runtime.

This also drops the global pgBufferUsage, any callers interested in
measuring buffer activity should instead utilize InstrStart/InstrStop.

The related global pgWalUsage is kept for now due to its use in pgstat
to track aggregate WAL activity and heap_page_prune_and_freeze for
measuring FPIs.

Author: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://www.postgresql.org/message-id/flat/CAP53PkxrmpECzVFpeeEEHDGe6u625s%2BYkmVv5-gw3L_NDSfbiA%40mail.gmail.com#cb583a08e8e096aa1f093bb178906173
This simplifies the DSM allocations a bit since we don't need to
separately allocate WAL and buffer usage, and allows the easier future
addition of a third stack-based struct being discussed.

Author: Lukas Fittl <lukas@fittl.com>
Reviewed-by:
Discussion:
For most queries, the bulk of the overhead of EXPLAIN ANALYZE happens in
ExecProcNodeInstr when starting/stopping instrumentation for that node.

Previously each ExecProcNodeInstr would check which instrumentation
options are active in the InstrStartNode/InstrStopNode calls, and do the
corresponding work (timers, instrumentation stack, etc.). These
conditionals being checked for each tuple being emitted add up, and cause
non-optimal set of instructions to be generated by the compiler.

Because we already have an existing mechanism to specify a function
pointer when instrumentation is enabled, we can instead create specialized
functions that are tailored to the instrumentation options enabled, and
avoid conditionals on subsequent ExecProcNodeInstr calls. This results in
the overhead for EXPLAIN (ANALYZE, TIMING OFF, BUFFERS OFF) for a stress
test with a large COUNT(*) that does many ExecProcNode calls from ~ 20% on
top of actual runtime to ~ 3%. When using BUFFERS ON the same query goes
from ~ 20% to ~ 10% on top of actual runtime.

Author: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com>
Discussion: https://www.postgresql.org/message-id/flat/CAP53PkxFP7i7-wy98ZmEJ11edYq-RrPvJoa4kzGhBBjERA4Nyw%40mail.gmail.com#e8dfd018a07d7f8d41565a079d40c564

fix up execprocnode 2
This sets up a separate instrumentation stack that is used whilst an
Index Scan or Index Only Scan does scanning on the table, for example due
to additional data being needed.

EXPLAIN ANALYZE will now show "Table Buffers" that represent such activity.
The activity is also included in regular "Buffers" together with index
activity and that of any child nodes.

Author: Lukas Fittl <lukas@fittl.com>
Suggested-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Discussion: https://www.postgresql.org/message-id/flat/CAP53PkxrmpECzVFpeeEEHDGe6u625s%2BYkmVv5-gw3L_NDSfbiA%40mail.gmail.com#cb583a08e8e096aa1f093bb178906173

Actually populate I(O)S table stack pre index prefetching merge
This is intended for testing instrumentation related logic as it pertains
to the top level stack that is maintained as a running total. There is
currently no in-core user that utilizes the top-level values in this
manner, and especially during abort situations this helps ensure we don't
lose activity due to incorrect handling of unfinalized node stacks.
@lfittl lfittl force-pushed the instrument-usage-stack-v16 branch from 3a264bb to 30d4492 Compare April 7, 2026 20:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant