Skip to content

Fast timing for ARM (v1)#74

Open
lfittl wants to merge 1 commit into
masterfrom
fast-timing-arm-v1
Open

Fast timing for ARM (v1)#74
lfittl wants to merge 1 commit into
masterfrom
fast-timing-arm-v1

Conversation

@lfittl
Copy link
Copy Markdown
Owner

@lfittl lfittl commented Apr 11, 2026

TODO

  • Confirm how M3 hardware behaves on Linux (if the GitHub issue referenced below is to be believed, the difference between cores is a macOS-ism, not a hardware issue)

Test systems

Resources

Noteworthy:

On Apple Silicon based devices with M3 or later, and A16 Bionic or later, the values returned by reading the CNTFRQ_EL0 and CNTVCT_EL0 registers have been updated to 1 GHz, instead of the prior value of 24 MHz. It is still recommended for apps to use libsystem APIs like mach_absolute_time() for timekeeping. Your app will not be impacted by this change if it uses Apple’s timekeeping APIs. For compatibility purposes, this change will only be visible when using the SDK associated with this release or later. On macOS, applications running inside a Virtualization.framework VM will continue to receive the legacy behavior. (84639494)
https://developer.apple.com/documentation/macos-release-notes/macos-15-release-notes

On MacOS, when applications are built with a newer SDK (15.2 SDK I think?) then Apple's kernel when starting a process will program their AGTCNTRDIR_EL1 register, which scales the values returned by CNTFRQ_EL0 and CNTVCT_EL0 to operate at 1Ghz instead. The clock is still actually running at 24Mhz, but it increments larger amounts. So its granularity instead of being 1, is ~41.6.

This is required to be done according to ARMv8.6/v9.1 spec otherwise it isn't compliant hardware. From the ARM Architecture reference manual, From Armv8.6 the counter operates at a higher fixed frequency of 1GHz.
utmapp/UTM#6942 (comment)


The lower frequency on old ARM (and current Apple Silicon) is likely a general problem, and can already be observed by running pg_test_timing on ARM machines today, and seeing the timing durations not move:

Testing timing overhead for 3 seconds.

System clock source: clock_gettime (CLOCK_MONOTONIC_RAW)
Average loop time including overhead: 16.35 ns
Histogram of timing durations:
   <= ns   % of total  running %      count
       0      61.0142    61.0142  111945941
       1       0.0000    61.0142          0
       3       0.0000    61.0142          0
       7       0.0000    61.0142          0
      15       0.0000    61.0142          0
      31       0.0000    61.0142          0
      63      38.9193    99.9335   71407352
     127       0.0569    99.9905     104482
     255       0.0085    99.9990      15603
     511       0.0001    99.9991        221
    1023       0.0001    99.9992        127
    2047       0.0002    99.9993        299
    4095       0.0001    99.9995        215
    8191       0.0003    99.9998        561
   16383       0.0001    99.9999        271
   32767       0.0001   100.0000         97
   65535       0.0000   100.0000         51
  131071       0.0000   100.0000          4

Observed timing durations up to 99.9900%:
      ns   % of total  running %      count
       0      61.0142    61.0142  111945941
      41      12.9731    73.9873   23802478
      42      25.9462    99.9335   47604874
      83       0.0296    99.9631      54312
      84       0.0148    99.9779      27130
     125       0.0126    99.9905      23040
...
   77333       0.0000   100.0000          1

@lfittl lfittl force-pushed the fast-timing-arm-v1 branch from a80f6f0 to a0d5cd8 Compare April 11, 2026 00:53
Similar to the RDTSC/RDTSCP instructions on x68-64, this introduces
use of the cntvct_el0 instruction on ARM systems to access the generic
timer that provides a synchronized ticks value across CPUs.

Note this adds an exception for Apple Silicon CPUs, due to the observed
fact that M3 and newer has different timer frequencies for the Efficiency
and the Performance cores, and we can't be sure where we get scheduled.

To simplify the implementation this does not support MSVC, since Windows
on ARM is rarely used in practice.

Relies on the existing timing_clock_source GUC to control whether
TSC-like timer gets used, instead of system timer.

Author: Lukas Fittl <lukas@fittl.com>
Reviewed-by:
Discussion:
@lfittl lfittl force-pushed the fast-timing-arm-v1 branch from a0d5cd8 to 163b7b0 Compare April 11, 2026 01:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant