diff --git a/.claude/sweep-security-state.csv b/.claude/sweep-security-state.csv
index dee6c0116..961376e78 100644
--- a/.claude/sweep-security-state.csv
+++ b/.claude/sweep-security-state.csv
@@ -16,7 +16,7 @@ emerging_hotspots,2026-04-25,1274,HIGH,1,,"HIGH (fixed #1274): emerging_hotspots
 erosion,2026-04-25,1275,HIGH,1;3;6,,"HIGH (fixed #1275): erode() accepted three user-controlled parameters with no upper bound. (1) iterations sized rng.random((iterations, 2)) on the host (16 B/particle) and was copied to the GPU via cupy.asarray, so iterations=10**12 attempted ~16 TB on each side. (2) params['radius'] drove _build_brush which iterates (2r+1)**2 cells and stores three arrays of the same length, so radius=10**6 allocated ~12 TB of brush data. (3) params['max_lifetime'] is the inner per-particle JIT loop in both _erode_cpu and _erode_gpu_kernel, so max_lifetime=10**12 with the default iterations=50000 ran 5e16 step iterations. The existing _check_erosion_memory helper only fired on dask paths and ignored the random_pos and brush working sets. Fixed by capping all three parameters at the public erode() entry via _validate_scalar(max_val=...) (_MAX_ITERATIONS=1e8, _MAX_RADIUS=1024, _MAX_LIFETIME=1e5), rewriting _check_erosion_memory to include the random_pos buffer and brush bytes in its budget, and wiring the guard into _erode_numpy and _erode_cupy so every backend benefits (the dask paths inherit it via their _erode_numpy/_erode_cupy calls). Mirrors diffuse #1268 pattern. Deferred follow-ups (separate PRs): Cat 3 HIGH NaN input is not guarded in _erode_cpu / _erode_gpu_kernel -- a NaN cell propagates through bilinear interpolation into dir_x/dir_y, NaN bounds checks fall through, and particles can deposit NaN into arbitrary cells via cuda.atomic.add. Cat 6 MEDIUM erode() does not call _validate_raster() on agg -- non-numeric or wrong-ndim input fails inside numba/cupy with a confusing error. No Cat 2 (no int32 flat-index math), no Cat 4 (GPU kernel has bounds guard at line 184 plus per-step bounds checks before every read/write, brush writes are explicitly bounds-checked, no shared memory), no Cat 5 (no file I/O)."
 fire,2026-04-25,,,,,"Clean. Despite the module's size hint, fire.py is purely per-cell raster ops -- not cellular-automaton or front-tracking. Seven public APIs: dnbr, rdnbr, burn_severity_class, fireline_intensity, flame_length, rate_of_spread, kbdi. No iteration, no queues, no multi-channel state, no random numbers, no file paths. Cat 1: every output allocation matches input shape (single buffer, bounded by caller). Anderson-13 fuel table is a fixed 13x8 constant. _rothermel_fuel_constants returns 12 scalars before dispatch (no per-pixel state). Cat 2: no flat-index math, all indexing is 2-D (y, x); no height*width multiplication. Cat 3: rdnbr guards denom < 1e-10; burn_severity_class is threshold-only; flame_length guards v <= 0.0 before fractional power; rate_of_spread guards M_x>0/beta>0/denom>0 and clamps eta_M, U_mmin, R; kbdi clamps Q to [0, 800] and net_P to >= 0. Adversarial wind=inf or T=inf would push exp/power to inf in rate_of_spread/kbdi but inputs are user-controlled rasters, fire model is research-quality (LOW only). Cat 4: all 7 CUDA kernels (_dnbr_gpu L157, _rdnbr_gpu L246, _bsc_gpu L362, _fli_gpu L455, _fl_gpu L552, _ros_gpu L681, _kbdi_gpu L870) have 'y < out.shape[0] and x < out.shape[1]' bounds guard; every kernel is point-wise (no neighbour stencil) so the simple guard is sufficient; no shared memory, no syncthreads needed. Cat 5: no file I/O. Cat 6: every public function calls _validate_raster on each input raster (dnbr/rdnbr/fireline_intensity/rate_of_spread/kbdi pass 2-3 rasters each, all validated), validate_arrays enforces equal shape, _validate_scalar gates heat_content/fuel_model (1-13)/annual_precip, and every input is .astype('f4') before reaching any kernel so dtype is normalized."
 flood,2026-05-03,1437,MEDIUM,3,,Re-audit 2026-05-03. MEDIUM Cat 3 fixed in PR #1438 (travel_time and flood_depth_vegetation now validate mannings_n DataArray values are finite and strictly positive via _validate_mannings_n_dataarray helper). No remaining unfixed findings. Other categories clean: every allocation is same-shape as input; no flat index math; NaN propagation explicit in every backend; tan_slope clamped by _TAN_MIN; no CUDA kernels; no file I/O; every public API calls _validate_raster on DataArray inputs.
-focal,2026-04-27,1284,HIGH,1,,"HIGH (fixed PR #1286): apply(), focal_stats(), and hotspots() accepted unbounded user-supplied kernels via custom_kernel(), which only checks shape parity. The kernel-size guard from #1241 (_check_kernel_memory) only ran inside circle_kernel/annulus_kernel, so a (50001, 50001) custom kernel on a 10x10 raster allocated ~10 GB on the kernel itself plus a much larger padded raster before any work -- same shape as the bilateral DoS in #1236. Fixed by adding _check_kernel_vs_raster_memory in focal.py and wiring it into apply(), focal_stats(), and hotspots() after custom_kernel() validation. All 134 focal tests + 19 bilateral tests pass. No other findings: 10 CUDA kernels all have proper bounds + stencil guards; _validate_raster called on every public entry point; hotspots already raises ZeroDivisionError on constant-value rasters; _focal_variety_cuda uses a fixed-size local buffer (silent truncation but bounded); _focal_std_cuda/_focal_var_cuda clamp the catastrophic-cancellation case via if var < 0.0: var = 0.0; no file I/O."
+focal,2026-06-10,3222,MEDIUM,1;6,3223,"Two MEDIUM findings, both fixed via rockout. Cat 6 (#3222): mean() GPU paths (_mean_cupy ~261, _mean_dask_cupy ~194) force float32 while CPU computes float64 (astype(float)); max abs diff 0.5 on values ~1e7; same class as #2769 which only covered apply()/focal_stats(). Cat 1 (#3223): _check_kernel_vs_raster_memory budgets 4 B/cell ('float32 internals') but #2805 made internals preserve float64, so the guard underestimates 2x and a float64 combo can pass at ~100% of available RAM. Clean elsewhere: Cat 2 no int32 flat-index math; Cat 3 all divisions guarded (num>0, w_sum>0, var<0 clamp, variance_term where-guard, global_std==0 validated eagerly + lazily via _gistar_validate_lazy), NaN checks use v!=v idiom; Cat 4 all 10 CUDA kernels have bounds guards, validated under compute-sanitizer memcheck on shapes (1,1)/(7,1)/(1,7)/(97,89): 0 errors; Cat 5 no file I/O; all public APIs call _validate_raster."
 geodesic,2026-04-27,1283,HIGH,1,,"HIGH (fixed PR #1285): slope(method='geodesic') and aspect(method='geodesic') stack a (3, H, W) float64 array (data, lat, lon) before dispatch with no memory check. A large lat/lon-tagged raster passed to either function would OOM. Fixed by adding _check_geodesic_memory(rows, cols) in xrspatial/geodesic.py (mirrors morphology._check_kernel_memory): budgets 56 bytes/cell (24 stacked float64 + 4 float32 output + 24 padded copy + slack) and raises MemoryError when > 50% of available RAM; called from slope.py and aspect.py inside the geodesic branch before dispatch. No other findings: 6 CUDA kernels all have bounds guards (e.g. _run_gpu_geodesic_aspect at geodesic.py:395), custom 16x16 thread blocks avoid register spill, no shared memory, _validate_raster runs upstream in slope/aspect, all backends cast to float32, slope_mag < 1e-7 flat threshold prevents arctan2 NaN propagation, curvature correction uses hardcoded WGS84 R."
 geotiff,2026-06-09,3104,MEDIUM,3;6,,"Re-audit pass 20 2026-06-09 (deep-sweep). MEDIUM Cat 3/6: unpack=True accepted SCALE=0 / non-finite SCALE-OFFSET from GDAL_METADATA (silent data destruction on read: all pixels become offset or NaN) and _pack divided by scale_factor with no guard (pack=True round-trip wrote an all-sentinel file, confirmed by repro). Issue #3104, fixed on deep-sweep-security-geotiff-2026-06-09: reject in _extract_scale_offset (covers numpy+dask, both share it) plus _pack guard for hand-edited attrs; tests tests/read/test_scale_zero_3104.py. Audited 191 commits since b5bd2658 incl. reader split into _sources/_decode/_encode/_nodata/_overview, pack/unpack (#3064/#3075), sidecar origin threading (#3027), streaming write budget (#3008), _overview_kernels.py ngjit reducers (bounds-checked, clean), GPU overview path (cupy ufuncs, no raw kernels). Carried-forward guards verified post-refactor: SSRF UnsafeURLError+ALLOW_PRIVATE_HOSTS now in _sources.py, DOCTYPE rejection _safe_xml, realpath containment _vrt, decompression-bomb margins _compression, MAX_IFDS/MAX_IFD_ENTRY_COUNT/validate_tile_layout _header, max_pixels on all backends, _check_gpu_memory tile caps, sidecar max_cloud_bytes (#2121) with SSRF-validated _HTTPSource probes. CUDA available; no Cat 4 suspects required kernel execution."
 glcm,2026-04-24,1257,HIGH,1,,"HIGH (fixed #1257): glcm_texture() validated window_size only as >= 3 and distance only as >= 1, with no upper bound on either. _glcm_numba_kernel iterates range(r-half, r+half+1) for every pixel, so window_size=1_000_001 on a 10x10 raster ran ~10^14 loop iterations with all neighbors failing the interior bounds check (CPU DoS). On the dask backends depth = window_size // 2 + distance drove map_overlap padding, so a huge window also caused oversize per-chunk allocations (memory DoS). Fixed by adding max_val caps in the public entrypoint: window_size <= max(3, min(rows, cols)) and distance <= max(1, window_size // 2). One cap covers every backend because cupy and dask+cupy call through to the CPU kernel after cupy.asnumpy. No other HIGH findings: levels is already capped at 256 so the per-pixel np.zeros((levels, levels)) matrix in the kernel is bounded to 512 KB. No CUDA kernels. No file I/O. Quantization clips to [0, levels-1] before the kernel and NaN maps to -1 which the kernel filters with i_val >= 0. Entropy log(p) and correlation p / (std_i * std_j) are both guarded. All four backends use _validate_raster and cast to float64 before quantizing. MEDIUM (unfixed, Cat 1): the per-pixel np.zeros((levels, levels)) allocation inside the hot loop is a perf issue (levels=256 -> 512 KB alloc+free per pixel) but not a security issue because levels is bounded. Could be hoisted out of the loop or replaced with an in-place clear, but that is an efficiency concern, not security."
diff --git a/xrspatial/tests/test_focal.py b/xrspatial/tests/test_focal.py
index 69e64d8e6..324bfe201 100644
--- a/xrspatial/tests/test_focal.py
+++ b/xrspatial/tests/test_focal.py
@@ -98,9 +98,53 @@ def test_mean_transfer_function_dask_gpu():
     dask_cupy_mean = mean(dask_cupy_agg)
     general_output_checks(dask_cupy_agg, dask_cupy_mean)
 
+    # Tight tolerance: since #3222 every backend computes in float64, so
+    # the old float32-drift allowance (rtol=1e-4) is no longer needed.
     np.testing.assert_allclose(
         numpy_mean.data, dask_cupy_mean.data.compute().get(),
-        equal_nan=True, rtol=1e-4)
+        equal_nan=True)
+
+
+@cuda_and_cupy_available
+def test_mean_preserves_float64_gpu_3222():
+    # Regression for #3222: _mean_cupy forced float32, so a float64 raster
+    # came back float32 on GPU while the CPU paths returned float64. The
+    # +1e7 offset makes the float32 rounding visible (abs error ~0.5).
+    import cupy
+
+    data = np.arange(20, dtype=np.float64).reshape(4, 5) + 1e7
+    numpy_mean = mean(xr.DataArray(data))
+    cupy_mean = mean(xr.DataArray(cupy.asarray(data)))
+
+    assert numpy_mean.data.dtype == np.float64
+    assert cupy_mean.data.dtype == np.float64
+    np.testing.assert_allclose(
+        numpy_mean.data, cupy_mean.data.get(), equal_nan=True)
+
+    # The float32 error used to compound with passes > 1; multi-pass
+    # results must stay in lockstep too.
+    numpy_mean3 = mean(xr.DataArray(data), passes=3)
+    cupy_mean3 = mean(xr.DataArray(cupy.asarray(data)), passes=3)
+    np.testing.assert_allclose(
+        numpy_mean3.data, cupy_mean3.data.get(), equal_nan=True)
+
+
+@dask_array_available
+@cuda_and_cupy_available
+def test_mean_preserves_float64_dask_gpu_3222():
+    # Regression for #3222: same as above for the dask+cupy path, which
+    # cast every chunk to float32 in _mean_dask_cupy.
+    import cupy
+
+    data = np.arange(20, dtype=np.float64).reshape(4, 5) + 1e7
+    numpy_mean = mean(xr.DataArray(data))
+
+    dask_cupy_agg = xr.DataArray(
+        da.from_array(cupy.asarray(data), chunks=(2, 3)))
+    result = mean(dask_cupy_agg).data.compute().get()
+
+    assert result.dtype == np.float64
+    np.testing.assert_allclose(numpy_mean.data, result, equal_nan=True)
 
 
 @pytest.fixture