Skip to content

feat: Support for fault-tolerant execution#779

Merged
dervoeti merged 26 commits into
mainfrom
feat/fault-tolerant-execution
Aug 18, 2025
Merged

feat: Support for fault-tolerant execution#779
dervoeti merged 26 commits into
mainfrom
feat/fault-tolerant-execution

Conversation

@dervoeti

@dervoeti dervoeti commented Aug 4, 2025

Copy link
Copy Markdown
Member

Description

Implementation for #576

When providing credentials for S3, Azure or GCS we currently need a SecretClass. It would be easier if the user could just provide a Secret as an alternative, we had that discussion recently. To be consistent with the current state, I used SecretClass exclusively for now.

Things I did not do in this PR to ease the review process (we could tackle those as a separate PR):

  • Combine redundant logic from FTE and Catalog modules
  • Use config-utils instead of doing the load_env_from_files stunt

Definition of Done Checklist

  • Not all of these items are applicable to all PRs, the author should update this template to only leave the boxes in that are relevant
  • Please make sure all these things are done and tick the boxes

Author

  • Changes are OpenShift compatible
  • CRD changes approved
  • CRD documentation for all fields, following the style guide.
  • Helm chart can be installed and deployed operator works
  • Integration tests passed (for non trivial changes)
  • Changes need to be "offline" compatible
  • Links to generated (nightly) docs added
  • Release note snippet added

Reviewer

  • Code contains useful comments
  • Code contains useful logging statements
  • (Integration-)Test cases added
  • Documentation added or updated. Follows the style guide.
  • Changelog updated
  • Cargo.toml only contains references to git tags (not specific commits or branches)

Acceptance

  • Feature Tracker has been updated
  • Proper release label has been added
  • Links to generated (nightly) docs added
  • Release note snippet added
  • Add type/deprecation label & add to the deprecation schedule
  • Add type/experimental label & add to the experimental features tracker

Comment thread deploy/helm/trino-operator/crds/crds.yaml Outdated
@dervoeti dervoeti force-pushed the feat/fault-tolerant-execution branch 2 times, most recently from 4cc640f to 3d113df Compare August 5, 2025 18:44
@dervoeti dervoeti self-assigned this Aug 5, 2025
@dervoeti dervoeti moved this to Development: In Review in Stackable Engineering Aug 5, 2025
@dervoeti dervoeti force-pushed the feat/fault-tolerant-execution branch from fc684e3 to 85a09cc Compare August 5, 2025 19:40

@sbernauer sbernauer left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dropped some comments on the CRD code and YAML.
As an possible alternative I can propose a complex enum style, where we can together think about it, just pasting it here for reference for now

spec:
  clusterConfig:
    faultTolerantExecution:
      query:
        retryAttempts: 4 # maps to query-retry-attempts
        retryInitialDelay: 10s
        retryMaxDelay: 60s
        retryDelayScaleFactor: 2.0 # f32
        exchangeManager: # Optional
          deduplicationBufferSize: 64Mi # Quantity
          encryptionEnabled: true
          sinkBufferPoolMinSize: 20
          sinkBuffersPerPartition: 4
          sinkMaxFileSize: 2Gi # Quantity
          sourceConcurrentReaders: 8
          s3:
            baseDirectories:
              - s3://trino-exchange-bucket/spooling
            connection: # Mandatory
              reference: minio-connection
            maxErrorRetries: 10
            uploadPartSize: 10Mi # Quantity
      # OR
      task:
        retryAttemptsPerTask: 4 # maps to task-retry-attempts-per-task
        retryInitialDelay: 10s
        retryMaxDelay: 60s
        retryDelayScaleFactor: 2.0 # f32
        exchangeManager: # Mandatory
          # ... same struct as above

Comment thread docs/modules/trino/pages/usage-guide/fault-tolerant-execution.adoc Outdated
Comment thread docs/modules/trino/pages/usage-guide/fault-tolerant-execution.adoc Outdated
Comment thread docs/modules/trino/pages/usage-guide/fault-tolerant-execution.adoc Outdated
Comment thread docs/modules/trino/pages/usage-guide/fault-tolerant-execution.adoc Outdated
Comment thread examples/simple-trino-cluster-fault-tolerant-execution.yaml Outdated
Comment thread rust/operator-binary/src/crd/fault_tolerant_execution.rs Outdated
Comment thread rust/operator-binary/src/crd/fault_tolerant_execution.rs Outdated
Comment thread rust/operator-binary/src/crd/fault_tolerant_execution.rs Outdated
Comment thread rust/operator-binary/src/crd/fault_tolerant_execution.rs Outdated
@dervoeti dervoeti force-pushed the feat/fault-tolerant-execution branch 2 times, most recently from 2567920 to 843b18d Compare August 7, 2025 14:06
Comment thread rust/operator-binary/src/crd/fault_tolerant_execution.rs
@dervoeti dervoeti force-pushed the feat/fault-tolerant-execution branch from fd42c2b to af39e6b Compare August 12, 2025 12:28
@lfrancke lfrancke linked an issue Aug 13, 2025 that may be closed by this pull request
3 tasks
@dervoeti dervoeti force-pushed the feat/fault-tolerant-execution branch from eaaff12 to 9e587d8 Compare August 13, 2025 11:39
@dervoeti dervoeti requested a review from sbernauer August 13, 2025 11:44
@sbernauer sbernauer moved this from Development: In Review to Development: Waiting for Review in Stackable Engineering Aug 18, 2025
@sbernauer sbernauer requested review from maltesander and removed request for sbernauer August 18, 2025 07:07
@sbernauer sbernauer moved this from Development: Waiting for Review to Development: In Review in Stackable Engineering Aug 18, 2025
@dervoeti dervoeti force-pushed the feat/fault-tolerant-execution branch from 539473f to b9f5ca9 Compare August 18, 2025 08:20
Comment thread docs/modules/trino/pages/usage-guide/fault-tolerant-execution.adoc Outdated
Comment thread docs/modules/trino/pages/usage-guide/fault-tolerant-execution.adoc Outdated
Comment thread docs/modules/trino/pages/usage-guide/fault-tolerant-execution.adoc Outdated
Comment thread docs/modules/trino/pages/usage-guide/fault-tolerant-execution.adoc Outdated
Comment thread docs/modules/trino/pages/usage-guide/fault-tolerant-execution.adoc
Comment thread tests/templates/kuttl/fault-tolerant-execution/04_check-fte.py Outdated
Comment thread tests/templates/kuttl/fault-tolerant-execution/05-run-tests.yaml Outdated
dervoeti and others added 6 commits August 18, 2025 14:20
…adoc

Co-authored-by: Malte Sander <malte.sander.it@gmail.com>
…adoc

Co-authored-by: Malte Sander <malte.sander.it@gmail.com>
…adoc

Co-authored-by: Malte Sander <malte.sander.it@gmail.com>
…adoc

Co-authored-by: Malte Sander <malte.sander.it@gmail.com>

@maltesander maltesander left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@dervoeti dervoeti added this pull request to the merge queue Aug 18, 2025
Merged via the queue into main with commit 28260dc Aug 18, 2025
16 of 17 checks passed
@dervoeti dervoeti deleted the feat/fault-tolerant-execution branch August 18, 2025 14:35
@dervoeti dervoeti moved this from Development: In Review to Development: Done in Stackable Engineering Aug 19, 2025
@dervoeti

Copy link
Copy Markdown
Member Author

Release note

SDP now supports configuring fault-tolerant execution for Trino via the TrinoCluster CRD

@lfrancke lfrancke added release-note Denotes a PR that will be considered when it comes time to generate release notes. release/25.11.0 and removed scheduled-for/25.11.0 labels Sep 8, 2025
@lfrancke

lfrancke commented Sep 8, 2025

Copy link
Copy Markdown
Member

Are there any docs for this we can link to?

@lfrancke lfrancke moved this from Development: Done to Acceptance: In Progress in Stackable Engineering Sep 8, 2025
@sbernauer

Copy link
Copy Markdown
Member

Yes, https://docs.stackable.tech/home/nightly/trino/usage-guide/fault-tolerant-execution/

@lfrancke lfrancke moved this from Acceptance: In Progress to Done in Stackable Engineering Sep 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release/25.11.0 release-note Denotes a PR that will be considered when it comes time to generate release notes.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

Support Fault tolerant execution (FTE)

4 participants