SD_FLASH_ATTN for Mac, Linux, and non-CUBLAS Windows builds?

Hi there, I was reading and saw:
> Enabling flash attention reduces memory usage by at least 400 MB. At the moment, it is not supported when CUBLAS is enabled because the kernel implementation is missing.

But I'm curious, would it make sense to set `-DSD_FLASH_ATTN=ON` for the Mac, Linux, and other non-CUBLAS builds:
```
          - build: "noavx"
            defines: "-DGGML_AVX=OFF -DGGML_AVX2=OFF -DGGML_FMA=OFF -DSD_BUILD_SHARED_LIBS=ON"
          - build: "avx2"
            defines: "-DGGML_AVX2=ON -DSD_BUILD_SHARED_LIBS=ON"
          - build: "avx"
            defines: "-DGGML_AVX2=OFF -DSD_BUILD_SHARED_LIBS=ON"
          - build: "avx512"
            defines: "-DGGML_AVX512=ON -DSD_BUILD_SHARED_LIBS=ON"
          - build: "cuda12"
```
Thanks!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SD_FLASH_ATTN for Mac, Linux, and non-CUBLAS Windows builds? #407

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

SD_FLASH_ATTN for Mac, Linux, and non-CUBLAS Windows builds? #407

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions