kvarn_async::prelude::compact_str::core

Module arch

1.27.0 · source
Expand description

SIMD and vendor intrinsics module.

This module is intended to be the gateway to architecture-specific intrinsic functions, typically related to SIMD (but not always!). Each architecture that Rust compiles to may contain a submodule here, which means that this is not a portable module! If you’re writing a portable library take care when using these APIs!

Under this module you’ll find an architecture-named module, such as x86_64. Each #[cfg(target_arch)] that Rust can compile to may have a module entry here, only present on that particular target. For example the i686-pc-windows-msvc target will have an x86 module here, whereas x86_64-pc-windows-msvc has x86_64.

§Overview

This module exposes vendor-specific intrinsics that typically correspond to a single machine instruction. These intrinsics are not portable: their availability is architecture-dependent, and not all machines of that architecture might provide the intrinsic.

The arch module is intended to be a low-level implementation detail for higher-level APIs. Using it correctly can be quite tricky as you need to ensure at least a few guarantees are upheld:

  • The correct architecture’s module is used. For example the arm module isn’t available on the x86_64-unknown-linux-gnu target. This is typically done by ensuring that #[cfg] is used appropriately when using this module.
  • The CPU the program is currently running on supports the function being called. For example it is unsafe to call an AVX2 function on a CPU that doesn’t actually support AVX2.

As a result of the latter of these guarantees all intrinsics in this module are unsafe and extra care needs to be taken when calling them!

§CPU Feature Detection

In order to call these APIs in a safe fashion there’s a number of mechanisms available to ensure that the correct CPU feature is available to call an intrinsic. Let’s consider, for example, the _mm256_add_epi64 intrinsics on the x86 and x86_64 architectures. This function requires the AVX2 feature as documented by Intel so to correctly call this function we need to (a) guarantee we only call it on x86/x86_64 and (b) ensure that the CPU feature is available

§Static CPU Feature Detection

The first option available to us is to conditionally compile code via the #[cfg] attribute. CPU features correspond to the target_feature cfg available, and can be used like so:

#[cfg(
    all(
        any(target_arch = "x86", target_arch = "x86_64"),
        target_feature = "avx2"
    )
)]
fn foo() {
    #[cfg(target_arch = "x86")]
    use std::arch::x86::_mm256_add_epi64;
    #[cfg(target_arch = "x86_64")]
    use std::arch::x86_64::_mm256_add_epi64;

    unsafe {
        _mm256_add_epi64(...);
    }
}

Here we’re using #[cfg(target_feature = "avx2")] to conditionally compile this function into our module. This means that if the