Additional classes, macros, and functions that help to work more easily with the main vector types.
|
#define | Vc_foreach_bit(iterator, mask) |
| Loop over all set bits in the mask. More...
|
|
#define | foreach_bit(iterator, mask) |
| Alias for Vc_foreach_bit unless VC_CLEAN_NAMESPACE is defined.
|
|
|
void | forceToRegisters (const vec &,...) |
| Force the vectors passed to the function into registers. More...
|
|
template<typename V , typename Parent , typename Dimension , typename RM > |
std::ostream & | operator<< (std::ostream &s, const Vc::MemoryBase< V, Parent, Dimension, RM > &m) |
| Prints the contents of a Memory object into a stream object. More...
|
|
const char * | versionString () |
|
unsigned int | versionNumber () |
|
template<typename T , Vc::MallocAlignment A> |
T * | malloc (size_t n) |
| Allocates memory on the Heap with alignment and padding suitable for vectorized access. More...
|
|
template<typename T > |
void | free (T *p) |
| Frees memory that was allocated with Vc::malloc. More...
|
|
void | prefetchForOneRead (const void *addr) |
| Prefetch the cacheline containing addr for a single read access. More...
|
|
void | prefetchForModify (const void *addr) |
| Prefetch the cacheline containing addr for modification. More...
|
|
void | prefetchClose (const void *addr) |
| Prefetch the cacheline containing addr to L1 cache. More...
|
|
void | prefetchMid (const void *addr) |
| Prefetch the cacheline containing addr to L2 cache. More...
|
|
void | prefetchFar (const void *addr) |
| Prefetch the cacheline containing addr to L3 cache. More...
|
|
|
#define | VC_IMPL |
| This macro is set to the value of Vc::Implementation that the current translation unit is compiled with.
|
|
#define | VC_IMPL_XOP |
| This macro is defined if the current translation unit is compiled with XOP instruction support.
|
|
#define | VC_IMPL_FMA4 |
| This macro is defined if the current translation unit is compiled with FMA4 instruction support.
|
|
#define | VC_IMPL_F16C |
| This macro is defined if the current translation unit is compiled with F16C instruction support.
|
|
#define | VC_IMPL_POPCNT |
| This macro is defined if the current translation unit is compiled with POPCNT instruction support.
|
|
#define | VC_IMPL_SSE4a |
| This macro is defined if the current translation unit is compiled with SSE4a instruction support.
|
|
#define | VC_IMPL_Scalar |
| This macro is defined if the current translation unit is compiled without any SIMD support.
|
|
#define | VC_IMPL_SSE |
| This macro is defined if the current translation unit is compiled with any version of SSE (but not AVX).
|
|
#define | VC_IMPL_SSE2 |
| This macro is defined if the current translation unit is compiled with SSE2 instruction support (excluding SSE3 and up).
|
|
#define | VC_IMPL_SSE3 |
| This macro is defined if the current translation unit is compiled with SSE3 instruction support (excluding SSSE3 and up).
|
|
#define | VC_IMPL_SSSE3 |
| This macro is defined if the current translation unit is compiled with SSSE3 instruction support (excluding SSE4.1 and up).
|
|
#define | VC_IMPL_SSE4_1 |
| This macro is defined if the current translation unit is compiled with SSE4.1 instruction support (excluding SSE4.2 and up).
|
|
#define | VC_IMPL_SSE4_2 |
| This macro is defined if the current translation unit is compiled with SSE4.2 instruction support (excluding AVX and up).
|
|
#define | VC_IMPL_AVX |
| This macro is defined if the current translation unit is compiled with AVX instruction support (excluding AVX2 and up).
|
|
|
#define | VC_DOUBLE_V_SIZE |
| An integer (for use with the preprocessor) that gives the number of entries in a double_v.
|
|
#define | VC_FLOAT_V_SIZE |
| An integer (for use with the preprocessor) that gives the number of entries in a float_v.
|
|
#define | VC_SFLOAT_V_SIZE |
| An integer (for use with the preprocessor) that gives the number of entries in a sfloat_v.
|
|
#define | VC_INT_V_SIZE |
| An integer (for use with the preprocessor) that gives the number of entries in a int_v.
|
|
#define | VC_UINT_V_SIZE |
| An integer (for use with the preprocessor) that gives the number of entries in a uint_v.
|
|
#define | VC_SHORT_V_SIZE |
| An integer (for use with the preprocessor) that gives the number of entries in a short_v.
|
|
#define | VC_USHORT_V_SIZE |
| An integer (for use with the preprocessor) that gives the number of entries in a ushort_v.
|
|
#define Vc_foreach_bit |
( |
|
iterator, |
|
|
|
mask |
|
) |
| |
Loop over all set bits in the mask.
The iterator variable will be set to the position of the set bits. A mask of e.g. 00011010 would result in the loop being called with the iterator being set to 1, 3, and 4.
This allows you to write:
2 Vc_foreach_bit(int i, a < 0.f) {
3 std::cout << a[i] << "\n";
The example prints all the values in a
that are negative, and only those.
- Parameters
-
iterator | The iterator variable. For example "int i". |
mask | The mask to iterate over. You can also just write a vector operation that returns a mask. |
- Note
- Since Vc 0.7 break and continue are supported in foreach_bit loops.
#define VC_VERSION_STRING |
#define VC_VERSION_NUMBER |
#define VC_VERSION_CHECK |
( |
|
major, |
|
|
|
minor, |
|
|
|
patch |
|
) |
| |
Helper macro to compare against an encoded version number.
Example:
1 #if VC_VERSION_CHECK(0.5.1) >= VC_VERSION_NUMBER
Enum that specifies the alignment and padding restrictions to use for memory allocation with Vc::malloc.
Enumerator |
---|
AlignOnVector |
Align on boundary of vector sizes (e.g.
16 Bytes on SSE platforms) and pad to allow vector access to the end. Thus the allocated memory contains a multiple of VectorAlignment bytes.
|
AlignOnCacheline |
Align on boundary of cache line sizes (e.g.
64 Bytes on x86) and pad to allow full cache line access to the end. Thus the allocated memory contains a multiple of 64 bytes.
|
AlignOnPage |
Align on boundary of page sizes (e.g.
4096 Bytes on x86) and pad to allow full page access to the end. Thus the allocated memory contains a multiple of 4096 bytes.
|
Enum to identify a certain SIMD instruction set.
You can use VC_IMPL for the currently active implementation.
- See also
- ExtraInstructions
Enumerator |
---|
ScalarImpl |
uses only fundamental types
|
SSE2Impl |
x86 SSE + SSE2
|
SSE3Impl |
x86 SSE + SSE2 + SSE3
|
SSSE3Impl |
x86 SSE + SSE2 + SSE3 + SSSE3
|
SSE41Impl |
x86 SSE + SSE2 + SSE3 + SSSE3 + SSE4.1
|
SSE42Impl |
x86 SSE + SSE2 + SSE3 + SSSE3 + SSE4.1 + SSE4.2
|
AVXImpl |
x86 AVX
|
AVX2Impl |
x86 AVX + AVX2
|
The list of available instructions is not easily described by a linear list of instruction sets.
On x86 the following instruction sets always include their predecessors: SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2
But there are additional instructions that are not necessarily required by this list. These are covered in this enum.
Enumerator |
---|
Float16cInstructions |
Support for float16 conversions in hardware.
|
Fma4Instructions |
Support for FMA4 instructions.
|
XopInstructions |
Support for XOP instructions.
|
PopcntInstructions |
Support for the population count instruction.
|
Sse4aInstructions |
Support for SSE4a instructions.
|
FmaInstructions |
Support for FMA instructions (3 operand variant)
|
unsigned int Vc::extraInstructionsSupported |
( |
| ) |
|
Determines the extra instructions supported by the current CPU.
- Returns
- A combination of flags from Vc::ExtraInstructions that the current CPU supports.
Tests whether the given implementation is supported by the system the code is executing on.
- Returns
true
if the OS and hardware support execution of instructions defined by impl
.
-
false
otherwise
- Parameters
-
impl | The SIMD target to test for. |
Determines the best supported implementation for the current system.
- Returns
- The enum value for the best implementation.
bool Vc::currentImplementationSupported |
( |
| ) |
|
Tests that the CPU and Operating System support the vector unit which was compiled for.
This function should be called before any other Vc functionality is used. It checks whether the program will work. If this function returns false
then the program should exit with a useful error message before the OS has to kill it because of an invalid instruction exception.
If the program continues and makes use of any vector features not supported by hard- or software then the program will crash.
Example:
int main()
{
std::cerr << "CPU or OS requirements not met for the compiled in vector unit!\n";
exit -1;
}
...
}
- Returns
true
if the OS and hardware support execution of the currently selected SIMD instructions.
-
false
otherwise
void Vc::forceToRegisters |
( |
const vec & |
, |
|
|
|
... |
|
) |
| |
Force the vectors passed to the function into registers.
This can be useful after looking at the emitted assembly to force the compiler to optimize properly.
- Note
- Currently only has an effect for SSE vectors.
-
MSVC does not support this function at all.
- Warning
- Be careful with this function, especially since it can render the compiler unable to compile for 32 bit systems if it forces more than 8 vectors in registers.
std::ostream& operator<< |
( |
std::ostream & |
s, |
|
|
const Vc::MemoryBase< V, Parent, Dimension, RM > & |
m |
|
) |
| |
Prints the contents of a Memory object into a stream object.
1 Vc::Memory<int_v, 10> m;
2 for (int i = 0; i < m.entriesCount(); ++i) {
5 std::cout << m << std::endl;
will output (with SSE):
{[0, 1, 2, 3] [4, 5, 6, 7] [8, 9, 0, 0]}
- Parameters
-
s | Any standard C++ ostream object. For example std::cout or a std::stringstream object. |
m | Any Vc::Memory object. |
- Returns
- The ostream object: to chain multiple stream operations.
- Note
- With the GNU standard library this function will check, whether the output stream is a tty. In that case it will colorize the output.
- Warning
- Please do not forget that printing a large memory object can take a long time.
const char* Vc::versionString |
( |
| ) |
|
- Returns
- the version string of the Vc headers.
- Note
- There exists a built-in check that ensures on application startup that the Vc version of the library (link time) and the headers (compile time) are equal. A mismatch between headers and library could lead to errors that are very hard to debug.
-
If you need to disable the check (it costs a very small amount of application startup time) you can define VC_NO_VERSION_CHECK at compile time.
unsigned int Vc::versionNumber |
( |
| ) |
|
- Returns
- the version of the Vc headers encoded in an integer.
T* Vc::malloc |
( |
size_t |
n | ) |
|
Allocates memory on the Heap with alignment and padding suitable for vectorized access.
Memory that was allocated with this function must be released with Vc::free! Other methods might work but are not portable.
- Parameters
-
n | Specifies the number of objects the allocated memory must be able to store. |
- Template Parameters
-
T | The type of the allocated memory. Note, that the constructor is not called. |
A | Determines the alignment of the memory. See Vc::MallocAlignment. |
- Returns
- Pointer to memory of the requested type, or 0 on error. The allocated memory is padded at the end to be a multiple of the requested alignment
A
. Thus if you request memory for 21 int objects, aligned via Vc::AlignOnCacheline, you can safely read a full cacheline until the end of the array, without generating an out-of-bounds access. For a cacheline size of 64 Bytes and an int size of 4 Bytes you would thus get an array of 128 Bytes to work with.
- Warning
- The standard malloc function specifies the number of Bytes to allocate whereas this function specifies the number of values, thus differing in a factor of sizeof(T).
- This function is mainly meant for use with builtin types. If you use a custom type with a sizeof that is not a multiple of 2 the results might not be what you expect.
- The constructor of T is not called. You can make up for this:
SomeType *array = new(Vc::malloc<SomeType, Vc::AlignOnCacheline>(N)) SomeType[N];
- See also
- Vc::free
Frees memory that was allocated with Vc::malloc.
- Parameters
-
p | The pointer to the memory to be freed. |
- Template Parameters
-
T | The type of the allocated memory. |
- Warning
- The destructor of T is not called. If needed, you can call the destructor before calling free:
for (int i = 0; i < N; ++i) {
p[i].~T();
}
- See also
- Vc::malloc
void Vc::prefetchForOneRead |
( |
const void * |
addr | ) |
|
Prefetch the cacheline containing addr
for a single read access.
This prefetch completely bypasses the cache, not evicting any other data.
- Parameters
-
addr | The cacheline containing addr will be prefetched. |
void Vc::prefetchForModify |
( |
const void * |
addr | ) |
|
Prefetch the cacheline containing addr
for modification.
This prefetch evicts data from the cache. So use it only for data you really will use. When the target system supports it the cacheline will be marked as modified while prefetching, saving work later on.
- Parameters
-
addr | The cacheline containing addr will be prefetched. |
void Vc::prefetchClose |
( |
const void * |
addr | ) |
|
Prefetch the cacheline containing addr
to L1 cache.
This prefetch evicts data from the cache. So use it only for data you really will use.
- Parameters
-
addr | The cacheline containing addr will be prefetched. |
void Vc::prefetchMid |
( |
const void * |
addr | ) |
|
Prefetch the cacheline containing addr
to L2 cache.
This prefetch evicts data from the cache. So use it only for data you really will use.
- Parameters
-
addr | The cacheline containing addr will be prefetched. |
void Vc::prefetchFar |
( |
const void * |
addr | ) |
|
Prefetch the cacheline containing addr
to L3 cache.
This prefetch evicts data from the cache. So use it only for data you really will use.
- Parameters
-
addr | The cacheline containing addr will be prefetched. |