#39565: Incorrect alignment of SIMD vectors returned by value from functions Open Date: 2019-09-12 07:47 Last Update: 2021-04-13 19:29 URL for this Ticket: https://osdn.net//projects/mingw/ticket/39565 RSS feed for this Ticket: https://osdn.net/ticket/ticket_rss.php?group_id=3917&tid=39565 --------------------------------------------------------------------- Last Changes/Comment on this Ticket: 2021-04-13 19:29 Updated by: keith * Status Update from Open to Closed * Resolution Update from None to Invalid Comment: Thank you for your report, but may I ask why you have raised the issue here? You are using a (trademark-infringing) x86_64-w64-mingw32 GCC compiler suite, (which, in spite of its anomalous mingw32 host identification, appears to generate 64-bit code); this compiler suite appears to originate from the MSYS2 project, which is not, in any way, associated with the MinGW Project, (the legitimate owner of the MinGW trademark). Consequently, I am inclined to close this, as an "invalid" ticket. Before I do close it, however, I will offer the following comments: If I compile your test case, with my own mingw32-g++ (GNU/Linux hosted GCC-9.2.0 cross-compiler), I see somewhat different assembly for your crash_vector_ret function: $ mingw32-g++ -S -O0 -DARRAY_SIZE=32 -o- -masm=intel -mavx2 main.cc | grep -v cfi | less -FX ... .text .globl __Z16crash_vector_retPh .def __Z16crash_vector_retPh; .scl 2; .type 32; .endef __Z16crash_vector_retPh: LFB5437: push ebp mov ebp, esp and esp, -32 sub esp, 64 mov eax, DWORD PTR [ebp+8] mov DWORD PTR [esp+28], eax mov eax, DWORD PTR [esp+28] vmovdqu xmm0, XMMWORD PTR [eax] vinserti128 ymm0, ymm0, XMMWORD PTR [eax+16], 0x1 vmovdqa YMMWORD PTR [esp+32], ymm0 vmovdqa ymm0, YMMWORD PTR [esp+32] leave ret Ignoring that this is 32-bit assembly, (and if I actually try to run it, it crashes, not with the SIGSEGV you report, but with a SIGILL "illegal instruction" exception), do note the additional: and esp, -32 instruction, at the beginning of the local stack frame set-up, for which no corresponding instruction appears in your GDB disassembly: this ensures that the local frame itself, and thus any variable addressed at a 32-byte offset within it, will be correctly aligned on a 32-byte boundary. Alternatively, if I compile with the Linux-native GCC-10.2.0 compiler, I see: $ g++ -S -O0 -DARRAY_SIZE=32 -o- -masm=intel -mavx2 main.cc | grep -v cfi | less -FX ... .text .size _ZlsRSoRKDv4_x, .-_ZlsRSoRKDv4_x .globl _Z16crash_vector_retPh .type _Z16crash_vector_retPh, @function _Z16crash_vector_retPh: .LFB5667: push rbp mov rbp, rsp and rsp, -32 mov QWORD PTR -56[rsp], rdi mov rax, QWORD PTR -56[rsp] mov QWORD PTR -40[rsp], rax mov rax, QWORD PTR -40[rsp] vmovdqu xmm0, XMMWORD PTR [rax] vinserti128 ymm0, ymm0, XMMWORD PTR 16[rax], 0x1 vmovdqa YMMWORD PTR -32[rsp], ymm0 vmovdqa ymm0, YMMWORD PTR -32[rsp] leave ret Obviously, this is now 64-bit Linux-native code, but it too has that corresponding: and rsp, -32 instruction, to achieve the required stack frame alignment. (Alas, if I try to run this, it too crashes with a SIGILL exception, on the first vmovdqu instruction; I guess the Intel Celeron N4000 64-bit processor, in my laptop, either doesn't support the AVX2 instructions, or they are disabled in the firmware configuration). --------------------------------------------------------------------- Ticket Status: Reporter: michal_fapso Owner: (None) Type: Issues Status: Closed Priority: 5 - Medium MileStone: (None) Component: MSYS Severity: 5 - Medium Resolution: Invalid --------------------------------------------------------------------- Ticket details: Summary of the Issue Working with AVX2 SIMD vectors. When a __m256i value is returned from a function, the SIMD register ymm0 is copied to memory using the aligned move instruction vmovdqa, so the destination memory address must be aligned to 32 bytes. This requirement is not met in my test case and the program crashes with SIGSEGV. More info follows in the "Minimal Test Case" section below. I tested it also on Ubuntu's gcc, but it didn't crash there, neither did with clang++ on Windows and Ubuntu, so therefore I am submitting this issue to the MinGW tracker. But I might be wrong and the issue may exist in gcc as well. Host Operating System Information and Version Windows 10 Home, Version 10.0.17134 Build 17134 with latest updated MSYS2. GCC Version $ gcc -v Using built-in specs. COLLECT_GCC=D:\msys64_latest\mingw64\bin\gcc.exe COLLECT_LTO_WRAPPER=D:/msys64_latest/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/9.2.0/lto-wrapper.exe Target: x86_64-w64-mingw32 Configured with: ../gcc-9.2.0/configure --prefix=/mingw64 --with-local-prefix=/mingw64/local --build=x86_64-w64-mingw32 --host=x86_64-w64-mingw32 --target=x86_64-w64-mingw32 --with-native-system-header-dir=/mingw64/x86_64-w64-mingw32/include --libexecdir=/mingw64/lib --enable-bootstrap --with-arch=x86-64 --with-tune=generic --enable-languages=c,lto,c++,fortran,ada,objc,obj-c++ --enable-shared --enable-static --enable-libatomic --enable-threads=posix --enable-graphite --enable-fully-dynamic-string --enable-libstdcxx-filesystem-ts=yes --enable-libstdcxx-time=yes --disable-libstdcxx-pch --disable-libstdcxx-debug --disable-isl-version-check --enable-lto --enable-libgomp --disable-multilib --enable-checking=release --disable-rpath --disable-win32-registry --disable-nls --disable-werror --disable-symvers --enable-plugin --with-libiconv --with-system-zlib --with-gmp=/mingw64 --with-mpfr=/mingw64 --with-mpc=/mingw64 --with-isl=/mingw64 --with-pkgversion='Rev2, Built by MSYS2 project' --with-bugurl=https://sourceforge.net/projects/msys2 --with-gnu-as --with-gnu-ld Thread model: posix gcc version 9.2.0 (Rev2, Built by MSYS2 project) Binutils Version $ ld -v GNU ld (GNU Binutils) 2.32 MinGW Version I don't know what should I look for in the mingw/include/_mingw.h file. Build Environment $ uname -a MINGW64_NT-10.0-17134 MISO-PC 3.0.7-338.x86_64 2019-07-11 10:58 UTC x86_64 Msys Minimal Self-Contained Test Case main.cpp: #include <iostream> #include <immintrin.h> #define DBG(var_name) std::cout<<#var_name": "<<(var_name)<<std::endl // Output operator for vector std::ostream& operator<<(std::ostream& oss, const __m256i& v) { constexpr size_t length_bytes = 32; unsigned char a[length_bytes]; _mm256_storeu_si256(reinterpret_cast<__m256i*>(a), v); oss << "["; std::string sep = ""; for (size_t i=0; i<length_bytes; i++) { oss << sep << int(a[i]); sep = " "; } return oss << "]"; } __m256i __attribute__ ((noinline)) crash_vector_ret(uint8_t* a) { __m256i v; v = _mm256_loadu_si256(reinterpret_cast<const __m256i*>(a)); // Crash return v; } int main(int argc, char** argv) { // Setup memory from which the vector will be loaded const int a_size = ARRAY_SIZE; uint8_t a[a_size]; for (volatile int i=0; i<a_size; i++) { a[i] = i; } DBG(alignof(__m256i)); __m256i vr; vr = crash_vector_ret(a); DBG(vr); return 0; }Makefile: CXXFLAGS += -mavx2 CXXFLAGS += -std=c++17 CXXFLAGS += -g CXX = g++ main.exe: main.cpp $(CXX) $(CXXFLAGS) -O1 -DARRAY_SIZE=32 -o $@ $< test: main.cpp for COMPILER in g++; do \ ASM_FLAGS="-S -fverbose-asm -masm=intel"; \ for ARRAY_SIZE in 32 36; do \ for OPT in 0 1 2 3; do \ NAME="main_O$${OPT}_$${COMPILER}_s$${ARRAY_SIZE}"; \ echo "--------------------------------------------------"; \ echo "NAME:$${NAME}"; \ rm -f $${NAME}.exe; \ $${COMPILER} $(CXXFLAGS) -DARRAY_SIZE=$${ARRAY_SIZE} -O$${OPT} -o $${NAME}.exe main.cpp; \ $${COMPILER} $(CXXFLAGS) -DARRAY_SIZE=$${ARRAY_SIZE} -O$${OPT} $${ASM_FLAGS} -o $${NAME}.s main.cpp; \ ./$${NAME}.exe; \ done; \ done; \ doneTrying various compilation options: $ make test for COMPILER in g++; do \ ASM_FLAGS="-S -fverbose-asm -masm=intel"; \ for ARRAY_SIZE in 32 36; do \ for OPT in 0 1 2 3; do \ NAME="main_O${OPT}_${COMPILER}_s${ARRAY_SIZE}"; \ echo "--------------------------------------------------"; \ echo "NAME:${NAME}"; \ rm -f ${NAME}.exe; \ ${COMPILER} -mavx2 -std=c++17 -g -DARRAY_SIZE=${ARRAY_SIZE} -O${OPT} -o ${NAME}.exe main.cpp; \ ${COMPILER} -mavx2 -std=c++17 -g -DARRAY_SIZE=${ARRAY_SIZE} -O${OPT} ${ASM_FLAGS} -o ${NAME}.s main.cpp; \ ./${NAME}.exe; \ done; \ done; \ done -------------------------------------------------- NAME:main_O0_g++_s32 alignof(__m256i): 32 /bin/sh: line 3: 15627 Segmentation fault ./${NAME}.exe -------------------------------------------------- NAME:main_O1_g++_s32 alignof(__m256i): 32 /bin/sh: line 3: 15631 Segmentation fault ./${NAME}.exe -------------------------------------------------- NAME:main_O2_g++_s32 alignof(__m256i): 32 /bin/sh: line 3: 15635 Segmentation fault ./${NAME}.exe -------------------------------------------------- NAME:main_O3_g++_s32 alignof(__m256i): 32 /bin/sh: line 3: 15639 Segmentation fault ./${NAME}.exe -------------------------------------------------- NAME:main_O0_g++_s36 alignof(__m256i): 32 /bin/sh: line 3: 15643 Segmentation fault ./${NAME}.exe -------------------------------------------------- NAME:main_O1_g++_s36 alignof(__m256i): 32 vr: [0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31] -------------------------------------------------- NAME:main_O2_g++_s36 alignof(__m256i): 32 vr: [0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31] -------------------------------------------------- NAME:main_O3_g++_s36 alignof(__m256i): 32 vr: [0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31] GDB session Program compiled by g++ -mavx2 -std=c++17 -g -DARRAY_SIZE=32 -O0 -o main_O0_g++_s32.exe main.cpp $ gdb main_O0_g++_s32 GNU gdb (GDB) 8.3 Copyright (C) 2019 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-w64-mingw32". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from main_O0_g++_s32... (gdb) r Starting program: D:\SourceCode\cpp_simdpp_crash_loadu\main_O0_g++_s32.exe [New Thread 10112.0x4c8] [New Thread 10112.0x409c] [New Thread 10112.0x6bf0] alignof(__m256i): 32 Thread 1 received signal SIGSEGV, Segmentation fault. 0x00000000004016ea in crash_vector_ret (a=0x66fe10 "") at main.cpp:24 24 v = _mm256_loadu_si256(reinterpret_cast<const __m256i*>(a)); (gdb) bt #0 0x00000000004016ea in crash_vector_ret (a=0x66fe10 "") at main.cpp:24 #1 0x000000000040179f in main (argc=1, argv=0xe64a60) at main.cpp:40 (gdb) set disassembly-flavor intel (gdb) disas Dump of assembler code for function crash_vector_ret(unsigned char*): 0x00000000004016b7 <+0>: push rbp 0x00000000004016b8 <+1>: mov rbp,rsp 0x00000000004016bb <+4>: sub rsp,0x10 0x00000000004016bf <+8>: mov QWORD PTR [rbp+0x10],rcx 0x00000000004016c3 <+12>: mov QWORD PTR [rbp+0x18],rdx 0x00000000004016c7 <+16>: mov rax,QWORD PTR [rbp+0x18] 0x00000000004016cb <+20>: mov QWORD PTR [rbp-0x8],rax 0x00000000004016cf <+24>: mov rax,QWORD PTR [rbp-0x8] 0x00000000004016d3 <+28>: vmovdqu xmm0,XMMWORD PTR [rax] 0x00000000004016d7 <+32>: vinserti128 ymm0,ymm0,XMMWORD PTR [rax+0x10],0x1 0x00000000004016de <+39>: vmovdqa ymm1,ymm0 0x00000000004016e2 <+43>: vmovdqa ymm0,ymm1 0x00000000004016e6 <+47>: mov rax,QWORD PTR [rbp+0x10] => 0x00000000004016ea <+51>: vmovdqa YMMWORD PTR [rax],ymm0 0x00000000004016ee <+55>: nop 0x00000000004016ef <+56>: mov rax,QWORD PTR [rbp+0x10] 0x00000000004016f3 <+60>: add rsp,0x10 0x00000000004016f7 <+64>: pop rbp 0x00000000004016f8 <+65>: ret End of assembler dump. (gdb) p $rax $1 = 6749616 (gdb) p $rax/32*32 $2 = 6749600 As you can see, the $rax address is not aligned to 32 bytes and therefore the vmovdqa crashes. I thought that because of the alignof(__m256i) == 32, the compiler should guarantee that also the return value is aligned to 32 bytes. I might be wrong. Could you please shed more light on this issue? Thanks for any hints. -- Ticket information of MinGW - Minimalist GNU for Windows project MinGW - Minimalist GNU for Windows Project is hosted on OSDN Project URL: https://osdn.net/projects/mingw/ OSDN: https://osdn.net URL for this Ticket: https://osdn.net/projects/mingw/ticket/39565 RSS feed for this Ticket: https://osdn.net/ticket/ticket_rss.php?group_id=3917&tid=39565