MOVSHDUP—Replicate Single FP Values

Opcode/Instruction	Op /En	64/32 bit Mode Support	CPUID Feature Flag	Description
F3 0F 16 /r MOVSHDUP xmm1, xmm2/m128	RM	V/V	SSE3	Move odd index single-precision floating-point values from xmm2/mem and duplicate each element into xmm1.
VEX.128.F3.0F.WIG 16 /r VMOVSHDUP xmm1, xmm2/m128	RM	V/V	AVX	Move odd index single-precision floating-point values from xmm2/mem and duplicate each element into xmm1.
VEX.256.F3.0F.WIG 16 /r VMOVSHDUP ymm1, ymm2/m256	RM	V/V	AVX	Move odd index single-precision floating-point values from ymm2/mem and duplicate each element into ymm1.
EVEX.128.F3.0F.W0 16 /r VMOVSHDUP xmm1 {k1}{z}, xmm2/m128	FVM	V/V	AVX512VL AVX512F	Move odd index single-precision floating-point values from xmm2/m128 and duplicate each element into xmm1 under writemask.
EVEX.256.F3.0F.W0 16 /r VMOVSHDUP ymm1 {k1}{z}, ymm2/m256	FVM	V/V	AVX512VL AVX512F	Move odd index single-precision floating-point values from ymm2/m256 and duplicate each element into ymm1 under writemask.
EVEX.512.F3.0F.W0 16 /r VMOVSHDUP zmm1 {k1}{z}, zmm2/m512	FVM	V/V	AVX512F	Move odd index single-precision floating-point values from zmm2/m512 and duplicate each element into zmm1 under writemask.

Instruction Operand Encoding

Op/En	Operand 1	Operand 2	Operand 3	Operand 4
RM	ModRM:reg (w)	ModRM:r/m (r)	NA	NA
FVM	ModRM:reg (w)	ModRM:r/m (r)	NA	NA

Description

Duplicates odd-indexed single-precision floating-point values from the source operand (the second operand) to adjacent element pair in the destination operand (the first operand). The source operand is an XMM, YMM or ZMM register or 128, 256 or 512-bit memory location and the destination operand is an XMM, YMM or ZMM register.

128-bit Legacy SSE version: Bits (MAX_VL-1:128) of the corresponding destination register remain unchanged.

VEX.128 encoded version: Bits (MAX_VL-1:128) of the destination register are zeroed.

VEX.256 encoded version: Bits (MAX_VL-1:256) of the destination register are zeroed.

EVEX encoded version: The destination operand is updated at 32-bit granularity according to the writemask.

Note: VEX.vvvv and EVEX.vvvv are reserved and must be 1111b otherwise instructions will #UD.

Operation

VMOVSHDUP (EVEX encoded versions)

(KL, VL) = (4, 128), (8, 256), (16, 512)
TMP_SRC[31:0] (cid:197) SRC[63:32]
TMP_SRC[63:32] (cid:197) SRC[63:32]
TMP_SRC[95:64] (cid:197) SRC[127:96]
TMP_SRC[127:96] (cid:197) SRC[127:96]
IF VL >= 256
    TMP_SRC[159:128] (cid:197) SRC[191:160]
    TMP_SRC[191:160] (cid:197) SRC[191:160]
    TMP_SRC[223:192] (cid:197) SRC[255:224]
    TMP_SRC[255:224] (cid:197) SRC[255:224]
FI;
IF VL >= 512
    TMP_SRC[287:256] (cid:197) SRC[319:288]
    TMP_SRC[319:288] (cid:197) SRC[319:288]
    TMP_SRC[351:320] (cid:197) SRC[383:352]
    TMP_SRC[383:352] (cid:197) SRC[383:352]
    TMP_SRC[415:384] (cid:197) SRC[447:416]
    TMP_SRC[447:416] (cid:197) SRC[447:416]
    TMP_SRC[479:448] (cid:197) SRC[511:480]
    TMP_SRC[511:480] (cid:197) SRC[511:480]
FI;
FOR j (cid:197) 0 TO KL-1
    i (cid:197) j * 32
    IF k1[j] OR *no writemask*
        THEN DEST[i+31:i] (cid:197) TMP_SRC[i+31:i]
        ELSE
        IF *merging-masking*
            ; merging-masking
            THEN *DEST[i+31:i] remains unchanged*
            ELSE
            ; zeroing-masking
            DEST[i+31:i] (cid:197) 0
        FI
    FI;
ENDFOR
DEST[MAX_VL-1:VL] (cid:197) 0

VMOVSHDUP (VEX.256 encoded version)

DEST[31:0] (cid:197) SRC[63:32]
DEST[63:32] (cid:197) SRC[63:32]
DEST[95:64] (cid:197) SRC[127:96]
DEST[127:96] (cid:197) SRC[127:96]
DEST[159:128] (cid:197) SRC[191:160]
DEST[191:160] (cid:197) SRC[191:160]
DEST[223:192] (cid:197) SRC[255:224]
DEST[255:224] (cid:197) SRC[255:224]
DEST[MAX_VL-1:256] (cid:197) 0

VMOVSHDUP (VEX.128 encoded version)

DEST[31:0] (cid:197) SRC[63:32]
DEST[63:32] (cid:197) SRC[63:32]
DEST[95:64] (cid:197) SRC[127:96]
DEST[127:96] (cid:197) SRC[127:96]
DEST[MAX_VL-1:128] (cid:197) 0

MOVSHDUP (128-bit Legacy SSE version)

DEST[31:0] (cid:197)SRC[63:32]
DEST[63:32] (cid:197)SRC[63:32]
DEST[95:64] (cid:197)SRC[127:96]
DEST[127:96] (cid:197)SRC[127:96]
DEST[MAX_VL-1:128] (Unmodified)

Intel C/C++ Compiler Intrinsic Equivalent

VMOVSHDUP __m512 _mm512_movehdup_ps( __m512 a);
VMOVSHDUP __m512 _mm512_mask_movehdup_ps(__m512 s, __mmask16 k, __m512 a);
VMOVSHDUP __m512 _mm512_maskz_movehdup_ps( __mmask16 k, __m512 a);
VMOVSHDUP __m256 _mm256_mask_movehdup_ps(__m256 s, __mmask8 k, __m256 a);
VMOVSHDUP __m256 _mm256_maskz_movehdup_ps( __mmask8 k, __m256 a);
VMOVSHDUP __m128 _mm_mask_movehdup_ps(__m128 s, __mmask8 k, __m128 a);
VMOVSHDUP __m128 _mm_maskz_movehdup_ps( __mmask8 k, __m128 a);
VMOVSHDUP __m256 _mm256_movehdup_ps (__m256 a);
VMOVSHDUP __m128 _mm_movehdup_ps (__m128 a);

SIMD Floating-Point Exceptions

None

Other Exceptions

Non-EVEX-encoded instruction, see Exceptions Type 4;

EVEX-encoded instruction, see Exceptions Type E4NF.nb.

If EVEX.vvvv != 1111B or VEX.vvvv != 1111B.