VINSERTI128/VINSERTI32x4/VINSERTI64x2/VINSERTI32x8/VINSERTI64x4—Insert Packed Integer Values

Opcode/Instruction	Op /En	64/32 bit Mode Support	CPUID Feature Flag	Description
VEX.NDS.256.66.0F3A.W0 38 /r ib VINSERTI128 ymm1, ymm2, xmm3/m128, imm8	RVMI	V/V	AVX2	Insert 128 bits of integer data from xmm3/m128 and the remaining values from ymm2 into ymm1.
EVEX.NDS.256.66.0F3A.W0 38 /r ib VINSERTI32X4 ymm1 {k1}{z}, ymm2, xmm3/m128, imm8	T4	V/V	AVX512VL AVX512F	Insert 128 bits of packed doubleword integer values from xmm3/m128 and the remaining values from ymm2 into ymm1 under writemask k1.
EVEX.NDS.512.66.0F3A.W0 38 /r ib VINSERTI32X4 zmm1 {k1}{z}, zmm2, xmm3/m128, imm8	T4	V/V	AVX512F	Insert 128 bits of packed doubleword integer values from xmm3/m128 and the remaining values from zmm2 into zmm1 under writemask k1.
EVEX.NDS.256.66.0F3A.W1 38 /r ib VINSERTI64X2 ymm1 {k1}{z}, ymm2, xmm3/m128, imm8	T2	V/V	AVX512VL AVX512DQ	Insert 128 bits of packed quadword integer values from xmm3/m128 and the remaining values from ymm2 into ymm1 under writemask k1.
EVEX.NDS.512.66.0F3A.W1 38 /r ib VINSERTI64X2 zmm1 {k1}{z}, zmm2, xmm3/m128, imm8	T2	V/V	AVX512DQ	Insert 128 bits of packed quadword integer values from xmm3/m128 and the remaining values from zmm2 into zmm1 under writemask k1.
EVEX.NDS.512.66.0F3A.W0 3A /r ib VINSERTI32X8 zmm1 {k1}{z}, zmm2, ymm3/m256, imm8	T8	V/V	AVX512DQ	Insert 256 bits of packed doubleword integer values from ymm3/m256 and the remaining values from zmm2 into zmm1 under writemask k1.
EVEX.NDS.512.66.0F3A.W1 3A /r ib VINSERTI64X4 zmm1 {k1}{z}, zmm2, ymm3/m256, imm8	T4	V/V	AVX512F	Insert 256 bits of packed quadword integer values from ymm3/m256 and the remaining values from zmm2 into zmm1 under writemask k1.

Instruction Operand Encoding

Op/En	Operand 1	Operand 2	Operand 3	Operand 4
RVMI	ModRM:reg (w)	VEX.vvvv	ModRM:r/m (r)	Imm8
T2, T4, T8	ModRM:reg (w)	EVEX.vvvv	ModRM:r/m (r)	Imm8

Description

VINSERTI32x4 and VINSERTI64x2 inserts 128-bits of packed integer values from the second source operand (the third operand) into the destination operand (the first operand) at an 128-bit granular offset multiplied by imm8[0] (256-bit) or imm8[1:0]. The remaining portions of the destination are copied from the corresponding fields of the first source operand (the second operand). The second source operand can be either an XMM register or a 128-bit memory location. The high 6/7bits of the immediate are ignored. The destination operand is a ZMM/YMM register and updated at 32 and 64-bit granularity according to the writemask.

VINSERTI32x8 and VINSERTI64x4 inserts 256-bits of packed integer values from the second source operand (the third operand) into the destination operand (the first operand) at a 256-bit granular offset multiplied by imm8[0]. The remaining portions of the destination are copied from the corresponding fields of the first source operand (the second operand). The second source operand can be either an YMM register or a 256-bit memory location. The upper bits of the immediate are ignored. The destination operand is a ZMM register and updated at 32 and 64-bit granularity according to the writemask.

VINSERTI128 inserts 128-bits of packed integer data from the second source operand (the third operand) into the destination operand (the first operand) at a 128-bit granular offset multiplied by imm8[0]. The remaining portions of the destination are copied from the corresponding fields of the first source operand (the second operand). The second source operand can be either an XMM register or a 128-bit memory location. The high 7 bits of the imme-diate are ignored. VEX.L must be 1, otherwise attempt to execute this instruction with VEX.L=0 will cause #UD.

Operation

VINSERTI32x4 (EVEX encoded versions)

(KL, VL) = (8, 256), (16, 512)
TEMP_DEST[VL-1:0] (cid:197) SRC1[VL-1:0]
IF VL = 256
    CASE (imm8[0]) OF
    0:
    TMP_DEST[127:0] (cid:197) SRC2[127:0]
    1:
    TMP_DEST[255:128] (cid:197) SRC2[127:0]
    ESAC.
FI;
IF VL = 512
    CASE (imm8[1:0]) OF
    00:  TMP_DEST[127:0] (cid:197) SRC2[127:0]
    01:  TMP_DEST[255:128] (cid:197) SRC2[127:0]
    10:  TMP_DEST[383:256] (cid:197) SRC2[127:0]
    11:  TMP_DEST[511:384] (cid:197) SRC2[127:0]
    ESAC.
FI;
FOR j (cid:197) 0 TO KL-1
    i (cid:197) j * 32
    IF k1[j] OR *no writemask*
        THEN DEST[i+31:i] (cid:197) TMP_DEST[i+31:i]
        ELSE
        IF *merging-masking*
            ; merging-masking
            THEN *DEST[i+31:i] remains unchanged*
            ELSE
            ; zeroing-masking
            DEST[i+31:i] (cid:197) 0
        FI
    FI;
ENDFOR
DEST[MAX_VL-1:VL] (cid:197) 0

VINSERTI64x2 (EVEX encoded versions)

(KL, VL) = (4, 256), (8, 512)
TEMP_DEST[VL-1:0] (cid:197) SRC1[VL-1:0]
IF VL = 256
    CASE (imm8[0]) OF
    0:
    TMP_DEST[127:0] (cid:197) SRC2[127:0]
    1:
    TMP_DEST[255:128] (cid:197) SRC2[127:0]
    ESAC.
FI;
IF VL = 512
    CASE (imm8[1:0]) OF
    00:  TMP_DEST[127:0] (cid:197) SRC2[127:0]
    01:  TMP_DEST[255:128] (cid:197) SRC2[127:0]
    10:  TMP_DEST[383:256] (cid:197) SRC2[127:0]
    11:  TMP_DEST[511:384] (cid:197) SRC2[127:0]
    ESAC.
FI;
FOR j (cid:197) 0 TO KL-1
    i (cid:197) j * 64
    IF k1[j] OR *no writemask*
        THEN DEST[i+63:i] (cid:197) TMP_DEST[i+63:i]
        ELSE
        IF *merging-masking*
            ; merging-masking
            THEN *DEST[i+63:i] remains unchanged*
            ELSE
            ; zeroing-masking
            DEST[i+63:i] (cid:197) 0
        FI
    FI;
ENDFOR
DEST[MAX_VL-1:VL] (cid:197) 0

VINSERTI32x8 (EVEX.U1.512 encoded version)

TEMP_DEST[VL-1:0] (cid:197) SRC1[VL-1:0]
CASE (imm8[0]) OF
0: TMP_DEST[255:0] (cid:197) SRC2[255:0]
1: TMP_DEST[511:256] (cid:197) SRC2[255:0]
ESAC.
FOR j (cid:197) 0 TO 15
    i (cid:197) j * 32
    IF k1[j] OR *no writemask*
        THEN DEST[i+31:i] (cid:197) TMP_DEST[i+31:i]
        ELSE
        IF *merging-masking*
            ; merging-masking
            THEN *DEST[i+31:i] remains unchanged*
            ELSE
            ; zeroing-masking
            DEST[i+31:i] (cid:197) 0
        FI
    FI;
ENDFOR
DEST[MAX_VL-1:VL] (cid:197) 0

VINSERTI64x4 (EVEX.512 encoded version)

VL = 512
TEMP_DEST[VL-1:0] (cid:197) SRC1[VL-1:0]
CASE (imm8[0]) OF
0: TMP_DEST[255:0] (cid:197) SRC2[255:0]
1: TMP_DEST[511:256] (cid:197) SRC2[255:0]
ESAC.
FOR j (cid:197) 0 TO 7
    i (cid:197) j * 64
    IF k1[j] OR *no writemask*
        THEN DEST[i+63:i] (cid:197) TMP_DEST[i+63:i]
        ELSE
        IF *merging-masking*
            ; merging-masking
            THEN *DEST[i+63:i] remains unchanged*
            ELSE
            ; zeroing-masking
            DEST[i+63:i] (cid:197) 0
        FI
    FI;
ENDFOR
DEST[MAX_VL-1:VL] (cid:197) 0

VINSERTI128

TEMP[255:0] (cid:197)SRC1[255:0]
CASE (imm8[0]) OF
0: TEMP[127:0] (cid:197)SRC2[127:0]
1: TEMP[255:128] (cid:197)SRC2[127:0]
ESAC
DEST (cid:197)TEMP

Intel C/C++ Compiler Intrinsic Equivalent

VINSERTI32x4 _mm512i _inserti32x4( __m512i a, __m128i b, int imm);
VINSERTI32x4 _mm512i _mask_inserti32x4(__m512i s, __mmask16 k, __m512i a, __m128i b, int imm);
VINSERTI32x4 _mm512i _maskz_inserti32x4( __mmask16 k, __m512i a, __m128i b, int imm);
VINSERTI32x4 __m256i _mm256_inserti32x4( __m256i a, __m128i b, int imm);
VINSERTI32x4 __m256i _mm256_mask_inserti32x4(__m256i s, __mmask8 k, __m256i a, __m128i b, int imm);
VINSERTI32x4 __m256i _mm256_maskz_inserti32x4( __mmask8 k, __m256i a, __m128i b, int imm);
VINSERTI32x8 __m512i _mm512_inserti32x8( __m512i a, __m256i b, int imm);
VINSERTI32x8 __m512i _mm512_mask_inserti32x8(__m512i s, __mmask16 k, __m512i a, __m256i b, int imm);
VINSERTI32x8 __m512i _mm512_maskz_inserti32x8( __mmask16 k, __m512i a, __m256i b, int imm);
VINSERTI64x2 __m512i _mm512_inserti64x2( __m512i a, __m128i b, int imm);
VINSERTI64x2 __m512i _mm512_mask_inserti64x2(__m512i s, __mmask8 k, __m512i a, __m128i b, int imm);
VINSERTI64x2 __m512i _mm512_maskz_inserti64x2( __mmask8 k, __m512i a, __m128i b, int imm);
VINSERTI64x2 __m256i _mm256_inserti64x2( __m256i a, __m128i b, int imm);
VINSERTI64x2 __m256i _mm256_mask_inserti64x2(__m256i s, __mmask8 k, __m256i a, __m128i b, int imm);
VINSERTI64x2 __m256i _mm256_maskz_inserti64x2( __mmask8 k, __m256i a, __m128i b, int imm);
VINSERTI64x4 _mm512_inserti64x4( __m512i a, __m256i b, int imm);
VINSERTI64x4 _mm512_mask_inserti64x4(__m512i s, __mmask8 k, __m512i a, __m256i b, int imm);
VINSERTI64x4 _mm512_maskz_inserti64x4( __mmask m, __m512i a, __m256i b, int imm);
VINSERTI128 __m256i _mm256_insertf128_si256 (__m256i a, __m128i b, int offset);

SIMD Floating-Point Exceptions

None

Other Exceptions

VEX-encoded instruction, see Exceptions Type 6; additionally

If VEX.L = 0.

EVEX-encoded instruction, see Exceptions Type E6NF.