.. _language-appendix: Appendix ======== .. _language-appendix-simd: SIMD Mode --------- Many of the :ref:`builtins for DSD operations` have a SIMD (single instruction, multiple data) mode, in which multiple operations can be performed in a single cycle. Under appropriate conditions, these builtins will automatically execute in SIMD mode when operating on DSDs, if possible. In particular, builtins can only operate at their full SIMD width if no bank conflicts occur when fetching the operands from memory. The 48 KB of memory in a PE are laid out into 8 banks of 6 KB each. Each successive 16 bits are located in successive banks. In a single cycle, the PE can perform two 32-bit reads and one 32-bit write. However, the reads must occur from separate banks. More specifically, if the 8 banks are numbered 0 to 7, then the bank IDs ``bank_1`` and ``bank_2`` of the two reads must be such that ``bank_1 % 4 != bank_2 % 4``. For best results in avoiding bank conflicts with SIMD operations, the operand addresses should be 32-bit aligned, and ``(src0_addr % 8) == ((src1_addr + 4) % 8)``, where ``src0_addr`` and ``src1_addr`` are the addresses of the operands. Dumping the ELF file's symbol table with the ``--sym`` option of ``cs_readelf`` provides addresses and banking information for all symbols in a compiled CSL program, and can be useful for determining if bank conflicts may occur. Additionally, if the DSD operands have non-contiguous strided accesses, the SIMD width may be limited: - Strides of 0 and 1 can operate at full SIMD width. - Strides such that ``stride % 8`` is 2, 3, 5, or 6 can operate at full SIMD width. - Unless stride is 1, strides such that ``stride % 8`` is 1 or 7 is limited to two operations per cycle, or a SIMD width of 2. - Unless stride is 0, strides such that ``stride % 8`` is 0 is limited to one operation per cycle, or a SIMD width of 1. - Strides such that ``stride % 8`` is 4 are limited to two operations per cycle, or a SIMD width of 4. The maximum width of builtins which can operate in SIMD mode are given in the table below. ================= ================ ================ Builtin WSE-2 SIMD Width WSE-3 SIMD Width ================= ================ ================ :ref:`@add16` 4 8 :ref:`@addc16` 1 8 :ref:`@and16` 4 8 :ref:`@fabsh` 4 8 :ref:`@fabss` 2 4 :ref:`@faddh` 4 8 :ref:`@faddhs` 2 4 :ref:`@fadds` 2 4 :ref:`@fnormh` 4 8 :ref:`@fnorms` 2 4 :ref:`@fh2s` 1 4 :ref:`@fh2xp16` 1 8 :ref:`@fmach` 4 8 :ref:`@fmachs` 2 4 :ref:`@fmaxh` 1 8 :ref:`@fmaxs` 1 4 :ref:`@fmovh` 4 8 :ref:`@fmovs` 2 4 :ref:`@fmulh` 4 8 :ref:`@fnegh` 4 8 :ref:`@fnegs` 2 4 :ref:`@fs2h` 1 4 :ref:`@fs2xp16` 1 4 :ref:`@fscaleh` 4 8 :ref:`@fscales` 2 4 :ref:`@fsubh` 4 8 :ref:`@fsubs` 2 4 :ref:`@mov16` 4 8 :ref:`@mov32` 2 4 :ref:`@or16` 4 8 :ref:`@sar16` 1 4 :ref:`@sll16` 1 4 :ref:`@slr16` 1 4 :ref:`@sub16` 4 8 :ref:`@xor16` 4 8 :ref:`@xp162fh` 1 8 :ref:`@xp162fs` 1 4 ================= ================ ================