.. _sdk-rel-notes-cumulative.rst: SDK Release Notes ================= The following are the release notes for the Cerebras SDK. .. _v1-4-0: Version 1.4.0 ------------- Released 26 May 2025 .. note:: The Cerebras Wafer-Scale Cluster appliance running Cerebras ML Software 2.4 supports SDK 1.3. `See here for SDK 1.3 documentation `_. The Cerebras Wafer-Scale Cluster appliance running Cerebras ML Software 2.5 supports SDK 1.4, the current version of SDK software. New features and enhancements ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - (beta) New ``SdkLayout`` program layout specification API: - Introduces a new ``SdkLayout`` Python API for specifying program layout. This API allows the user to define retangular code regions, define color routing and switching, automatically allocate colors, and automatically route between code regions. - Introduces several example programs demonstrating the use of the ``SdkLayout`` API. See the list of new example programs below. - Introduces new documentation for this API. See :ref:`sdklayout-api-reference`. - This API is in **beta**. The ``memcpy`` API for data transfers and remote kernel launches is not currently supported. CSL libraries with their own internal color routing are not currently supported. - CSL language and compiler enhancements: - ``@map`` now supports explicit DSR arguments. DSR input arguments must be ``dsr_src1`` and DSR output arguments must be ``dsr_dest``. All DSR arguments should be loaded with the ``single_step`` property set. For example: .. code-block:: csl param inDSR: dsr_src1; param outDSR: dsr_dest; task foo() void { // Compute the square-root of each element of `memDSD` and // send it out to `faboutDSD`. @load_to_dsr(inDSR, memDSD, .{.single_step = true}); @load_to_dsr(outDSR, faboutDSD, .{.single_step = true}); @map(math_lib.sqrt_f16, inDSR, outDSR); } - Introduces support for ``cb16`` (``cbfloat16``) and ``bfloat16`` (bfloat) 16-bit floating point types, and the associated ``@fp16()`` builtin. See :ref:`language-builtins-fp16` and :ref:`language-types`. ``cbfloat16`` is a Cerebras-specific 16-bit floating point format with a 6-bit exponent and 9-bit explicit mantissa. - On WSE-3, introduces support for microthread priority via the ``.priority`` field in ``@get_dsd`` for ``fabin_dsd`` and ``fabout_dsd``, and in ``@allocate_fifo``. See :ref:`language-dsds`. - CSL library enhancements: - Introduces 3D FFT kernel library. See :ref:`language-libraries-kernels-fft`. - Introduces ``tile_config.input_queue_status`` and ``tile_config.output_queue_status`` to query input and output queue full/ empty status registers. See :ref:`language-libraries-tile-config-input-queue-status` and :ref:`language-libraries-tile-config-output-queue-status`. - ``SdkRuntime`` host runtime enhancements: - Introduces the ``SdkRuntime`` direct link API functions ``send`` and ``receive``, which are used to stream data into or out of the wafer via program input and output ports. This API can be used with ``SdkLayout`` as demonstrated in :ref:`sdkruntime-sdklayout-04-h2d-d2h`. See :ref:`sdkruntime-api-reference`. - Example programs: - Introduces a series of example programs demonstrating the new ``SdkLayout`` API: - :ref:`sdkruntime-sdklayout-01-introduction` introduces the ``SdkLayout`` API with a single-PE program. - :ref:`sdkruntime-sdklayout-02-routing` demonstrates color routing with the ``SdkLayout`` API and automatic color allocation. - :ref:`sdkruntime-sdklayout-03-ports-and-connections` demonstrates automatic routing between code regions. - :ref:`sdkruntime-sdklayout-04-h2d-d2h` demonstrates the use of the ``SdkRuntime`` direct link API with ``SdkLayout`` to create host-to-device and device-to-host streams. - :ref:`sdkruntime-sdklayout-05-gemv` implements a full GEMV program with the ``SdkLayout`` API. - Introduces an example using the 3D FFT kernel library. See :ref:`sdkruntime-fft-3d`. Resolved issues ~~~~~~~~~~~~~~~ - Fixes incorrect parsing of CSL if statements whose body is an assignment without braces (e.g. ``if (cond) lhs = rhs;``) - On WSE-2, fixes bug in which ``@set_color_config`` did not support all 6 available filters. Previously, only the first four were available. - Fixes potential stall caused by sending many small data transfers via ``SdkRuntime``. - Appliance mode compilation via ``SdkCompiler`` no longer allocates a system while compiling. - Appliance mode SDK jobs launched via ``SdkCompiler``, ``SdkLauncher``, or ``SdkRuntime`` now exit gracefully. Known issues ~~~~~~~~~~~~ - The ``25-pt-stencil``, ``histogram-torus``, and ``spmv-hypersparse`` benchmark examples are not supported on WSE-3. - Instruction traces in the SDK GUI are not supported on WSE-3. - The bandwidth of memory transfers saturates at around 8 IO channels. Deprecations ~~~~~~~~~~~~ - In CSL, calling a task is now an error. Only functions may be called. Tasks must be activated. - In CSL, dereference or access of pointers into config space is now illegal. The ``@get_config`` and ``@set_config`` builtins should be used instead. - WSE-1 is no longer supported. .. _v1-3-0: Version 1.3.0 ------------- Released 13 December 2024 New features and enhancements ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - CSL language and compiler enhancements: - For DSD definitions, a tensor access expression is now shorthand for a ``comptime_struct`` with ``extent``, ``stride``, and ``base_address`` fields. DSDs can now also be specified using these fields directly, for example: .. code-block:: csl // These two definitions are equivalent: var my_dsd = @get_dsd(mem1d_dsd, .{ .extent = 10, .stride = 2, .base_address = &my_arr }); var my_dsd = @get_dsd(mem1d_dsd, .{ .tensor_access = |i|{10} -> my_arr[2*i] }); ``stride`` is an optional parameter with default value 1. See :ref:`language-dsds-mem1d-tensor-access` for more information. - Memory DSD properties can now take runtime values when using the individual field specification format. However, ``mem4d_dsd`` extent and stride must still be comptime known. - Introduces inline functions, which are expanded during semantic analysis. See :ref:`language-syntax` for more information. - Introduces labeled ``break`` and the ability to break values from blocks. See :ref:`language-syntax` for more information. - Improves performance of CSL's parser, potentially improving program compile times. - Improves DSR allocation diagnostics when using DSDs. Upon failure to allocate, diagnostics now contain information about operations that prevent a DSR from being allocated. - CSL library enhancements: - Introduces a ```` library which provides wrappers around DSD op builtins that select an appropriate builtin depending on the underlying data types, enabling more concise and flexible code when supporting multiple data types. See :ref:`language-libraries-dsd-ops` for more information. - ``SdkRuntime`` host runtime enhancements: - Introduces a strided version of ``memcpy_h2d`` for strided host-to-device data transfers. See ``memcpy_h2d_stride`` in :ref:`sdkruntime-api-reference`. - Introduces row and column broadcast variants of ``memcpy_h2d`` for host-to-device row and column broadcasts. See ``memcpy_h2d_colbcast`` and ``memcpy_h2d_rowbcast`` in :ref:`sdkruntime-api-reference`. Also see the example program :ref:`sdkruntime-row-col-broadcast`. - Example programs: - Introduces a new example program :ref:`sdkruntime-row-col-broadcast` to demonstrate row and column broadcasts for host-to-device data transfers. Resolved issues ~~~~~~~~~~~~~~~ - Fixes an issue in the ```` library where messages were limited to only 16 wavelets. The maximum message size is 32 wavelets. - Fixes bugs in the ```` library in which ``encode_payload()`` could index out of bounds, and not set ``NOCE`` bit on unused commands. - Fixes a bug in which sequential ``@map`` operations within a function would not be able to reuse DSRs. Known issues ~~~~~~~~~~~~ - The ``25-pt-stencil``, ``histogram-torus``, and ``spmv-hypersparse`` benchmark examples are not yet supported on WSE-3. - Instruction traces in the SDK GUI are not yet supported on WSE-3. - The bandwidth of memory transfers saturates at around 8 IO channels. .. _v1-2-0: Version 1.2.0 ------------- Released 28 June 2024 .. note:: The Cerebras Wafer-Scale Cluster appliance running Cerebras ML Software 2.2 supports SDK 1.1. `See here for SDK 1.1 documentation `_. The Cerebras Wafer-Scale Cluster appliance running Cerebras ML Software 2.3 supports SDK 1.2, the current version of SDK software. New features and enhancements ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - CSL language and compiler enhancements: - Introduces ``inline`` ``for``-loops, which are unrolled at compile time. The body of an ``inline`` ``for``-loop may assign to a ``comptime`` variable. For example: .. code-block:: csl fn length(comptime array: anytype) comptime_int { comptime var result = 0; // This loop will be inlined. inline for (array) |v| { result += 1; } return result; } - Introduces the ``@queue_flush`` and ``@set_empty_queue_handler`` builtin for WSE-3. See :ref:`language-builtins-wse3-qflush`. - Runtime ``on_control`` values in DSD operations are now supported. For example: .. code-block:: csl fn f(out: fabout_dsd, in: fabin_dsd, act_id: local_task_id) void { @fmovh(out, in, .{ .async = true, .on_control = .{ .activate = act_id }}); } - Improves ``void`` type semantics, enabling optionally specified module parameters and function arguments. - Significantly improves compile times for large programs. Compilation time for full-wafer programs may be improved as much as 10x. - CSL library enhancements: - Introduces a ```` library for runtime debug printing to the simulator log. See :ref:`language-libraries-simprint`. - Introduces a ```` library for creating control wavelet payloads. See :ref:`language-libraries-control`. - Introduces a ```` library for WSE-3 point-to-point communication. See :ref:`language-libraries-message-passing`. - Introduces the ``queue_flush`` module within the ```` library for WSE-3, which can be used for querying when a queue is flushed and to exit the flushed state. See :ref:`language-libraries-wse3-tile-config-queue-flush`. - Adds WSE-3 support to the ``collectives_2d`` library. - ``SdkRuntime`` host runtime enhancements: - Adds WSE-3 support for ``memcpy`` streaming mode. - Example programs: - Reorganizes and updates all tutorial example programs with WSE-3 support. - Introduces two new tutorial examples for switches, demonstrating use of the ```` library. See :ref:`sdkruntime-topic-06-switches` and :ref:`sdkruntime-topic-07-switches-entrypt`. - Introduces a new tutorial example to demonstrate the ```` library. See :ref:`sdkruntime-topic-13-simprint`. - Introduces a new tutorial example to demonstrate color swapping on WSE-2. See :ref:`sdkruntime-topic-14-color-swap`. - Adds WSE-3 support to the ``wide-multiplication``, ``residual``, ``mandelbrot``, ``gemv-collectives_2d``, ``gemv-checkerboard-pattern``, ``gemm-collectives_2d``, ``7pt-stencil-spmv``, ``bicgstab``, ``conjugateGradient``, ``preconditionedConjugateGradient``, and ``powerMethod`` benchmark example programs. Resolved issues ~~~~~~~~~~~~~~~ - Adds ``memcpy`` streaming support for WSE-3. - Adds WSE-3 support for the ```` library. - Fixes potential bug in the ```` library related to reconfiguring the library's colors. - Fixes potential bug in the ```` library related to reconfiguring the library's colors. Known issues ~~~~~~~~~~~~ - The ``25-pt-stencil``, ``histogram-torus``, and ``spmv-hypersparse`` benchmark examples are not yet supported on WSE-3. - The SDK GUI is not yet supported on WSE-3. - The bandwidth of memory transfers saturates at around 8 IO channels. Deprecations ~~~~~~~~~~~~ - The deprecated ``@get_color_id`` builtin to get the numerical value of a color is now removed. Use ``@get_int`` instead. - Use of ``@get_color`` on any ID other than a routable color ID is no longer supported. - ``tile_config.reg_ptr`` has been removed. Use ``@get_config`` and ``@set_config`` for direct manipulation of config space addresses. .. _v1-1-0: Version 1.1.0 ------------- Released 10 April 2024 This version of the Cerebras SDK is the first with experimental support for the WSE-3, the third generation Cerebras architecture. The WSE-3 is the wafer-scale processor powering the CS-3 Cerebras system. .. note:: The Cerebras Wafer-Scale Cluster appliance running Cerebras ML Software 2.0 supports SDK 0.9. `See here for SDK 0.9 documentation `_. The Cerebras Wafer-Scale Cluster appliance running Cerebras ML Software 2.1 supports SDK 1.0. `See here for SDK 1.0 documentation `_. The Cerebras Wafer-Scale Cluster appliance running Cerebras ML Software 2.2 supports SDK 1.1, the current version of SDK software. New features and enhancements ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - CSL language and compiler enhancements: - Introduces initial support for WSE-3. - Introduces ``ut_id`` type and ``@get_ut_id`` builtin for representing microthread IDs. This feature is WSE-3 only. - Introduces runtime ``@get_config`` and ``@set_config`` support. - Introduces ``i64`` and ``u64`` types, and support in ````, ````, and ```` libraries. Like ``i8`` and ``u8``, these types are not allowed in memory DSD tensors or ``@map``, nor as arguments to tasks. - CSL ``memcpy`` library enhancements: - ``memcpy/get_params`` no longer requires specifying a ``LAUNCH`` color for host kernel launch support. - The ``@rpc`` builtin is no longer necessary for host kernel launch support. The RPC server is now created internally. - Other CSL library enhancements: - Introduces ``reset_tsc_counter()`` function in ``