Nov 21, 2024 5 min read

Diagrams Explained (Quickly)

This post covers the foundations of diagrams—how they represent data types and the functions between them—with minimal fluff. There is no category theory in this post, which may encourage or dissuade you depending on your background. If details are missing, I've oversimplified; if too many details are still present, I've undersupplied. This work is the first in a series that previews a paper I am finishing up, which will go into how to use diagrams to derive optimal implementations of algorithms, how to analyze the performance models they produce, and how to configure algorithms for specific CUDA implementations.

Array types such as $\mathbb{R}^{a\times b\times c}$ are represented in diagrams by adjacent labeled wires arranged into a column, each of which represents an axis constituting its shape. Data types consist of lists of arrays of a particular shapes and are represented with dashed lines separating the constituent array shapes. Stacking data types with a dashed wire between them, therefore, concatenates them.

Figure 1 We represent arrays, of forms such as $\displaystyle \mathbb{R}^{a\times b\times c}$ , by labeling stacked wires in a column with $\displaystyle a$ , $\displaystyle b$ , and $\displaystyle c$ . To represent data types that consist of lists of arrays, such as $\displaystyle \mathbb{R}^{a\times b\times c} \times \mathbb{R}^{d\times e}$ , we place a dashed line between them.

Functions between data types are represented by labeled boxes or pictograms with their input/output shapes to the left/right. Sequential execution (composition) of functions is shown by horizontal placement, creating a diagram with alternating columns representing data type and functions. Parallel execution (concatenation) of functions stacks them with a dashed line in between. A concatenated function takes concatenated inputs and provides concatenated outputs. The diagram reflects the change in the input/output types.

Figure 2 Functions are represented by labeled boxes or pictograms, which aid intuition. These representations can be horizontally composed, which represents sequential execution and yields a diagram with alternating data type and function columns. We represent composition by $F;G=G\circ F$ .

Figure 3 Functions can also be stacked with a separating dashed line, which concatenates their inputs and outputs. Concatenating lists is associative and represented by $\displaystyle x\otimes y$ . If $\displaystyle x$ or $\displaystyle y$ are not lists, they are first assembled into a single member list such as $\displaystyle [ x]$ then concatenated. Concatenating functions $\displaystyle F$ and $\displaystyle G$ gives a function $\displaystyle F\otimes G$ such that, if $\displaystyle F( x) =x'$ and $\displaystyle G( y) =y'$ , then $\displaystyle ( F\otimes G)( x\otimes y) =F( x) \otimes G( y)$ .

We represent identity functions that leave inputs unchanged by extending the data type. This reflects that composition with identities leaves a function unchanged. Functions are stateless and are defined by how they map inputs to outputs. Therefore, we concatenate with identities to represent a function acting on some data but leaving the rest unchanged and available.

Figure 4 A compound diagram can be disassembled into columns representing alternating functions and data types. Stacked functions and data types can be further decomposed to find the core units concatenated to construct them. Identities are represented by continuing the representation of data types. This reflects that composition leaves functions unchanged and that concatenation with identities keeps unused data in memory.

Functions can be mapped over an additional axis, represented by weaving the axis over the function's outputs and a choice of the inputs. This diagrammatic implementation naturally updates the input and output sizes for the mapped function. When an input segment is not weaved, its data is copied to evaluate each index along the outputs of the new axis. The axis can be weaved into any location of the target segments. A thick dotted wire represents the "item" array, which contains a single member of $\displaystyle \mathbb{R}$ . Weaving a thick dotted wire by an axis $\displaystyle q$ results in an axis of size $\displaystyle \mathbb{R}^{q}$ , which can be represented by the weaved axis alone.

Figure 5 A function can be weaved, which adds an axis to the outputs and some of the inputs. The function is mapped over this axis. When we weave the ``item'' array represented by a thick dotted wire, we can remove it. Here, we provide a weaving for SoftMax, represented by a triangle, to have it act over each row of an array. We also show the weaving of the dot product, which provides matrix multiplication.

Weaving a function allows for complex mappings to be represented and avoids the ambiguity of typical expressions. We can weave primitives defined on items, such as multiplication, addition, and copying. We can also use weaving to represent the splitting and joining of shared axes, which overcomes the typical difficulties of expressing how arrays are partitioned and concatenated.

Figure 6 Weaving primitive functions lets us express the addition of vectors to each row of an array, multiplication over a specific axis, and copying arrays. We can use weaving to split an array along slices of an axis or show the axes over which arrays are joined. Here, we provide various primitive functions and examples of how they can be weaved.

This simple procedure—wires to represent axes, vertical stacking and dashed lines to represent concatenation, and weaving to represent mapping—can be used to describe the details of complex algorithms. Attention, for instance, is typically expressed by $\displaystyle O=\text{SoftMax}( Q\cdot K^{T} d^{-0.5}) \cdot V$ , which contains little information about the size of data or the axes over which we operate. With diagrams, we can express attention with all the details of axes by the figure below:

Figure 7 Attention can be expressed by using a cross over to represent a transpose, cups to represent matrix multiplication, and a triangle to represent SoftMax. Element wise multiplication by $\displaystyle d^{-0.5}$ is mapped over the $\displaystyle \overline{q}$ and $\displaystyle \overline{x}$ axis, meaning it is a hovering symbol. All details of axes are represented above.