Tapestry Graph Type Theory Explorations

I’m spending time exploring how to build the type theory for the graphs representing the internals of my Tapestry project.

Consider an expression graph like the following, representing a small expression:

It’s important that this graph lend itself to easy to understand and use APIs, while also being easy to implement fully and correctly. The Tapestry project lives and dies on the ease with which backend compiler and optimizer passes can be written; so the more investment put into getting an expressive yet concise representation I put in, the cheaper and faster everything that follows will be.

Much of that is working out what should be in the graph language; but a lot of that has been done in explorations and theory work on Tapestry so far; under concern at the moment is the internals of the type theory which structures the graph itself.

Node

Consider a simple Node, with an id, making it distinct from other nodes; this will form the basis of most of our type theory:

Note: We’ll use UUIDs for the id; because symbol generation is easy, and because we’d like to be able to compare graphs across a timeline history of mutations to the same graph, so ids in a global namespace work better; we won’t reuse them. Are UUIDs “unique enough”? Almost certainly, yes. We can make sure we don’t reuse them in the same graph, if we really care; but collisions are exceptionally unlikely, and impossible when generated by the same UUID library during the same program execution.

Tensor Node

We know we need Tensors holding values; but our graph model doesn’t see the values, so we mainly need a tensor to represent the space a value could be in. We can extend the Node and add a shape and a dtype:

Structured Edges

We need to establish links between nodes, and some of our links carry data. We can introduce a new kind of node, called an Edge Node, and give it a sourceId and a targetId:

Inputs and Results

We also know that some operations read Tensor Nodes, and some produce them.

We can introduce an Input Edge and a Result Edge, to create that link; but we’ll also include a key field on that edge so we can distinguish which input or output a given tensor is bound to.

We can also introduce some abstract types, HasInputs and HasResults, to describe nodes which are permitted to have these attached edges.

Operation Nodes

We run into problems when we begin to describe operation nodes, representing processes to be applied to tensors to produce new tensors.

We do know that all operations will want to have Parameters, which we can model as part of the node, or as an attached linked node; there are arguments for both, but since we’re aiming to shard operations, we may want to make a choice which permits them to share Parameter nodes after sharding, so we can more easily trace the evolution of sharding plans.

At issue is that we have a number of properties we’d like to be able to attach to an operation node, and the properties don’t fit into a simple type hierarchy.

  • Does this operation have external side effects, which we need to sequence appropriately into “Happens Before”/“Happens After” schedules relative to other operations with side effects?
  • Is this operation cell-aligned and intrinsically shardable?
  • If this operation is not cell-aligned, do we have a block index and shape signature which permit us to shard or slice the operation?

Additionally, we’d like to be able to include operation nodes which are Macro Nodes; which expand into subgraphs of other operations. These can be described by the same properties; side effects, cell-aligned, index and signature bearing.

We can say that “has IO”/“has no IO” is one type property; and “cell-aligned” / “signature” is another, and make the actual presence of the index and signature optional (if it’s missing, we can’t shard).

But a given operation could have either form of either property; determined not by the node type but by the intrinsics of the operation internals being described.

So we’re forced to either:

  • expand to the cross-product family of base node types, and special handle each of them;
  • annotate each operation with one or the other as a property;
  • treat operation classes as CSS-like union properties.

It’s very tempting to try and shoehorn these types into the type theory of existing languages, so that we can take advantage of the language’s static analysis tooling to help us autocomplete compiler backend code during writing, and detect bugs and errors; using either mix-ins or interface extensions; but there is a risk that similar type distinctions will arise in the future, forcing every more elaborate type hacks. And the larger risk is that these type hacks won’t be aligned between languages; the python api may have very different type hierarchies than the java api, for instance.

The alternative to this approach is to build a graph that works more like an HTML/XML DOM tree; where attaching properties and classes to nodes is independent of running schema validators over the tree. The api is somewhat more verbose, and we lose the language’s external static analysis tooling; but we get an api that can be implemented the same way across languages, and we can share schema specifications for type checking.

I’ve put some time into trying to solve this via embedding in Java; and I think I’ve reached the limits of expressibility trying to model the “is IO” relationship; and it’s leading to code that looks like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
@JsonTypeName("ResultOf")
@TTagBase.SourceType(TTensor.class)
@TEdgeBase.TargetType(TOperatorBase.class)
@NodeDisplayOptions.NodeAttributes(
value = {@NodeDisplayOptions.Attribute(name = "fillcolor", value = "#A7E1D5")})
public static final class TResultEdge
extends TTargetKeyedEdge<TResultEdge, TTensor, TOperatorBase> {
@JsonCreator
public TResultEdge(
@Nullable @JsonProperty(value = "id", required = true) UUID id,
@Nonnull @JsonProperty(value = "sourceId", required = true) UUID sourceId,
@Nonnull @JsonProperty(value = "targetId", required = true) UUID targetId,
@Nonnull @JsonProperty(value = "key", required = true) String key) {
super(id, sourceId, targetId, key);
}

public TResultEdge(@Nonnull UUID sourceId, @Nonnull UUID targetId, @Nonnull String name) {
this(null, sourceId, targetId, name);
}

public TResultEdge(@Nonnull TResultEdge source) {
this(source.id, source.sourceId, source.targetId, source.key);
}

@Override
public TResultEdge copy() {
return new TResultEdge(this);
}
}

So I’m going to explore the DOM tree model next.