Runtime definition of graph resources

Problem statement

There are times when a graph resource’s node and arc types may need to be extended at run-time rather than compile time. The current design provides type-safety but also requires all the node and arc types to be provided in a std::tuple<> at build time. There are two motivating capabilities that drive the need for run-time graph specification

  1. User-provided organization of data and
  2. Python-defined graph-based resources.

An example of the former is when a user wishes to provide an ontology (such as in the web ontology language format). An ontology is effectively a specification of node and arc types. A good example is the Uberon ontology for anatomy. It defines over 20,000 node types and many relationship types with predefined arcs between many of the nodes.

Complications

While run-time graphs are desirable feature, they introduce some complications to the design of graph resources.

Unlike compile-time, type-safe arcs and nodes, it is possible for instances of run-time nodes and arcs to change their types (and thus invalidate the graph). Keeping search/traversal indexes updated as instances of run-time nodes and arcs are modified is also a challenge.

Potential solutions

Force everything at compile time

One solution is to force everything to occur at compile time; provide a utility to read an ontology and create a C++ implementation. This might suffice for some users (when those providing an ontrology have the technical ability to build a redistributable plugin with their new ontology’s node-types). It does not address Python-based resources.

Special “programmable” arc and node classes

Resources might always accept these programmable types by default but allow the node and arc types themselves to decide whether a particular node- or arc-insertion is allowed.

class ExtensibleResource
  : public smtk::resource::DerivedFrom<
    ExtensibleResource, smtk::graph::Resource>
{
public:
  using Traits = ExtensibleTraits;
};

struct ExtensibleTraits
{
  using NodeTypes = std::tuple<ProgrammableNode, …>;
  using ArcTypes = std::tuple<ProgrammableArc, …>;
};

class ProgrammableNode : public smtk::graph::Component
{
  void typeName() override { return m_nodeType; }
};

The ProgrammableArc type would probably be an implicit arc; the current graph-resource design does not provide storage of data (such as arc “type”) on individual explicit arcs. If we use an explicit arc, then the resource API would need to be extended to deal with arc “type” separate from the existing endpoint interface API.

Resource maintains a “runtime whitelist”

In this case, the resource would keep a list of node and arc type-names that are allowed. The arcs would have to also provide a way to check that the from- and to- nodes are acceptable before insertion.

namespace smtk::graph {
class Resource
{
public:
    void addRuntimeNodeType(
      smtk::string::Token nodeTypeName,
      std::function<std::shared_ptr<Component>()> nodeCtor
    );
    void addRuntimeArcType(
      smtk::string::Token arcTypeName,
      GenericEndpointInterfaceConstructor ctor);
    // … and maybe:
    void removeRuntimeNodeType(smtk::string::Token nodeTypeName);
    void removeRuntimeArcType(smtk::string::Token arcTypeName);

    /// Non-templated API for adding nodes
    Component* createNodeOfType(smtk::string::Token nodeType);
};

// The base graph::Component class would need to be
// extended to include methods to access arcs at run time:
class Component
{
public:
  GenericEndpointInterface incomingArcs(smtk::string::Token arcType);
  ConstGenericEndpointInterface incomingArcs(smtk::string::Token arcType) const;

  GenericEndpointInterface outgoingArcs(smtk::string::Token arcType);
  ConstGenericEndpointInterface outgoingArcs(smtk::string::Token arcType) const;
};
} // namespace smtk::graph

Queries used to validate node and arc insertions

We could use the query mechanism that the base resource class exposes to delegate

  1. decisions about whether instances of a given node or arc type are allowed into a graph; and
  2. insertion, removal, and traversal/filtering of run-time nodes and arcs.

This solution would mean that the templated C++ API would be unable to perform these tasks for runtime nodes and arcs.

Another potential solution:

Labels stored in a new arc type

Normally, arc types have FromType, ToType, and Directed type-aliases. We could extend this with a LabelType alias:

struct LabeledArc
{
  using FromType = foo;  // Inherits smtk::graph::Component
  using ToType = bar;    // Inherits smtk::graph::Component
  using Directed = std::true_type;
  using LabelType = baz; // Options discussed below.
};

Label types

The options for LabelType include:

  1. LabelType is a graph node (inherits smtk::graph::Component just like nodes, but would be used as an arc label rather than an endpoint). In this case, each arc would be a 3-tuple (FromType*, LabelType*, ToType*) instead of a 2-tuple of (FromType, ToType).
  2. LabelType is a component, but not necessarily a graph component. For example, graph::Resource could provide a new smtk::graph::ArcLabel subclass of smtk::resource::Component for use by arcs.
  3. LabelType is any lightweight type to be stored by value (i.e., something that can be serialized and is not a component at all). An example would be std::string or smtk::string::Token.

The first two are not exactly mutually exclusive. For example, ArcLabel could serve as a graph node as well as an arc label. This might be useful for ontologies that connect classes to the types of arcs allowed on those classes.

The benefits of (1) and (2) are that arc labels can have properties, can easily be serialized, and can be unambiguously referenced by many arcs. Null pointers would be accepted for the label of any arc so that “unlabeled” arcs could be created.

The benefits of (3) are that arc labels would be easy to construct and simpler to use (the mental model for storing a string label is lower than components as labels). It does mean labels would be very lightweight but not as flexible. In this case, LabelType would be required to have a default constructor so that arcs with an “empty” label could be created.

Arc storage

All of the label type choices above would require a new arc template or changes to the ExplicitArcs template to store the labels, plus changes to the ArcImplementation template to expose additional API for connecting nodes with a label for the arc.

The current ExplicitArcs class has two member variables:

std::unordered_map<const FromType*, std::unordered_set<const ToType*>> m_forward;
std::unordered_map<const ToType*, std::unordered_set<const FromType*>> m_reverse;

Some options for labeled arcs include:

//  A. Map of maps of sets:
unordered_map<FromType*, unordered_map<LabelType*, unordered_set<ToType*>> m_forward;
unordered_map<ToType*, unordered_map<LabelType*, unordered_set<FromType*>> m_reverse;
//     ... plus optionally an index of labels:
unordered_map<LabelType*, unordered_set<std::pair<FromType*, ToType*>>> m_labelIndex;

// B. Map of sets of pairs:
unordered_map<FromType*, unordered_set<pair<LabelType*, ToType*>> m_forward;
unordered_map<ToType*, unordered_set<pair<LabelType*, FromType*>> m_reverse;
unordered_map<LabelType*, unordered_set<pair<FromType*, ToType*>> m_labelIndex;

// C. Map of maps of sets, value-based variant:
unordered_map<FromType*, unordered_map<LabelType, unordered_set<ToType*>> m_forward;
unordered_map<ToType*, unordered_map<LabelType, unordered_set<FromType*>> m_reverse;
unordered_map<LabelType, unordered_set<pair<FromType*, ToType*>> m_labelIndex;

Note that (C) stores LabelType values, not pointers like (A) and (B).

The arc-endpoint API on nodes would then have a connect(n1, n2) method which would create unlabeled arcs as well as a connectWithLabel(n1, label, n2) method.

Graph constraints

By extending labeled-arc typenames to include an additional type alias, we can provide labeled arcs with a validator to programmatically disable connections at run time:

struct LabeledArc
{
  using FromType = foo;  // Inherits smtk::graph::Component
  using ToType = bar;    // Inherits smtk::graph::Component
  using Directed = std::true_type;
  using LabelType = baz;
  // Provide a functor to disallow some arcs:
  static bool validateArc(
    const FromType*, const LabelType*, const ToType*);
};

The validator could inspect the endpoint nodes and label to decide whether a connection is allowed or not. This could be used to force additional constraints on nodes (e.g., by inspecting properties) as well as constraints on arc labels.