Artifact class

dcthomp · August 2, 2021, 4:15pm

A potentially large change I am considering is to add a class into the resource hierarchy to represent artifacts (i.e., persistent objects that are files but do not contain components – things that have non-native (i.e., non-SMTK) data in them). Use cases we design for include

simulation results (typically produced “offline” or in a long-running asynchronous operation and are read-only as far as SMTK is concerned),
meshes (typically produced by a long-running asynchronous operation and may have some edits made by SMTK operations), and
CAD models (may be produced offline or by interactive operations in SMTK, often edited by SMTK operations).

@johnt calls these “assets,” but that is a word I want to reserve for the information tasks exchange (“artifact” is used more frequently in the context of files than “asset,” which may refer to information more generally).

The current hierarchy is

smtk::resource::PersistentObject
  +-- smtk::resource::Component
  +-- smtk::resource::Resource
    +-- smtk::geometry::Resource (*)
      +-- smtk::attribute::Resource (*)
      ... + others ...

(* = via DerivedFrom)

Alternative 1 (Preferred)

If we insert a class between PersistentObject and Resource, we get a new branch point:

smtk::resource::PersistentObject
  +-- smtk::resource::Artifact
    +-- smtk::resource::Resource
      +-- smtk::geometry::Resource (*)
        +-- smtk::attribute::Resource (*)
        +-- smtk::model::Resource (*)
          +-- smtk::session::polygon::Resource (*)
          ... + others ...
        +-- smtk::mesh::Resource (*)
  +-- smtk::resource::Component

(* = via DerivedFrom)

Since artifacts would not be SMTK-formatted JSON files,

Artifact would not have properties or links but could have queries associated with them (i.e., bounding box, filesystem size/SHA, etc.). They would have names and locations (URLs).
Resource would inherit its name, location, and queries from Artifact but add API for links, properties, and components (find/visit/filter).

We could use Artifact without further inheritance, but in all likelihood we would derive SimulationArtifact, MeshArtifact, ModelArtifact, etc. since we may wish to provide import/export operations and (especially for model resources) have a native SMTK resource own a non-native artifact (e.g., a STEP or IGES file owned by an smtk::session::opencascade::Resource).

We might also introduce the concept of “remote-ness” (i.e., resources held on remote computers) to Artifact or a subclass. I don’t think that an object being remote should require it being an instance of a separate class – so I’m not sure it makes a difference to this change – but it’s worth bringing up.

The ramifications of this inheritance hierarchy are:

Artifacts can be held by the resource::Manager, loaded in/out of memory as needed, indexed with resource::Metadata, observed, locked-for/consumed-by operations, handled by qtReferenceItem, marked clean/dirty, and easily added to projects.
Significant API changes would be required since the resource manager and observers would provide pointers to Artifacts rather than Resources (requiring type casting). Some function names would be deprecated (e.g., Manager::registerResource, Manager::unregisterResource). Some changes we might get around by providing function overloads (i.e., accepting visitor functions that only visit resources, not artifacts).

Alternative 2

Another option is to treat artifacts as components (i.e., subclass Component).

smtk::resource::PersistentObject
  +-- smtk::resource::Component
    +-- smtk::resource::Artifact
  +-- smtk::resource::Resource
    +-- smtk::geometry::Resource (*)
      +-- smtk::attribute::Resource (*)
      +-- smtk::model::Resource (*)
        +-- smtk::session::polygon::Resource (*)
        ... + others ...
      +-- smtk::mesh::Resource (*)

(* = via DerivedFrom)

The ramifications of this are:

non-native artifacts would be owned by native SMTK resources (and/or projects, since it inherits Resource).
Operations could produce artifacts, but there would be no read/write locking (which means if two resources owned artifacts that point to the same file, there would be no blocking to prevent simultaneous modification).
Artifacts would not be visible/queryable in the resource manager same way as other files; a new manager would be needed if this type of query was to be handled. This might cause trouble for the use case of meshes – because we don’t always want all large meshes loaded into memory but do want them managed.

Alternative 3

smtk::resource::PersistentObject
  +-- smtk::resource::Artifact
  +-- smtk::resource::Component
  +-- smtk::resource::Resource
    +-- smtk::geometry::Resource (*)
      +-- smtk::attribute::Resource (*)
      +-- smtk::model::Resource (*)
        +-- smtk::session::polygon::Resource (*)
        ... + others ...
      +-- smtk::mesh::Resource (*)

(* = via DerivedFrom)

We could have Artifact inherit from PersistentObject, but not have Resource inherit Artifact. The ramifications of this are

Artifacts would not be managed by the resource manager.
Read and write locking would not apply to artifacts or would force operations to have lots of redundant code for locking both resources and artifacts.
Passing artifacts to operations might require a new smtk::attribute::ItemDefinition type and/or UI element.
New observers would be required for tasks or UI elements that wish to be notified when artifacts are added-to/removed-from an application.
Projects would have to explicitly consume artifacts, which could be awkward for model and mesh resources (since they should not need to be aware of project but need some way to declare that they own artifacts).

johnt · August 2, 2021, 8:42pm

1. Yes, as expected, I would prefer not to use the term “artifacts” to describe simulation data. I suppose I am stuck on the earlier denotation that artifacts are byproducts or residual data. (Software artifacts used to mean development documents like use-case analyses, flow charts, and test reports.) I have never, for example, told anyone they can download modelbuilder packages from an “artifacts” folder at data.kitware.com. I also think it would be misleading to use the term artifacts to describe data generated by non-CMB software, i.e., simulation output, meshes, CAD files (maybe they are “external artifacts?”). Finally, using the term “artifact” here just so that we can use “asset” somewhere else might not be needed if we instead use namespaces; maybe smtk::resource::Asset and smtk::workflow::Asset?

2. But enough whining (at least for now). Considering the pending use of “asset” to represent data consumed and/or produced by tasks. I propose an equally-generic alternative of “data object”, more specifically, smtk::workflow::DataObject, inspired by the vtkDataObject that has served VTK for quite awhile. This also suggests an alternative for the proposed “artifact” class to smtk::resource::DataObject and/or smtk::resource::PersistentDataObject (the latter might be a renaming of our current PersistentObject?).

3. As for the proposed class hierarchy changes, I probably don’t fully understand. Referring to your Alternative 1:

Does this move links and properties OUT of PersistentObject and INTO the Resource and Component classes?
Does this also mean that Resource and Component don’t have a common class with links or properties methods? That seems awkward, though I am aware that alot of downcasting already takes place in parts of smtk code that use instances of both types.
Where would these new use-cases – simulation results, mesh files, CAD model files – fit in this hierarchy? Would they be subclasses of the smtk::resource::Artifact class? If the Artifact class does not contain links, does that mean they cannot be assigned to ReferenceItems? That feels like a bummer, but I haven’t given it enough thought.

dcthomp · August 3, 2021, 2:04am

While that makes sense when the concepts in the different namespaces are the same, it does not when they are different. But I will find something else to call the information passed from task to task.

Because PersistentObject is neither long enough nor hard enough to type?

Nope. It means that Artifact (or Asset) will not provide a way to store properties (because there’s nowhere to store them and please don’t say “on the project” because that’s not required to exist). Trying to dereference the properties() method will cause a warning or error.

They only share a common API, not a common class now. If you look under the hood, there’s ResourceProperties and ComponentProperties that get accessed by the same API-only Properties class. (It just happens that ComponentProperties uses ResourceProperties under the hood to implement its storage.)

They would either be instances of Artifact or subclasses of it. As above, I think the primary consideration over whether a subclass is needed is whether I/O or UI components need a type to distinguish how to handle the Artifact. At least for STEP/IGES files – where operations modify the artifact contents – having import/export operations attached to the artifact’s type-metadata would be nice. The other use cases mostly treat artifacts as black boxes of data (at least until we start doing mesh editing).
Links are unidirectional. SMTK components and artifacts can link to artifacts, but they cannot link back (because the link data should live in the storage holding the “from” (lhs) side of the relationship and we have no control over the Artifact/Asset file format.

Besides what you’ve asked, we still need to work through how the Surrogate class would work, but I don’t think it’s a showstopper.

johnt · August 3, 2021, 3:24pm

This might be in the minutia, but I’m still curious: The current PersistentObject class has pure virtual methods for links and properties. Are you going to make these regular (not pure) virtual methods in PersistentObject or move them down the class hierarchy?

dcthomp · August 3, 2021, 3:27pm

No, calling properties() or links() will return an implementation for Asset that does not accept any property settings (either emits a warning on attempted write or silently does nothing).

If it becomes important to implement properties or links for assets, we can provide an implementation that does something evil under the hood when it comes to storage.