Project and Project Management Design

Bob_Obara · November 1, 2019, 4:55pm

Proposed Plan for Project Support (Draft)

Project Definition

A SMTK project represents an instanced simulation workflow. It simplifies the loading and saving of a workflow by treating all of the necessary resources (and operations) atomically.

In a non-project workflow, the user is responsible for loading all of the resources (including the instantiation of attribute template files) required to implement the workflow. The user is also responsible for explicitly saving those resources and tracking them on disk.

In the case of a project workflow, a project is created via a template or operation. Relevant attribute templates are instantiated automatically when needed. Any resource required by the project but not automatically generated is presented to the user as a requirement to be fulfilled or assigned during project creation. When a project is saved all required resources are saved atomically and in a known location as part of the project. Conversely loading a project also loads in all of the project’s resources (or at least loads them into the resource manager to be loaded in later (this feature of the resource manager is not yet implemented)).

Requirements

Ability to be used in operations as well as be returned as a result.
Ability to create a project for a specific type of workflow either using a template or using a operation.
Ability to define new types of projects at runtime - this will allow new project types to be created using Python operations as well as using a file-based specification.
Ability to refer to a collection of resources and associated a “role” which respect to them.
Ability to contain project specific information not contained in any of the resources it is linked to.
Ability to specify either a white or black list of operations required by the project
Ability to specify the list of SMTK Plugins required by the project (as well as their version)
Support both a conceptual version identifier for its meta data and an user version identify for the information contained within the project
Projects should contain the following:
- Name (non-unique)
- Type (unique string)
- ID (unique)
- Location
Visualization/ParaView State (TBD)
Ability to hold a Workflow (TBD)
The ability to model other simulation assets such as simulation input decks and results. In theory these could be modeled as other resources (or components) so they could be used in operations, associated or referred by attributes. (TBD)
Ability to hold post processing information such as xml representations of ParaView pipelines.
Projects should be relocatable within the user’s filesystem and packable if being moved to a different machine.
A project should describe how its required resources are named/saved to disk.
Ability to embed a change log into the project. When the project gets saved the user should be prompted for a reason for the modification. If specified it should be saved along with the rest of the information (ideally along with the current date/time).

Possible Design - Deriving from Resource

After discussing it with other developers it sounds like Project should be modeled as a new kind of SMTK Resource. This will have the additional benefit of satisfying many of the above requirements with little additional code.

Other Benefits:

Projects could be easily incorporated into the GUI (for example you could show a Project within a Resource Browser as well as its related resources.

Project Contents

Name (string)
Type (string that is set on creation)
ID (uuid that is assigned on creation)
Location of Project Directory (set when reading or doing a save as)
Conceptual Version (assigned when a project is created) - this identifies the version of the meta-data used to create the project and its required resources
Project Version - assigned when the project is saved
Internal Attribute Resource for Project Specifics (#4 & #5) - this would also simplify GUI support for projects since its internal attributes could be easily rendered.

Add Project Item and Project Item Definition to Attribute Resource

This is required to satisfy 1 & 2. Project Item Definitions should include the type in order to restrict which projects can be referenced by a Project Item as well as the corresponding Qt classes.

What is the state of a project?

Currently a project’s state would include the information on the project itself as well as the state of the project’s resources.

Project Management

Similar to the Resource Manager, the Project Manager allows the registration of new project types and provides the following functionality:

Construction of new projects of a specified type
Ability to read in an existing project
Ability to import into a new project
Ability to write a project
Makes sure the required plugins are loading when creating/loading a project

Project Instantiation

List of Operations required (or not required) (tags? categories?)
Creation of autogenerated resources (like templated attribute resources)
Required Input from user -> if using a Project creation operation
Customization of UI

File System Structure

Projects are represented by a directory (or as a zip archive of a directory) and have a .smtk extension
A project directory should have a project.smtk file to indicate that it is a SMTK Project directory. The file contains all project related information including its internal attribute resource as well as File Version indicating the format of the project file.
A project should indicate how it wishes to deal with existing resources or native model files. The options could include:
- Relocate both SMTK and/or native files into the project directory
- Relocate only the SMTK file into the project directory
  Originally I had a move neither option but if a Project sets properties on the resources to indicate their role in the workflow then you would at least always need to write out the SMTK resource.

johnt · November 1, 2019, 7:29pm

I think there is an additional project requirement for tracking simulation data that are not smtk resources, the most obvious being simulation input decks and output datasets. This has been suggested as a “simulation asset” class that is somewhere in the PersistentObject class hierarchy. This was previously discussed in the spring (Proposed Change To PersistentObject), but I dropped that ball.

A simulation asset class would share at least some of the behaviors of persistent objects, in particular (i) ability for projects to use links to assets, and (ii) the ability to use assets in operations, presumably by assigning them to ReferenceItem instances.

johnt · November 5, 2019, 3:59am

My comments refer to the updated Requirements list:

I strongly recommend adding a future//TBD requirement to the list, for the ability to track simulation assets that are not smtk resources, such as input decks and output datasets. Although this feature is being deferred in the first implementation, we should nonetheless consider it a core requirement, and we should be aware that support for non-smtk assets will be essential for developing an effective workflow capability.
I also recommend adding a portability requirement, that is: (i) the ability to copy a project from one host machine to another, and (ii) the ability to move/copy a project from one directory to another (like a “save as” function).
I would also consider adding a TBD requirement to save/restore that UI layout for projects. This could be deferred until we design the workflow capability (where it will be a core requirement).

dcthomp · November 5, 2019, 1:16pm

I would prefer to handle this in a way that does not introduce new file extensions. We already have sbt, sbi, smtk, and used to have crf. Applications should strive to have a single native file format.

Some alternatives:

the directory name itself should have a .smtk extension. ParaView’s file browser now supports choosing directories, not just files.
the directory could contain an index.smtk file in JSON format that deserialized into the project.

tj.corona · November 5, 2019, 1:18pm

+1.

I like either of these ideas.

tj.corona · November 5, 2019, 1:25pm

Given the above requirements for a project, we can satisfy 1, 2, 4, 8, and 9 without writing any code if we were to have Project be a type of Resource. The remaining requirements would simply be implementation.

dcthomp · November 5, 2019, 1:34pm

+1

Haocheng_Liu · November 5, 2019, 1:59pm

Comparing session and project, is it fair to say that a project consists of one or more sessions, and they are independent of each other?

For instance, per our last conservation we mentioned to keep things simple, we can have an active project notion which dictates all other information.

dcthomp · November 5, 2019, 2:09pm

Yes

I am not sure this is the case. Yes in the sense that a project might add or remove a particular model resource (session). But no in the sense that the project owns the model resource, so there is a relationship. Removing a model resource might invalidate or alter a project’s data.

tj.corona · November 5, 2019, 2:11pm

I don’t remember this comment, but it makes me apprehensive. “Active XXXX” implies global access to XXXX, which is hard to undo in the future.

Haocheng_Liu · November 5, 2019, 2:37pm

That being said, shall the project enforce the existence of certain model resource(session) so the users are forbidden from making illegal actions?

Haocheng_Liu · November 5, 2019, 2:38pm

+1

johnt · November 5, 2019, 3:46pm

From the perspective at LANL and SLAC, I think we are better off only supporting one project loaded in modelbuilder at a time, and deferring multiple projects. But I’m not sure that other outside developers feel the same way.

johnt · November 5, 2019, 4:05pm

Kinda-related notes:

To modelbuilder end users, a serialized project should look like a directory (whether it is actually a physical/filesystem directory or libarchive file).
The directory name should be the user-provided project name (or a sluggified version if we need to support project names that are not valid identifiers).
Adding our extension to the directory name is OK.

Bob_Obara · November 5, 2019, 4:59pm

I would probably have a project.smtk file included in the project directory and it would represent the project itself.

jacob.vaverka · November 6, 2019, 2:47pm

Would a change log also be useful? Similar to a git commit message

dcthomp · November 6, 2019, 11:13pm

Sorry I could not be there for the call today. Would the change log be for developer changes to the project template or user changes to the project?

If the former, that sounds like a lot of extra boilerplate to include ‒ but a URL to the changelog would be nice so users could decide on upgrading (or not).
If the latter, is it something you imagine users editing or generated automatically based on edits to the project’s resources (4 new attributes, 3 category changes, …)?

johnt · November 7, 2019, 2:20pm

I think that our working definition for “resource” is that it represents a fundamental data type used in simulation-based analysis. Projects, being essentially meta data, don’t fit that definition. But if piggy-backing on smtk Resource expedites implementation as much as it appears, then let’s go with it. (I’ll probably want to revisit this topic when we get to other simulation assets.)

By “session” do you mean SMTK session? If so, the relationship between projects and sessions is indirect at best. When importing models to create a new project, the current project code uses vtk session by default, but the end user can switch to using mesh session. We should keep project and session as independent as possible.

dcthomp · November 7, 2019, 2:41pm

I have not been thinking of resource::Resource that way… to me, it is just a “quantum of independent persistence” (components are not persisted individually, so they are not independent). The subclasses of resource::Resource are what imbue it with intent. While I suppose it would be possible to put a new subclass between resource::Resource and its current subclasses to indicate that they are direct inputs to a simulation, I am not a fan of using C++ classes for administrative or ontological purposes; classes should be used to organize information for efficient processing… even if that does not fit the mental model presented to users of the software. I have yet to encounter a computer language that is capable of representing the ways people organize information and would probably not want to use it if it existed.

I think Haocheng is referring more to the namespace than to the classes named Session within that namespace. @Haocheng_Liu is that correct?

Bob_Obara · November 7, 2019, 3:01pm

With respects to sessions - I think John’s POV is more correct. A Project is concerned with a set of resources that may require different sessions to be in core so the relationship is more indirect rather than saying that a project directly requires a session. It’s a minor conceptual point.

A Project will require session plugins to be loaded (in order to support the required resources and can either white or black list operations with a session.

With respects to change logs - this is a log of changes made to the project’s data. So every time a project is saved, the user would be prompted to enter in a logged message equivalent to a git commit message. The project would then provide the ability to view (and possibly modify) this log.