Resource Save-As Feature

johnt · March 17, 2021, 1:36pm

Resource Save-As

This topic is prompted by a recent, broader discussion about implementing a save-as functionality with the new smtk::project code (which is not yet merged into smtk:master). But before discussing projects, I think we need to first clarify what save-as behavior should be for baseline SMTK resources.

Current Behavior

When the modelbuilder “Save As” menu item in invoked, the current software writes the selected resource to a new location on the local file system. That resource is identical to the original in terms of the UUIDS used to identify the resource and its components. This leaves two copies of the same resource on the disk. If either is modified in modelbuilder or otherwise, the result will be two resources with the same UUIDs but different contents. Just to be clear, that’s not what we want.

Side notes:

The two .smtk files themselves are not always identical because resource contents are not always written in the same order.
If a user then tries to open two .smtk files with the same resource UUID into modelbuilder, my understanding is that the second “open” will be ignored by the application because a resource with the same UUID is already loaded.
We went to some effort to support the copying of a resource by copying the .smtk file and any native/auxiliary files that go with it. Although that is a very useful feature for internal development, I don’t consider it mandatory to support this as a long-term requirement.

Requirements

The basic requirement is to duplicate a resource so that the copy has the same external behavior as the original but is a distinct/different smtk resource.

The implementation is envisioned as an in-memory operation analogous to the document save-as feature in applications such as Google Docs. The copy of the resource is serialized to a new location in the file system and, from the end-user’s perspective, the original resource is removed from memory and replaced by the copy.
In terms of resource links, my opinion is that associations of the copy should track the original. For example, if an attribute resource is associated to a model resource, then a copy of the attribute resource should also be associated to the same model resource and the copy’s component associations should behave the same as the original’s.
In contrast, because associations are not symmetric, a copy of the model resource in the same example is not associated to the original attribute resource.
Whether or not the attribute resource’s associations can be readily “switched” from the original model resource to a copy of the model resource isn’t an immediate requirement, but will be needed when the scope of this discussion expands to include SMTK projects.
In a similar way, if a mesh resource is classified on a model resource, then a copy of the mesh resource should also be classified on the (same) model resource. But a copy of a model resource has no relation to meshes classified on the original model, though there is motivation to reclassify mesh resources onto copies of the model resource.

Technical Approaches

The two approaches discussed most are:

1. Create the copy of the resource with a new UUID, making the original and copy different resources with no direct relation between them. Both can be loaded into modelbuilder and used independently. This would mimic the document save-as paradigm.

2. Add a “version tag” to the smtk::resource::Resource class and update it each time the resource is written to disk. Conceptually, the original and copy would be variants of one resource and the version tag (alternate name suggestions are invited) would provide an initial resource provenance capability.

The version tag could be implemented as a UUID (perhaps one based on date-time?)
The resource could potentially store an ordered list of version tags to reflect the history of changes.
I think that SMTK would support loading multiple variants of the same resource, perhaps with only one being editable and the others available for view/comparison?

Both cases require TBD updates to the SMTK resource manager.

C_Wetterer-Nelson · March 18, 2021, 5:03pm

What would be the utility of keeping an ordered list of version tags? Would it be desirable to also store change history?

ben.boeckel · March 18, 2021, 8:31pm

Say you have a file. It has versions 1, 2, 3, and 4 in its history log. I open another file. It has 1, 2, 3, 5, 6 in its history. I know they have a similar lineage (say, a template file) that they shared at some point. I don’t know that one wants to save the entire history into the files due to that being very complicated and may require certain plugins to be loaded to do/undo some operations.

Not making any comment on the use case of this information being available without what versions 1, 2, 3, or 5 are, but there is some signal in the data.

C_Wetterer-Nelson · March 18, 2021, 8:43pm

Right, I am mostly curious about what capability or functionality inspires such a version log. Seems like Git (LFS or no) gives you that data and actual history. I definitely understand the need to keep similar (or copies of) resources straight, but from an end-user perspective, I feel like a singular version number would be amply sufficient. (Further, it seems like that mechanism already exists in the UUID concept)

johnt · March 18, 2021, 8:45pm

CC @chart3388 @amuhsin in case you want to weigh in

amuhsin · March 18, 2021, 10:06pm

Something like that would be useful for situations where a user wants to modify a simulation that they’ve built previously but still keep them connected to each other. For example I submit Project A and review the results, once I do so I realize that I need to change a parameter and resubmit. However, the important thing is that:

I do not want to modify Project A directly (I want to keep Project A as a snapshot)
I do not want to make a disconnected copy of Project A (I want to be able to tell that Project B evolved from Project A)

Nevertheless, it seems to me we may be looking at 3 distinct pieces of functionality rather than one:

Project History
- This is the ability to view how a single project has evolved over time
- The user is making modification to the same project over time overriding previous states
- This is where something like GIT can come in and help us or deliver completely
Project Branching aka Simulation Studies
- This is the use case that I discussed above; the user doesn’t want to modify the original project but wants to keep note of how it relates to the original project.
- For this we would need to think about things like resource and artifact sharing, etc.
- This would require something like what @johnt laid out in his second technical approach.
- each branched off project has its own unique history with the starting point being the end point of the project it was branched off from at the time it was branched off.
Project Copying/Cloning
- This is the traditional Save As functionality that we see in most applications.
- The copied project lives in its own universe and has no idea that it was copied from another project.
- This would require something like what @johnt laid out in his first technical approach.
- The cloned project also clones the (GIT) history of the project it was cloned from.

Furthermore, a user might want to use a combination of all three functionalities like this:

A user starts Project A
They make many modifications to Project A over time
The user uses the history of Project A to revert some of the changes they made
They submit Project A
They branch Project B off of Project A
They make modifications to Project B
They Submit Project B
They Clone Project B to make Project C

johnt · March 19, 2021, 5:34pm

Thanks for the clarification/organization, Ahmed. I also consider the save-as feature to be separate from project “versioning”.

Bob_Obara · March 24, 2021, 7:33pm

Targeting Corey’s and Ben’s Comments/Questions:

This is why versioning is somewhat a complex topic and deals with data evolution. Especially if resources are shared between multiple workflows or if a single workflow is being done by multiple people concurrently. Versioning would “potentially” be able to notify users when the resources/components that are being referenced by another resource, has been modified and requires attention.

Use Cases:

Loading in Copies of the same Resource

The user creates project P1 that contains a model resource M1
The user then copies P1

P1 and P1-copy would both point to the same M1 (uuid-wise) though M1 and M1-copy would have different URLs. Lets assume P1-Copy is later loaded into memory and M1-Copy is modified. Now M1 and M1-copy would be different as indicated by the fact that their versions are different. But we would know they are related by the fact that M1’s history is a subset of M1-Copy.

The problem I see is, what if both P1 and P1-Copy are in memory at the same time and M1 and M1-copy are currently the same. The Resource Manager only needs to load in M1 or M1-Copy since they have the same uuids. The user then
wants to modify M1-Copy, what should happen?

Ideally, if a Resource is shared between different projects, the moment the resource is about to be changed it would need to be cloned with a new version uuid. This would allow the Resource Manager to now have 2 copies of M1 in memory. Alternatively, when P1 was copied, it could force M1-Copy (and for that matter P1-Copy) to have a new version uuid. If we did that, the above issue will never occur.

Comparison of 2 Variations

Say I have a workflow containing an attribute resource A1.
The user then makes a variation of the workflow producing a new version of A1 (call it A1:A).
Later he then wants to compare the two versions of A1 to see what had been changed.

If SMTK sets the versions of the modified components to the same as that of the resource then all of the changed Components {Cm} would be of version A, making the gross comparison possible. Of course they may not know how the components were modified.

So what I’m thinking is a Resource has a modifiedVersion() method that returns a uuid to be used to represent the modified resource. Initially it could be set to an invalid uuid. When a component or the resource is modified, this method is called to get the version uuid. On the first call, a valid uuid is set and is added to the resource’s version history. The component will then set its version to this uuid.

When a resource is asked to save itself, if the modifiedVersion UUID is still invalid, we know that the resource (and its components) were not modified.

Documenting Versions

There should be the ability to “Name” a Version
The ability to make notes on a Version
The ability to compress the Version History
- Remove all un-named versions (except the latest version if unnamed)
- Remove all history

johnt · March 25, 2021, 12:19am

A couple comments

I am not sure that we need to (nor should) support sharing the same resource between two or more projects.
I would prefer we limit further discussion in this thread to the “resource save-as” topic, and continue discussing versioning in another thread(s).

Bob_Obara · March 25, 2021, 1:26pm

I completely agree on point #2 - I was just answering a question raised on a previous comment.

In terms of point #1 - that all depends on how Save As is implemented. If it is simply a copy function (like cp -R) then you will have to Projects conceptually sharing the same resources. Along those lines, I think we have to face the fact that users will use cp -R (or equivalent) on their projects and on other resources resulting in resource duplication. In the near term as long as the application is one loaded project at a time we can worry about this latter or at least implement some kind of versioning.

johnt · April 12, 2021, 6:43pm

To anyone still interested in this topic, I think I have working code to replicate an attribute resource. The logic relies on our I/O code to copy the original plus new code to modify reference items. The basic steps are

Create a smtk::attribute::Resource instance to be the replica, and save it’s UUID.
Serialize the input resource to a json object and then deserialize the json object into the replica. I think this is safer, and certainly easier, than adding new code to do a deep copy.
Restore the replica’s UUID and also clear its location string.
Traverse all attributes in the replica and, for each, find all ReferenceItem instances (including associations). For each ReferenceItem value:
a. If the value points to an attribute in the input resource, replace it with the corresponding attribute in the replica resource.
b. If the value points to the input resource itself, replace it with the replica resource.
If the input resource is associated to itself, remove it from the replica and associate the replica to itself.

The code needs more testing, but it passes the tests I am most concerned with.

There should probably also be a final step to replace all of the attribute (component) UUIDs in the replica.

amuhsin · April 14, 2021, 6:45pm

Wouldn’t we need to traverse all links instead of just ReferenceItems since we might have links made manually using operations?

johnt · April 14, 2021, 7:12pm

Well I’m not sure. I think the only persistent links that attribute resources can have are resource associations and ResourceItem instances.

amuhsin · April 14, 2021, 7:36pm

ReferenceItems use of the public links api:

myAtt->guardedLinks()->addLinkTo(component, def->role())
https://gitlab.kitware.com/cmb/smtk/-/blob/master/smtk/attribute/ReferenceItem.cxx#L496

I think any operation can use the same api to assign a link with any role that they want.

johnt · April 14, 2021, 8:21pm

I might be getting out my depth here, but I don’t think Attribute::guardedLinks() should be a public method. Shouldn’t it be protected in Attribute and provide access only to smtk::attribute::Resource and smtk::attribute::ReferenceItem? @Bob_Obara and/or @dcthomp do you have any insight on this?

johnt · April 15, 2021, 3:31pm

A couple more comments

Thanks, Ahmed, for pointing out the Attribute::guardedLinks() method. I wasn’t aware of it.
Whether it’s a public or protected method, this API appears to be the most appropriate place to update self-references in the replica resource.
But having said that, I know that ReferenceItem has a private cache for persistent objects, so I suspect we would still need to traverse all ReferenceItem instances and explicitly update their data.
Maybe the plan should be to traverse all attribute links after the ReferenceItem instances have been updated and verify there are no remaining links to the original attribute resource.
As for the public API question, let me try a different tact: @Bob_Obara @dcthomp would you have any objection to an MR making Attribute::guardedLinks() protected?

amuhsin · April 16, 2021, 8:56pm

We use links to track relationships between persistent objects that are not associations. For example, when evaluating expressions; if att A is a variable that is being used in att B then we link the two together using a special role dedicated to expressions.