Windows/Docker build errors with cmb-superbuild:master

Hrm. Recent VTK changes affected SMTK as well it seems :confused: . Iā€™ll look at it Monday.

I hit the target export-set problem as well (on macos) after updating ParaView in my superbuild from a45459775a1421068e69c378d272d67f3a8fb6c6 to e7805d820b7b83315eac2ea613e9199b1e12b2b2 .

Many thanks David, Iā€™ll add that to my branch with fixes & (mostly) workarounds.

After pinning cmb-superbuild to use paraview: a4545977 (23-Dec), there was some progress last week:

  • There is now an option to install the ā€œfullā€ python on Windows (in lieu of our no-ssl one). This is needed so that our export scripts can connect to girder/cumulus servers.
  • And with the full python, we can now install girder-client from pypi (in lieu of requiring individual projects for each prerequisite module)
  • And the ACE3P plugin will now install and get bundled into the modelbuilder package.

Issues remain, of course

  • Perhaps the only critical one is that python operators wonā€™t load ā€” the dreaded ImportPythonOperation failed. I hope I just overlooked something.
  • Weā€™ll also need some instructions for windows users on how to install OpenSSL.
  • I also got some help on how build arguments are parsed in docker/windows. They are based on the shell (link), which makes sense in hindsight. I was interpreting the docs too literally.

Another increment today ā€“ I modified smtk to print out the python exception that occurs when importing python modules (with thanks to TJ):

ImportPythonOperation.cxx:155 No module named 'smtk'

So I tried adding logger messages to PythonInterpreter.cxx to see what paths are getting added. It builds no problem on linux but, of course, no way on windows. The error message is for good old _snprintf. I tried the usual fix plus a few others that google suggested, but absolutely no way. If anyone knows the fix for this, please pass it along. Otherwise, if it is just intractable, Iā€™ll go another way (even more klugy).

[ 18%] Building CXX object smtk/CMakeFiles/smtkCore.dir/common/PythonInterpreter.cxx.obj
PythonInterpreter.cxx
C:/Users/ContainerUser/build/cmb-superbuild/install/include/boost-1_71\boost/system/detail/system_category_win32.hpp(52): error C2039: '_snprintf': is not a member of 'std'
c:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.23.28105\include\array(18): note: see declaration of 'std'
make[2]: *** [smtk/CMakeFiles/smtkCore.dir/build.make:297: smtk/CMakeFiles/smtkCore.dir/common/PythonInterpreter.cxx.obj] Error 2
make[1]: *** [CMakeFiles/Makefile2:2231: smtk/CMakeFiles/smtkCore.dir/all] Error 2
make: *** [Makefile:118: all] Error 2
The command 'powershell -NoLogo -NoProfile -Command cd C:/Users/ContainerUser/build/cmb-superbuild/superbuild/smtk/build; make -j6 install' returned a non-zero code: 1

This is mostly to document things. The problem arises in PythonInterpreter::initialize() when this smtk lib path is obtained from boost:

SMTK: INFO: C:/Users/ContainerUser/build/cmb-superbuild/superbuild/smtk/src/smtk/common/PythonInterpreter.cxx:128 smtkLibDir = \\?\C:\Users\john.tourtellott\Documents\docker-191230\modelbuilder-6.3.0-rc1-Windows-64bit\bin\smtk-3.3.0\smtkAttributePlugin

Note the \\?\ at the front of the path.

To continue capturing where weā€™re at:

  • At TJā€™s suggestion, I ran modelbuilder from a powershell with PYTHONPATH set to the known path (which bypasses the boost::dll::symbol_location logic). In this setup, the python interpreter could find the smtk module.

  • The next error was that the smtk module could not find/import smtk.operation._smtkPybindOperation. Looking in winodws explorer, it turns out that the module file was all lower case, i.e., _smtkpybindoperation.cp37-win_amd64.pyd. Fixing the capitalization (i.e., _smtkPybindOperation.cp37-win_amd64) resolved that. Weā€™ll need to look into why this one moduleā€™s filename is all lowercase.

Anyway, with those two workarounds, the packaged modelbuilder was able to load our test_op.py operator ā€“ a first!. Weā€™re gettin thereā€¦

Oh, well as everyone knows, the \\?\ prefix is used to specify an extended-length path in Windows (hereā€™s a link). Not clear whether things changed in smtk because the boost::dll authors changed their win api call, or maybe this is a windows 10 change? Whatever the case, I think it is safe to check for this prefix and remove if necessary.

As for the all lowercase smtkpybindoperation filename:

  • The file in the build tree is OK but the install tree is all lowercase.
  • This library is one of the few that get rebuilt during my hacking-to-see-wtf experiments. Earlier packages are OK, so Iā€™m going on the assumption that a clean build will be OK. (Weā€™ll seeā€¦)

Another milestone: Ben made this cmb-superbuild MR to package the smtk dllā€™s in the bin directory instead of every plugin directory. That had several positive impacts:

  • The package size was reduced from 386 MB to 340 MB
  • The boost/windows-api call that was returning extended-file syntax (\\?\) went back to returning the standard format.
  • As a result, the latest Windows/modelbuilder can now find the smtk python modules and import our test_op.py script.

So next Iā€™ll be building the full package with plugins (project-manager & ace3p) to see if we can export some data. Thanks TJ & Ben.

News update: With the fixes on Tuesday, I can now build and package modelbuilder with plugins on Windows and, in fact, I was able to run modelbuilder and submit a job to NERSC. A number of new problems appeared, two of which are potentially serious:

  • Selecting the Edit => Settings... menu item causes modelbuilder to close, presumably because it crashed. This doesnā€™t occur for the paraview build, only modelbuilder. Also no problems like this on my linux build of modelbuilder.
  • Loading simulation results with the SLAC Tools plugin wreaks havoc on the display. I guess the good news is that this occurs for Windows builds of both ParaView (screenshot) and ModelBuilder, but again, doesnā€™t occur with my linux build.

Man itā€™s never easyā€¦

1 Like

John, that second bug is a known ParaView bug, and itā€™s a problem with the Intel display driver on Windows. If you can run with an nvidia card instead, it doesnā€™t occur. It may also be possible to downgrade the Intel driver, but Iā€™ve just been running with nvidia instead.

Thanks Aron. I had created a separate ParaView Issue, but I will close that as duplicative.

As for debugging the settings crash, last time I checked, the superbuild doesnā€™t support debug builds on windowsā€¦

Oh, and for completeness, I want to add the other issues with the current windows build

  • When first creating a project, the model resource doesnā€™t get added to rendering pipeline. No such problem on linux, and I have not tried on macOS.
  • The Output Messages panel doesnā€™t display any text printed by python code running in operators. In fact, the sys.stdout object is null (I presume sys.stderr is too). Not a deal-breaker, but this will make debugging export scripts aot more difficult. If this is a feature, Iā€™ll need to come up with some alternate way to get logger data from scripts.
  • Related to that, I have observed inconsistency when writing to Operator::log() from python scripts. It seems as though sometimes the messages make it back to c++ code and sometimes they donā€™t. So maybe we need an alternate I/O scheme for this reason as well.
  • The new smtkACE3PPlugin does not show up in the Plugin Manager, instead, the user has to find it the first time. I am still looking for what is different between that plugin and, for example, smtkProjectManagerPlugin.

This is not a feature, it is a bug. Please do not come up with an alternate way to get logger data, this should instead be fixed.

Again, this should be debugged, not worked around.

@ben.boeckel Any progress on this? I would like to bump cmb-superbuild to catch the fix for conflicting vtkClientServerWrapper filenames but cannot. This in turn prevents me from merging SMTK MR 1903 without reverting some changes.

Nevermind, I see that you have already fixed things! Thanks!

I might be jumping the gun, but maybe my Windows journey is near the end:

The last critical issue on Windows was a modelbuilder crash that occurs when opening the settings dialog. After learning how to use procdump and windbg for the first time, and lots of trial & error (i.e., lots of docker build/package runs), I found that the crash is triggered by some interesting code in the pqSettingsDialog constructor.

  // A hack to move color palette to back of the list of proxies.
  for (auto piter = proxies_to_show.begin(); piter != proxies_to_show.end(); ++piter)
  {
    if (strcmp((*piter)->GetXMLName(), "ColorPalette") == 0)
    {
      auto proxy = *piter;
      proxies_to_show.erase(piter);
      proxies_to_show.push_back(proxy);
    }
  }

Iā€™ll leave it to the Qt/C++ enthusiasts to opine if the QList::end() behavior is well defined when this code erases then pushes back its last item, but I personally would avoid such logic. Nonetheless, the current code works fine for my linux builds as well as my windows paraview build. It only consistently fails ā€” throwing an invalid pointer exception ā€” with the windows modelbuilder builds.

The crash might be due a Qt bug, but that wouldnā€™t answer why it only crashes for modelbuilder and not paraview. (Our windows CI baseline is Qt 5.12.5, btw.) Fortunately for me, the crash can be resolved by adding a break statement after the push_back() call. I submitted ParaView MR 3798 for this change; hope it will get approvedā€¦

Looking for moā€™ help. Since the last post, the settings crash did get resolved and I seem to be able to consistently build modelbuilder packages for windows via docker. (Takes awhile, but thatā€™s okā€¦)

It turns out, however, there is a problem with the viewing that I cannot duplicate on linux (havenā€™t tried macOS). The main symptoms are:

  • Models are displayed in the resource panel without the visibility (eyeball) icons, so users cannot control visibility
  • The resource panel always displays the ā€œhierarchicalā€ style even when the setting is ā€œtwo-levelā€. (And yes, I have restarted the app.)
  • On startup, there are 4 error messages of the form ā€œCould not set settings proxy property forā€ (i) DefaultFaceColor, (ii) SMTKDefaultValueBackground, (iii) SMTKHighlightColor, (iv) SMTKInvalidValueBackground.
  • I can click the cube-shaped icon in the resource and change the color assignment. The color is updated in the resource panel but not the renderview
  • When I hover over items in the resource panel, they do highlight in the render view but not the resource panel. On linux, the model entities are highlighted in both the render view and resource panel.

@dcthomp any ideas what might be going wrong here?

@johnt I have seen (even before my most recent merge) that there are some issues with the highlight-on-hover and selection colors. I have not had time to diagnose, but both @tj.corona and I have seen that the selection color now appears to be yellow and hover color cyan, but donā€™t see any commits that would change the defaults. I would not be surprised if these things are related and will do what I can to investigate.

Regarding the missing visibility icons (eyeballs):

  • At static initialization, pqSMTKResourceBrowser calls setDecorator() on the phrase model. This decorator is a lambda function that calls smtk::view::VisibilityContent::decoratePhrase(), passing it another lambda function that assigns a pqSMTKResourceBrowser::panelPhraseDecorator instance.
  • When executed from a developer build, the setDecorator lambda function gets called, setting things up so that the resource panel ends up decorated with the visibility icons/eyeballs.
  • When executed from a packaged build, the same lambda function is NOT called, so that the resource panel does not include the visibility icons.

Two additional notes:

  • This is NOT a windows issue. I can duplicate this behavior with linux (havenā€™t tried macOS).
  • There are two main differences between the ā€œworksā€ and ā€œdont workā€ cases: (i) the dont-work cases are release builds and the ā€œworkā€ cases are debug builds; (ii) the dont-work cases are currently generated using docker containers.

So I guess I will try a a packaged-debug build next, at least for comparisonā€¦