MantisBT - Community
View Issue Details
0024042Community[OCCT] OCCT:Foundation Classespublic2013-06-25 19:082013-12-19 13:56
Roman Lygin 
normalintegration request 
[OCCT] 6.6.0 
[OCCT] 6.7.0[OCCT] 6.7.0 
Not needed
0024042: Performance improvements: Foundation Classes
A series of performance improvements, to help speed up surface-surface intersection. The set will be split into a few chunks to ease integration.

Key details:

* TCollection - more inlining
In TCollection_Array1::Init() replaced with [i] - better for compiler's vectorization and consistent with Array2::Init()

* NCollection_BaseCollection
- added Allocator() to return an allocator. STL-consistent and allows to reuse allocator

* Bnd_Box2d
- Removed redundant copy (what becomes a hotspot)
No tags attached.
related to 0024072closed Roman Lygin Community VC9 64-bit compiler crashes while compiling IntPoly_ShapeSection.cxx 
related to 0024043closed Roman Lygin Community Performance improvements: Modeling Algorithms 
related to 0024044closed bugmaster Community Performance improvements: Foundation Classes (math) 
related to 0024071closed abv Open CASCADE VC 2009 64-bit compiler crashes while compiling IntPoly_ShapeSection.cxx 
Issue History
2013-06-25 19:08Roman LyginNew Issue
2013-06-25 19:08Roman LyginAssigned To => abv
2013-06-25 19:13Roman LyginNote Added: 0024864
2013-06-25 19:13Roman LyginStatusnew => resolved
2013-06-25 21:26Roman LyginRelationship addedrelated to 0024043
2013-06-25 22:00Roman LyginRelationship addedrelated to 0024044
2013-06-26 06:52abvNote Added: 0024868
2013-06-26 06:52abvAssigned Toabv => Roman Lygin
2013-06-26 06:52abvStatusresolved => assigned
2013-06-27 22:30Roman LyginNote Added: 0024906
2013-06-27 22:31Roman LyginAssigned ToRoman Lygin => ifv
2013-06-27 22:31Roman LyginStatusassigned => resolved
2013-06-27 22:32Roman LyginAssigned Toifv => abv
2013-06-28 15:47abvNote Added: 0024915
2013-06-28 15:47abvAssigned Toabv => bugmaster
2013-06-28 15:47abvStatusresolved => reviewed
2013-07-01 15:42mkvAssigned Tobugmaster => mkv
2013-07-03 11:42mkvNote Added: 0024951
2013-07-03 11:42mkvTest case number => Not needed
2013-07-03 11:42mkvAssigned Tomkv => bugmaster
2013-07-03 11:42mkvStatusreviewed => tested
2013-07-03 11:44apnProduct Version => 6.6.0
2013-07-03 11:44apnTarget Version => 6.7.0
2013-07-05 11:57Roman LyginChangeset attached => occt master 1145e2bc
2013-07-05 11:57Roman LyginAssigned Tobugmaster => Roman Lygin
2013-07-05 11:57Roman LyginStatustested => verified
2013-07-05 11:57Roman LyginResolutionopen => fixed
2013-07-12 16:09abvNote Added: 0025053
2013-07-12 16:09abvAssigned ToRoman Lygin => apn
2013-07-12 16:22apnRelationship addedrelated to 0024071
2013-07-12 16:36apnRelationship addedrelated to 0024072
2013-07-16 21:50Roman LyginNote Added: 0025089
2013-07-16 21:51Roman LyginNote Edited: 0025089bug_revision_view_page.php?bugnote_id=25089#r5646
2013-07-19 18:53barbierNote Added: 0025125
2013-07-19 20:13Roman LyginNote Added: 0025126
2013-07-19 23:58barbierNote Added: 0025127
2013-12-19 13:52bugmasterStatusverified => closed
2013-12-19 13:56bugmasterFixed in Version => 6.7.0

Roman Lygin   
2013-06-25 19:13   
The fix pushed into the repository
2013-06-26 06:52   
One remark: the code or array constructors / destructors which is made inline contain conditional compilation triggered by macro __OPTIM_ARRAY. I believe it is potentially dangerous: even if this macro is not defined in OCCT, what if some application code defines it? I propose removing the code activated by this macro completely.

Besides, if you have some numbers indicating how performance improved after this fix in your application, I encourage you to share these in a note to this issue -- it will help providing more sensible information of this fix in Release Notes.
Roman Lygin   
2013-06-27 22:30   
Thanks Andrey. The code under __OPTIM_ARRAY has been removed (the branch repushed into the repository).

As for performance gains: I have been measuring cumulative effects of all changes done incrementally (in other related trackers - 0024043, 0024044). The CAD Exchanger code that invokes multiple intersection computations (which take about 80% of this code) speeds up by 20%-30% depending on workloads. That corresponds to ~25%-35% of the intersection code.
Of course, the effect differs per workloads, as different code branches get executed but this gives an approximation.

There are opportunities for greater speed up which would involve algorithm redesigns or other extensions (e.g. multiple computations of surface points at once instead of loop over poles). But this is currently beyond the scope of my effort.
2013-06-28 15:47   
Reviewed, please test
2013-07-03 11:42   
Dear BugMaster,

Branch CR24042 (and products from GIT master) was compiled on Linux and Windows platforms and tested.
SHA-1: db24b0959bce4a26d40b88b3e46d07f8495daa8e

Number of compiler warnings:

occt component :
Linux: 2 (2 on master)
Windows: 7 (7 on master)

products component :
Linux: 0 (0 on master)
Windows: 63 (63 on master)

No regressions

No improvements

Testing cases:
Not needed

Testing on Linux:
Total MEMORY difference: 365028944 / 366290416
Total CPU difference: 44231.47000000074 / 43406.39000000102

Testing on Windows:
Total MEMORY difference: 424202976 / 424097456
Total CPU difference: 40968.0625 / 44235.734375

There are not differences in images found by testdiff.
2013-07-12 16:09   
This fix breaks compilation by MSVC++ 9.0 (MSVC 2008) in 64-bit mode, see [^]
Roman Lygin   
2013-07-16 21:50   
(edited on: 2013-07-16 21:51)
Fix for 0024072 partially rolls back modifications made in TCollection to be more conservative.
Note for the future: however, if some workloads demonstrate that inlining Array2 helps performance without inflating the code size then constructor/destructor can be inlined again.

2013-07-19 18:53   
Reading figures above, it looks like performance is worse on Linux with this patch, isn't it?
Roman Lygin   
2013-07-19 20:13   
If you refer to the following excerpts:
Testing on Linux:
Total MEMORY difference: 365028944 / 366290416
Total CPU difference: 44231.47000000074 / 43406.39000000102

Testing on Windows:
Total MEMORY difference: 424202976 / 424097456
Total CPU difference: 40968.0625 / 44235.734375

then they cannot be currently trusted for fine-/medium-grain performance differences as the testing is executed on virtual machines, per OCC guys.
2013-07-19 23:58   
Yes Roman, this is what I had in mind. Thanks for your explanations.