Roman Lygin 
0024042: Performance improvements: Foundation Classes
A series of performance improvements, to help speed up surface-surface intersection. The set will be split into a few chunks to ease integration.

Key details:

* TCollection - more inlining
In TCollection_Array1::Init() replaced with [i] - better for compiler's vectorization and consistent with Array2::Init()

* NCollection_BaseCollection
- added Allocator() to return an allocator. STL-consistent and allows to reuse allocator

* Bnd_Box2d
- Removed redundant copy (what becomes a hotspot)
Roman Lygin   
2013-06-25 19:13   
The fix pushed into the repository
2013-06-26 06:52   
One remark: the code or array constructors / destructors which is made inline contain conditional compilation triggered by macro __OPTIM_ARRAY. I believe it is potentially dangerous: even if this macro is not defined in OCCT, what if some application code defines it? I propose removing the code activated by this macro completely.

Besides, if you have some numbers indicating how performance improved after this fix in your application, I encourage you to share these in a note to this issue -- it will help providing more sensible information of this fix in Release Notes.
Roman Lygin   
2013-06-27 22:30   
Thanks Andrey. The code under __OPTIM_ARRAY has been removed (the branch repushed into the repository).

As for performance gains: I have been measuring cumulative effects of all changes done incrementally (in other related trackers - 0024043, 0024044). The CAD Exchanger code that invokes multiple intersection computations (which take about 80% of this code) speeds up by 20%-30% depending on workloads. That corresponds to ~25%-35% of the intersection code.
Of course, the effect differs per workloads, as different code branches get executed but this gives an approximation.

There are opportunities for greater speed up which would involve algorithm redesigns or other extensions (e.g. multiple computations of surface points at once instead of loop over poles). But this is currently beyond the scope of my effort.
2013-06-28 15:47   
Reviewed, please test
2013-07-03 11:42   
Dear BugMaster,

Branch CR24042 (and products from GIT master) was compiled on Linux and Windows platforms and tested.
SHA-1: db24b0959bce4a26d40b88b3e46d07f8495daa8e

Number of compiler warnings:

occt component :
Linux: 2 (2 on master)
Windows: 7 (7 on master)

products component :
Linux: 0 (0 on master)
Windows: 63 (63 on master)

No regressions

No improvements

Testing cases:
Not needed

Testing on Linux:
Total MEMORY difference: 365028944 / 366290416
Total CPU difference: 44231.47000000074 / 43406.39000000102

Testing on Windows:
Total MEMORY difference: 424202976 / 424097456
Total CPU difference: 40968.0625 / 44235.734375

There are not differences in images found by testdiff.
2013-07-12 16:09   
This fix breaks compilation by MSVC++ 9.0 (MSVC 2008) in 64-bit mode, see [^]
Roman Lygin   
2013-07-16 21:50   
(edited on: 2013-07-16 21:51)
Fix for 0024072 partially rolls back modifications made in TCollection to be more conservative.
Note for the future: however, if some workloads demonstrate that inlining Array2 helps performance without inflating the code size then constructor/destructor can be inlined again.

2013-07-19 18:53   
Reading figures above, it looks like performance is worse on Linux with this patch, isn't it?
Roman Lygin   
2013-07-19 20:13   
If you refer to the following excerpts:
Testing on Linux:
Total MEMORY difference: 365028944 / 366290416
Total CPU difference: 44231.47000000074 / 43406.39000000102

Testing on Windows:
Total MEMORY difference: 424202976 / 424097456
Total CPU difference: 40968.0625 / 44235.734375

then they cannot be currently trusted for fine-/medium-grain performance differences as the testing is executed on virtual machines, per OCC guys.
2013-07-19 23:58   
Yes Roman, this is what I had in mind. Thanks for your explanations.