View Issue Details
|ID||Project||Category||View Status||Date Submitted||Last Update|
|0022719||Community||OCCT:Foundation Classes||public||2011-09-05 10:44||2017-01-22 16:14|
|Status||feedback||Resolution||no change required|
|Summary||0022719: Performance problem of TCollection_AsciiString and OCCT memory manager|
|Description||Post from the Forum - http://www.opencascade.org/org/forum/thread_21696/.|
Initial message concerning memory leak is a false statement. But there are several rational ideas from RLN to be considered:
2. TCollection_AsciiString to use std::string vs current OCC implementations of 1990-es. That could make sense, indeed. The only thing to keep in mind is that it must rather be basic_string that would accept allocator object. That allocator object shall implement C++ allocator using Standard::Allocate(), Standard::Free(). This is important to preserve flexibility of using arbitrary memory allocator.
Performance testing must be applied to confirm this new implementation will be any more efficient.
a. With MMGT_OPT=0 (i.e. system allocator) the test behaves 100-1000x faster than with MMGT_OPT=1 (OCC allocator).
I used Intel Parallel Amplifier to understand the root-cause of this huge slow-down. The root-cause is that as of i=3333 (i.e. ~ 97% of time!) special branch in OCC allocator is used, which is use of mmap() on Linux and CreateFileMapping() on Windows (see Standard_MMgrOpt::AllocMemory()). This imposes huge overhead being called for each call of operator+=().
This could be hint to the OCC team to revisit the work with large blocks.
b. With MMGT_OPT=2 (i.e. TBB allocator) the code behaves ~2x faster than with MMGT_OPT=0. It makes sense to increase n to 1000000 to have meaningful times (mine were ~13sec vs ~25sec). Thus, TBB is a winner in this case ;-).
Windows7, VS2005 SP1.
AFAICT, the reason is efficient work of TBB with large-blocks and their caching, taking benefits of hoarding large blocks and reusing them for numerous iterations before a next larger block is allocated.
4. Custom allocators vs existing system allocators.
I perfectly agree with Denis' vision here in long-term. At some point there must be system allocators that would make attempts to have custom allocators worthless. The only exception would likely be memory regions (allocation of a large block and then deallocation at once). Windows 2008 vs 2003 prove the trend (I intentionally left both charts in the quoted blog post to show the difference). The problem however is that we are far from there yet. OS vendors still do not offer optimal allocators, especially for *threaded* scenarios. Therefore TBB allocator, for instance, is often a winner here. In recent experiments with Salome, TBB gave extra 3x speed-up vs system allocator, especially on large blocks. So there is a lot of demand for that.
|Tags||No tags attached.|
|Test case number|
||This issue does not look relevant anymore.|
Point 2 has been addressed by fix on #11758 -- we are now using plain C functions to do string operations in TCollection_AsciiString, instead of custom tricky code.
Point 4 is realized: now system allocators behave better than OCCT custom allocator (MMGT_OPT=1) and we are using system one by default (MMGT_OPT=0).
Note that system allocator behaves better than TBB on this test case unless really huge number of steps is used:
Nb. steps | System | OCCT | TBB |
100 K | 0 sec | 8 sec | 0.015 sec |
1 M | 2.6 sec | >10 min | 2.2 sec |
However, point 3 is still relevant: if MMGT_OPT is set to 1 (OCCT "optimized" allocator), test case becomes very slow. Setting MMGT_MMAP to 0 makes it a bit worse yet. Thus the problem is still there.
Branch CR22719 has been created by abv.
Detailed log of new commits:
Date: Tue Jan 17 09:33:45 2017 +0300
Test command for 22719: Performance problem of TCollection_ASciiString and OCCT memory manager
|2011-09-05 10:44||szy||New Issue|
|2011-09-05 10:44||szy||Assigned To||=> abv|
|2011-09-21 11:10||bugmaster||Target Version||6.5.2 => 6.5.3|
|2011-09-22 16:50||szy||Status||new => assigned|
|2012-02-09 09:22||abv||Target Version||6.5.3 => 6.5.4|
|2012-02-09 09:24||abv||Target Version||6.5.4 => Unscheduled|
|2013-06-18 17:41||abv||Assigned To||abv => eap|
|2017-01-13 19:30||kgv||Note Added: 0062595|
|2017-01-13 19:30||kgv||Assigned To||eap => abv|
|2017-01-13 19:30||kgv||Status||assigned => feedback|
|2017-01-13 19:30||kgv||Resolution||open => no change required|
|2017-01-13 19:30||kgv||Target Version||Unscheduled => 7.2.0|
|2017-01-17 08:32||abv||Description Updated|
|2017-01-17 09:10||abv||Note Added: 0062671|
|2017-01-17 09:11||abv||Note Edited: 0062671|
|2017-01-17 09:33||git||Note Added: 0062673|
|2017-01-17 11:22||kgv||Summary||Performance problem of TCollection_ASciiString and OCCT memory manager => Performance problem of TCollection_AsciiString and OCCT memory manager|
|2017-01-22 16:14||abv||Target Version||7.2.0 => Unscheduled|