View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0023541 | Community | OCCT:Coding | public | 2012-11-11 14:21 | 2016-07-14 14:38 |
Reporter | Roman Lygin | Assigned To | Roman Lygin | ||
Priority | normal | Severity | minor | ||
Status | closed | Resolution | fixed | ||
Platform | any | OS | Linux | ||
Product Version | 6.5.3 | ||||
Target Version | 6.6.0 | Fixed in Version | 6.6.0 | ||
Summary | 0023541: On Linux OCC links to release mode TBB leading to unspecified behavior | ||||
Description | Configure produces makefiles to always link to release mode TBB (libtbb.so, libtbbmalloc.so) even for OCC being built in debug mode. This leads to unspecified behavior, e.g. sporadic crashes of the application in debug mode. Debugging reveals that the memory gets corrupted when de-allocating neighbor objects, when TBB allocator tries to update lists of free/allocated objects. Given that release mode TBB comes without symbol files it is impossible to debug TBB source code. However using gdb's 'watch' to inspect the memory cell of interest, reveals it happens only once and the call stack leads to TBB allocator. Release mode functions fine. Also another component (CAD Exchanger SDK in this case) was linked to libtbb_debug.so and libtbbmalloc_debug.so. So overall there is obvious conflict of symbols when mixing in one application. Perhaps the code to fix is in configure.ac around the following lines: CSF_TBB_LIB="-L$tbb_lib -ltbb -ltbbmalloc" | ||||
Steps To Reproduce | Try to configure and build OCC in debug mode and verify the link line of TKernel, TKMesh. | ||||
Tags | No tags attached. | ||||
Test case number | |||||
related to | 0025972 | closed | Threre is no way to link occ with debug tbb library using cmake |
|
Investigating the original issue that led to submitting this bug report revealed the root-cause to be in gcc 4.1.2. More precisely it is in its std::basic_string implementation that uses a static template member _S_empty_rep_storage, which leads to crash with creation in one library and destruction in another library. (Curious minds can find more details on internet). However incorrectly linked tbb allocator hid the issue misinterpreting it to be an issue in inconsistently linked allocator. Relinking debug OCC binaries to debug TBB allocator (by simple temporary renaming of libtbbmalloc_debug to libtbbmalloc) helped to reveal real issue, as TBB allocator asserted on a first attempt to free an invalid pointer. Thus, ensuring correct linking of TBB libraries by OCC is still a must. Meanwhile, the issue is downgraded from major/always to minor/random to reflect the severity. |
|
Sorry for late commenting: I believe using different names of libraries in different configurations is bad practice, exactly due to possibility to load several different instances of the code at runtime, and I see no good reason to follow it. On Windows, it is permanent trouble with MS VS CRT that follows this way, so why should we replicate the same troublesome approach on Linux? |
|
Andrey, Different names for release vs debug libraries is a separate issue here. Let's not bring it here to avoid further complication (for the record, I have the opinion for distinct names ;-)). What needs to be done here (in 22315) is to make sure OCC links to a proper version of the 3rd party prerequisite (TBB in this case), which already provides certain naming convention, regardless if anyone likes this convention or not. |
|
One more example of the consequences of this issue - a sporadic deadlock in the code which uses tbb::task_group. The root-cause is that due to simultaneous use of both libtbb_debug.so (used by debug version CAD Exchanger) and libtbb.so (enforced by OCC), different instances of static basic_tls<generic_scheduler*> theTLS; get created. This leads to two distinct tbb masters created in the same thread and further leading to a deadlock. (More details are available upon request). The issue investigation took me 20(!) person-hours of work (which I obviously prefer to have been spent elsewhere). If this issue leaked into the field and happened at customer site, the cost would be multiples of this, if it could ever be triaged at all. So please DO consider this issue really severe and get it fixed! The source file to be modified is apparently outside of the git source tree, or please tell me otherwise, so I could offer you a fix. Thank you, Roman P.S. The same applies to any 3rd party library which is linked to by OCC (FreeType, etc) and which can potentially be used by any other library/application which can co-exist with OCC in one process. OCC should offer some user-defined option which library to link with during its build process. |
|
Roman, if you wish to propose a fix to correct project files generation, please check WOK sources at http://git.dev.opencascade.org/gitweb/ ; clone URL is gitolite@git.dev.opencascade.org:occt-wok.git Use scripts collect_binary.* in the root folder to rebuild WOK with your changes in the scripts (I suppose you can keep the binaries the same as you use). |
|
Thanks, Andrey. I did not realize WOK sources are available via another git repository (by the way, the correct url is gitolite@git.dev.opencascade.org:occt-wok, i.e. no .git suffix). The fix has been pushed into the git repository. Reassigned back to bugmaster. |
|
Fix has been tested and integrated into master of occt-wok repository |
Date Modified | Username | Field | Change |
---|---|---|---|
2012-11-11 14:21 | Roman Lygin | New Issue | |
2012-11-11 14:21 | Roman Lygin | Assigned To | => bugmaster |
2012-11-12 17:21 | bugmaster | Target Version | 6.5.4 => 6.6.0 |
2012-11-20 16:10 | Roman Lygin | Note Added: 0022312 | |
2012-11-20 16:10 | Roman Lygin | Severity | major => minor |
2012-11-20 16:10 | Roman Lygin | Reproducibility | always => random |
2012-11-20 18:17 |
|
Note Added: 0022315 | |
2012-11-20 18:51 | Roman Lygin | Note Added: 0022316 | |
2013-02-26 15:56 |
|
Target Version | 6.6.0 => 6.7.0 |
2013-03-16 11:29 | Roman Lygin | Note Added: 0023763 | |
2013-03-16 12:22 |
|
Note Added: 0023764 | |
2013-03-16 12:22 |
|
Assigned To | bugmaster => Roman Lygin |
2013-03-16 12:22 |
|
Status | new => assigned |
2013-03-18 21:14 | Roman Lygin | Note Added: 0023786 | |
2013-03-18 21:14 | Roman Lygin | Assigned To | Roman Lygin => bugmaster |
2013-03-18 21:14 | Roman Lygin | Status | assigned => resolved |
2013-03-18 23:12 |
|
Target Version | 6.7.0 => 6.6.0 |
2013-03-22 12:55 | bugmaster | Status | resolved => reviewed |
2013-03-22 12:57 | bugmaster | Note Added: 0023856 | |
2013-03-22 12:57 | bugmaster | Status | reviewed => verified |
2013-03-22 12:57 | bugmaster | Resolution | open => fixed |
2013-03-22 12:57 | bugmaster | Assigned To | bugmaster => Roman Lygin |
2013-04-23 13:35 |
|
Status | verified => closed |
2013-04-29 15:24 |
|
Fixed in Version | => 6.6.0 |
2014-01-11 11:58 |
|
Category | OCCT Release:BUILD => OCCT:Coding |
2016-07-14 14:38 | Roman Lygin | Relationship added | related to 0025972 |