View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0022939 | Open CASCADE | OCCT:Modeling Data | public | 2012-01-30 17:45 | 2012-11-16 13:16 |
Reporter | epv | Assigned To | |||
Priority | high | Severity | major | ||
Status | closed | Resolution | fixed | ||
Platform | A | OS | L | ||
Product Version | 6.5.2 | ||||
Target Version | 6.5.4 | Fixed in Version | 6.5.4 | ||
Summary | 0022939: Make B-Spline internal cache thread-safe to be used in multy-threaded mode | ||||
Description | Now B-Spline internal cache is not thread-safe. The parallel BrepMesh algorithm generates exceptions because of this problem. It is necessary to improve the thread-safety of B-Spline cache to avoid data races in future. See issues #0022835 and 0022850 for additional information and test models. | ||||
Tags | No tags attached. | ||||
Test case number | |||||
|
Branch http://svn.nnov.opencascade.com/svn/occt/branches/OCC22939 is ready to be reviewed. Please don't forget to call method SetReentrant(true) while testing. This can be done by the command mpparallel 1. |
|
Approach to bind cash to thread ID does not look feasilbe frome the industrial point of view. How the cash will be released in case of significan number of threads. What would be pleased to see: 1) test case when 10-100 bspline surfaces is created and 10-100 threads work with them. 2) just to put siple mutext to define if this code is real bottleneck. |
|
Now the problem is solved by mutex(http://svn.nnov.opencascade.com/svn/occt/branches/OCC22939). Tests is showed that the usage of mutex doesn't affect performance when operating in single threaded mode (tests measured time spent for loading of the shape and its triangulation). Also important to note that this approach can cause performance degradation for a specific face which shared the same B-spline curves or surfaces (when running in parallel mode). |
|
Can I get a preview of the fix before it is committed ? I reported this issue (http://opencascade.blogspot.com/2010/05/open-cascade-and-multi-threading-again.html), so obviously have some bias in it. Beware of solutions involving mutexes. By definition, this creates a non-scalable code which will result in performance degradation for highly contended situations. To minimize this effect, a spin mutex which involves busy waiting instead of immediate going to sleep, can be used. Consider using tbb::spin_mutex. OCC Standard_Mutex does not provide busy waiting. Another idea to consider is to have user-defined flag disabling use of caching whatsover. Avoiding using cache can be beneficial in other circumstances not just in multi-threaded configurations. For example, workloads used in Shape Healing revealed that there were >50% cache misses, so time spent for cache population was effectively wasted. Likely, this flag should be per instance-defined. To avoid increase of data size this flag can be made bit field with other boolean flags. Extra convenience, to have a global flag that would define default flags for all B-Splines created. |
|
The proposed improvement protects functions rather than cache. Please re-implement this so as to have one instance of mutex per instance of the cache (i.e. mutex should be the field of the relevant class instead of static variable in the function). |
|
Roman, I still consider protection by mutex as being better than no protection at all. It may be activated only if Standard::IsReentrant() is true; however I doubt if this would make sense taking that this will become the default mode soon. This change does not close the issue for sure; possibility to improve further (extract cache to separate class or else) is open. |
|
Now mutex is a field of class. |
2012-05-03 12:59 developer |
test_report.doc (69,120 bytes) |
|
The branch CR22939 is ready to be reviewed. The attached file test_report.doc describes measuring performance with mutexes and without them. |
|
Notes/questions/suggestions for test case: 1. Can you please add a test case source code for Test_Report.doc? I bet results strongly depend upon the algorithm (be it meshing, projection or other algorithm) and workload - in particular, the probability of contention. 2. It would also be helpful if you could try more cores than 2 3.There is basically no speed up. What is the root-cause? This can be for instance, because serial part is overwhelming (Ahmdal's law) or because the algorithm cannot concurrently run on this model. You might want to try Intel Amplifier XE in concurrency mode to analyze concurrency. |
|
Please disregard (3) above - invalid comment. Sorry for confusion. |
|
Review remarks (minor): 1. There seems to be no need to protect methods InvalidateCache(): these methods are called only when b-spline data are modified, when parallel access to the same b-spline is invalid anyway (i.e. regardless of cache). I suggest this protection should be removed. 2. Fields myMutex and *cache could be declared mutable, in order to avoid casting 'this' to non-const pointer in const methods. 3. Please avoid changes of formatting in the places not affected by actual fix, to minimize amount of modified code I suggest the branch will be tested after these remarks are taken into account (even if partially). |
|
1. Re Andrey's (1) above - you should always be careful when concurrently writing to shared data. You can break machine word boundaries and hence result in inconsistent value (neither of two writes). Of course, this won't apply in this case, and writing validcache=0 will produce consistent result. 2. For high contention algorithms Standard_Mutex should be bad as it will go to slip right away. In such cases spinning-like mutexes are better alternative. Consider use tbb::spin_mutex. If you want to have TBB-independent API you might want to have wrappers #if HAVE_TBB which will use system API (like Windows critical section) |
|
Roman, On (1): my point was that these methods change fields of the b-spline, and this is not protected, thus protecting only cache in this case is useless. On (2): I agree that in highly contended scenarios busy-waiting is a choice, however our current goal is to implement protection against very rare but possible contention. I believe that high contention on b-spline cache is quite unlikely to happen in our typical scenarios (parallel processing of faces). When we go to situation where b-spline cache contention is an issue, we will likely need to revise this approach completely. |
|
The changes were implemented according to remarks. The branch CR22939 is ready to be reviewed. |
|
No remarks, please test. Note that I have rebased the fix on current master and cleaned it from irrelevant changes; use branch CR22939_1 (previous branch CR22939 has been removed) |
|
Dear BugMaster, Workbench KAS:dev:apn-22939-occt was created from git branch CR22939_1 (and apn-22939-products from svn trunk) and compiled on Linux platform. There are not regressions in apn-22939-products regarding to KAS:dev:products-20120415-opt: chl 937 A6 - non stable test case There are not improvements in apn-22939-products regarding to KAS:dev:products-20120415-opt See results in /QADisk/occttests/results/KAS/dev/apn-22939-products_04052012/lin See reference results in /QADisk/occttests/results/KAS/dev/products-20120415-opt_13042012/lin See test cases in /QADisk/occttests/tests/ED |
occt: master 83ada95b 2012-05-04 11:51:20
Committer: |
0022939: Make B-Spline internal cache thread-safe to be used in multy-threaded mode Internal cache in classes implementing b-spline curves and surface in Geom and Geom2d packages is protected from possible concurrency by mutex (added as a class field in each instance). |
Affected Issues 0022939 |
|
mod - src/Geom/Geom_BSplineCurve.cdl | Diff File | ||
mod - src/Geom/Geom_BSplineCurve_1.cxx | Diff File | ||
mod - src/Geom/Geom_BSplineSurface.cdl | Diff File | ||
mod - src/Geom/Geom_BSplineSurface_1.cxx | Diff File | ||
mod - src/Geom2d/Geom2d_BSplineCurve.cdl | Diff File | ||
mod - src/Geom2d/Geom2d_BSplineCurve_1.cxx | Diff File |
Date Modified | Username | Field | Change |
---|---|---|---|
2012-01-30 17:45 | epv | New Issue | |
2012-01-30 17:45 | epv | Assigned To | => azn |
2012-01-30 17:53 |
|
Project | Internal => Open CASCADE |
2012-02-02 10:08 |
|
Target Version | 6.5.3 => 6.5.4 |
2012-02-15 14:43 |
|
Note Added: 0019574 | |
2012-02-15 14:44 |
|
Note Edited: 0019574 | |
2012-02-15 14:49 |
|
Note Edited: 0019574 | |
2012-02-15 14:57 |
|
Assigned To | azn => abv |
2012-02-15 15:12 |
|
Assigned To | abv => pdn |
2012-02-15 15:12 |
|
Status | new => resolved |
2012-02-20 14:25 |
|
Note Added: 0019666 | |
2012-02-20 14:25 |
|
Assigned To | pdn => azn |
2012-02-20 14:25 |
|
Status | resolved => assigned |
2012-03-16 17:35 |
|
Note Added: 0020022 | |
2012-03-16 17:35 |
|
Note Edited: 0020022 | |
2012-03-16 17:36 |
|
Status | assigned => resolved |
2012-03-16 17:37 |
|
Assigned To | azn => pdn |
2012-03-17 18:31 | Roman Lygin | Note Added: 0020032 | |
2012-03-18 10:53 |
|
Note Added: 0020034 | |
2012-03-18 10:53 |
|
Assigned To | pdn => azn |
2012-03-18 10:53 |
|
Status | resolved => assigned |
2012-03-18 10:58 |
|
Note Added: 0020035 | |
2012-03-19 14:24 |
|
Note Added: 0020038 | |
2012-03-19 14:25 |
|
Assigned To | azn => abv |
2012-03-19 15:10 |
|
Note Edited: 0020038 | |
2012-05-03 12:59 |
|
File Added: test_report.doc | |
2012-05-03 12:59 |
|
Note Added: 0020465 | |
2012-05-03 13:40 |
|
Status | assigned => resolved |
2012-05-03 14:15 | Roman Lygin | Note Added: 0020466 | |
2012-05-03 14:48 | Roman Lygin | Note Added: 0020467 | |
2012-05-03 15:25 |
|
Note Added: 0020468 | |
2012-05-03 15:25 |
|
Assigned To | abv => azn |
2012-05-03 15:25 |
|
Status | resolved => assigned |
2012-05-03 15:57 | Roman Lygin | Note Added: 0020469 | |
2012-05-03 16:00 |
|
Note Edited: 0020469 | |
2012-05-03 16:00 |
|
Note Edited: 0020467 | |
2012-05-03 16:06 |
|
Note Added: 0020470 | |
2012-05-04 15:23 |
|
Note Added: 0020476 | |
2012-05-04 15:23 |
|
Assigned To | azn => abv |
2012-05-04 15:23 |
|
Status | assigned => resolved |
2012-05-04 15:59 |
|
Note Added: 0020479 | |
2012-05-04 15:59 |
|
Assigned To | abv => azn |
2012-05-04 15:59 |
|
Status | resolved => reviewed |
2012-05-04 16:10 | apn | Assigned To | azn => apn |
2012-05-05 10:06 | apn | Note Added: 0020483 | |
2012-05-05 10:26 | apn | Assigned To | apn => azn |
2012-05-05 10:26 | apn | Status | reviewed => assigned |
2012-05-10 15:43 | apn | Note Edited: 0020483 | |
2012-05-10 15:44 | apn | Status | assigned => resolved |
2012-05-10 15:44 | apn | Status | resolved => reviewed |
2012-05-10 15:45 | apn | Status | reviewed => tested |
2012-05-10 15:45 | apn | Assigned To | azn => bugmaster |
2012-05-14 12:41 |
|
Changeset attached | => occt master 83ada95b |
2012-05-14 12:41 |
|
Assigned To | bugmaster => azn |
2012-05-14 12:41 |
|
Status | tested => verified |
2012-05-14 12:41 |
|
Resolution | open => fixed |
2012-11-16 13:15 | bugmaster | Fixed in Version | => 6.5.4 |
2012-11-16 13:16 | bugmaster | Status | verified => closed |