View Issue Details

IDProjectCategoryView StatusLast Update
0022939Open CASCADEOCCT:Modeling Datapublic2012-11-16 13:16
Reporterepv Assigned Toazn 
PriorityhighSeveritymajor 
Status closedResolutionfixed 
PlatformAOSL 
Product Version6.5.2 
Target Version6.5.4Fixed in Version6.5.4 
Summary0022939: Make B-Spline internal cache thread-safe to be used in multy-threaded mode
DescriptionNow B-Spline internal cache is not thread-safe. The parallel BrepMesh algorithm generates exceptions because of this problem. It is necessary to improve the thread-safety of B-Spline cache to avoid data races in future. See issues #0022835 and 0022850 for additional information and test models.
TagsNo tags attached.
Test case number

Attached Files

  • test_report.doc (69,120 bytes)

Activities

azn

2012-02-15 14:43

developer   ~0019574

Last edited: 2012-02-15 14:49

Branch http://svn.nnov.opencascade.com/svn/occt/branches/OCC22939 is ready to be reviewed.

Please don't forget to call method SetReentrant(true) while testing.
This can be done by the command mpparallel 1.

pdn

2012-02-20 14:25

reporter   ~0019666

Approach to bind cash to thread ID does not look feasilbe frome the industrial point of view. How the cash will be released in case of significan number of threads.

What would be pleased to see:
1) test case when 10-100 bspline surfaces is created and 10-100 threads work with them.
2) just to put siple mutext to define if this code is real bottleneck.

azn

2012-03-16 17:35

developer   ~0020022

Last edited: 2012-03-16 17:35

Now the problem is solved by mutex(http://svn.nnov.opencascade.com/svn/occt/branches/OCC22939).
Tests is showed that the usage of mutex doesn't affect performance when operating in single threaded mode (tests measured time spent for loading of the shape and its triangulation). Also important to note that this approach can cause performance degradation for a specific face which shared the same B-spline curves or surfaces (when running in parallel mode).

Roman Lygin

2012-03-17 18:31

updater   ~0020032

Can I get a preview of the fix before it is committed ? I reported this issue (http://opencascade.blogspot.com/2010/05/open-cascade-and-multi-threading-again.html), so obviously have some bias in it.
Beware of solutions involving mutexes. By definition, this creates a non-scalable code which will result in performance degradation for highly contended situations. To minimize this effect, a spin mutex which involves busy waiting instead of immediate going to sleep, can be used. Consider using tbb::spin_mutex. OCC Standard_Mutex does not provide busy waiting.
Another idea to consider is to have user-defined flag disabling use of caching whatsover. Avoiding using cache can be beneficial in other circumstances not just in multi-threaded configurations. For example, workloads used in Shape Healing revealed that there were >50% cache misses, so time spent for cache population was effectively wasted.
Likely, this flag should be per instance-defined. To avoid increase of data size this flag can be made bit field with other boolean flags.
Extra convenience, to have a global flag that would define default flags for all B-Splines created.

abv

2012-03-18 10:53

manager   ~0020034

The proposed improvement protects functions rather than cache. Please re-implement this so as to have one instance of mutex per instance of the cache (i.e. mutex should be the field of the relevant class instead of static variable in the function).

abv

2012-03-18 10:58

manager   ~0020035

Roman, I still consider protection by mutex as being better than no protection at all. It may be activated only if Standard::IsReentrant() is true; however I doubt if this would make sense taking that this will become the default mode soon.

This change does not close the issue for sure; possibility to improve further (extract cache to separate class or else) is open.

azn

2012-03-19 14:24

developer   ~0020038

Last edited: 2012-03-19 15:10

Now mutex is a field of class.

azn

2012-05-03 12:59

developer  

test_report.doc (69,120 bytes)

azn

2012-05-03 12:59

developer   ~0020465

The branch CR22939 is ready to be reviewed.
The attached file test_report.doc describes measuring performance with mutexes and without them.

Roman Lygin

2012-05-03 14:15

updater   ~0020466

Notes/questions/suggestions for test case:
1. Can you please add a test case source code for Test_Report.doc? I bet results strongly depend upon the algorithm (be it meshing, projection or other algorithm) and workload - in particular, the probability of contention.
2. It would also be helpful if you could try more cores than 2
3.There is basically no speed up. What is the root-cause? This can be for instance, because serial part is overwhelming (Ahmdal's law) or because the algorithm cannot concurrently run on this model. You might want to try Intel Amplifier XE in concurrency mode to analyze concurrency.

Roman Lygin

2012-05-03 14:48

updater   ~0020467

Last edited: 2012-05-03 16:00

Please disregard (3) above - invalid comment. Sorry for confusion.

abv

2012-05-03 15:25

manager   ~0020468

Review remarks (minor):

1. There seems to be no need to protect methods InvalidateCache(): these methods are called only when b-spline data are modified, when parallel access to the same b-spline is invalid anyway (i.e. regardless of cache). I suggest this protection should be removed.

2. Fields myMutex and *cache could be declared mutable, in order to avoid casting 'this' to non-const pointer in const methods.

3. Please avoid changes of formatting in the places not affected by actual fix, to minimize amount of modified code

I suggest the branch will be tested after these remarks are taken into account (even if partially).

Roman Lygin

2012-05-03 15:57

updater   ~0020469

Last edited: 2012-05-03 16:00

1. Re Andrey's (1) above - you should always be careful when concurrently writing to shared data. You can break machine word boundaries and hence result in inconsistent value (neither of two writes). Of course, this won't apply in this case, and writing validcache=0 will produce consistent result.

2. For high contention algorithms Standard_Mutex should be bad as it will go to slip right away. In such cases spinning-like mutexes are better alternative. Consider use tbb::spin_mutex. If you want to have TBB-independent API you might want to have wrappers #if HAVE_TBB which will use system API (like Windows critical section)

abv

2012-05-03 16:06

manager   ~0020470

Roman,

On (1): my point was that these methods change fields of the b-spline, and this is not protected, thus protecting only cache in this case is useless.

On (2): I agree that in highly contended scenarios busy-waiting is a choice, however our current goal is to implement protection against very rare but possible contention. I believe that high contention on b-spline cache is quite unlikely to happen in our typical scenarios (parallel processing of faces). When we go to situation where b-spline cache contention is an issue, we will likely need to revise this approach completely.

azn

2012-05-04 15:23

developer   ~0020476

The changes were implemented according to remarks. The branch CR22939 is ready to be reviewed.

abv

2012-05-04 15:59

manager   ~0020479

No remarks, please test. Note that I have rebased the fix on current master and cleaned it from irrelevant changes; use branch CR22939_1 (previous branch CR22939 has been removed)

apn

2012-05-05 10:06

administrator   ~0020483

Last edited: 2012-05-10 15:43

Dear BugMaster,
       Workbench KAS:dev:apn-22939-occt was created from git branch CR22939_1 (and apn-22939-products from svn trunk) and compiled on Linux platform.
   
       There are not regressions in apn-22939-products regarding to KAS:dev:products-20120415-opt:

       chl 937 A6 - non stable test case
      
       There are not improvements in apn-22939-products regarding to KAS:dev:products-20120415-opt
  
       See results in /QADisk/occttests/results/KAS/dev/apn-22939-products_04052012/lin
       See reference results in /QADisk/occttests/results/KAS/dev/products-20120415-opt_13042012/lin
       See test cases in /QADisk/occttests/tests/ED

Related Changesets

occt: master 83ada95b

2012-05-04 11:51:20

abv


Committer: azn Details Diff
0022939: Make B-Spline internal cache thread-safe to be used in multy-threaded mode

Internal cache in classes implementing b-spline curves and surface in Geom and Geom2d packages is protected from possible concurrency by mutex (added as a class field in each instance).
Affected Issues
0022939
mod - src/Geom/Geom_BSplineCurve.cdl Diff File
mod - src/Geom/Geom_BSplineCurve_1.cxx Diff File
mod - src/Geom/Geom_BSplineSurface.cdl Diff File
mod - src/Geom/Geom_BSplineSurface_1.cxx Diff File
mod - src/Geom2d/Geom2d_BSplineCurve.cdl Diff File
mod - src/Geom2d/Geom2d_BSplineCurve_1.cxx Diff File

Issue History

Date Modified Username Field Change
2012-01-30 17:45 epv New Issue
2012-01-30 17:45 epv Assigned To => azn
2012-01-30 17:53 abv Project Internal => Open CASCADE
2012-02-02 10:08 abv Target Version 6.5.3 => 6.5.4
2012-02-15 14:43 azn Note Added: 0019574
2012-02-15 14:44 azn Note Edited: 0019574
2012-02-15 14:49 azn Note Edited: 0019574
2012-02-15 14:57 azn Assigned To azn => abv
2012-02-15 15:12 abv Assigned To abv => pdn
2012-02-15 15:12 abv Status new => resolved
2012-02-20 14:25 pdn Note Added: 0019666
2012-02-20 14:25 pdn Assigned To pdn => azn
2012-02-20 14:25 pdn Status resolved => assigned
2012-03-16 17:35 azn Note Added: 0020022
2012-03-16 17:35 azn Note Edited: 0020022
2012-03-16 17:36 azn Status assigned => resolved
2012-03-16 17:37 azn Assigned To azn => pdn
2012-03-17 18:31 Roman Lygin Note Added: 0020032
2012-03-18 10:53 abv Note Added: 0020034
2012-03-18 10:53 abv Assigned To pdn => azn
2012-03-18 10:53 abv Status resolved => assigned
2012-03-18 10:58 abv Note Added: 0020035
2012-03-19 14:24 azn Note Added: 0020038
2012-03-19 14:25 azn Assigned To azn => abv
2012-03-19 15:10 azn Note Edited: 0020038
2012-05-03 12:59 azn File Added: test_report.doc
2012-05-03 12:59 azn Note Added: 0020465
2012-05-03 13:40 azn Status assigned => resolved
2012-05-03 14:15 Roman Lygin Note Added: 0020466
2012-05-03 14:48 Roman Lygin Note Added: 0020467
2012-05-03 15:25 abv Note Added: 0020468
2012-05-03 15:25 abv Assigned To abv => azn
2012-05-03 15:25 abv Status resolved => assigned
2012-05-03 15:57 Roman Lygin Note Added: 0020469
2012-05-03 16:00 abv Note Edited: 0020469
2012-05-03 16:00 abv Note Edited: 0020467
2012-05-03 16:06 abv Note Added: 0020470
2012-05-04 15:23 azn Note Added: 0020476
2012-05-04 15:23 azn Assigned To azn => abv
2012-05-04 15:23 azn Status assigned => resolved
2012-05-04 15:59 abv Note Added: 0020479
2012-05-04 15:59 abv Assigned To abv => azn
2012-05-04 15:59 abv Status resolved => reviewed
2012-05-04 16:10 apn Assigned To azn => apn
2012-05-05 10:06 apn Note Added: 0020483
2012-05-05 10:26 apn Assigned To apn => azn
2012-05-05 10:26 apn Status reviewed => assigned
2012-05-10 15:43 apn Note Edited: 0020483
2012-05-10 15:44 apn Status assigned => resolved
2012-05-10 15:44 apn Status resolved => reviewed
2012-05-10 15:45 apn Status reviewed => tested
2012-05-10 15:45 apn Assigned To azn => bugmaster
2012-05-14 12:41 azn Changeset attached => occt master 83ada95b
2012-05-14 12:41 azn Assigned To bugmaster => azn
2012-05-14 12:41 azn Status tested => verified
2012-05-14 12:41 azn Resolution open => fixed
2012-11-16 13:15 bugmaster Fixed in Version => 6.5.4
2012-11-16 13:16 bugmaster Status verified => closed