View Issue Details

IDProjectCategoryView StatusLast Update
0022850Open CASCADEOCCT:Meshpublic2012-11-16 13:18
ReporterpdnAssigned Todbv 
PriorityurgentSeveritymajor 
Status closedResolutionfixed 
Product Version6.5.2 
Target Version6.5.4Fixed in Version6.5.4 
Summary0022850: Not stable fix 22735
DescriptionCase chl/934/C8 is not stable to be fixed in parallel mode.

Data race has been detected in the BRepMesh_FastDiscretFace::AddInShape() method. The exact reason consists in modification of shared data by the code of BRep_Builder::UpdateEdge() method called inside AddInShape().

The proposed technical solution is to create a special map containing pairs <TopoDS_Edge(TopoDS_Shape), Standard_Mutex>. This map must be initialized before parallel processing of the face triangulation (for example in BRepMesh_IncrementalMesh class. The BRepMesh_FastDiscretFace class will use the reference to this map in such case). Next it is necessary to disable concurrent access to the corresponding edge from different threads by corresponding mutex from the map before calling an UpdateEdge() method.
TagsNo tags attached.
Test case numberchl 934 C8

Relationships

parent of 0023260 closeddbv Regression: Instability in parallel incmesh on Linux. 

Activities

dbv

2012-06-07 15:16

developer   ~0020658

Fix has been pushed to the Git branch CR22850

Dear Andrey,
Please review

abv

2012-06-07 15:41

manager   ~0020660

Some remarks:

1. Instead of defining a method AssignMapOfMutexes() and replicating twice the same block of code for filling the map, I suggest you add method that constructs that map given a shape, and use it in two places.

2. As soon as method ClearMapOfMutexes() is defined, I suggest you use it in BRepMesh_FastDiscret.cxx instead of direct Clear(), for consistency

3. In BRepMesh_FastDiscretFace::AddInShape(), please use Standard_Mutex::SentryNested to ensure that mutex gets unlocked even if exception is raised in the protected code. In order to correctly sequential execution (when mutex map is empty), use dummy instance of mutex and create sentry with doLock = false in that case.

4. Please double-check that there are no other places in the code where edges or vertices can be modified in parallel threads

dbv

2012-06-13 14:14

developer   ~0020687

Added method BRepMesh_FastDiscret::CreateMutexes() which creates mutexes for sub-shapes of a given shape.
Added method BRepMesh_FastDiscret::RemoveAllMutexes() - removes all created mutexes.

Methods BRepMesh_FastDiscretFace::Add()and BRepMesh_FastDiscretFace::AddInShape() now accept additional argument. The argument is a reference to BRepMesh_DataMapOfShapeMutex. This map is used to prevent data race in parallel mode.

Fix has been pushed to the Git branch CR22850

Dear Andrey,
Please review

abv

2012-06-14 16:55

manager   ~0020694

No remarks, please test

mkv

2012-06-18 18:06

tester   ~0020719

Dear BugMaster,
The workbenches
KAS:dev:mkv-22850-occt (GIT branch CR22850_1)
KAS:dev:mkv-22850-products (GIT master)
were compiled on Linux platform and tested.

Regression:
Not detected

Improvements:
Not detected

Testing case:
chl 934 C8 testing case was created for issue.
It is OK in this branch.
It is OK in current master (KAS:dev:products-20120608-opt).

See results in /QADisk/occttests/results/KAS/dev/mkv-22850-products-1_18062012/lin
See reference results in /QADisk/occttests/results/KAS/dev/products-20120608-opt_08062012/lin
See test cases in /QADisk/occttests/tests/ED
N.B. In order to launch testing case you can make use the following instructions
http://doc/doku.php?id=occt:certification

san

2012-06-22 18:59

developer   ~0020773

Last edited: 2012-06-22 19:20

The following very simple test in DRAW still crashes on branch CR22840_1 under Windows 7 64-bit:

pload ALL
box b 1 1 1
incmesh b 0.1 1

The crash is caused by the following code fragment in BRepMesh_IncrementalMesh::Update() method:

  if (myInParallel)
  {
    myMesh->CreateMutexes(S, TopAbs_EDGE);
  #ifdef HAVE_TBB
    // mesh faces in parallel threads using TBB
    tbb::parallel_for_each (aFaces.begin(), aFaces.end(), *myMesh.operator->());
  #else
    // alternative parallelization not yet available
    for (std::vector<TopoDS_Face>::iterator it(aFaces.begin()); it != aFaces.end(); it++)
      myMesh->Process (*it);
  #endif
    myMesh->RemoveAllMutexes(); // This line crashes at DeleteCriticalSection() API call
  }

Access violation in DeleteCriticalSection() typically means double deletion of the critical section object.
Setting a breakpoint in Standard_Mutex::~Standard_Mutex () discovers the reason.


1. Bitwise-copying a temporary Standard_Mutex to the map node and then destroying it results in the first DeleteCriticalSection() call. But critical section data is now in the map node, and the code thinks it is valid:

void BRepMesh_FastDiscret::CreateMutexes(const TopoDS_Shape& theShape,
                                         const TopAbs_ShapeEnum theType)
{
  for(TopExp_Explorer ex(theShape, theType); ex.More(); ex.Next())
  {
    myMapOfMutexes.Bind(ex.Current(), Standard_Mutex()) ; // This temporary is destroyed immediately after copying the data into the map
  }
}

2. When triangulation completes, myMesh->RemoveAllMutexes() kills the app by calling DeleteCriticalSection().

Conclusions:
a. Make copy constructor for Standard_Mutex private to avoid unintentional bitwise copying of WinAPI data that is invalidated immediately after that.
b. Try to avoid calling ~Standard_Mutex() several times by managing the mutex objects by pointers.

Other ideas are welcome!

san

2012-06-22 19:43

developer   ~0020777

Last edited: 2012-06-22 19:46

I have tried a different approach: moving critical section initialization and de-initialization out of Standard_Mutex constructor/destructor:

//=============================================
// Standard_Mutex::Standard_Mutex
//=============================================

Standard_Mutex::Standard_Mutex ()
{
}

//=============================================
// Standard_Mutex::~Standard_Mutex
//=============================================

Standard_Mutex::~Standard_Mutex ()
{
}

void Standard_Mutex::Attach()
{
#ifdef WNT
  InitializeCriticalSection( &myMutex );
#else
  pthread_mutex_init( &myMutex, 0 );
#endif
}

void Standard_Mutex::Detach()
{
#ifdef WNT
  DeleteCriticalSection( &myMutex );
#else
  pthread_mutex_destroy( &myMutex );
#endif
}


BRepMesh_FastDiscrete class was changed accordingly:

//=======================================================================
//function : CreateMutexes
//purpose :
//=======================================================================
void BRepMesh_FastDiscret::CreateMutexes(const TopoDS_Shape& theShape,
                                         const TopAbs_ShapeEnum theType)
{
  for(TopExp_Explorer ex(theShape, theType); ex.More(); ex.Next())
  {
    myMapOfMutexes.Bind(ex.Current(), Standard_Mutex()) ;
    myMapOfMutexes.ChangeFind(ex.Current()).Attach();
  }
}

//=======================================================================
//function : RemoveAllMutexes
//purpose :
//=======================================================================
void BRepMesh_FastDiscret::RemoveAllMutexes()
{
  BRepMesh_DataMapIteratorOfDataMapOfShapeMutex anIt(myMapOfMutexes);
  for (; anIt.More(); anIt.Next())
  {
    const_cast<Standard_Mutex&>(anIt.Value()).Detach();
  }
  myMapOfMutexes.Clear();
}

After this simple correction incmesh DRAW command seems to work.
Of course, this is just a prototype, an it is necessary to think how to implement a robust solution instead.

msv

2012-06-25 09:46

developer   ~0020780

I would prefer the first solution: forbid copy constructor and assign operator, putting pointers to mutexes into map nodes.

dbv

2012-06-25 18:36

developer   ~0020791

Redefined copy constructor and assign operator. Now they creates and initializes new instance of critical section instead of copying.

Fix pushed to the branch CR22850_2

Dear Andrey,
Please review.

abv

2012-06-25 20:07

manager   ~0020793

Dmitry: for consistency, assignment operator should destroy myMutex (previously initialized in constructor) instead of overriding it by new temporary object.

After revising this version, I agree with Mikhail: it is better to forbid copying of mutexes as it is too uncertain and dangerous. Then you will have to store pointers in the map. I suggest that you create vector of mutexes to avoid allocating each one in the heap individually, and automate destruction.

(Further we can think of separating this map and vector to dedicated tool that could be reused in other parts of OCCT.)

dbv

2012-06-28 17:16

developer   ~0020816

Class TopTools_MutexForShapeProvider has been created
Class contain methods:
TopTools_MutexForShapeProvider::CreateMutexesForSubShapes - Creates and associates mutexes with each sub-shape of type theType in theShape.
TopTools_MutexForShapeProvider::CreateMutexForShape - Creates and associates mutex with theShape
TopTools_MutexForShapeProvider::GetMutex - Returns pointer to mutex associated with theShape. In case when mutex not found returns NULL.

This class used in BRepMesh to avoid data race.

Changes have been pushed to the Git branch CR22850_3

Dear Andrey,
Please review.

dbv

2012-06-28 18:44

developer   ~0020824

Additional changes has been pushed

Dear Andrey,
Please review.

dbv

2012-06-28 19:16

developer   ~0020825

Added method RemoveAllMutexes to TopTools_MutexForShapeProvider

Dear Andrey,
Please review.

abv

2012-06-28 19:35

manager   ~0020826

Please make copy operator = private in Provider class, and also make copy constructor and operator = private in Mutex (recall previous version of the fix).

In BRepMesh_FastDiscretFace::AddInShape(), I suggest to simplify the code where Sentry is constructed to have single call to GetMutex, as follows:

      // lock mutex to prevent parallel change of the same data
      Standard_Mutex* aMutex = theMutexProvider.GetMutex(It.Key());
      Standard_Mutex::SentryNested (aMutex == NULL ? DummyMutex : *aMutex,
                                    aMutex != NULL);

It could be also useful to provide constructor in SentryNested from pointer, so that it is armed only if pointer is nun-null. This then will be single line:

      Standard_Mutex::SentryNested (theMutexProvider.GetMutex(It.Key()));

dbv

2012-06-29 11:13

developer   ~0020830

Fix for remarks have been pushed to Git

Dear Andrey,
Please review.

abv

2012-06-29 11:36

manager   ~0020831

The fix looks Ok, please test.

Some remarks:
(a) in BRepMesh_FastDiscret.cxx, comments for two new methods (at the end of the file) contain wrong names (from previous version).
(b) in the comment before class header in TopTools_MutexForShapeProvider.hxx, lines ======== can probably break Doxygen generation (association of comment with class); it is better to avoid this kind of unnecessary lines

dbv

2012-06-29 12:01

developer   ~0020832

Cosmetic changes according to the remark have been pushed

mkv

2012-06-29 15:53

tester   ~0020834

Dear BugMaster,
The workbenches
KAS:dev:mkv-22850-occt-3 (GIT branch CR22850_3)
KAS:dev:mkv-22850-products-3 (GIT master)
were compiled on Linux platform.
There is cyclic dependency:

=====> DRAWEXE:exec.libs
 
Info : ========> DRAWEXE
Error : Cyclic dependency detected between: TopExp TopTools
Error : Failed during execution
Error : ========> DRAWEXE failed
Error : Failed during execution
Info : Step exec.libs
Error : Step exec.libs failed
Error : Step exec.tks not done : almost DRAWEXE:exec.libs failed
Error : Step exec.link not done : almost DRAWEXE:exec.tks failed
Info : ------------------ Process report ------------------
Info : Failed DRAWEXE (exec.libs exec.tks exec.link )
Info : ----------------------------------------------------

dbv

2012-07-03 11:41

developer   ~0020845

Replaced TopExp_Explorer with TopoDS_Iterator to avoid cyclic dependence

Git branch CR22850_3 is ready to be reviewed

abv

2012-07-03 12:52

manager   ~0020846

Dmitry, I believe if iterator is used then function CreateMutexesForSubshape should be recursive; otherwise please explain how it is intended to work for theShape being face and type = edge.

dbv

2012-07-03 14:06

developer   ~0020847

Sorry, it was a stupid mistake.

Changes have been pushed to the git.

Dear Andrey,
Please review.

abv

2012-07-03 15:36

manager   ~0020848

Reviewed, please test

mkv

2012-07-04 18:26

tester   ~0020858

Dear BugMaster,
The workbenches
KAS:dev:mkv-22850-occt-3 (GIT branch CR22850_3)
KAS:dev:mkv-22850-products-3 (GIT master)
were compiled on Linux platform and tested.

Regression:
Not detected

Improvements:
Not detected

Testing case:
chl 934 C8 testing case was created for issue.
It is OK in this branch.

See results in /QADisk/occttests/results/KAS/dev/mkv-22850-products-3_03072012/lin
See reference results in /QADisk/occttests/results/KAS/dev/products-20120622-opt_22062012/lin
See test cases in /QADisk/occttests/tests/ED
N.B. In order to launch testing case you can make use the following instructions
http://doc/doku.php?id=occt:certification

Related Changesets

occt: master d00cba63

2012-07-06 12:08:21

dbv

Details Diff
0022850: Not stable fix 22735

Class TopTools_MutexForShapeProvider has been created
Class contain methods:
TopTools_MutexForShapeProvider::CreateMutexesForSubShapes - Creates and associates mutexes with each sub-shape of type theType in theShape.
TopTools_MutexForShapeProvider::CreateMutexForShape - Creates and associates mutex with theShape
TopTools_MutexForShapeProvider::GetMutex - Returns pointer to mutex associated with theShape. In case when mutex not found returns NULL.

Added method RemoveAllMutexes to TopTools_MutexForShapeProvider
Assign operator in MutexProvider, constructor and operator and assign operator in Standard_Mutex now private
Replaced TopExp_Explorer with TopoDS_Iterator to avoid cyclic dependence
Affected Issues
0022850
mod - src/BRepMesh/BRepMesh_FastDiscret.cdl Diff File
mod - src/BRepMesh/BRepMesh_FastDiscret.cxx Diff File
mod - src/BRepMesh/BRepMesh_FastDiscretFace.cdl Diff File
mod - src/BRepMesh/BRepMesh_FastDiscretFace.cxx Diff File
mod - src/BRepMesh/BRepMesh_IncrementalMesh.cxx Diff File
mod - src/Standard/Standard_Mutex.hxx Diff File
add - src/TopTools/FILES Diff File
mod - src/TopTools/TopTools.cdl Diff File
add - src/TopTools/TopTools_MutexForShapeProvider.cxx Diff File
add - src/TopTools/TopTools_MutexForShapeProvider.hxx Diff File

Issue History

Date Modified Username Field Change
2011-12-16 10:09 pdn New Issue
2011-12-16 10:09 pdn Assigned To => bugmaster
2011-12-16 10:12 pdn Assigned To bugmaster => epv
2011-12-16 10:12 pdn Status new => assigned
2012-03-22 06:40 abv Target Version 6.5.3 => 6.5.4
2012-04-28 12:20 epv Assigned To epv => azn
2012-05-04 19:37 azn Description Updated
2012-05-04 19:38 azn Assigned To azn => abv
2012-06-04 15:26 pdn Assigned To abv => ssv
2012-06-05 12:40 bugmaster Project LPKF => Open CASCADE
2012-06-05 13:34 abv Assigned To ssv => dbv
2012-06-07 15:16 dbv Note Added: 0020658
2012-06-07 15:16 dbv Assigned To dbv => abv
2012-06-07 15:16 dbv Status assigned => resolved
2012-06-07 15:41 abv Note Added: 0020660
2012-06-07 15:41 abv Assigned To abv => dbv
2012-06-07 15:41 abv Status resolved => assigned
2012-06-13 14:14 dbv Note Added: 0020687
2012-06-13 14:14 dbv Assigned To dbv => abv
2012-06-13 14:14 dbv Status assigned => resolved
2012-06-14 16:55 abv Note Added: 0020694
2012-06-14 16:55 abv Assigned To abv => bugmaster
2012-06-14 16:55 abv Status resolved => reviewed
2012-06-14 18:02 mkv Assigned To bugmaster => mkv
2012-06-15 14:11 mkv Assigned To mkv => dbv
2012-06-15 14:12 mkv Status reviewed => assigned
2012-06-18 08:43 abv Status assigned => resolved
2012-06-18 08:43 abv Status resolved => reviewed
2012-06-18 12:18 mkv Assigned To dbv => mkv
2012-06-18 18:06 mkv Note Added: 0020719
2012-06-18 18:07 mkv Test case number => chl 934 C8
2012-06-18 18:07 mkv Assigned To mkv => bugmaster
2012-06-18 18:07 mkv Status reviewed => tested
2012-06-22 18:59 san Note Added: 0020773
2012-06-22 18:59 san Assigned To bugmaster => dbv
2012-06-22 18:59 san Status tested => assigned
2012-06-22 19:00 san Note Edited: 0020773
2012-06-22 19:04 san Note Edited: 0020773
2012-06-22 19:20 san Note Edited: 0020773
2012-06-22 19:43 san Note Added: 0020777
2012-06-22 19:46 san Note Edited: 0020777
2012-06-25 09:46 msv Note Added: 0020780
2012-06-25 18:36 dbv Note Added: 0020791
2012-06-25 18:36 dbv Assigned To dbv => abv
2012-06-25 18:36 dbv Status assigned => resolved
2012-06-25 20:07 abv Note Added: 0020793
2012-06-25 20:07 abv Assigned To abv => dbv
2012-06-25 20:07 abv Status resolved => assigned
2012-06-28 17:16 dbv Note Added: 0020816
2012-06-28 17:16 dbv Assigned To dbv => abv
2012-06-28 17:16 dbv Status assigned => resolved
2012-06-28 18:44 dbv Note Added: 0020824
2012-06-28 19:16 dbv Note Added: 0020825
2012-06-28 19:35 abv Note Added: 0020826
2012-06-28 19:35 abv Assigned To abv => dbv
2012-06-28 19:35 abv Status resolved => assigned
2012-06-29 11:13 dbv Note Added: 0020830
2012-06-29 11:13 dbv Assigned To dbv => abv
2012-06-29 11:13 dbv Status assigned => resolved
2012-06-29 11:36 abv Note Added: 0020831
2012-06-29 11:36 abv Status resolved => reviewed
2012-06-29 11:37 abv Assigned To abv => bugmaster
2012-06-29 12:01 dbv Note Added: 0020832
2012-06-29 14:28 mkv Assigned To bugmaster => mkv
2012-06-29 15:53 mkv Note Added: 0020834
2012-06-29 15:54 mkv Assigned To mkv => dbv
2012-06-29 15:54 mkv Status reviewed => assigned
2012-07-03 11:41 dbv Note Added: 0020845
2012-07-03 11:41 dbv Assigned To dbv => abv
2012-07-03 11:41 dbv Status assigned => resolved
2012-07-03 12:52 abv Note Added: 0020846
2012-07-03 12:52 abv Assigned To abv => dbv
2012-07-03 12:52 abv Status resolved => assigned
2012-07-03 14:06 dbv Note Added: 0020847
2012-07-03 14:06 dbv Assigned To dbv => abv
2012-07-03 14:06 dbv Status assigned => resolved
2012-07-03 15:36 abv Note Added: 0020848
2012-07-03 15:36 abv Assigned To abv => dbv
2012-07-03 15:36 abv Status resolved => reviewed
2012-07-03 16:31 mkv Assigned To dbv => mkv
2012-07-04 18:26 mkv Note Added: 0020858
2012-07-04 18:28 mkv Assigned To mkv => bugmaster
2012-07-04 18:28 mkv Status reviewed => tested
2012-07-10 14:33 dbv Changeset attached => occt master d00cba63
2012-07-10 14:33 dbv Assigned To bugmaster => dbv
2012-07-10 14:33 dbv Status tested => verified
2012-07-10 14:33 dbv Resolution open => fixed
2012-07-23 13:01 abv Relationship added parent of 0023260
2012-11-09 09:53 abv Category OCCT:Visualization => OCCT:Mesh
2012-11-16 13:13 bugmaster Fixed in Version => 6.5.4
2012-11-16 13:18 bugmaster Status verified => closed