MantisBT - Community
View Issue Details
0025545Community[OCCT] OCCT:Modeling Datapublic2014-12-03 08:252015-05-14 15:32
Roman Lygin 
bugmaster 
normalminor 
closedfixed 
[OCCT] 6.8.0 
[OCCT] 6.9.0[OCCT] 6.9.0 
0025545: TopLoc_Location::Transformation() provokes data races
Up to 6.8.0 TopLoc_Location::Transformation() uses delayed construction of the gp_Trsf: initially only a null pointer gets stored and the object is dynamically allocated and created upon first call to Transformation().

Therefore accessing the same location from multiple threads creates a data race if the object has not been created yet.

The fix will enforce creation of the object right away, without delay and extra pointer. Not only will this remove data race but also reduce total memory footprint (object size only vs object size + size of pointer) and increase performance (no extra dynamic allocation). This is reasonable given that the object gets created anyway, upon any access to geometry, i.e. via BRep_Tool.
The below code is inserted in the command OCC25545.
Test case is bugs fclasses bug25545.

/*! Tests data-race when concurrently accessing TopLoc_Location::Transformation().*/
void TopLoc_LocationTest::Multithreading()
{
    size_t n = 1000;
    std::vector<TopoDS_Shape> aShapeVec (n);
    std::vector<TopLoc_Location> aLocVec (n);
    TopoDS_Shape aShape = BRepBuilderAPI_MakeVertex (gp::Origin ());
    aShapeVec[0] = aShape;
    for (size_t i = 1; i < n; ++i) {
        gp_Trsf aT;
        aT.SetTranslation (gp_Vec (1, 0, 0));
        aLocVec[i] = aLocVec[i - 1] * aT;
        aShapeVec[i] = aShape.Moved (aLocVec[i]);
    }

    //have to use an external flag as QCOMPARE() inside lambda does not return from it
    tbb::atomic<bool> anIsRaceDetected;
    anIsRaceDetected = false;

    //concurrently process
    tbb::parallel_for (size_t (0), n, [&](size_t i) {
        if (!anIsRaceDetected) {
            const TopoDS_Vertex& aV = TopoDS::Vertex (aShapeVec[i]);
            gp_Pnt aP = BRep_Tool::Pnt (aV);
            if (aP.X () != static_cast<double> (i)) {
                anIsRaceDetected = true;
            }
        }
    }, tbb::simple_partitioner ());
    QVERIFY (!anIsRaceDetected);
}
The class TopLoc_Location is made thread-safe. Earlier the method Transformation provoked data races.
No tags attached.
related to 0024826closed bugmaster Wrapping of parallelisation algorithms 
Issue History
2014-12-03 08:25Roman LyginNew Issue
2014-12-03 08:25Roman LyginAssigned To => Roman Lygin
2014-12-04 10:34gitNote Added: 0034992
2014-12-04 10:41Roman LyginNote Added: 0034993
2014-12-04 10:41Roman LyginAssigned ToRoman Lygin => msv
2014-12-04 10:41Roman LyginStatusnew => resolved
2014-12-06 17:57msvNote Added: 0035095
2014-12-06 17:58msvAssigned Tomsv => Roman Lygin
2014-12-06 17:58msvStatusresolved => feedback
2014-12-06 23:55Roman LyginNote Added: 0035097
2014-12-06 23:55Roman LyginAssigned ToRoman Lygin => msv
2014-12-07 14:35gitNote Added: 0035102
2014-12-07 14:41msvNote Added: 0035103
2014-12-07 14:41msvAssigned Tomsv => abv
2014-12-07 14:41msvStatusfeedback => resolved
2014-12-07 14:41msvSteps to Reproduce Updatedbug_revision_view_page.php?rev_id=8764#r8764
2014-12-07 16:31Roman LyginNote Added: 0035104
2014-12-08 09:41abvNote Added: 0035114
2014-12-08 09:41abvAssigned Toabv => msv
2014-12-08 09:41abvStatusresolved => assigned
2014-12-08 13:24gitNote Added: 0035125
2014-12-08 13:30msvSteps to Reproduce Updatedbug_revision_view_page.php?rev_id=8767#r8767
2014-12-08 13:30msvAdditional Information Updatedbug_revision_view_page.php?rev_id=8769#r8769
2014-12-08 13:31msvNote Added: 0035126
2014-12-08 13:31msvAssigned Tomsv => abv
2014-12-08 13:31msvStatusassigned => resolved
2014-12-08 13:34msvRelationship addedrelated to 0024826
2014-12-08 13:36msvNote Added: 0035127
2014-12-08 13:59abvNote Added: 0035133
2014-12-08 13:59abvAssigned Toabv => bugmaster
2014-12-08 13:59abvStatusresolved => reviewed
2014-12-08 16:41mkvAssigned Tobugmaster => apv
2014-12-10 17:58apvNote Added: 0035280
2014-12-10 17:58apvAssigned Toapv => msv
2014-12-10 17:58apvStatusreviewed => assigned
2014-12-10 19:13msvAssigned Tomsv => Roman Lygin
2014-12-10 19:14msvAssigned ToRoman Lygin => apv
2014-12-10 19:16msvNote Added: 0035285
2014-12-17 11:29apvNote Added: 0035444
2014-12-17 11:29apvAssigned Toapv => bugmaster
2014-12-17 11:29apvStatusassigned => feedback
2014-12-25 12:32apvNote Added: 0035708
2014-12-25 12:32apvStatusfeedback => tested
2014-12-26 13:33bugmasterChangeset attached => occt master ee6bb37b
2014-12-26 13:33bugmasterStatustested => verified
2014-12-26 13:33bugmasterResolutionopen => fixed
2015-01-19 16:23bugmasterTarget Version7.0.0 => 6.9.0
2015-01-26 12:35gitNote Added: 0036587
2015-01-26 12:36gitNote Added: 0036598
2015-05-14 15:29aivStatusverified => closed
2015-05-14 15:32aivFixed in Version => 6.9.0

Notes
(0034992)
git   
2014-12-04 10:34   
Branch CR25545 has been created by Roman Lygin.

SHA-1: 7ea5c1cc3309780e07ceecf0dbba10fbcec612bc


Detailed log of new commits:

Author: Roman Lygin
Date: Thu Dec 4 10:34:20 2014 +0400

    0025545: TopLoc_Location::Transformation() provokes data races
(0034993)
Roman Lygin   
2014-12-04 10:41   
The fix had to introduce a boolean flag TopLoc_ItemLocation::myIsComputed as otherwise it is impossible to distinguish cases when myTrsf corresponds to computed or not yet computed value. This collision caused regressions on a few cases (e.g. blend bfuseblend A8) and required a rework.

Thus, there is a footprint of 2 bytes (sizeof(short)) vs saved 4 or 8 bytes (sizeof(gp_Trsf*)). To reduce this new footprint a single int field could be used for power and boolean flag. This modification has not been done for the sake of simplicity.

Other notes:
gp_Trsf::Powered() has been made const as it was non-const by mistake in the past and thus caused a compilation error in modified TopLoc_ItemLocation ctor.
(0035095)
msv   
2014-12-06 17:57   
Dear Roman,

It seems there is a way to go further in this solution and to get rid of myIsComputed flag. (BTW, footprint mentioned in the previous post is not 2 bytes, but 4 bytes, as Standard_Boolean is defined as unsigned int.) I have analyzed the usage of TopLoc_SListOfItemLocation class and its methods and found out that once the chain of item locations is constructed it is not changed (nor power nor datum nor tail) during its life. Yes, there are methods that can turn this statement to false. They are ChangeValue, SetValue, ChangeTail, SetTail. But these methods are not used anywhere. Therefore I propose to remove them from the interface as unused and dangerous.

So, my main idea is to move the computation of transformation into the following constructor of the class TopLoc_SListOfItemLocation:

TopLoc_SListOfItemLocation::TopLoc_SListOfItemLocation
  (const TopLoc_ItemLocation& anItem,
   const TopLoc_SListOfItemLocation& aTail);


This ctor is called only from within the methods Construct and Constructed, i.e. when new chain is constructed from an old one by putting a new item location on top. If at this moment we compute transformation for the new item location already knowing the tail of previous items this transformation will be valid till the end of life cycle of this chain.

Please, tell me if you know any obstacles on this way.

BR,
Mikhail
(0035097)
Roman Lygin   
2014-12-06 23:55   
Dear Mikhail,

Thank you for your notes.
This was my way of thinking as well when I implemented the first version of the fix. However it did cause a regression on a few cases:
blend bfuseblend A8 A9
boolean bopcommon_simple ZF3 ZI1 ZI2
and others.
After initial triaging I isolated it to the case when there is internal location, two ones on the parent shape and then scaling. Not sure but it could be something like this:
line l 0 0 0 1 0 0
vertex v1 0 0 0
vertex v2 0 0 0
translate v2 1 0 0
edge e l v1 v2
wire w e
trotate w 0 0 0 0 1 0 90
trotate w 0 0 0 1 0 0 90
tscale w 0 0 0 10

The parent shape gets invalid transformation but when you save the shape to .brep and restore back then it gets fine. After triaging it looked like the chain of items was fine (therefore storing/restoring to/from .brep is fine) but the value in one chain node is updated more than needed.

I ran out of time to further triage that and had to fall back to a limited version as explained above. I am not certain if avoiding the flag is really feasible.

Yes, you are right that sizeof(Standard_Boolean) is 4 (I believed it was unsigned short). This decreases the gain but the new version is still better in terms of footprint and performance given that no extra dynamic allocation happens.

If you/your team have spare cycles you might want to further investigate this to reach the best possible fix. If not then I suggest that you integrate the fix and create a follow up tracker to get back to this later. I am afraid I ran out of cycles on this task.

Thank you again,
Roman
(0035102)
git   
2014-12-07 14:35   
Branch CR25545_1 has been created by msv.

SHA-1: d18744ea904ec351493863aedebecc357c879fac


Detailed log of new commits:

Author: Roman Lygin
Date: Thu Dec 4 10:34:20 2014 +0400

    0025545: TopLoc_Location::Transformation() provokes data races
    
    Get rid of postponed calculation of transformation.
    Remove unused methods.
    Add command OCC25545 to reproduce the bug with data races.
(0035103)
msv   
2014-12-07 14:41   
I have reworked the patch of Roman. It has no regressions pointed by him.
Andrey, please review.
(0035104)
Roman Lygin   
2014-12-07 16:31   
Hello Mikhail,
Many thanks for completing the patch. Great to see you managed to get rid of this extra field. Apparently I missed some obvious stuff.
Roman
(0035114)
abv   
2014-12-08 09:41   
Mikhail, it's great that you have added DRAW command for testing the fix! Please just correct it so it can be compiled without TBB (use HAVE_TBB macro) and by compilers with no support of C++11 features like lambda functions. Then, I strongly recommend you to add a test case script -- it should be pretty easy and will allow the patch to pass without additional efforts by QA team.
(0035125)
git   
2014-12-08 13:24   
Branch CR25545_1 has been updated by msv.

SHA-1: a8682d99e563bd750a9aad6e50f663547a23054e


Detailed log of new commits:

Author: msv
Date: Mon Dec 8 13:23:57 2014 +0300

    - Get rid of C++11 lambda construction
    - make code compilable with no HAVE_TBB defined
    - add test case bugs/fclasses/bug25545

(0035126)
msv   
2014-12-08 13:31   
Please review new changes.
(0035127)
msv   
2014-12-08 13:36   
The parallel framework developed in scope of 0024826 is to be applied to the command OCC25545, so as to make it functioning in absence of TBB.
(0035133)
abv   
2014-12-08 13:59   
No remarks, please test
(0035280)
apv   
2014-12-10 17:58   
Dear BugMaster,

Branch CR25545_1 has been rejected due compilation errors on all platforms:
Linux:
http://jenkins-test-02.nnov.opencascade.com/user/mnt/my-views/view/CR25545_1/job/mnt-CR25545_1-master_build_occt_linux/1/parsed_console/ [^]
Windows:
http://jenkins-test-02.nnov.opencascade.com/user/mnt/my-views/view/CR25545_1/job/mnt-CR25545_1-master_build_occt_windows/1/parsed_console/ [^]
MacOS:
http://jenkins-test-02.nnov.opencascade.com/user/mnt/my-views/view/CR25545_1/job/mnt-CR25545_1-master_prepare_build_occt_MacOS/1/parsed_console/ [^]
(0035285)
msv   
2014-12-10 19:16   
Dear apv,
The errors occur due to use of obsolete version of tbb (3.0 on win and lin; and 4.0 on macOS). Please, update environment to use tbb 4.2 or 4.3.
(0035444)
apv   
2014-12-17 11:29   
Please, process request.
(0035708)
apv   
2014-12-25 12:32   
Dear BugMaster,

Branch CR25545_1 from occt git-repository (and master from products git-repository) was compiled on Linux, MacOS and Windows platforms and tested.
SHA-1: a8682d99e563bd750a9aad6e50f663547a23054e

Number of compiler warnings:
occt component:
   Linux: 18 (18 on master)
   Windows: 0 (0 on master)
products component :
   Linux: 11 (11 on master)
   Windows: 1 (1 on master)

Regressions/Differences:
Not detected

Testing cases:
bugs fclasses bug25545 - OK
http://occt-tests/CR25545-1-master-occt/Debian60-64/bugs/fclasses/bug25545.html [^]
http://occt-tests/CR25545-1-master-occt/Windows-32-VC10/bugs/fclasses/bug25545.html [^]

Testing on Linux:
Total MEMORY difference: 364789752 / 363562240
Total CPU difference: 46090.75000000002 / 47537.4299999999

Testing on Windows:
Total MEMORY difference: 276830884 / 277181316
Total CPU difference: 33849.90625 / 41681.890625
(0036587)
git   
2015-01-26 12:35   
Branch CR25545_1 has been deleted by inv.

SHA-1: a8682d99e563bd750a9aad6e50f663547a23054e
(0036598)
git   
2015-01-26 12:36   
Branch CR25545 has been deleted by inv.

SHA-1: 7ea5c1cc3309780e07ceecf0dbba10fbcec612bc