Menu

#526 Expressions miscompiled: incorrect algebraic rearrangement

closed
Matthew
compiler (247)
2013-02-14
2011-01-10
TeeEmCee
No

The following expression:

dim as integer h = 0
? (h - (h+1) * 2 + h)

Miscompiles. It prints 2 when it should print -2. Looking at the assembly, I see that it has been compiled to "h - h * 2 + h + 2". The order of the terms matters, but the particular constants and the use of variables does not (any value which is not a compile-time constant seems to do).

Tested with fbc 0.21.1 and SVN revision 5450.

Discussion

  • Matthew

    Matthew - 2011-01-14

    Thanks, nice find. It looks like a faulty optimisation routine. I've disabled the optimisation as a temporary fix in #5455:
    http://fbc.svn.sf.net/viewvc/fbc?view=revision&revision=5455
    Not properly investigated the routine yet, but I think I know roughly what the bug will be and what the fix should look like.

     
  • Matthew

    Matthew - 2011-04-27

    Update: looks like I was wrong about the routine, the bug, and the fix. After searching for a likely cause without success, I gave up for a while.

    Technical explanation: It turns out now that the routine I disabled - hOptConstDistMul() - isn't causing the bug, but helping to bring about the right conditions for it to occur: a-m(b+n) is optimised by it to give a-(mb+mn), but doing that directly would have been optimised (presumably by hOptAssocADD(), which runs before it) to a-mb+(-mn).

    Recently coming back to it and attacking it with some different debugging methods, I've isolated the bug - I think - to a pair of routines, hOptConstAccum1() and hConstAccumADDSUB(). The former calls the latter. It can't (easily) be isolated to the callee because it effectively returns two values which the caller sticks together. Somewhere along the line, the code is getting confused with the mix of adding and subtracting.

    Anyway, I suspect the end is in sight for this bug. It looks like I will have to dig deeper into the logic of the code though...

     
  • Matthew

    Matthew - 2011-04-27

    EDIT: hOptConstAccum2(), not hOptConstAccum1().
    While I'm here, I'll also report that the surrounding additions don't have to be with a var - in fact they can even be zero (i.e. 0 - (h+1)*2 + 0) - but they do have to be there.

     
  • Matthew

    Matthew - 2011-06-14

    I've uploaded a complete fix for the problem, but not a very satisfactory one.

    Given the amount of changes I made while researching, I'm not sure everything I said in my comments was accurate, but the problem seems to be of the case (n op (n op (n op ... (n op c)...))), where op is +/-, and n is some non-constant numeric expression (possibly all identical), and c is a numeric constant.

    The problem seems to occur when the ops are a mixture of '+' and '-'. There is a vague pattern, but not enough for me to understand how the logic is faulty.
    So anyway, I've effectively just disabled (n - (expr)), which prevents the mixing.

    Note: I'm not sure now what I meant about (0 - (h+1)*2 + 0), I can't replicate it now.

    I just hope I have actually isolated the problem area and squashed the bug. But I don't think I'll make any progress with it unless I can get my head into the logic sometime. So far it has eluded me and I think I'll be moving on to other things for now.

    Anyway, I'll upload a program I wrote to generate some simple test expressions of the form I gave above (i.e. each left branch terminates at n, each right branch continues except the last which terminates at c.) The results showed a pattern, but not enough for me to discern the problem.

     
  • Matthew

    Matthew - 2011-06-14

    Program to generate test expression trees of a certain form

     
  • dkl

    dkl - 2011-06-16

    Awesome, it seems to fix this problem too:
    http://www.freebasic.net/forum/viewtopic.php?t=18001
    (looks like an unfortunate 0.22 regression)

    Here's a test:

    '' '@array(a,b)' seems to be miscompiled,
    '' only seems to happen with global multi-dimensional array,
    '' when the 1st dimension is indexed beyond the first element,
    '' and the 2nd dimension has 3 elements each.
    ''
    '' Works though when the array offset is constant & evaluated at compile-time.
    '' Works with data type byte.
    ''
    '' Test with -exx to make sure the array bounds are ok

    print "(ok)", "(mismatch)"

    dim as integer a = 0, b = 1

    dim shared as integer x(0 to 1, 0 to 2) = {{1, 2, 3}, {4, 5, 6}}
    dim shared as integer y(1 to 1, 0 to 2) = {{1, 2, 3}}
    dim shared as integer z(0 to 0, 0 to 2) = {{1, 2, 3}}

    print "-----------------------"
    print @x(0,0), @x(1,0)
    print @x(a,0), @x(b,0)
    print "-----------------------"
    print , @y(1,0)
    print , @y(b,0)
    print "-----------------------"
    print @z(0,0)
    print @z(a,0)

     
  • dkl

    dkl - 2011-06-16

    I have no other explanation anyways...

     
  • Matthew

    Matthew - 2011-06-24

    It seems to be the reenabling of hOptConstDistMUL() that's fixing the array access.

    Not sure exactly what it's doing, but the end result is to emit:
    imul eax, 12
    lea ebx, [_X+eax]
    Instead of:
    lea ebx, [_X+eax*2+eax]
    So, eax12 instead of eax3.

    Using a modified astDumpTree, this is how the IDX node gets optimised:

    Without hOptConstDistMUL:

    Before:
    [IDX: ofs=0 mult=1 l=(((B * 3) + 1) * 4)]
    After:
    [IDX: ofs=0 mult=4 l=((B * 3) + 1)]

    With hOptConstDistMUL:

    Before:
    [IDX: ofs=0 mult=1 l=(((B * 3) + 1) * 4)]
    After:
    [IDX: ofs=4 mult=1 l=(B * 12)]

    So perhaps the emitter is ignoring the mult=4 somewhere. I don't know whether that would be "by design" or not, perhaps hOptConstDistMUL was specifically added to prevent this occurance.

     
  • dkl

    dkl - 2013-02-14

    I believe I found the real cause of this bug. hConstAccumADDSUB() is used to accumulate the constants in a tree +/- BOPs. For the rhs of '-' BOPs, the sign must be inverted when pulling the constants out. However, this "context sign" was only taken into account for the second and following constants added/subtracted on top of the current accumulated value, but not for the first constant found.

    Given this expression:

    a - 1 - 2
    
         -
        / \
       -   2
      / \
     a   1
    

    hConstAccumADDSUB() did (without its rhs walking disabled by the EXIT SELECT originally added to fix this bug):
    1. See the constant rhs "2"
    2. Set accumulator to "2" (while it should have been -2 because it's a subtraction)
    3. Walk into the lhs
    4. See the '-' BOP
    5. Walk into the lhs and find nothing, return
    6. Invert the context sign from + to - since it's a subtraction
    7. Walk into the rhs
    8. See the constant "1", and due to the now negative context sign, add "-1" (or subtract "1") on top of the accumulator, making it "1" (while it would have been -3 if the sign were handled correctly for the first constant)

    It should be fixed in [361ee9], including re-enabling hConstAccumADDSUB()'s rhs walking. Also, I added a huge amount of auto-generated test cases, covering lots of +/-/* expression combinations. (actually, the added tests are pretty much the smallest I could make them while still auto-generating and triggering the bug; the generator program can be used to create millions of expression resulting in about thousands of CUnit test case files with 70KB each, that's way too much for regular development, but it may be used to test fbc before making a release)

     

    Related

    Commit: [361ee9]

  • dkl

    dkl - 2013-02-14

    Note: Turning off astNewBOP()'s "l - const" -> "l + -const" optimization seems to be necessary to directly test this. Apparently the only other way a '-' with a constant rhs can appear is as result of other optimizations.

     

Log in to post a comment.