|
This whitepaper on the History of Truth and Logical Expression Evaluation in Microsoft Basic was written, and copyright is held, by Microsoft Visual Basic MVP: |
Dan Barclay Copyright ©2001, Barclay Software, Inc. |
This document describes logical expression evaluation in Microsoft Basic. The primary purpose is to describe expression evaluation from a language standpoint. That is, what the programmer should have in mind when writing the logical expression. This document was originally targeted at answering a few frequently asked questions, then wandered into the territory of a tutorial. I stopped adding (perhaps too late) when it seemed to be heading off in the direction of a dissertation.
Scope: This description applies to currently or previously shipped versions of MS Basic including 8 bit (CP/M TRSDOS), 16bit DOS (QuickBasic, Qbasic, IBM Basic Compiler, BASICA, PDS, VB for DOS), OS/2 (PDS), 16bit for Windows (Visual Basic versions 1-3), 32bit for Windows (Visual Basic versions 4-6). There are likely other MS Basic derivative products to which this applies that I have forgotten or am unaware of. Note: This behavior does not apply to Beta1 of VB.Net. It also will not apply to some other versions of Basic (some early Apple versions, for example, even when authored by Microsoft).
In addition to the language view, some effort is spent to describe optimizations that have been made in previous versions of MS Basic to emphasize some points in the language view.
Much discussion has occurred recently concerning logical expression evaluation. Questions such as "why are logical operators bitwise?"; "Why is the value of true -1 and not simply 1?"; "Why doesn't Basic include short circuiting?" are common. These are natural questions for those whose background is predominately another language with different characteristics. Basic has different syntax and different behavior than other languages… or it would not be another language. These questions are answered below.
First, understand that logical operations and tests are two different things.
For tests only (that is, the expression tested by IF and whatnot), a zero value is judged to be false, any nonzero value is judged to be the truth (well, True).
Logical operations have both numeric operands (we'll get to Boolean Data Type in a minute) and numeric results. The operators are bitwise on the numeric operands. That is, they operate on each corresponding bit of each operand. For example:
Assignment Bits Value As Integer a = 0 0000000000000000 0 b = Not a 1111111111111111 -1
Assignment
Bits
Value As Integera = 1 0000000000000001 1 b = 3 0000000000000011 3 a And b 0000000000000001 1 a Or b 0000000000000011 3 a Xor b 0000000000000010 2 Not b 1111111111111100 -4 (Trust me on this one!) That is And, Or, Xor and Not compare each pair of corresponding bits in the operands and perform the corresponding logical operation. For those familiar with bitwise operations, I apologize for unneeded detail. For those wishing an even more detailed explanation of what is meant by "bitwise operations", you may want to refer to Ethan Winer's book Basic Techniques and Utilities "Bit Operations" in Chapter 3 (out of print, but available for free download at http://www.ethanwiner.com).
If you use
MyFalse% = 0 MyTrue% = Not MyFalse%
you get all bits set in MyTrue%. (That happens to represent -1, by the way). Now, since all bits behave the same way (since they're all the same), you can use the bitwise Boolean operators to simulate value level Boolean operations.
This is why, in the olden days, you would find statements at the top of most routines such as:
False% = 0 True% = Not False%It is also why, when the Boolean "constants" False and True were introduced their representation was as 0 and -1. Likewise, when the Boolean Data Type was introduced, even later, it had the behavior of representing false/true as 0/-1. In addition, coercion rules used the test rules to set the value. That is
Dim MyBool As Boolean MyBool = 1 ' 1 *tests* as True, so MyBool is set to True (-1) MyInt% = MyBool ' Now MyInt% equals -1
You could use the value of 1 for True if you want, since it will test properly, but then True is not equal to Not False, which will seem a bit strange. As such, it cannot be used to represent the Boolean value of True in value level Boolean expressions. Example:
a = 1 ' true, in the testing sense b = 2 ' also true using the <>0 criteria If a And b Then ' this is false because: ' 0000000000000001 ' 0000000000000010 ' ---------------- ' 0000000000000000 AndWhen you are trying to deal with value level Boolean expressions, be certain you are using the values 0 and -1 so that all bits are either zero or one. Some have said the result of ignoring this advice will be unpredictable. That is incorrect. The result of doing this is entirely predictable but may be undesirable from your standpoint.
Enter the Boolean data type. If you are trying to accomplish value-level Boolean expression evaluation you can assure you stick with the proper values (zero and minus one) by using Boolean variables. In fact, if you do you will find that the bitwise operators behave as value level Boolean operators.
Still, it's bits. Only zero and 1 are allowed (exist). The key here is that Boolean operators are bitwise and it is important what all the bits do. With the Boolean data type you are simply assured they are all doing the same thing.
Remember that Boolean Expressions are numeric in Basic. The operators take numeric operands (all bits) and produce a numeric result. The following is perfectly valid:
A = 1 B = 2 C = B > AIn this fragment, the result left in C is -1. Consider also:
A = 1 B = 2 C = B Or A
In this fragment, the result left in C is 3. Further, the following is perfectly valid:
C = 1.23 * (B > A) + 1000 * A
A few other things you should know about logical expression evaluation:
Order of Execution
Operator Precedence
Precedence of operations is Comparison (>,<,= and friends) then Logical (And, Or and friends). Within the same operation level, precedence is left to right. Example:
If MyValue% And MyBitmask% And A > B ThenIn the above expression, the bitwise And comparison of MyValue% and MyBitMask% is Anded (on a bitwise basis) to the result of A>B. In this case A>B is a switch turning on or off all the resulting bits.
Other (order of appearance)
Order of expression execution based on order of appearance is not guaranteed, even though precedence is guaranteed. Compiler optimizations may change the execution order.
Short Circuiting
Short Circuiting is a process by which execution of parts of a logical expression may be skipped entirely. For example:
If A > B And B < C ThenIn processing the above fragment, if A<=B then it simply does not matter if B<C so the compiler can skip that comparison. (Likewise, if B=>C it does not matter if A>B). The process of skipping that part of the expression is called "Short Circuiting" the expression. Short circuiting may be implemented for a number of reasons including performance (not having to perform comparisons as shown above) or flow control. A good flow control example would be:
If A <> 0 And ((B / A) > 10) ThenIn this case, if short circuiting were implemented for flow control, the expression B/A would avoid divide by zero because it would never be executed. Note that parenthesis are added in the above expression for clarity even though they are not required.
While it is possible that future versions of MS Basic will provide guaranteed execution order, current versions do not (as discussed above). Short circuiting is provided as a performance enhancement in the Basic PDS and VB for DOS optimizing compilers, but reliance on it for flow control is not supported.
In short, in Basic short circuiting is provided as an optimization by some compilers, but in currently available products neither short circuiting itself nor the order of execution is defined as a behavior of the language. The way to short circuit for flow control is to use explicit short circuiting (nested If statements). Similarly in other languages such as C the way you assure execution of expression fragments is to explicitly promote them out of logical expressions.
Embedded functions
Functions embedded in logical expressions are guaranteed to be evaluated, in no particular order. That is, with the expression:
If A > MyFunction(Param1) And B = MyOtherFunction(Param2) ThenBoth MyFunction and MyOtherFunction are evaluated, regardless of any short circuiting of the logical expression by the compiler. In addition, since execution order is not guaranteed the order of function execution is not guaranteed.
So, with all these numeric conversions and tests, isn't performance impacted? Not necessarily. All isn't as it may appear to be (to the compiler anyway). The purpose of discussing a few compiler optimizations is to show that actual implementation is not necessarily a rote implementation of the language view. The developer must always keep the language view in mind, as this determines behavior and is the same for all MS Basic products. Optionally, the developer should consider the actual implementation (which will vary from product to product) in order to take advantage of performance enhancements.
The VB for DOS compiler is used because it is convenient and illustrates a wide variety cases in which the language view and actual optimized implementation are different.
Direct Logical evaluation.
One way performance is maintained is by good expression evaluation by the compiler. An example would be a simple logical comparison. Consider the fragment:
If A% > B% Then B% = 1 End IfViewing this from the language standpoint the developer should assume the following evaluation is occurring: First, the comparison A% > B% occurs, yielding a zero or -1. Next, the value of that expression is tested by If for Truth (nonzero).
However, the compiler sees that only a simple comparison is being made and can optimize the entire expression to a comparison and a jump over the assignment of B if warranted as follows:
0030 0006 If A% > B% Then 0030 0006 B% = 1 0030 0006 End If 0030 ** I00002: mov ax,B% 0033 ** cmp ax,A% 0037 ** jnl I00003 0039 ** mov B%,0001h 003F 000A 003F 000A 003F ** I00003:Short Circuiting
Short circuiting has been provided in some previous versions of MS Basic products, in a way that maintains language behavior but optimizes performance. Consider the fragment:
If A% > B% And B% > C% And D% > E% Then B% = 1 End IfViewing this from the language standpoint, each comparison (A>B; B>C; D>E) yields a true or false numeric value (0 or -1). Each of the comparison results is then Anded to yield a numeric result for If to test.
However, the compiler sees that these are simple comparisons and furthermore short circuits them for performance:
0030 0006 If A% > B% And B% > C% And D% > E% Then 0030 0006 B% = 1 0030 0006 End If 0030 ** I00002: mov ax,B% 0033 ** cmp ax,A% 0037 ** jnl I00003 0039 ** cmp ax,C% 003D ** jng I00003 003F ** mov ax,E% 0042 ** cmp ax,D% 0046 ** jnl I00003 0048 ** mov B%,0001h 004E 0010 004E 0010 004E ** I00003:Note the short circuiting (multiple jumps to I00003) as well as register reuse. The point of short circuiting expression evaluation is to not waste time comparing expressions that don't matter anymore, since the first test has already failed.
Order of Execution
If MyValue% And MyMask% And A% > B% Then B% = 1 End IfThis is a mix of comparison and numeric logicals. Viewing this from the language standpoint, a value (MyValue%) containing binary data is masked using the bitmask MyMask% and then is switched using the comparison A% > B%. This occurs using bitwise Anding of the value and the mask, followed by bitwise Anding of the result of the A>B comparison.
However, the compiler sees this a bit differently and reorders the expression a bit:
0030 0006 If MyValue% And MyMask% And A% > B% Then 0030 0006 B% = 1 0030 0006 End If 0030 ** I00002: mov ax,B% 0033 ** cmp ax,A% 0037 ** mov cx,0000h 003A ** jnl $+03h 003C ** dec cx 003D ** and cx,MyValue% 0041 ** and cx,MyMask% 0045 ** and cx,cx 0047 ** je I00003 0049 ** mov B%,0001h 004F 000E 004F 000E 004F ** I00003:Note that the comparison is promoted to the top of the expression (as opposed to the order in which it was written).
It is not predictable which expressions will be executed and in what order, the compiler has full freedom to optimize at will. Even the use of parentheses do not guarantee order of execution, though they change precedence and certainly will make complex expressions more easily read.
If (A% > B% And (B% > C% And (D% > E% Or F% > G%))) Then B% = 1 End IfThough parentheses are used, the order of execution of the comparisons is neither fully inside out or outside in nor is it left to right or right to left:
0030 0006 If (A% > B% And (B% > C% And (D% > E% Or F% > G%))) Then 0030 0006 B% = 1 0030 0006 End If 0030 ** I00002: mov ax,B% 0033 ** cmp ax,A% 0037 ** jnl I00003 0039 ** mov ax,E% 003C ** cmp ax,D% 0040 ** jl I00004 0042 ** mov ax,G% 0045 ** cmp ax,F% 0049 ** jnl I00003 004B ** I00004: mov ax,C% 004E ** cmp ax,B% 0052 ** jnl I00003 0054 ** mov B%,0001h 005A 0014 005A 0014 005A ** I00003:It is key to note that the compiler has freedom to execute the expression in any way it wishes, but no freedom to change the precedence or logical meaning of the expression.
Function Promotion
The following fragment contains embedded functions.
If A% > B% And B% > MyFunc1%(C%) And D% > MyFunc1%(E%) Then B% = 1 End IfEarly behavior established that functions embedded in logical expressions are always executed. This behavior is preserved by the compiler, even during short circuiting optimizations. This expression compiles to:
004E 0006 If A% > B% And B% > MyFunc1%(C%) And D% > MyFunc1%(E%) Then 004E 0006 B% = 1 004E 0006 End If 004E ** I00003: mov ax,offset C% 0051 ** push ax 0052 ** call MYFUNC1% 0057 ** mov _T0008%,ax 005A ** mov ax,offset E% 005D ** push ax 005E ** call MYFUNC1% 0063 ** mov _T000C%,ax 0066 ** mov ax,B% 0069 ** cmp ax,A% 006D ** jnl I00005 006F ** cmp ax,_T0008% 0073 ** jng I00005 0075 ** mov ax,_T000C% 0078 ** cmp ax,D% 007C ** jnl I00005 007E ** mov B%,0001hThe calls to user functions are promoted to the top of the evaluation using hidden variables to hold intermediate results, followed by short circuit evaluation of the logical expression itself.
The behavior of logical expressions in MS Basic follows well established and clear rules. Knowing this behavior is critical to proper execution of any application. Knowledge of the behavior includes both the defined and undefined characteristics.
Misunderstanding the defined behavior, or dependence on undefined behavior, lead to unwanted (even if predictable) behavior.
-- D.A. Barclay