Wednesday, March 7, 2012

Reproducable bug, yay!

Hi Ian,
Here is a good article about changes in behavior for this function, between
versions 2000 and 2005.
Change in Behavior of RAND and NEWID in SQL Server 2005
http://www.sqlmag.com/Articles/ArticleID/97032/97032.html?Ad=1
Enjoy it,
AMB
"Ian Boyd" wrote:

> Steps to reproduce:
> --Create test table of one uniqueidentifier column
> CREATE TABLE [dbo].[UniverseCollapse] (
> [SomeID] [uniqueidentifier] NOT NULL
> )
> --Add clustered index
> CREATE CLUSTERED INDEX [IX_UniverseCollapse_1] ON
> [dbo].[UniverseCollapse]([SomeID])
> --Add test row
> INSERT INTO UniverseCollapse (SomeID) VALUES
> ('F9CB8628-571D-463B-9E82-0AC4D0A3C2EB')
> --Run the select over and over. Sometimes returns zero rows, sometimes
> returns one.
> SELECT SomeID
> FROM UniverseCollapse
> WHERE SomeID = 'F9CB8628-571D-463B-9E82-0AC4D0A3C2EB'
> AND '6c03c85d-8151-43ed-858f-7ec5d887689b' <> newID()
>
> Run the select over and over and sometimes you get zero rows, sometimes you
> get one.
> Without the clustered index, it never fails.
> Looking closer at the two WHERE clause conditions:
> 1) SomeID = '...'
> This is always the case, as it is the value that was inserted.
> 2) '6c03...' <> newid()
> This is a trueism, since guid's don't conflict very that often.
> If we run just:
> SELECT SomeID
> FROM UniverseCollapse
> WHERE SomeID = 'F9CB8628-571D-463B-9E82-0AC4D0A3C2EB'
> we always get one row. If we run:
> SELECT SomeID
> FROM UniverseCollapse
> WHERE '6c03c85d-8151-43ed-858f-7ec5d887689b' <> newID()
> we always get one row. Have both criteria together, and we sometimes get
> zero rows.
> Execution Plan:
> |--Filter(
> WHERE
> [UniverseCollapse].[SomeID]<newid()
> OR
> [UniverseCollapse].[SomeID]>newid() ))
> |--Clustered Index Seek(
> OBJECT
> [foo].[dbo].[UniverseCollapse].[IX_UniverseCollapse_1]),
> SEEK[UniverseCollapse].[SomeID]=F9CB8628-571D-463B-9E82-0AC4D0A3C2EB)
> ORDERED FORWARD)
> The "ID < newid OR ID > newid" is very interesting. If "newid()" is
> evaluated twice, once for each half of the WHERE clause, and it happens to
> get a GUID that has swung the other way, it could then eliminate the row
> when it shouldn't.
>
> If i only execute:
> SELECT SomeID
> FROM UniverseCollapse
> WHERE '6c03c85d-8151-43ed-858f-7ec5d887689b' <> newID()
> i get the plan:
> |--Filter(
> WHERE
> STARTUP
> R(6C03C85D-8151-43ED-858F-7EC5D887689B<>newid()) ))
> |--Clustered Index Scan(
> OBJECT[foo].[dbo].[UniverseCollapse].[IX_UniverseCollapse_1]))
> And it's only evaluating newid() once. So the question then becomes, "Why
> does it fail when a clustered index is there?" Looking at the execution
> plan if i remove the clustered index:
> |--Filter(
> WHERESTARTUP EXPR(6C03C85D-8151-43ED-858F-7EC5D887689B<>newid())))
> |--Table Scan(
> OBJECT[foo].[dbo].[UniverseCollapse]),
> WHERE[UniverseCollapse].[SomeID]=F9CB8628-571D-463B-9E82-0AC4D0A3C2EB))
> It doesn't do a "guid > newid or guid < newid", but instead does a "Startup
> Expression". Which means that the problem is in fact that newID is being
> evaluated twice for the "field <> newid()"
>
> Yay! i found a bug! So do i get a cheque or something?
>
>
> Here is a good article about changes in behavior for this function,
> between
> versions 2000 and 2005.
> Change in Behavior of RAND and NEWID in SQL Server 2005
> http://www.sqlmag.com/Articles/ArticleID/97032/97032.html?Ad=1
It should be mentioned that these changes to newid() are not intended for
this case. These changes ensure that Rand() and Newid() are only evaluated
once for each reference. Before if you aliased a column that was a Rand or a
NewID, it would generate a new value for each time the alias is used.
For example in:
SELECT
CASE
WHEN TheValue < 0.5 THEN 'LessThanPoint5'
ELSE 'GreaterThanPoint5'
END AS Test1,
CASE
WHEN TheValue < 0.5 THEN 'LessThanPoint5'
ELSE 'GreaterThanPoint5'
END AS Test2,
CASE
WHEN TheValue < 0.5 THEN 'LessThanPoint5'
ELSE 'GreaterThanPoint5'
END AS Test3
FROM (SELECT rand() as TheValue) ADerivedTable
it happens that for any given execution, Test1 Test2 and Test3 need not be
equal, even through you're selecting rand() only once. Because the rand
column is aliased, it generates a new value.
This is also a bug, but not quite the situation i have, since i am not
referencing the rand or newid column more than once. The optimizer is
internally turning my one reference into two references - and then exposing
the bug.

No comments:

Post a Comment