I have discovered what looks like a bug in the optimiser. I've posted it at https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=288243 but I wonder if any of you with SQL 2005 RTM, 2005 SP1 or 2008 CTP could confirm when this was introduced and whether it is still an issue?
Code Snippet
-- Bug report
-- 2007/07/19
-- Alasdair Cunningham-Smith
-- alasdair at acs-solutions dot co dot uk
set nocount on
go
-- example date in in British date format
set dateformat dmy
go
use tempdb
go
create table foo( bar varchar( 30 ) not null )
go
insert into foo( bar ) values ( 'fishy' )
insert into foo( bar ) values ( '19/07/2007' )
go
-- this works fine in all versions - only valid dates are passed to the convert function
select
convert( smalldatetime, bar, 103 ) as bardate
from
foo
where
bar like '__/__/_'
go
-- this works on SQL 2000, but fails on SQL 2005 SP2 (I've not tried other SPs of SQL 2005):
-- Msg 295, Level 16, State 3, Line 2
-- Conversion failed when converting character string to smalldatetime data type.
--
-- I believe the query is rewritten as if the derived table query contained
-- "and convert( smalldatetime, bar, 103 ) < getdate()"
-- which would expose the convert to the invalid data
select
*
from
(
select
convert( smalldatetime, bar, 103 ) as bardate
from
foo
where
bar like '__/__/_'
) as derived
where
bardate < getdate()
go
-- Workaround:
-- Use a case statement to protect the convert operator from the invalid data
select
*
from
(
select
case when bar like '__/__/_' then
convert( smalldatetime, bar, 103 )
else
null
end as bardate
from
foo
where
bar like '__/__/_'
) as derived
where
bardate < getdate()
go
drop table foo
go
The workaround I discovered is simple but ugly. I invite your comments...
alasdair.
Same observation on SQL-2005 SP1. Occures the same way with datetime / smalldatetime.|||Hey - thanks S StLouis! I'd really like to know if its fixed in 2008, but dont have enough hardware or time to try it just yet.
Regards,
A.
|||
Alasdair, as described in the resolution of your feedback posting, the query optimizer doesn't provide guarantees for the order of predicate evaluation. In general, the SQL language is declarative and how you construct a SQL statement doesn't guarantee you a certain execution order. Although the execution order may be predictable a certain release, it is something that your code should depend on.
You could use the query rewrite with the CASE statement or split the query in two using a temp table as a workaround. I would also suggest that you consider changing the schema of the application (if that is possible) so that date data is stored in a datetime column. The current column in your schema seems to contain multiple pieces of information (date and non-date data) and will provide much worse statistical information to the query optimizer than a regular datetime column would do. Inaccurate statistics may result to limited use of indices and bad plan choice for some of your queries.
Regards,
Leo Giakoumakis
SQL Server Engine
|||Leo - thanks for your post. I haven't read the SQL-92 spec, but I feel from a programmers perspective that the derived table should behave logically as if it were a base table, even if the optimizer get up to some tricks behind the scenes, but in this example it doesn't behave like a base table. It's the old "if it quacks like a duck" argument...
I understand the need to re-order predicates to optimise performance, but I don't (didn't) regard the predicates inside a derived table to be freely exchangable with those outside - I believe that a logical dependency is being broken here for the sake of performance. Perhaps some kind of constraint can be placed on the predicate exchange between a derived table and its enclosing query to prevent re-ordering when a function (such as case, convert, etc) has been applied so as to ensure logical consistency?
I had independently come up with the CASE solution and its working fine for me - just a bit ugly.
As for the schema - the thing came about because the real code which triggered this posting was dealing with data extracted from somebody else's schema in a Progress database, complete with array fields - ugh! I take your point about the stats and I also understand that this contravenes the normalisation rules, but this is a consequence of the hiddeous design of the source database. As it happens, the real schema is well indexed and, as ever with SQL Server, provides exceptionally good performance with around 6.5 million rows - no complaints there!
Thanks for your time,
Alasdair Cunningham-Smith
ACS Solutions Limited
No comments:
Post a Comment