Navigation in a Graph: Need help about how to avoid bad plan used by the optimizer

Hi,
We had the big table containing about 600M records and the following query
used to take long time (this is subset of the query but this part takes the
most time and is part of this question/discussion):
Select top 1001 col3, col1, col4, col5
From tab1 au --with (index (idx_tab1))
Where au.col6 = 44431 and au.col7 = 857669662 and au.col8 is null
This table have clustered index on col1 and non-clustered index on (col6,
col7, col3).
Since this col6 = 44431 hold about 120M records in this table so whenever we
run the above query with this col6 as predicate, SQL Server always do
clustered index san and takes about 4 min but we apply the non-clustered
index in the query and time reduces to 3 sec. So the bottom line is due to
the skewed statistics SQL Server feels that index scan will be better that
doing the combination of index seek and bookmark lookup and forgets about
that we specified the TOP 1001 and it will be better to do index seek than
clustered scan.
So far this was working and now we have deployed the partition view in the
database and partition this table on (col7, col1) and now the data is spread
out on different partitions that all are on different databases. Now since w
e
are running this query against view so we can’t apply index hints so we ar
e
wondering what are the other ways we can force optimizer to use non-clustere
d
seek or come up with better execution plan by seeing that TOP 1001 hint is
specified.
Isn't it the kind of a bug that SQL Server optimizer don't always use the
index seek when TOP clause is specified with small number of rows to be
returned?
Thanks
--HarvinderHi,
If you place a covering index on
Col6,col7,col8,col3,col1,col4,col5
on each of your partition tables . Then the optimizer will use this index
to select your row.
One question though which 1001 rows do you want? because the order is no
guaranteed here and you are not ordering or grouping. So sql will return
the first 1001 rows it finds and the order will be first found first get and
depending on the index used this could be different rows.
kind regards
Greg O
--
Looking to use CLR in SQL 2005. Try some pre-build CLR Functions and SP
AGS SQL 2005 Utilities, over 20+ functions
http://www.ag-software.com/?tabid=38
"harvinder" <harvinder@.discussions.microsoft.com> wrote in message
news:4367F496-593E-4118-8128-E4F1FA8815AA@.microsoft.com...
> Hi,
> We had the big table containing about 600M records and the following query
> used to take long time (this is subset of the query but this part takes
> the
> most time and is part of this question/discussion):
> Select top 1001 col3, col1, col4, col5
> From tab1 au --with (index (idx_tab1))
> Where au.col6 = 44431 and au.col7 = 857669662 and au.col8 is null
> This table have clustered index on col1 and non-clustered index on (col6,
> col7, col3).
> Since this col6 = 44431 hold about 120M records in this table so whenever
> we
> run the above query with this col6 as predicate, SQL Server always do
> clustered index san and takes about 4 min but we apply the non-clustered
> index in the query and time reduces to 3 sec. So the bottom line is due to
> the skewed statistics SQL Server feels that index scan will be better that
> doing the combination of index seek and bookmark lookup and forgets about
> that we specified the TOP 1001 and it will be better to do index seek than
> clustered scan.
> So far this was working and now we have deployed the partition view in the
> database and partition this table on (col7, col1) and now the data is
> spread
> out on different partitions that all are on different databases. Now since
> we
> are running this query against view so we can't apply index hints so we
> are
> wondering what are the other ways we can force optimizer to use
> non-clustered
> seek or come up with better execution plan by seeing that TOP 1001 hint is
> specified.
> Isn't it the kind of a bug that SQL Server optimizer don't always use the
> index seek when TOP clause is specified with small number of rows to be
> returned?
> Thanks
> --Harvinder
>|||On Thu, 17 Nov 2005 12:50:16 -0800, harvinder
<harvinder@.discussions.microsoft.com> wrote:
>We had the big table containing about 600M records and the following query
>used to take long time (this is subset of the query but this part takes the
>most time and is part of this question/discussion):
> Select top 1001 col3, col1, col4, col5
> From tab1 au --with (index (idx_tab1))
> Where au.col6 = 44431 and au.col7 = 857669662 and au.col8 is null
>This table have clustered index on col1 and non-clustered index on (col6,
>col7, col3).
>Since this col6 = 44431 hold about 120M records in this table so whenever w
e
>run the above query with this col6 as predicate, SQL Server always do
>clustered index san and takes about 4 min but we apply the non-clustered
>index in the query and time reduces to 3 sec.
...
>Isn't it the kind of a bug that SQL Server optimizer don't always use the
>index seek when TOP clause is specified with small number of rows to be
>returned?
When you ask for 1001 records, this may be enough for the optimizer to
decide to scan rather than index. What happens if you ask for top 10?
A covering index to fit the query would of course work (we hope!), but
if you have many queries, such combinatorics can kill you. Another
way to go is to put separate indexes on col6 alone and col7 alone and
col8 alone, and see if SQLServer can order them by selectivity, do a
hash match, and get you good results.
Josh

Wednesday, March 28, 2012

Need help about how to avoid bad plan used by the optimizer

No comments:

Post a Comment

Navigation in a Graph

Blog Archive

About Me