Navigation in a Graph: Need a fast queue using a table

I am trying to implement a very fast queue using SQL Server.

The queue table will contain tens of millions of records.

The problem I have is the more records completed, the the slower it
gets. I don't want to remove data from the queue because I use the
same table to store results. The queue handles concurrent requests.

The status field will contain the following values:
0 = Waiting
1 = Started
2 = Finished

Any help would be greatly appreciated.

Here is a simplified script to demonstrate what has been done.

CREATE TABLE [dbo].[Queue] (
[ID] [int] IDENTITY (1, 1) NOT NULL ,
[JobID] [int] NOT NULL ,
[Status] [tinyint] NOT NULL
) ON [PRIMARY]
GO

CREATE INDEX [Status] ON [dbo].[Queue]([Status]) ON [PRIMARY]
GO

CREATE PROCEDURE dbo.NextItem
@.JobID integer,
@.ID integer output
AS
SELECT TOP 1 @.ID = [ID]
FROM Queue WITH (READPAST, XLOCK)
WHERE (Status = 0) AND (JobID = @.JobID)
RETURN
GOmy idea: i would use 3 different tables, one for waiting entries,
one for the running entries and one for the finished.
And when status is changing instead of updating field "status"
(which is now not longer necessary) move the record from one
table to the other.

hth,
Helmut

"Chris Foster" <chrisfoster@.btinternet.com> schrieb im Newsbeitrag
news:b311a0b8.0307020655.9a5beeb@.posting.google.co m...
> I am trying to implement a very fast queue using SQL Server.
> The queue table will contain tens of millions of records.
> The problem I have is the more records completed, the the slower it
> gets. I don't want to remove data from the queue because I use the
> same table to store results. The queue handles concurrent requests.
> The status field will contain the following values:
> 0 = Waiting
> 1 = Started
> 2 = Finished
> Any help would be greatly appreciated.
> Here is a simplified script to demonstrate what has been done.
> CREATE TABLE [dbo].[Queue] (
> [ID] [int] IDENTITY (1, 1) NOT NULL ,
> [JobID] [int] NOT NULL ,
> [Status] [tinyint] NOT NULL
> ) ON [PRIMARY]
> GO
> CREATE INDEX [Status] ON [dbo].[Queue]([Status]) ON [PRIMARY]
> GO
> CREATE PROCEDURE dbo.NextItem
> @.JobID integer,
> @.ID integer output
> AS
> SELECT TOP 1 @.ID = [ID]
> FROM Queue WITH (READPAST, XLOCK)
> WHERE (Status = 0) AND (JobID = @.JobID)
> RETURN
> GO|||[posted and mailed, please reply in public]

Chris Foster (chrisfoster@.btinternet.com) writes:
> The queue table will contain tens of millions of records.
> The problem I have is the more records completed, the the slower it
> gets. I don't want to remove data from the queue because I use the
> same table to store results. The queue handles concurrent requests.
> The status field will contain the following values:
> 0 = Waiting
> 1 = Started
> 2 = Finished
>...
> CREATE TABLE [dbo].[Queue] (
> [ID] [int] IDENTITY (1, 1) NOT NULL ,
> [JobID] [int] NOT NULL ,
> [Status] [tinyint] NOT NULL
> ) ON [PRIMARY]
> GO
> CREATE INDEX [Status] ON [dbo].[Queue]([Status]) ON [PRIMARY]
> GO
> CREATE PROCEDURE dbo.NextItem
> @.JobID integer,
> @.ID integer output
> AS
> SELECT TOP 1 @.ID = [ID]
> FROM Queue WITH (READPAST, XLOCK)
> WHERE (Status = 0) AND (JobID = @.JobID)
> RETURN
> GO

Since you have no index on JobID, I am not surprise if this is running
slow beyond all belief. The index you have on Status is likely to be
worthless, not being selective enough.

Helmut Wss suggested using three tables, and this may be worth considering.

However, you should start with getting a decent index structure. It
is from the table definition unclear to be whether there can be more
than one row for the same JobID. If there is, add a clustered index on
JobID and Status. If JobID is infact unique, get rid of that Identity
column. If your aim is to get high speed, then you should trim the
table, since the smaller the rows, the more rows you can fit on a page,
and the faster you can do I/O.

--
Erland Sommarskog, SQL Server MVP, sommar@.algonet.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp|||Thank you for your reply

The script I submitted is a smaller sample of what I have. I forgot to
state that the JobID has a foreign key to another table, there will be
many JobIds. Am I correct in assuming that a foreign key is indexed?
I can't set the index to be clustered, because it will only allow one
clustered index, ID is the primary key and is clustered.

The final queue will contain more fields than stated, if this effects
the speed I guess I will have to use a separate queuing table.

My first thought was that if status is indexed and has a value between
0 and 2, the index would be sorted ascending and 0's would be at the
top, therefore would remain the same speed throughout the queue. I am
not sure exactly what indexing does behind the scenes, I should
probably look into this. What difference would it make creating a
composite index of JobID and Status?

Erland Sommarskog <sommar@.algonet.se> wrote in message news:<Xns93AD579174Yazorman@.127.0.0.1>...
> [posted and mailed, please reply in public]
> Chris Foster (chrisfoster@.btinternet.com) writes:
> > The queue table will contain tens of millions of records.
> > The problem I have is the more records completed, the the slower it
> > gets. I don't want to remove data from the queue because I use the
> > same table to store results. The queue handles concurrent requests.
> > The status field will contain the following values:
> > 0 = Waiting
> > 1 = Started
> > 2 = Finished
> >...
> > CREATE TABLE [dbo].[Queue] (
> > [ID] [int] IDENTITY (1, 1) NOT NULL ,
> > [JobID] [int] NOT NULL ,
> > [Status] [tinyint] NOT NULL
> > ) ON [PRIMARY]
> > GO
> > CREATE INDEX [Status] ON [dbo].[Queue]([Status]) ON [PRIMARY]
> > GO
> > CREATE PROCEDURE dbo.NextItem
> > @.JobID integer,
> > @.ID integer output
> > AS
> > SELECT TOP 1 @.ID = [ID]
> > FROM Queue WITH (READPAST, XLOCK)
> > WHERE (Status = 0) AND (JobID = @.JobID)
> > RETURN
> > GO
> Since you have no index on JobID, I am not surprise if this is running
> slow beyond all belief. The index you have on Status is likely to be
> worthless, not being selective enough.
> Helmut Wss suggested using three tables, and this may be worth considering.
> However, you should start with getting a decent index structure. It
> is from the table definition unclear to be whether there can be more
> than one row for the same JobID. If there is, add a clustered index on
> JobID and Status. If JobID is infact unique, get rid of that Identity
> column. If your aim is to get high speed, then you should trim the
> table, since the smaller the rows, the more rows you can fit on a page,
> and the faster you can do I/O.|||Hi Chris

If you are after speed and with those sorts of volumes, I would not be
recommending that you used SQL Server to implement a TP queue in the
first place to store requests that are either waiting or started.
There are other options worth considering including IBM WebSphere
MQSeries.

chrisfoster@.btinternet.com (Chris Foster) wrote in message news:<b311a0b8.0307020655.9a5beeb@.posting.google.com>...
> I am trying to implement a very fast queue using SQL Server.
> The queue table will contain tens of millions of records.
> The problem I have is the more records completed, the the slower it
> gets. I don't want to remove data from the queue because I use the
> same table to store results. The queue handles concurrent requests.
> The status field will contain the following values:
> 0 = Waiting
> 1 = Started
> 2 = Finished
> Any help would be greatly appreciated.
> Here is a simplified script to demonstrate what has been done.
> CREATE TABLE [dbo].[Queue] (
> [ID] [int] IDENTITY (1, 1) NOT NULL ,
> [JobID] [int] NOT NULL ,
> [Status] [tinyint] NOT NULL
> ) ON [PRIMARY]
> GO
> CREATE INDEX [Status] ON [dbo].[Queue]([Status]) ON [PRIMARY]
> GO
> CREATE PROCEDURE dbo.NextItem
> @.JobID integer,
> @.ID integer output
> AS
> SELECT TOP 1 @.ID = [ID]
> FROM Queue WITH (READPAST, XLOCK)
> WHERE (Status = 0) AND (JobID = @.JobID)
> RETURN
> GO|||Chris Foster (chrisfoster@.btinternet.com) writes:
> The script I submitted is a smaller sample of what I have. I forgot to
> state that the JobID has a foreign key to another table, there will be
> many JobIds. Am I correct in assuming that a foreign key is indexed?

No. There are no automatic indexes created on foriegn-key columns. You need
to add any index yourself.

> I can't set the index to be clustered, because it will only allow one
> clustered index, ID is the primary key and is clustered.

On second thought, a non-clustered index on (JobId, Status) is likely
to be ideal. You could add ID explicitly to this index, but it is in fact
already there, because for a non-clustered index, SQL Server uses the
clustered key as the address to the data page.

The point here is that you get a *covering index*, which means that
SQL Server can resolve the query from the index alone. This is good
for speed. Since the index nodes are smaller than the complete rows,
you get more rows per page, and fewer pages to read.

> My first thought was that if status is indexed and has a value between
> 0 and 2, the index would be sorted ascending and 0's would be at the
> top, therefore would remain the same speed throughout the queue. I am
> not sure exactly what indexing does behind the scenes, I should
> probably look into this.

Yes, you should. :-) In a non-clustered index on Status only, SQL
Server needs to go to the data pages to find JobId so it can compare
this condition. This can be a more expensive operation that scanning
the table from left to right, because some pages may have to be read
more than twice. SQL Server uses the statistics is has on Status
to determine whether using the index is a viable way to go. You can
force SQL Server to use the index, by means of an index hint. You may find
that this gives even worse performance.

> What difference would it make creating a composite index of JobID and
> Status?

Because now SQL Server has all the information to resolve the query in
the index.

--
Erland Sommarskog, SQL Server MVP, sommar@.algonet.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp|||Indexing both JobID and Status has made the queue work at an acceptable speed.

Thanks you

Chris

Erland Sommarskog <sommar@.algonet.se> wrote in message news:<Xns93AF515713E6Yazorman@.127.0.0.1>...
> Chris Foster (chrisfoster@.btinternet.com) writes:
> > The script I submitted is a smaller sample of what I have. I forgot to
> > state that the JobID has a foreign key to another table, there will be
> > many JobIds. Am I correct in assuming that a foreign key is indexed?
> No. There are no automatic indexes created on foriegn-key columns. You need
> to add any index yourself.
> > I can't set the index to be clustered, because it will only allow one
> > clustered index, ID is the primary key and is clustered.
> On second thought, a non-clustered index on (JobId, Status) is likely
> to be ideal. You could add ID explicitly to this index, but it is in fact
> already there, because for a non-clustered index, SQL Server uses the
> clustered key as the address to the data page.
> The point here is that you get a *covering index*, which means that
> SQL Server can resolve the query from the index alone. This is good
> for speed. Since the index nodes are smaller than the complete rows,
> you get more rows per page, and fewer pages to read.
> > My first thought was that if status is indexed and has a value between
> > 0 and 2, the index would be sorted ascending and 0's would be at the
> > top, therefore would remain the same speed throughout the queue. I am
> > not sure exactly what indexing does behind the scenes, I should
> > probably look into this.
> Yes, you should. :-) In a non-clustered index on Status only, SQL
> Server needs to go to the data pages to find JobId so it can compare
> this condition. This can be a more expensive operation that scanning
> the table from left to right, because some pages may have to be read
> more than twice. SQL Server uses the statistics is has on Status
> to determine whether using the index is a viable way to go. You can
> force SQL Server to use the index, by means of an index hint. You may find
> that this gives even worse performance.
> > What difference would it make creating a composite index of JobID and
> > Status?
> Because now SQL Server has all the information to resolve the query in
> the index.

Saturday, February 25, 2012

Need a fast queue using a table

No comments:

Post a Comment

Navigation in a Graph

Blog Archive

About Me