I'm looking for advices from people with more experience using SQL Server. I have a situation where from my perspective i have 3 ways to accomplish my task, but i dont know which way would be more efficient and performant.
Here's the deal, i have a table which holds millions of records. This table will eventually receives between 30-40 insertion a second. So it's pretty busy. To display the data correctly in reports over the web, i need to manipulate the data in such a way that certain transaction are modified and others are completly eliminated depending on a set of preferences choosen at the moment of the request for the report. Obviously i dont see any ways to do this with just one SQL query, so my choices are the following:
1. Create a second table, which would hold the manipulated data. Create a trigger and on each insertion manipulate the data and modify the second table.
2. Create a stored procedure, from the stored procedure i query my records, create a temp table, manipulate my data and return it to the user
3. Simply fetch the data, return it to the IIS server and process it there before returning the result to the client.
The manipulation process consists of querying about 1000 to 2000 records, eliminating duplicate results (or similar results within a chosen range) by comparing them to the last one, and to indicate for each record how many other records were similar. In other words, it's a basic loop and i compare each record to the last one to see if it changed.
Personally i think choice #2 would be better, but i could be wrong. I'd like to hear your comments and suggestions.
Thanks,
Would the resultset start with 1000-2000 rows or would that be the total after you manipulated the data?
If you go with choice 2 would you use a cursor? If you must use a cursor then I would go with choice 3. (Given that the results going to your apsx page is 1000-2000 rows)
the resultset would start at 1000-2000 rows, but then would shrink down to 100-500 rows after processing. If i did go with choice #2, i dont see any other way but to use a cursor to fetch my initial dataset (~1500 rows), loop through that and insert into temp table. But i just found that i can use a datatype of type table instead of a temp table. And it's supposed to be more efficient than a temp table. What other options would i have besides a cursor?
So your thoughts are that if my cursor fetches ~1500 rows, that's less efficient than doing the process directly on my IIS server? Can i ask why? i've heard that cursors are to be avoided at all cost but i'm not really sure why.
Thanks for the input! :)
Why do you think this cannot be done with one SQL query? You'd be surprised what can be done in a single statement.
|||Having 30-40 concurrent users each fetching a large dataset would tax the system.I do avoid cursors at all costs and have always managed to stay away from them, except this week; I had no way out and used one for a particular solution. :(
One other question; are you inserting any data back into the database as you loop throught the resultset?|||
no i'm not inserting data again into the database from within my loop
...
and the reason i think it's not possible to do it in a single query statement, is because i've tried. I'm not necessarly an expert in SQL, but i consider myself pretty good at it. More with Oracle than SQL Server. But i am willing to agree that it could be possible, if someone wants to show me! :)
After carefull though, i'll probably do it on the server side. The load on my IIS server will be less than on my Database server and i think .NET will process it faster than SQL Server would with a cursor.
Thanks
|||Since you're not doing any inserts, I agree that you should let .NET handle it.|||Is SQL Server running on the same machine as IIS? You say there are 30-40 inserts a second, but how often is that data likely to be queried?Also, don't forget that a combination of all 3 options might be the most efficient. There may be some changes that can be done on insert by triggers, some when querying the data and some in .NET istelf. Personally I would start to think something was wrong with your database design if you were returning 1000-2000 records in query and then throwing most rows away.|||
currently, for our development phase, yes both IIS and SQL Server are on the same machine, but that's not gonna be the case for production
As far as database design, i think it's good. We're not throwing away any rows...simply elliminating them from the resultset, but we still need them to calculate various results for rows we do want to show in our resultset. You could call it process control, devices send me data every few seconds, and i need to process that data to visually display each devices behavior. And when the device's properties dont change, instead of showing 50 rows with the same data, i'll onlly show 1 row indicating the device and it's accumulated data and results.
I'll keep in mind to use all 3 options though, that could help.
Thanks
|||
XiNull wrote:
... instead of showing 50 rows with the same data, i'll onlly show 1 row indicating the device and it's accumulated data and results.
It sounds like your aggregating, could you not use a Group By in your Select?
No comments:
Post a Comment