Showing posts with label various. Show all posts
Showing posts with label various. Show all posts

Monday, March 26, 2012

need faster data transfer :: oledb

i need to fetch data into multiple tables from a remote server (ss 2005)...so inside a sequence container , i have placed various data flow task...each task in turn gets data from oledb source to oledb destination.....there is minimal transformation involved...but the data is huge...and its taking unexpectably long...(im gettin only fresh data..identified by datetime fields of source tables..thats the only check.),

and way to make it faster...source db in not in my control...so i cant get indexes on it..:(

There are way to many reasons for it to be slow. It could be your server, their server, network, etc.

The best performance you will get would be a "Execute SQL Task" and not a Data Task. You might need to do a "SELECT * INTO temptable FROM REMOTE.table" and then do your transforms into the real target table on your server.

When I have to do this kind of work, I usually just transfer the entire target to a new database and then do my work on my server.|||

Nitin Khurana wrote:

i need to fetch data into multiple tables from a remote server (ss 2005)...so inside a sequence container , i have placed various data flow task...each task in turn gets data from oledb source to oledb destination.....there is minimal transformation involved...but the data is huge...and its taking unexpectably long...(im gettin only fresh data..identified by datetime fields of source tables..thats the only check.),

and way to make it faster...source db in not in my control...so i cant get indexes on it..:(

There are so many "it depends" answers to this that its hardly worth replying.

There is a webcast on performance tuning that you may wish to watch:

Donald Farmer's Technet webcast
(http://blogs.conchango.com/jamiethomson/archive/2006/06/14/4076.aspx)

And info here about a performance tuning whitepaper:

Whitepaper on Performance Tuning Techniques
(http://blogs.conchango.com/jamiethomson/archive/2006/04/09/3594.aspx)

-Jamie

|||

Nitin Khurana wrote:

i need to fetch data into multiple tables from a remote server (ss 2005)...so inside a sequence container , i have placed various data flow task...each task in turn gets data from oledb source to oledb destination.....there is minimal transformation involved...but the data is huge...and its taking unexpectably long...(im gettin only fresh data..identified by datetime fields of source tables..thats the only check.),

and way to make it faster...source db in not in my control...so i cant get indexes on it..:(

Be sure you are using a SQL select statement in your OLE DB Source. Select just the columns you need, and if you need them all, "select *" has shown to be faster than simply selecting the table from the drop-down box.

Are your data flow tasks hooked sequentially to each other, or are you running as many as you can in parallel? If you're running them in parallel, try completing one first before moving on to the next one.|||okkk...let me tell a bit more.... network/server config etc are not my scope of prob...there r other packages performing gud...to be specific, im using the fast load option, (oledb source and destination) , in few tables , number of rows are over a few million (after filtering ,i.e which need to be transfered), these few tables are the bottleneck ..consider both source and destination as highend servers...what are the options i can set for rows per batch,..commit size,etc , size of rows is not much either , say 200 - 300 KB.... so what are the options i have..|||

Nitin Khurana wrote:

okkk...let me tell a bit more.... network/server config etc are not my scope of prob...there r other packages performing gud...to be specific, im using the fast load option, (oledb source and destination) , in few tables , number of rows are over a few million (after filtering ,i.e which need to be transfered), these few tables are the bottleneck ..consider both source and destination as highend servers...what are the options i can set for rows per batch,..commit size,etc , size of rows is not much either , say 200 - 300 KB.... so what are the options i have..

Are there indexes in place on the destination tables?

Try this:
Copy your package and in the new one (package_copy), take your OLE DB source and go right into a row counter and only a row counter -- nothing else. Run the package. How long does it take?

Then create another copy of your "master" package and in that copy (package_copy2) replace your destination with a row counter. How long does that take?

Create a copy of your "package_copy" package and remove the row counter and replace it with a raw file destination. Run the package. Then, delete everything and start over with a raw file source (pointing to the file just created) and then attach it to a row counter. How long does this new package take to run?

=================

The first test tells you how long it takes to get data from the source as fast as you can. (VALUEA)
The second test tells you how long it takes to get through the data flow, not counting the destination. (VALUEB)
The third test tells you how efficient the source pull is. (VALUEC)

Take the total package run time (your problem) and subtract from it VALUEB. This number tells you how long it takes to put data into the destination table. (VALUED)

Take VALUED and subtract from it VALUEA. This tells you how long it takes to transform the data.

Take VALUEA and subtract from it VALUEC. This tells you how inefficient the source pull is if the difference is not zero. (not likely)

*** NOTE: I first learned of this technique by watching a Webcast by Donald Farmer titled, SQL Server 2005 Integration Services: Performance and Scale.|||

Nitin Khurana wrote:

okkk...let me tell a bit more.... network/server config etc are not my scope of prob...there r other packages performing gud...to be specific, im using the fast load option, (oledb source and destination) , in few tables , number of rows are over a few million (after filtering ,i.e which need to be transfered), these few tables are the bottleneck ..consider both source and destination as highend servers...what are the options i can set for rows per batch,..commit size,etc , size of rows is not much either , say 200 - 300 KB.... so what are the options i have..

Only you can answer those questions. Test and measure is my advice to you.

There is some useful performance tuning info in the whitepaper that I linked to earlier.

You might also want to look at using the SQL Server Destination if at all possible.

-Jamie

|||

<<after filtering ,i.e which need to be transfered>>

This is where I would look first. That and network lag.

|||

I am having the same problem. Taking data from 1 table and copying it to another table. The table only consists of 3 columns, 1 is a data type image. In SSMS it also takes forever to open the table so I can see part of the reason why this package is taking so long. But it is taking about 20 minutes to get the data from the source and another 20 minutes to commit the data. With the old dts package it used a copy sql server objects task and the whole job ran in about 20 minutes. So I also tried using a transfer sql server objects task to copy the data from this one table and it is also taking around 40 minutes.

I have no idea why it would take so much longer.

Sam

|||

samssb wrote:

I am having the same problem. Taking data from 1 table and copying it to another table. The table only consists of 3 columns, 1 is a data type image. In SSMS it also takes forever to open the table so I can see part of the reason why this package is taking so long. But it is taking about 20 minutes to get the data from the source and another 20 minutes to commit the data. With the old dts package it used a copy sql server objects task and the whole job ran in about 20 minutes. So I also tried using a transfer sql server objects task to copy the data from this one table and it is also taking around 40 minutes.

I have no idea why it would take so much longer.

Sam

Did you run the test that I indicated above? That will help determine if it's the source pull or the destination push that is causing the bottleneck. However, ensure that you are using FAST LOAD in the OLE DB destination, otherwise you'll get slow performance.

need faster data transfer :: oledb

i need to fetch data into multiple tables from a remote server (ss 2005)...so inside a sequence container , i have placed various data flow task...each task in turn gets data from oledb source to oledb destination.....there is minimal transformation involved...but the data is huge...and its taking unexpectably long...(im gettin only fresh data..identified by datetime fields of source tables..thats the only check.),

and way to make it faster...source db in not in my control...so i cant get indexes on it..:(

There are way to many reasons for it to be slow. It could be your server, their server, network, etc.

The best performance you will get would be a "Execute SQL Task" and not a Data Task. You might need to do a "SELECT * INTO temptable FROM REMOTE.table" and then do your transforms into the real target table on your server.

When I have to do this kind of work, I usually just transfer the entire target to a new database and then do my work on my server.|||

Nitin Khurana wrote:

i need to fetch data into multiple tables from a remote server (ss 2005)...so inside a sequence container , i have placed various data flow task...each task in turn gets data from oledb source to oledb destination.....there is minimal transformation involved...but the data is huge...and its taking unexpectably long...(im gettin only fresh data..identified by datetime fields of source tables..thats the only check.),

and way to make it faster...source db in not in my control...so i cant get indexes on it..:(

There are so many "it depends" answers to this that its hardly worth replying.

There is a webcast on performance tuning that you may wish to watch:

Donald Farmer's Technet webcast
(http://blogs.conchango.com/jamiethomson/archive/2006/06/14/4076.aspx)

And info here about a performance tuning whitepaper:

Whitepaper on Performance Tuning Techniques
(http://blogs.conchango.com/jamiethomson/archive/2006/04/09/3594.aspx)

-Jamie

|||

Nitin Khurana wrote:

i need to fetch data into multiple tables from a remote server (ss 2005)...so inside a sequence container , i have placed various data flow task...each task in turn gets data from oledb source to oledb destination.....there is minimal transformation involved...but the data is huge...and its taking unexpectably long...(im gettin only fresh data..identified by datetime fields of source tables..thats the only check.),

and way to make it faster...source db in not in my control...so i cant get indexes on it..:(

Be sure you are using a SQL select statement in your OLE DB Source. Select just the columns you need, and if you need them all, "select *" has shown to be faster than simply selecting the table from the drop-down box.

Are your data flow tasks hooked sequentially to each other, or are you running as many as you can in parallel? If you're running them in parallel, try completing one first before moving on to the next one.|||okkk...let me tell a bit more.... network/server config etc are not my scope of prob...there r other packages performing gud...to be specific, im using the fast load option, (oledb source and destination) , in few tables , number of rows are over a few million (after filtering ,i.e which need to be transfered), these few tables are the bottleneck ..consider both source and destination as highend servers...what are the options i can set for rows per batch,..commit size,etc , size of rows is not much either , say 200 - 300 KB.... so what are the options i have..|||

Nitin Khurana wrote:

okkk...let me tell a bit more.... network/server config etc are not my scope of prob...there r other packages performing gud...to be specific, im using the fast load option, (oledb source and destination) , in few tables , number of rows are over a few million (after filtering ,i.e which need to be transfered), these few tables are the bottleneck ..consider both source and destination as highend servers...what are the options i can set for rows per batch,..commit size,etc , size of rows is not much either , say 200 - 300 KB.... so what are the options i have..

Are there indexes in place on the destination tables?

Try this:
Copy your package and in the new one (package_copy), take your OLE DB source and go right into a row counter and only a row counter -- nothing else. Run the package. How long does it take?

Then create another copy of your "master" package and in that copy (package_copy2) replace your destination with a row counter. How long does that take?

Create a copy of your "package_copy" package and remove the row counter and replace it with a raw file destination. Run the package. Then, delete everything and start over with a raw file source (pointing to the file just created) and then attach it to a row counter. How long does this new package take to run?

=================

The first test tells you how long it takes to get data from the source as fast as you can. (VALUEA)
The second test tells you how long it takes to get through the data flow, not counting the destination. (VALUEB)
The third test tells you how efficient the source pull is. (VALUEC)

Take the total package run time (your problem) and subtract from it VALUEB. This number tells you how long it takes to put data into the destination table. (VALUED)

Take VALUED and subtract from it VALUEA. This tells you how long it takes to transform the data.

Take VALUEA and subtract from it VALUEC. This tells you how inefficient the source pull is if the difference is not zero. (not likely)

*** NOTE: I first learned of this technique by watching a Webcast by Donald Farmer titled, SQL Server 2005 Integration Services: Performance and Scale.|||

Nitin Khurana wrote:

okkk...let me tell a bit more.... network/server config etc are not my scope of prob...there r other packages performing gud...to be specific, im using the fast load option, (oledb source and destination) , in few tables , number of rows are over a few million (after filtering ,i.e which need to be transfered), these few tables are the bottleneck ..consider both source and destination as highend servers...what are the options i can set for rows per batch,..commit size,etc , size of rows is not much either , say 200 - 300 KB.... so what are the options i have..

Only you can answer those questions. Test and measure is my advice to you.

There is some useful performance tuning info in the whitepaper that I linked to earlier.

You might also want to look at using the SQL Server Destination if at all possible.

-Jamie

|||

<<after filtering ,i.e which need to be transfered>>

This is where I would look first. That and network lag.

|||

I am having the same problem. Taking data from 1 table and copying it to another table. The table only consists of 3 columns, 1 is a data type image. In SSMS it also takes forever to open the table so I can see part of the reason why this package is taking so long. But it is taking about 20 minutes to get the data from the source and another 20 minutes to commit the data. With the old dts package it used a copy sql server objects task and the whole job ran in about 20 minutes. So I also tried using a transfer sql server objects task to copy the data from this one table and it is also taking around 40 minutes.

I have no idea why it would take so much longer.

Sam

|||

samssb wrote:

I am having the same problem. Taking data from 1 table and copying it to another table. The table only consists of 3 columns, 1 is a data type image. In SSMS it also takes forever to open the table so I can see part of the reason why this package is taking so long. But it is taking about 20 minutes to get the data from the source and another 20 minutes to commit the data. With the old dts package it used a copy sql server objects task and the whole job ran in about 20 minutes. So I also tried using a transfer sql server objects task to copy the data from this one table and it is also taking around 40 minutes.

I have no idea why it would take so much longer.

Sam

Did you run the test that I indicated above? That will help determine if it's the source pull or the destination push that is causing the bottleneck. However, ensure that you are using FAST LOAD in the OLE DB destination, otherwise you'll get slow performance.sql

Friday, March 23, 2012

Need dynamic SQL

Hi all,

I'm a newbie to MS-SQL UDFs and seem to have a real big problem. I need to
implement a logic to receive an adress build out of various user definable
fields from various user defined tables. The function is already implemented
in the Client software and as UDF-compliant in MySQL and Oracle. Now there's
just MS-SQL left...

The problem now is for sure, I'm in need of a scalar return value (a
varchar) composed out of a dynamic sql-context. The wohle informations are
placed in a definition table and would result in a sql-string like "select
@.cFieldlist from @.cTable where @.cWhere", e.g. "Select
title+firstname+char(13)+char(10)+lastname+char(13 )+char(10)+officepost+char
(13)+char(10)+street+char(13)+char(10)+zip+city+ch ar(13)+char(10) from
customer where customer.tooffice=1"...

The problem is of course the calling of the dynamic sql statement within my
function. As read in different articles, this is impossible in MS-SQL. Is
there any possibility to workaround like creating a stored procedure with
output variables and call this within my function ?

Any hint is welcome
Ralphstored procedure with
> output variables and call this within my function ?

You can use OPENQUERY .|||[posted and mailed, please reply in news]

Ralph (Ralph.Backes@.web.de) writes:
> I'm a newbie to MS-SQL UDFs and seem to have a real big problem. I need
> to implement a logic to receive an adress build out of various user
> definable fields from various user defined tables. The function is
> already implemented in the Client software and as UDF-compliant in MySQL
> and Oracle. Now there's just MS-SQL left...
> The problem now is for sure, I'm in need of a scalar return value (a
> varchar) composed out of a dynamic sql-context. The wohle informations are
> placed in a definition table and would result in a sql-string like "select
> @.cFieldlist from @.cTable where @.cWhere", e.g. "Select
> title+firstname+char(13)+char(10)+lastname+char(13 )+char(10)+officepost+
> char (13)+char(10)+street+char(13)+char(10)+zip+city+ch ar(13)+char(10)
> from customer where customer.tooffice=1"...
> The problem is of course the calling of the dynamic sql statement within
> my function. As read in different articles, this is impossible in
> MS-SQL. Is there any possibility to workaround like creating a stored
> procedure with output variables and call this within my function ?

You cannot call stored procedures from UDF:s. The idea is that a UDF
should not change the state of the database. Therefore you are severely
constraint in what you can do.

You can call extended stored procedures, though, so you could call
xp_cmdshell for do what you need from a second connection or a custom-
written XP for the task. Yura suggested OPENQUERY, and OPENQUERY is indeed
another loophole. But since OPENQUERY requires a constant SQL string,
you are still in a dead end.

And I would not really encourage the XP solution either. It may be
logicially correct, but performance will be poor. And if you use
xp_cmdshell there are security considerations. (And you have to be
really desperate to write your own XP.)

As I didn't really understand the problem, it's difficult to come
with suggestions, but you should probably use a stored procedure instead.
I have an article on my web site, which discusses the techniques
around this: http://www.sommarskog.se/share_data.html

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp|||[posted and mailed, please reply in news]

Ralph (Ralph.Backes@.web.de) writes:
> So this piece of code should be called in the fieldlist, somehow like
> 'select customer.cust_no,dbo.adress('Bill',customer.cust_n o) from customer
> where deleted=0' and give in return a char (varchar(4000)?)like
> 'Mr.<cr>Hans Mustermann<cr>somewhere street 10 <cr><cr>55555 ZipLand'. To
> find the fields and the correct records, this code has to scan the
> definition table, read the fields to use and under which conditions to use
> and perhaps the tables which should be joined to the query, all written
> there.
> Is this really impossible ?

Generally, if you want to do something in an RDBMS, and also insist on a
certain syntax, it is not possible, unless you happen to pick the syntax
of the RBDMS in question.

For SQL Server, write a stored procedure that returns the address in
output parameters. If you need to handle many at a time, you can use a
temp table. See further here: http://www.sommarskog.se/share_data.html.

Sharing code between RBDMSs is very difficult, because there is a great
variety in what they support. It may simplify programming and maintenance,
to get the data to the client, using basic SQL which works in any RDBMS.
This may be not be good for performance, as more data will be shoveled
across the network.

Then again, your complex and dynamic data model is apparently not designed
for performance anyway. RBDMSs work from the idea that tables and columns
are stable, and that your schema does not change dynamically. So one idea
could be to have custom fields as rows rather than columns. This can lean
to unwieldy self-join, but at least you will not need dynamic SQL.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp|||Erland Sommarskog <esquel@.sommarskog.se> wrote in message news:<Xns952F32E37589Yazorman@.127.0.0.1>...
> [posted and mailed, please reply in news]
> Generally, if you want to do something in an RDBMS, and also insist on a
> certain syntax, it is not possible, unless you happen to pick the syntax
> of the RBDMS in question.
Misunderstanding, we don't insist on syntax, we insist on
functionality. The implementation is of course different on every
RDBMS. But til now every RDBMS gives us a possible way to implement
this stuff with more or less knowledge and doing. Hard to believe
MS-SQL can't...

> For SQL Server, write a stored procedure that returns the address in
> output parameters. If you need to handle many at a time, you can use a
> temp table. See further here: http://www.sommarskog.se/share_data.html.
A stored procedure isn't useful, I need a scalar return value... This
function is called within a reportengine and there is no other
processhandling than straight forward doing a select.

> Sharing code between RBDMSs is very difficult, because there is a great
> variety in what they support. It may simplify programming and maintenance,
> to get the data to the client, using basic SQL which works in any RDBMS.
> This may be not be good for performance, as more data will be shoveled
> across the network.
There is a lot of functionality on the client side, even the discussed
functionality is implemented at the client. But this really is a
dead-end for performance tuning, so a server-based solution is
necessary.

> Then again, your complex and dynamic data model is apparently not designed
> for performance anyway. RBDMSs work from the idea that tables and columns
> are stable, and that your schema does not change dynamically. So one idea
> could be to have custom fields as rows rather than columns. This can lean
> to unwieldy self-join, but at least you will not need dynamic SQL.
It's simply not true, that a flexible and dynamic system can't have a
good performance. It's just more and harder work to implement. But the
maintainance and most of all the customers (and this software is very
very succesfull) are the best reason to do this work.

This just so far, back to the technical problem. Just one question
remaining, are these so called extended procedure only procedure or
are within the c-dll also functions like we need possible ? To be more
precise, is it possible to call a XP-Funtion like 'select
MyXpFunc('Test') from customer' ?|||Ralph (Ralph.Backes@.web.de) writes:
> Misunderstanding, we don't insist on syntax, we insist on
> functionality. The implementation is of course different on every
> RDBMS. But til now every RDBMS gives us a possible way to implement
> this stuff with more or less knowledge and doing. Hard to believe
> MS-SQL can't...

Well, MS-SQL can, but...

> A stored procedure isn't useful, I need a scalar return value... This
> function is called within a reportengine and there is no other
> processhandling than straight forward doing a select.

...you are insisting on a specific syntax.

You would have to invoke a batch first calls the stored procedures,
which puts the data in a variable or a temp table (have you looked at
http://www.sommarskog.se/share_data.html yet?), and then SELECTs the
data to the client. Presuming that you are looking for many rows
at a time, this is the only way to a scalable solution.

So that batch will look different than from what it does for MySQL,
but the syntax you wished to use included dbo.yourfunc(), and I don't
think you have the dbo bit on MySQL, so you are already prepared to
handle different syntax.

> This just so far, back to the technical problem. Just one question
> remaining, are these so called extended procedure only procedure or
> are within the c-dll also functions like we need possible ? To be more
> precise, is it possible to call a XP-Funtion like 'select
> MyXpFunc('Test') from customer' ?

No, there are no extended functions, but you can call extended stored
procedures from a user-defined function. So you could write a extended
stored procedure, which returns the value in an OUTPUT parameter, and
then the scalar UDF could return that function.

But: 1) An error like an access violation in an extended stored procedure
is no small accident - the entire SQL Server perishes.
2) You will get one more - and odd - component do deploy.
3) This is definitely not going to scale.

In short: don't go there.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp