Monday, March 12, 2012

Need advice on ETL architecture...

I am about to design (I think it could be considered as) ETL system and here
are my thoughts on how it should work at a very high level.
My intention in this post is to find out if there is a better way to design
this.
Data is originally entered in an as400 system.
On nightly basis, openquery from sql server is used to get all the tables
into "extract_database" as is from as400.
"User_database" takes data on nightly basis from "extract_database"
and converts data into user friendly tables. These tables could be
used by front end programmers to write tsql to drive the front end.
If we have a table called "Sales" then on nightly basis "Sales" table would
get refreshed from as400 and if there is a change in particluar sales order
then I want "user_database" to have the latest change in Sales table.
Some how I also want to document the change in Sales history. In other words
I don't want to lose any data change from day to day.
I also would like to have "history_database" which would keep track of
record changes or deletion.
In summary, "extract_database" has the copy of what exactly got imported,
"user_database" has the converted data so that front end application could
tap into it, and "history_database" is used for reporting.
For daily data refresh, I would like to daily keep track of data imported
for each table.
Are there any white papers, articles, web sites, books that address this
nightly database refresh design and best practices'
Please excuse me in advance if I missed some thing.
TIA..Read about the audit tools that you can buy to do this.|||What you are describing is a transactional data warehouse. The ETL (Extract
Transform Load) programming is just the tip of the iceburg.
"sqlster" <nospam@.nospam.com> wrote in message
news:AB2E77C4-8B79-4CC4-8B0D-2FF079239E1C@.microsoft.com...
>I am about to design (I think it could be considered as) ETL system and
>here
> are my thoughts on how it should work at a very high level.
> My intention in this post is to find out if there is a better way to
> design
> this.
> Data is originally entered in an as400 system.
> On nightly basis, openquery from sql server is used to get all the tables
> into "extract_database" as is from as400.
> "User_database" takes data on nightly basis from "extract_database"
> and converts data into user friendly tables. These tables could be
> used by front end programmers to write tsql to drive the front end.
> If we have a table called "Sales" then on nightly basis "Sales" table
> would
> get refreshed from as400 and if there is a change in particluar sales
> order
> then I want "user_database" to have the latest change in Sales table.
> Some how I also want to document the change in Sales history. In other
> words
> I don't want to lose any data change from day to day.
> I also would like to have "history_database" which would keep track of
> record changes or deletion.
> In summary, "extract_database" has the copy of what exactly got imported,
> "user_database" has the converted data so that front end application could
> tap into it, and "history_database" is used for reporting.
> For daily data refresh, I would like to daily keep track of data imported
> for each table.
> Are there any white papers, articles, web sites, books that address this
> nightly database refresh design and best practices'
> Please excuse me in advance if I missed some thing.
> TIA..
>

No comments:

Post a Comment