Monday, March 26, 2012

Need feedback on my plan to import a terribly formatted Excel spreadsheet

Good morning, all,

I have an Excel workbook that needs to be imported. It has three
sheets, but it's really the first that is giving me fits. Each of the
three worksheets have header info and instructions on the first 8
rows. Worksheet 1 then has, on row 9, the column names for the group
informtion. Row 10 has the group information. Row 11 has detail
column headers. Row 12 and later have detail information. Worksheets
2 and three do not have detail information, just row 9 with the column
names for the group informtion and Row 10 with group information.

Here is how I am thinking of handling this.
Run a script, outside of SSIS to save each sheet as a CSV file to a
folder. I believe that this must be done because some of the first 8
rows are blank and according to the docs, SSIS cannot have blank rows
in imported Excel sheets.
Loop over the files in the folder.
For each file, exclude the first 8 rows.
if the file name is the first worksheet then
get the next two rows and process group info
get the rest of the worksheet and process detail information
if the file name is not the first worksheet then
get the next two rows and process group info

My questions are: Does this seem feasible? Is there an easier way to
do this? Any hints or tricks that might be helpful? Any pitfalls
that I should watch out for?

Thanks so much for any insights,
Kathryn

It sounds like you have two record types to import, group and detail. I would load the data into a staging table designed to handle both or multiple record types as a first pass. Once the data is loaded to the staging table, you can pass through the table and load each record type into separate tables and process from there.

However, your approach should work as well.

No comments:

Post a Comment