Thursday, March 29, 2012

ETL / Data mart / Cube question

We are in process of using new Kimball toolkit book as our guide t
o create another DM. Since we only have one data source, we think we need t
o pull in all the tables into the staging area. At first we wanted to just
pull in minimal data to populate the Dimension/Fact tables and any informati
on needed to derive columns.
However the business area wants a complete snapshot of ALL THE dat
a for this given subject matter. Their reason is that sometimes our CUBES d
on't have all the information they want, and they want to be able to query t
he data mart tables against the staging area tables using SQL or Access. Si
nce our data mart load times is first Saturday of the month - this argues in
favor of this. If the load process was every day or once a week - it might
be better to just query across the data mart and the reporting (replicated p
roduction) database - the the data mart tables would be older than the infor
mation they were linking against on the replicated report DB using the 1st S
aturday approach.
Questions:
1. Is this the correct approach?
2. According to a Microsoft diagram, the load goes from Source to Stag
ing to OLAP DB - which is really just another OLTP DB but the tables are mod
eled as Dimension/Fact tables, right?
3. Our existing data mart which everyone mostly uses CUBES to access,
has a big manual step. After all the ETL happens and the Dim/Fact tables ha
ve been updated - we are having to open up SSAS and process the cubes, the f
irst business day after the 1st Saturday, to get them refreshed - can't this
be automated within SSIS?
4. There have been some major production changes to our existing data
source for our existing production data mart. All the ETL was done on SQL2k
with stored proc's and DTS was only used as the policeman to control the fl
ow. All the SCD processes were done and are working fine using SP's. We no
w have to make changes and are faced with having two complete different syst
ems to manage - one with an ETL using the old method and the 2nd using SSIS.
Some of the CUBES from the original data mart are going to be combined wit
h our new cubes we are creating in our new system. To try and make it clear
er - we have for example a Client dimension and a Client fact table that fir
st is refreshed monthly with the old ETL, and now our new subject matter is
now also going to affect the Client dimension and Client fact table that was
created the first project. The old production changes it appears may be m
any weeks to adjust using the old ETL methods - so it begs the question, sin
ce all the ETL is working fine with DTS as flow master - is it perhaps bette
r to get it all working with SSIS calling all the SP's and incorporate all o
ur new ETL within SSIS using the new features that SSIS can offer?
5. When adding a 3rd subject matter data mart - is it a 3rd ETL packag
e or just enhancing the one we have?
____________________
Joe Horton
SQL 2005 Business Intelligence
Aging & Disability Services AdministrationI think that for 1, 2 and 3 you're right - you can schedule cube
process withing SSIS (there is a specific task to do that).
For 4 and 5 it's hard to say - it depends from the cost of
implementation and the cost of maintenance... SSIS packages are easier
to maintain in the long term, but you could have an higher cost in the
short term for the initial design. Mixing SSIS and SP's is certainly
feasible, even if it makes the solution harder to manage and also
performance can suffert compared to a native SSIS solution. Anyway, it
all depends by other constraints (volume of data, time of development
available, and so on).
Marco Russo
http://www.sqlbi.eu
http://www.sqljunkies.com/weblog/sqlbi
On Feb 1, 6:16 pm, "Joe" <hortoris...@.gmail.dot.com> wrote:
> =B7 We are in process of using new Kimball toolkit book as our gu=
ide to create another DM. Since we only have one data source, we think we =
need to pull in all the tables into the staging area. At first we wanted t=
o just pull in minimal data to populate the Dimension/Fact tables and any i=
nformation needed to derive columns. =20
> =B7 However the business area wants a complete snapshot of ALL TH=
E data for this given subject matter. Their reason is that sometimes our C=
UBES don't have all the information they want, and they want to be able to =
query the data mart tables against the staging area tables using SQL or Acc=
ess. Since our data mart load times is first Saturday of the month - this =
argues in favor of this. If the load process was every day or once a week -=
it might be better to just query across the data mart and the reporting (r=
eplicated production) database - the the data mart tables would be older th=
an the information they were linking against on the replicated report DB us=
ing the 1st Saturday approach.
> Questions:
> 1. Is this the correct approach? =20
> 2. According to a Microsoft diagram, the load goes from Source to S=
taging to OLAP DB - which is really just another OLTP DB but the tables are=
modeled as Dimension/Fact tables, right? =20
> 3. Our existing data mart which everyone mostly uses CUBES to acces=
s, has a big manual step. After all the ETL happens and the Dim/Fact table=
s have been updated - we are having to open up SSAS and process the cubes, =
the first business day after the 1st Saturday, to get them refreshed - can'=
t this be automated within SSIS?
> 4. There have been some major production changes to our existing da=
ta source for our existing production data mart. All the ETL was done on S=
QL2k with stored proc's and DTS was only used as the policeman to control t=
he flow. All the SCD processes were done and are working fine using SP's. =
We now have to make changes and are faced with having two complete differe=
nt systems to manage - one with an ETL using the old method and the 2nd usi=
ng SSIS. Some of the CUBES from the original data mart are going to be com=
bined with our new cubes we are creating in our new system. To try and mak=
e it clearer - we have for example a Client dimension and a Client fact tab=
le that first is refreshed monthly with the old ETL, and now our new subjec=
t matter is now also going to affect the Client dimension and Client fact t=
able that was created the first project. The old production changes it ap=
pears may be many weeks to adjust using the old ETL methods - so it begs th=
e question, since all the ETL is working fine with DTS as flow master - is =
it perhaps better to get it all working with SSIS calling all the SP's and =
incorporate all our new ETL within SSIS using the new features that SSIS ca=
n offer? =20
> 5. When adding a 3rd subject matter data mart - is it a 3rd ETL pac=
kage or just enhancing the one we have?
> ____________________
> Joe Horton
> SQL 2005 Business Intelligence
> Aging & Disability Services Administration|||Hi Joe,
I'm involved in a Data Warehouse project. Client is already invested in SQL
Server 2005.
Directly coming to the point..
You must have setup server(s) for your data warehouse. I wanted to know
whether you were involved in Server Sizing / Data Sizing. If yes, then can
you please guide me as to which method / template did you use to size your
data and based on it what hardware (with configuration) did you suggest /
decide?
Please help this is very critical and urgent!
Regards,
Bharat Tamhankar
"Joe" wrote:

>
> We are in process of using new Kimball toolkit book as our guide
> to create another DM. Since we only have one data source, we think we
> need to pull in all the tables into the staging area. At first we wanted
> to just pull in minimal data to populate the Dimension/Fact tables and any
> information needed to derive columns.
> However the business area wants a complete snapshot of ALL THE
> data for this given subject matter. Their reason is that sometimes our
> CUBES don't have all the information they want, and they want to be able
> to query the data mart tables against the staging area tables using SQL or
> Access. Since our data mart load times is first Saturday of the month -
> this argues in favor of this. If the load process was every day or once a
> week - it might be better to just query across the data mart and the
> reporting (replicated production) database - the the data mart tables
> would be older than the information they were linking against on the
> replicated report DB using the 1st Saturday approach.
>
> Questions:
> 1. Is this the correct approach?
> 2. According to a Microsoft diagram, the load goes from Source to
> Staging to OLAP DB - which is really just another OLTP DB but the tables
> are modeled as Dimension/Fact tables, right?
> 3. Our existing data mart which everyone mostly uses CUBES to
> access, has a big manual step. After all the ETL happens and the Dim/Fact
> tables have been updated - we are having to open up SSAS and process the
> cubes, the first business day after the 1st Saturday, to get them
> refreshed - can't this be automated within SSIS?
> 4. There have been some major production changes to our existing
> data source for our existing production data mart. All the ETL was done
> on SQL2k with stored proc's and DTS was only used as the policeman to
> control the flow. All the SCD processes were done and are working fine
> using SP's. We now have to make changes and are faced with having two
> complete different systems to manage - one with an ETL using the old
> method and the 2nd using SSIS. Some of the CUBES from the original data
> mart are going to be combined with our new cubes we are creating in our
> new system. To try and make it clearer - we have for example a Client
> dimension and a Client fact table that first is refreshed monthly with the
> old ETL, and now our new subject matter is now also going to affect the
> Client dimension and Client fact table that was created the first project.
> The old production changes it appears may be many weeks to adjust using
> the old ETL methods - so it begs the question, since all the ETL is
> working fine with DTS as flow master - is it perhaps better to get it all
> working with SSIS calling all the SP's and incorporate all our new ETL
> within SSIS using the new features that SSIS can offer?
> 5. When adding a 3rd subject matter data mart - is it a 3rd ETL
> package or just enhancing the one we have?
>
>
>
> ____________________
> Joe Horton
> SQL 2005 Business Intelligence
> Aging & Disability Services Administration
>
>
>|||I'm sorry - the existing archetecture was in place before I arrived and as
my expertise is in coding not hardware - I wouldn't have been much help in
that area anyhow.
"Bharat Tamhankar" <BharatTamhankar@.discussions.microsoft.com> wrote in
message news:2B8F4494-BADC-4F47-86BC-383C5C07F0E6@.microsoft.com...[vbcol=seagreen]
> Hi Joe,
> I'm involved in a Data Warehouse project. Client is already invested in
> SQL
> Server 2005.
> Directly coming to the point..
> You must have setup server(s) for your data warehouse. I wanted to know
> whether you were involved in Server Sizing / Data Sizing. If yes, then can
> you please guide me as to which method / template did you use to size your
> data and based on it what hardware (with configuration) did you suggest /
> decide?
> Please help this is very critical and urgent!
> Regards,
> Bharat Tamhankar
>
> "Joe" wrote:
>

No comments:

Post a Comment