Thursday, March 29, 2012

ETL Package Problem

I'm going through the Integration tutorial, Lesson 1: Creating a Simple ETL Package. Everything goes fine until I run the package. I get an error about inserting Null in the CurrencyKey field which is set to not null. The flat file looks okay.

Any help would be appreciated.

SSIS package "Lesson 1.dtsx" starting.
Information: 0x4004300A at Extract Sample Currency Data, DTS.Pipeline: Validation phase is beginning.
Information: 0x4004300A at Extract Sample Currency Data, DTS.Pipeline: Validation phase is beginning.
Information: 0x40043006 at Extract Sample Currency Data, DTS.Pipeline: Prepare for Execute phase is beginning.
Information: 0x40043007 at Extract Sample Currency Data, DTS.Pipeline: Pre-Execute phase is beginning.
Information: 0x402090DC at Extract Sample Currency Data, Extract Sample Currency Data [1]: The processing of file "C:\Program Files\Microsoft SQL Server\90\Samples\Integration Services\Tutorial\Creating a Simple ETL Package\Sample Data\SampleCurrencyData.txt" has started.
Information: 0x400490F4 at Extract Sample Currency Data, Lookup Currency Key [30]: component "Lookup Currency Key" (30) has cached 105 rows.
Information: 0x400490F4 at Extract Sample Currency Data, Lookup Date Key [124]: component "Lookup Date Key" (124) has cached 992 rows.
Information: 0x4004300C at Extract Sample Currency Data, DTS.Pipeline: Execute phase is beginning.
Information: 0x402090DE at Extract Sample Currency Data, Extract Sample Currency Data [1]: The total number of data rows processed for file "C:\Program Files\Microsoft SQL Server\90\Samples\Integration Services\Tutorial\Creating a Simple ETL Package\Sample Data\SampleCurrencyData.txt" is 1097.
Information: 0x402090DF at Extract Sample Currency Data, Sample OLE DB Destination [158]: The final commit for the data insertion has started.
Error: 0xC0202009 at Extract Sample Currency Data, Sample OLE DB Destination [158]: An OLE DB error has occurred. Error code: 0x80004005.
An OLE DB record is available. Source: "Microsoft SQL Native Client" Hresult: 0x80004005 Description: "The statement has been terminated.".
An OLE DB record is available. Source: "Microsoft SQL Native Client" Hresult: 0x80004005 Description: "Cannot insert the value NULL into column 'CurrencyKey', table 'AdventureWorksDW.dbo.FactCurrencyRate'; column does not allow nulls. INSERT fails.".
Information: 0x402090E0 at Extract Sample Currency Data, Sample OLE DB Destination [158]: The final commit for the data insertion has ended.
Error: 0xC0047022 at Extract Sample Currency Data, DTS.Pipeline: The ProcessInput method on component "Sample OLE DB Destination" (158) failed with error code 0xC0202009. The identified component returned an error from the ProcessInput method. The error is specific to the component, but the error is fatal and will cause the Data Flow task to stop running.
Error: 0xC0047021 at Extract Sample Currency Data, DTS.Pipeline: Thread "WorkThread0" has exited with error code 0xC0202009.
Information: 0x40043008 at Extract Sample Currency Data, DTS.Pipeline: Post Execute phase is beginning.
Information: 0x402090DD at Extract Sample Currency Data, Extract Sample Currency Data [1]: The processing of file "C:\Program Files\Microsoft SQL Server\90\Samples\Integration Services\Tutorial\Creating a Simple ETL Package\Sample Data\SampleCurrencyData.txt" has ended.
Information: 0x40043009 at Extract Sample Currency Data, DTS.Pipeline: Cleanup phase is beginning.
Information: 0x4004300B at Extract Sample Currency Data, DTS.Pipeline: "component "Sample OLE DB Destination" (158)" wrote 1097 rows.
Task failed: Extract Sample Currency Data
Warning: 0x80019002 at Lesson 1: The Execution method succeeded, but the number of errors raised (3) reached the maximum allowed (1); resulting in failure. This occurs when the number of errors reaches the number specified in MaximumErrorCount. Change the MaximumErrorCount or fix the errors.
SSIS package "Lesson 1.dtsx" finished: Failure.
The program '[3372] Lesson 1.dtsx: DTS' has exited with code 0 (0x0).

CurrencyKey Column is set to Not Null means, it cannot take Null values.

If I remember right, CurrencyKey Column of the AdventureWorksDW database is a Primary key.

Thanks,

S Suresh

|||Then there is a problem with the tutorial. Do you know how to fix the problem?
|||

Hi kinny_k,

I was just wondering if you ever got an answer to this problem with the SSIS ETL tutorial. I've run into the same issue and I've checked and rechecked my steps and can not see anything wrong.

If you have the solution/workaround I'd appreciate the input!

-mayalu

|||

Did you review if any of the mapping in your destination componnet is missing?

Rafael Salas

|||

I had a problem with this too. It turns out that the Flat File dates are in a MM/DD/YYYY format and since my default is DD/MM/YYYY most dates aren't correctly identified.

To resolve this:

1. In the ‘Flat File Connection Manager’, changed the CurrencyDate DataType to string [DT_STR].

2. In the ‘Data Flow’ diagram, a Warning displays on the ‘Extract Sample Currency Data’ dataflow.Double click to open it. A message is displayed warning that the format has changed, click Yes and when the ‘Flat File Source Editor’ is displayed, click OK to close.

3. In the Date Lookup Transformation Editor, Reference Table tab, select 'Use results of an SQL query' and paste in:

SELECT TimeKey, FullDateAlternateKey,
CAST(CAST(CONVERT(VARCHAR(2), FullDateAlternateKey, 101) AS INT) AS VARCHAR) + '/' +
CAST(CAST(SUBSTRING(CONVERT(VARCHAR(5), FullDateAlternateKey, 101), 4,2) AS INT) AS VARCHAR) + '/' +
CAST(CAST(RIGHT(CONVERT(VARCHAR(10), FullDateAlternateKey, 101),4) AS INT) AS VARCHAR) + ' 0:00' USFormatDate
FROM dbo.DimTime

Check to output looks OK by clicking the preview button.

4. In the Date Lookup Transformation Editor, Columns tab, link the CurrencyDate to the new USFormatDate field.

I'd imagine there's a nicer way of fixing this but I'm not at all familiar with SSIS yet.

|||

Hi Tim,

Thanks for the input. I tried it and all goes well until the last step where I try to map the CurrencyDate to the new USFormatDate field where I get a data type mismatch. I guess there's something else peculiar to my setup.

At least I know where to look closer!

-Elsie

|||

Elsie,

Hum, I think I'd check the first step again. You have to change the flat file Currency Date to be a string otherwise the comparison wont work and you’ll get a type mismatch. Other than that, I'm afraid I don't know.

You could always change the FactCurrencyRate table definition to allow Null in the TimeKey field so you don't get the error. In retrospect I should have done that as it’s a lot quicker to get the lesson working.

Tim

|||It seems that the only solution is to use SSIS in a db instance with English(US) as default language.
At least, for db instances with non-English default languages (I have Russian one) nothing else works
http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=1566456&SiteID=1&mode=1|||

In both step 9's of the add lookup transformation, be sure to click on the box to select the(CurrencyKey) lookup value. You should have a checkbox before moving on. Then you can pam this lookup to the column in the later step.

Enjoy!

|||I am sure I verifшув that (and everything else) dozens of times,
BTW it does not address the error and the problem (incompatible datetime fields)|||

To get this fully working, following steps are required:

1. In the Sample Flat File Source Data connection manager, change CurrencyDate Data Type to String [DT_STR]

2. In the data flow task editor, double click data Flow souce 'Extract Sample Currency Data' then accept the warning

3. In Look Up Time Key task, under Reference Table tab, choose use Results of an SQL query, then use the following SQL statement, which I modified from an early post,

SELECT TimeKey, FullDateAlternateKey,
CAST(CAST(CONVERT(VARCHAR(2), FullDateAlternateKey, 101) AS INT) AS VARCHAR)+ '/'
+
CAST(CAST(SUBSTRING(CONVERT(VARCHAR(5), FullDateAlternateKey, 101), 4,2) AS INT) AS VARCHAR) + '/' +
CAST(CAST(RIGHT(CONVERT(VARCHAR(10), FullDateAlternateKey, 101),4) AS INT) AS VARCHAR) + ' 00:00:00' USFormatDate
FROM dbo.DimTime

4. In Look Up Time Key task, Link Currency date from Available Input Columns to USFormatDate in Available Look UP Columns.

5. Make sure you link the output of the look up to the destination columns.

Good luck,

Celine

|||

Another solution:

on http://www.microsoft.com/downloads/details.aspx?familyid=e719ecf7-9f46-4312-af89-6ad8702e4e6e&displaylang=en

you find database for case-sensitive collation and case-insensitive collation, a attached the other database and now works.

|||

This link tells that samples download had moved to codeplex

And codeplex has hundreds of samples

Which one?

|||http://www.codeplex.com/MSFTDBProdSamples

ETL Package Problem

I'm going through the Integration tutorial, Lesson 1: Creating a Simple ETL Package. Everything goes fine until I run the package. I get an error about inserting Null in the CurrencyKey field which is set to not null. The flat file looks okay.

Any help would be appreciated.

SSIS package "Lesson 1.dtsx" starting.
Information: 0x4004300A at Extract Sample Currency Data, DTS.Pipeline: Validation phase is beginning.
Information: 0x4004300A at Extract Sample Currency Data, DTS.Pipeline: Validation phase is beginning.
Information: 0x40043006 at Extract Sample Currency Data, DTS.Pipeline: Prepare for Execute phase is beginning.
Information: 0x40043007 at Extract Sample Currency Data, DTS.Pipeline: Pre-Execute phase is beginning.
Information: 0x402090DC at Extract Sample Currency Data, Extract Sample Currency Data [1]: The processing of file "C:\Program Files\Microsoft SQL Server\90\Samples\Integration Services\Tutorial\Creating a Simple ETL Package\Sample Data\SampleCurrencyData.txt" has started.
Information: 0x400490F4 at Extract Sample Currency Data, Lookup Currency Key [30]: component "Lookup Currency Key" (30) has cached 105 rows.
Information: 0x400490F4 at Extract Sample Currency Data, Lookup Date Key [124]: component "Lookup Date Key" (124) has cached 992 rows.
Information: 0x4004300C at Extract Sample Currency Data, DTS.Pipeline: Execute phase is beginning.
Information: 0x402090DE at Extract Sample Currency Data, Extract Sample Currency Data [1]: The total number of data rows processed for file "C:\Program Files\Microsoft SQL Server\90\Samples\Integration Services\Tutorial\Creating a Simple ETL Package\Sample Data\SampleCurrencyData.txt" is 1097.
Information: 0x402090DF at Extract Sample Currency Data, Sample OLE DB Destination [158]: The final commit for the data insertion has started.
Error: 0xC0202009 at Extract Sample Currency Data, Sample OLE DB Destination [158]: An OLE DB error has occurred. Error code: 0x80004005.
An OLE DB record is available. Source: "Microsoft SQL Native Client" Hresult: 0x80004005 Description: "The statement has been terminated.".
An OLE DB record is available. Source: "Microsoft SQL Native Client" Hresult: 0x80004005 Description: "Cannot insert the value NULL into column 'CurrencyKey', table 'AdventureWorksDW.dbo.FactCurrencyRate'; column does not allow nulls. INSERT fails.".
Information: 0x402090E0 at Extract Sample Currency Data, Sample OLE DB Destination [158]: The final commit for the data insertion has ended.
Error: 0xC0047022 at Extract Sample Currency Data, DTS.Pipeline: The ProcessInput method on component "Sample OLE DB Destination" (158) failed with error code 0xC0202009. The identified component returned an error from the ProcessInput method. The error is specific to the component, but the error is fatal and will cause the Data Flow task to stop running.
Error: 0xC0047021 at Extract Sample Currency Data, DTS.Pipeline: Thread "WorkThread0" has exited with error code 0xC0202009.
Information: 0x40043008 at Extract Sample Currency Data, DTS.Pipeline: Post Execute phase is beginning.
Information: 0x402090DD at Extract Sample Currency Data, Extract Sample Currency Data [1]: The processing of file "C:\Program Files\Microsoft SQL Server\90\Samples\Integration Services\Tutorial\Creating a Simple ETL Package\Sample Data\SampleCurrencyData.txt" has ended.
Information: 0x40043009 at Extract Sample Currency Data, DTS.Pipeline: Cleanup phase is beginning.
Information: 0x4004300B at Extract Sample Currency Data, DTS.Pipeline: "component "Sample OLE DB Destination" (158)" wrote 1097 rows.
Task failed: Extract Sample Currency Data
Warning: 0x80019002 at Lesson 1: The Execution method succeeded, but the number of errors raised (3) reached the maximum allowed (1); resulting in failure. This occurs when the number of errors reaches the number specified in MaximumErrorCount. Change the MaximumErrorCount or fix the errors.
SSIS package "Lesson 1.dtsx" finished: Failure.
The program '[3372] Lesson 1.dtsx: DTS' has exited with code 0 (0x0).

CurrencyKey Column is set to Not Null means, it cannot take Null values.

If I remember right, CurrencyKey Column of the AdventureWorksDW database is a Primary key.

Thanks,

S Suresh

|||Then there is a problem with the tutorial. Do you know how to fix the problem?
|||

Hi kinny_k,

I was just wondering if you ever got an answer to this problem with the SSIS ETL tutorial. I've run into the same issue and I've checked and rechecked my steps and can not see anything wrong.

If you have the solution/workaround I'd appreciate the input!

-mayalu

|||

Did you review if any of the mapping in your destination componnet is missing?

Rafael Salas

|||

I had a problem with this too. It turns out that the Flat File dates are in a MM/DD/YYYY format and since my default is DD/MM/YYYY most dates aren't correctly identified.

To resolve this:

1. In the ‘Flat File Connection Manager’, changed the CurrencyDate DataType to string [DT_STR].

2. In the ‘Data Flow’ diagram, a Warning displays on the ‘Extract Sample Currency Data’ dataflow.Double click to open it. A message is displayed warning that the format has changed, click Yes and when the ‘Flat File Source Editor’ is displayed, click OK to close.

3. In the Date Lookup Transformation Editor, Reference Table tab, select 'Use results of an SQL query' and paste in:

SELECT TimeKey, FullDateAlternateKey,
CAST(CAST(CONVERT(VARCHAR(2), FullDateAlternateKey, 101) AS INT) AS VARCHAR) + '/' +
CAST(CAST(SUBSTRING(CONVERT(VARCHAR(5), FullDateAlternateKey, 101), 4,2) AS INT) AS VARCHAR) + '/' +
CAST(CAST(RIGHT(CONVERT(VARCHAR(10), FullDateAlternateKey, 101),4) AS INT) AS VARCHAR) + ' 0:00' USFormatDate
FROM dbo.DimTime

Check to output looks OK by clicking the preview button.

4. In the Date Lookup Transformation Editor, Columns tab, link the CurrencyDate to the new USFormatDate field.

I'd imagine there's a nicer way of fixing this but I'm not at all familiar with SSIS yet.

|||

Hi Tim,

Thanks for the input. I tried it and all goes well until the last step where I try to map the CurrencyDate to the new USFormatDate field where I get a data type mismatch. I guess there's something else peculiar to my setup.

At least I know where to look closer!

-Elsie

|||

Elsie,

Hum, I think I'd check the first step again. You have to change the flat file Currency Date to be a string otherwise the comparison wont work and you’ll get a type mismatch. Other than that, I'm afraid I don't know.

You could always change the FactCurrencyRate table definition to allow Null in the TimeKey field so you don't get the error. In retrospect I should have done that as it’s a lot quicker to get the lesson working.

Tim

|||It seems that the only solution is to use SSIS in a db instance with English(US) as default language.
At least, for db instances with non-English default languages (I have Russian one) nothing else works
http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=1566456&SiteID=1&mode=1|||

In both step 9's of the add lookup transformation, be sure to click on the box to select the(CurrencyKey) lookup value. You should have a checkbox before moving on. Then you can pam this lookup to the column in the later step.

Enjoy!

|||I am sure I verifшув that (and everything else) dozens of times,
BTW it does not address the error and the problem (incompatible datetime fields)|||

To get this fully working, following steps are required:

1. In the Sample Flat File Source Data connection manager, change CurrencyDate Data Type to String [DT_STR]

2. In the data flow task editor, double click data Flow souce 'Extract Sample Currency Data' then accept the warning

3. In Look Up Time Key task, under Reference Table tab, choose use Results of an SQL query, then use the following SQL statement, which I modified from an early post,

SELECT TimeKey, FullDateAlternateKey,
CAST(CAST(CONVERT(VARCHAR(2), FullDateAlternateKey, 101) AS INT) AS VARCHAR)+ '/'
+
CAST(CAST(SUBSTRING(CONVERT(VARCHAR(5), FullDateAlternateKey, 101), 4,2) AS INT) AS VARCHAR) + '/' +
CAST(CAST(RIGHT(CONVERT(VARCHAR(10), FullDateAlternateKey, 101),4) AS INT) AS VARCHAR) + ' 00:00:00' USFormatDate
FROM dbo.DimTime

4. In Look Up Time Key task, Link Currency date from Available Input Columns to USFormatDate in Available Look UP Columns.

5. Make sure you link the output of the look up to the destination columns.

Good luck,

Celine

|||

Another solution:

on http://www.microsoft.com/downloads/details.aspx?familyid=e719ecf7-9f46-4312-af89-6ad8702e4e6e&displaylang=en

you find database for case-sensitive collation and case-insensitive collation, a attached the other database and now works.

|||

This link tells that samples download had moved to codeplex

And codeplex has hundreds of samples

Which one?

|||http://www.codeplex.com/MSFTDBProdSamples

ETL library

We are looking for an ETL library or callable application that would
allow us to transfer and transform data between platforms but mainly
into a SQL server database. We would be calling this tool from
javascript programs. Is there something out there in the midrange
price-wise? So far I am finding just high-end ETL tools that are a
little over kill for us.

Thanks."elizabeth" <ezelasky@.hotmail.com> wrote in message
news:78393913.0409150439.818af56@.posting.google.co m...
> We are looking for an ETL library or callable application that would
> allow us to transfer and transform data between platforms but mainly
> into a SQL server database. We would be calling this tool from
> javascript programs. Is there something out there in the midrange
> price-wise? So far I am finding just high-end ETL tools that are a
> little over kill for us.
>
> Thanks.

Have you looked at DTS, which is included with MSSQL? It has a command-line
executable and a COM interface available. See "Data Transformation Services"
in Books Online for more details.

If DTS isn't suitable for you, you might want to be more specific about what
features you require.

Simon|||Folks here felt that DTS was slow (copied record by record) and buggy
that is why I was asked to search for something else. What we need is
fairly basic...to be able to move large amounts of data (in some
cases > 2 gigs) between a variety of sources (flat files, FoxPro &
Dbase) into a SQL Server database. We need to be able to set or
specify the table structure "on-the-fly" thus the reason for calling
it from a scripting environment or another environment.

Thanks.
> Have you looked at DTS, which is included with MSSQL? It has a command-line
> executable and a COM interface available. See "Data Transformation Services"
> in Books Online for more details.
> If DTS isn't suitable for you, you might want to be more specific about what
> features you require.
> Simon|||DTS can use 'fast load' bulk insert techniques to import data. This is the
fastest way to get data into SQL Server. You can create DTS packages 'on
the fly' programmatically and have control over the object model.

--
Hope this helps.

Dan Guzman
SQL Server MVP

"elizabeth" <ezelasky@.hotmail.com> wrote in message
news:78393913.0409151028.7fcc0aca@.posting.google.c om...
> Folks here felt that DTS was slow (copied record by record) and buggy
> that is why I was asked to search for something else. What we need is
> fairly basic...to be able to move large amounts of data (in some
> cases > 2 gigs) between a variety of sources (flat files, FoxPro &
> Dbase) into a SQL Server database. We need to be able to set or
> specify the table structure "on-the-fly" thus the reason for calling
> it from a scripting environment or another environment.
> Thanks.
>>
>> Have you looked at DTS, which is included with MSSQL? It has a
>> command-line
>> executable and a COM interface available. See "Data Transformation
>> Services"
>> in Books Online for more details.
>>
>> If DTS isn't suitable for you, you might want to be more specific about
>> what
>> features you require.
>>
>> Simon|||One alternative is a "roll your own" solution. Create linked servers and
just run UPDATE/INSERT/ SELECT INTO statements. The problem is always
making sure you have the relevant drivers on th dev and production
enviroments. The enclosed link gives most of the connection strings needed.

http://www.able-consulting.com/ADO_Conn.htm

I'll warn you before you start this can get messy.

Regards

Adrian

elizabeth wrote:

> Folks here felt that DTS was slow (copied record by record) and buggy
> that is why I was asked to search for something else. What we need is
> fairly basic...to be able to move large amounts of data (in some
> cases > 2 gigs) between a variety of sources (flat files, FoxPro &
> Dbase) into a SQL Server database. We need to be able to set or
> specify the table structure "on-the-fly" thus the reason for calling
> it from a scripting environment or another environment.
> Thanks.
>>Have you looked at DTS, which is included with MSSQL? It has a command-line
>>executable and a COM interface available. See "Data Transformation Services"
>>in Books Online for more details.
>>
>>If DTS isn't suitable for you, you might want to be more specific about what
>>features you require.
>>
>>Simon

ETL Guidance

Hi,

We've decided to use SSIS for an ETL type project. We are going to convert some files to our special flat file format. Our special flat file format will not change, but we will need to convert about 40 different style flat files into this format. We will receive any of these files at random intervals. We will likely have to add more file formats to the 40 at some point, too.

I'd like to use a flexible design, but I just started with SSIS and its going to be hard to test all the possible combinations, so I hope I can find some guidance in this forum.

What would be a good approach to allow for the most reuse of work to support these various file formats?

Jamie Thomson's excellent SSIS blog has an "SSIS Best Practices" list that might be a good place to start: http://blogs.conchango.com/jamiethomson/archive/2006/01/05/2554.aspx

|||At worst you'll need to make a converison from each style of incoming file to your formatted style. more details would be helpful|||Thanks, I read Jamie's blog everyday and I recommend it to everyone, too.sql

ETL from SQL server to SQL CE

How cai tranform data from an existing SQL server 2005 DB to a (.sdf) SQL server Compact Edition DB ?

I don't know the data provider name for connecting to a SQL CE DB..

ANy suggestions ?

You can use the SQL Server Compact Edition Destination component (with a File Connection Manager) in a Data Flow Task to "export" data to SQL Server Mobile. You can place some transformations between your data source and the destination component.

ETL from multiple Access databases -SSIS?

Howdy Folks,

I have an ETL project that I want to ask a question about. I want to loop through a folder containing several Access databases, and extract and load each one into a single SQL Server 2005 database. The table layout is the same for all the Access dbs, as well as the SQL db. The number of Access dbs in the folder, their records, and their names will change. Is there a way i can use the SSIS Foreach tool to move through the folder and set the connection string inside the loop?

Thanks for your reply,

Chris

Yes, you can do this.

Use the ForEach File Enumerator to enumerate your list of files. you can use wildcards (e.g. *.mdb)

Store the file path of each file in a variable. You can then use the contents of that variable to dynamically build your connection string using an expression.

Lots of information via Google about this stuff if you want to go hunting or ask here with any pertinent questions.

Regards

-Jamie

|||

Thanks Jamie (alot ) for your suggestion. I tried using the Foreach File Enumerator with the root path for the Access files and a *.mdb wildcard. Then I created a mapped string variable varFileNameto hold the file path. Inside the loop i have my Data Flow Task. In the data source for the task, I was using the variable in an expression as the ConnectionSting property which didn't work. So I added an OLE DB connection string in the expression like "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + @.[User::varFileName] but i don't think that's the answer.

I've been googling on this but can't find solutions for multiple .mdb's, most of waht i find is dealing with flat files. Great blogs btw.

|||

Should be able to find all you need here: http://www.connectionstrings.com/

-Jamie

|||

Yep, i went there. Thanks Jamie. My string was ok. Actually what got me running was to set the Foreach Loop Container>Properties>Execution>DelayValidation = True. Works great running the SSIS package on the box with SQL server (as opposed to OLE DB) destinations.

ETL Delta Pulling Huge data.. Right approach ?

Hi all,

In an approach of building an ETL tool, we are into a situation wherein, a table has to be loaded on an incremental basis. The first run all the records apporx 100 lacs has to be loaded. From the next run, only the records that got updated since the last run of the package or newly added are to be pulled from the source Database. One idea we had was to have two OLE DB Source components, in one get those records that got updated or was added newly, since we have upddate cols in the DB getting them is fairly simple, in the next OLEDB source load all the records form the Destination, pass it onto a Merge Join then have a Conditional Split down the piple line, and handle the updates cum insert.

Now the question is, how slow the show is gonna be ? Will there be a case that the Source DB returns records pretty fast and Merge Join fails in anticipation of all the records from the destination ?

What might be the ideal way to go about my scenario.. Please advice...

Thanks in advance.

Or may be just use a lookup component to check if there are any updates and redirect the errored rows as New records to the destination. Sounds good ?

|||

MShetty wrote:

Or may be just use a lookup component to check if there are any updates and redirect the errored rows as New records to the destination. Sounds good ?

The lookup component might be the way to go; or at least it has been the most popular approach for most scenarios. However, there has been several interesting discussion around this issue.

If you decide to use lookup transforms; be sure to understand how that component uses memory resources. Also, be aware that lookup transform is case sensitive when 'matching' the columns in the pipeline against the lookup table.

Jamie has interesting article on his blog; also Greg has written about an alternative approach using script component:

http://blogs.conchango.com/jamiethomson/archive/2006/09/12/SSIS_3A00_-Checking-if-a-row-exists-and-if-it-does_2C00_-has-it-changed.aspx

Using 2 OLE DB source components and merge join, i think would be slower; but you can test it and gain a better understanding of that.

|||

Rafael Salas wrote:

The lookup component might be the way to go; or at least it has been the most popular approach for most scenarios. However, there has been several interesting discussion around this issue.

If you decide to use lookup transforms; be sure to understand how that component uses memory resources. Also, be aware that lookup transform is case sensitive when 'matching' the columns in the pipeline against the lookup table.

looks the memory usage is going to be an issue in lookup, especially when the record count is pretty huge.

|||

May be is not an issue at all; my advise on that: test it!

make sure you provide a query in the lookup transform with only the columns to be used for the lookup operation, so the resultset is as smallest as possible columns wise.

|||

Rafael Salas wrote:

May be is not an issue at all; my advise on that: test it!

make sure you provide a query in the lookup transform with only the columns to be used for the lookup operation, so the resultset is as smallest as possible columns wise.

I strongly agree. Lookup transform performs very well if you can use it right.

|||I did a testing with about 100 000+ rows in my forum post Merge Join vs. Lookup vs. Custom Script.

This should give you an overview - but please do your own testing before going live as the performance may vary accross configurations|||Thx for all the valuable inputs.. I think Lookup is not a fear, but its the components down the pipeline, OLEDB Command I am worried about,if its going to update lot of rows its going to hit the perfomance big time Sad|||

TheViewMaster wrote:

I did a testing with about 100 000+ rows in my forum post Merge Join vs. Lookup vs. Custom Script.

This should give you an overview - but please do your own testing before going live as the performance may vary accross configurations

Amen to that. its no good asking on here - the only person that can provide a definitive answer is yourself.

-Jamie

|||

MShetty wrote:

Thx for all the valuable inputs.. I think Lookup is not a fear, but its the components down the pipeline, OLEDB Command I am worried about,if its going to update lot of rows its going to hit the perfomance big time

Yup - updating large amount of rows can take forever.
I also came to conclusion that with large number records - OLE DB Command object won't perform well. The solution I use is to load the data which needs to be updated to staging table - and then Execute SQL to update
e.g.
UPDATE Prod_Table
SET Field = a2.Field
FROM Prod_Table a1
JOIN Staging_Table a2 ON a1.PrimaryKey = a2.PrimaryKey|||

TheViewMaster wrote:

MShetty wrote:

Thx for all the valuable inputs.. I think Lookup is not a fear, but its the components down the pipeline, OLEDB Command I am worried about,if its going to update lot of rows its going to hit the perfomance big time

Yup - updating large amount of rows can take forever.
I also came to conclusion that with large number records - OLE DB Command object won't perform well. The solution I use is to load the data which needs to be updated to staging table - and then Execute SQL to update
e.g.
UPDATE Prod_Table
SET Field = a2.Field
FROM Prod_Table a1
JOIN Staging_Table a2 ON a1.PrimaryKey = a2.PrimaryKey

The approach ViewMaster is describing is exactly what I do on such cases. OLE DB commands are simply slow by nature, command gets executed once per every row, so its implementation is just not adecuate for large number of rows. You can use a lookup to separete your inserts from updates and then the insert pipe can go directly to the destination table and the other one to a stage table to be used back in the control flow in a Execute sql task (1 time update).

|||Thx for the inputs... Will defintely post on what was the test results once I test out the packages Smile

etl best practice

Hi
With As there's a lot of ETL.
How much is there with report builder ? I've seen the up-coming versionn
of the erp system Dynamics Ax and it looks as if there is no ETL at all -
they
just build a report model ontop of the production database.
Is this ok ? Give's realtime but is the transaction database structue okay
for building larger reporting solutions ?
/Michael V.I think you need to perform the etl to get to the datasource view state,
then you may want to build an AS 2005 cube, or a report object model or
both. My point is the etl gets you to the intermediary point. ETL may not
be required, but may be waranted for the same reasons we used etl processes
in AS 2000. I'm not familiar with Dynamics Ax.
Steve MunLeeuw
"michael v" <test@.test.com> wrote in message
news:OxB7TjZzGHA.3440@.TK2MSFTNGP06.phx.gbl...
> Hi
> With As there's a lot of ETL.
> How much is there with report builder ? I've seen the up-coming versionn
> of the erp system Dynamics Ax and it looks as if there is no ETL at all -
> they
> just build a report model ontop of the production database.
> Is this ok ? Give's realtime but is the transaction database structue okay
> for building larger reporting solutions ?
> /Michael V.
>

ETL and SSIS/SSAS roles in it all.

I am a student in a data warehousing class. This is my only experience using
SQL Server 2005 BI ETL tools. I need to extract, transform, and load into a
data warehouse using SQL Server 2005. I've extracted all tables from my 3nf
d.b. into flat files using SSIS. My question is about the connection
between SSIS and SSAS, and how I create the time dimension (in SSAS?) and
then have it connect back to my process in SSIS so I can do the final load
into it.
I thought I could use SSIS for the entire ETL process. Now I'm discovering
I somehow need to bring in SSAS to define the dimensions and somehow finish
up in SSIS again.
I defined my source d.b. and final destination data warehouse in regular SQL
Server tables. Do I need to delete the final destination data warehouse
database and redefine it as dimensions in SSAS?
Any help on this is appreciated. If you even have a recommended tutorial
that would help as well.
Hello Laura,
there are a number of ways to skin this cat.
The way I usually do it is
Build a relational DB (Star Schema)
Build an SSAS Cube over the top of that schema.
Use SSIS to load up the relation DB
In 2K5 you also though can load the SSAS partitions and dimensions directly
through SSIS which is way cool.

Allan Mitchell
http://wiki.sqlis.com | http://www.sqlis.com | http://www.sqldts.com |
http://www.konesans.com

> I am a student in a data warehousing class. This is my only experience
> using SQL Server 2005 BI ETL tools. I need to extract, transform, and
> load into a data warehouse using SQL Server 2005. I've extracted all
> tables from my 3nf d.b. into flat files using SSIS. My question is
> about the connection between SSIS and SSAS, and how I create the time
> dimension (in SSAS?) and then have it connect back to my process in
> SSIS so I can do the final load into it.
> I thought I could use SSIS for the entire ETL process. Now I'm
> discovering I somehow need to bring in SSAS to define the dimensions
> and somehow finish up in SSIS again.
> I defined my source d.b. and final destination data warehouse in
> regular SQL Server tables. Do I need to delete the final destination
> data warehouse database and redefine it as dimensions in SSAS?
> Any help on this is appreciated. If you even have a recommended
> tutorial that would help as well.
>
sql

ETL and SSIS/SSAS roles in it all.

I am a student in a data warehousing class. This is my only experience using
SQL Server 2005 BI ETL tools. I need to extract, transform, and load into a
data warehouse using SQL Server 2005. I've extracted all tables from my 3nf
d.b. into flat files using SSIS. My question is about the connection
between SSIS and SSAS, and how I create the time dimension (in SSAS?) and
then have it connect back to my process in SSIS so I can do the final load
into it.
I thought I could use SSIS for the entire ETL process. Now I'm discovering
I somehow need to bring in SSAS to define the dimensions and somehow finish
up in SSIS again.
I defined my source d.b. and final destination data warehouse in regular SQL
Server tables. Do I need to delete the final destination data warehouse
database and redefine it as dimensions in SSAS?
Any help on this is appreciated. If you even have a recommended tutorial
that would help as well.Hello Laura,
there are a number of ways to skin this cat.
The way I usually do it is
Build a relational DB (Star Schema)
Build an SSAS Cube over the top of that schema.
Use SSIS to load up the relation DB
In 2K5 you also though can load the SSAS partitions and dimensions directly
through SSIS which is way cool.
Allan Mitchell
http://wiki.sqlis.com | http://www.sqlis.com | http://www.sqldts.com |
http://www.konesans.com

> I am a student in a data warehousing class. This is my only experience
> using SQL Server 2005 BI ETL tools. I need to extract, transform, and
> load into a data warehouse using SQL Server 2005. I've extracted all
> tables from my 3nf d.b. into flat files using SSIS. My question is
> about the connection between SSIS and SSAS, and how I create the time
> dimension (in SSAS?) and then have it connect back to my process in
> SSIS so I can do the final load into it.
> I thought I could use SSIS for the entire ETL process. Now I'm
> discovering I somehow need to bring in SSAS to define the dimensions
> and somehow finish up in SSIS again.
> I defined my source d.b. and final destination data warehouse in
> regular SQL Server tables. Do I need to delete the final destination
> data warehouse database and redefine it as dimensions in SSAS?
> Any help on this is appreciated. If you even have a recommended
> tutorial that would help as well.
>

ETL : rows with Errors

I'm using a "Execute SQL Task" to run a stored procedure to populate a Fact table in our dimensional datawarehouse. There is one row which violates a foreign key constraint, which causes the entire task to fail so zero rows are loaded. Is there any way to grab the offending row and send it off to some holding ground and go ahead and load the rest of the rows.

I'm using Execute SQL Task mostly because I am very comfortable with writing SQL whereas the rest of SSIS is a bit unfamiliar to me, but I'm guessing that to handle error rows I might have to change to a different kind of task ?

Thanks

Richard

The best way to do this is to use a lookup in the dataflow to determine if the incoming record violates foreign key rules.

As far as your stored procedure goes, head over to the Transact-SQL forum for help with that.

ETL / Data mart / Cube question

We are in process of using new Kimball toolkit book as our guide to create another DM. Since we only have one data source, we think we need to pull in all the tables into the staging area. At first we wanted to just pull in minimal data to populate the Dimension/Fact tables and any information needed to derive columns.
However the business area wants a complete snapshot of ALL THE data for this given subject matter. Their reason is that sometimes our CUBES don't have all the information they want, and they want to be able to query the data mart tables against the staging area tables using SQL or Access. Since our data mart load times is first Saturday of the month - this argues in favor of this. If the load process was every day or once a week - it might be better to just query across the data mart and the reporting (replicated production) database - the the data mart tables would be older than the information they were linking against on the replicated report DB using the 1st Saturday approach.
Questions:
1. Is this the correct approach?
2. According to a Microsoft diagram, the load goes from Source to Staging to OLAP DB - which is really just another OLTP DB but the tables are modeled as Dimension/Fact tables, right?
3. Our existing data mart which everyone mostly uses CUBES to access, has a big manual step. After all the ETL happens and the Dim/Fact tables have been updated - we are having to open up SSAS and process the cubes, the first business day after the 1st Saturday, to get them refreshed - can't this be automated within SSIS?
4. There have been some major production changes to our existing data source for our existing production data mart. All the ETL was done on SQL2k with stored proc's and DTS was only used as the policeman to control the flow. All the SCD processes were done and are working fine using SP's. We now have to make changes and are faced with having two complete different systems to manage - one with an ETL using the old method and the 2nd using SSIS. Some of the CUBES from the original data mart are going to be combined with our new cubes we are creating in our new system. To try and make it clearer - we have for example a Client dimension and a Client fact table that first is refreshed monthly with the old ETL, and now our new subject matter is now also going to affect the Client dimension and Client fact table that was created the first project. The old production changes it appears may be many weeks to adjust using the old ETL methods - so it begs the question, since all the ETL is working fine with DTS as flow master - is it perhaps better to get it all working with SSIS calling all the SP's and incorporate all our new ETL within SSIS using the new features that SSIS can offer?
5. When adding a 3rd subject matter data mart - is it a 3rd ETL package or just enhancing the one we have?
____________________
Joe Horton
SQL 2005 Business Intelligence
Aging & Disability Services Administration
I think that for 1, 2 and 3 you're right - you can schedule cube
process withing SSIS (there is a specific task to do that).
For 4 and 5 it's hard to say - it depends from the cost of
implementation and the cost of maintenance... SSIS packages are easier
to maintain in the long term, but you could have an higher cost in the
short term for the initial design. Mixing SSIS and SP's is certainly
feasible, even if it makes the solution harder to manage and also
performance can suffert compared to a native SSIS solution. Anyway, it
all depends by other constraints (volume of data, time of development
available, and so on).
Marco Russo
http://www.sqlbi.eu
http://www.sqljunkies.com/weblog/sqlbi
On Feb 1, 6:16 pm, "Joe" <hortoris...@.gmail.dot.com> wrote:
> We are in process of using new Kimball toolkit book as our guide to create another DM. Since we only have one data source, we think we need to pull in all the tables into the staging area. At first we wanted to just pull in minimal data to populate the Dimension/Fact tables and any information needed to derive columns.
> However the business area wants a complete snapshot of ALL THE data for this given subject matter. Their reason is that sometimes our CUBES don't have all the information they want, and they want to be able to query the data mart tables against the staging area tables using SQL or Access. Since our data mart load times is first Saturday of the month - this argues in favor of this. If the load process was every day or once a week -it might be better to just query across the data mart and the reporting (replicated production) database - the the data mart tables would be older than the information they were linking against on the replicated report DB using the 1st Saturday approach.
> Questions:
> 1. Is this the correct approach?
> 2. According to a Microsoft diagram, the load goes from Source to Staging to OLAP DB - which is really just another OLTP DB but the tables aremodeled as Dimension/Fact tables, right?
> 3. Our existing data mart which everyone mostly uses CUBES to access, has a big manual step. After all the ETL happens and the Dim/Fact tables have been updated - we are having to open up SSAS and process the cubes, the first business day after the 1st Saturday, to get them refreshed - can't this be automated within SSIS?
> 4. There have been some major production changes to our existing data source for our existing production data mart. All the ETL was done on SQL2k with stored proc's and DTS was only used as the policeman to control the flow. All the SCD processes were done and are working fine using SP's. We now have to make changes and are faced with having two complete different systems to manage - one with an ETL using the old method and the 2nd using SSIS. Some of the CUBES from the original data mart are going to be combined with our new cubes we are creating in our new system. To try and make it clearer - we have for example a Client dimension and a Client fact table that first is refreshed monthly with the old ETL, and now our new subject matter is now also going to affect the Client dimension and Client fact table that was created the first project. The old production changes it appears may be many weeks to adjust using the old ETL methods - so it begs the question, since all the ETL is working fine with DTS as flow master - is it perhaps better to get it all working with SSIS calling all the SP's and incorporate all our new ETL within SSIS using the new features that SSIS can offer?
> 5. When adding a 3rd subject matter data mart - is it a 3rd ETL package or just enhancing the one we have?
> ____________________
> Joe Horton
> SQL 2005 Business Intelligence
> Aging & Disability Services Administration
|||Hi Joe,
I'm involved in a Data Warehouse project. Client is already invested in SQL
Server 2005.
Directly coming to the point..
You must have setup server(s) for your data warehouse. I wanted to know
whether you were involved in Server Sizing / Data Sizing. If yes, then can
you please guide me as to which method / template did you use to size your
data and based on it what hardware (with configuration) did you suggest /
decide?
Please help this is very critical and urgent!
Regards,
Bharat Tamhankar
"Joe" wrote:

>
> We are in process of using new Kimball toolkit book as our guide
> to create another DM. Since we only have one data source, we think we
> need to pull in all the tables into the staging area. At first we wanted
> to just pull in minimal data to populate the Dimension/Fact tables and any
> information needed to derive columns.
> However the business area wants a complete snapshot of ALL THE
> data for this given subject matter. Their reason is that sometimes our
> CUBES don't have all the information they want, and they want to be able
> to query the data mart tables against the staging area tables using SQL or
> Access. Since our data mart load times is first Saturday of the month -
> this argues in favor of this. If the load process was every day or once a
> week - it might be better to just query across the data mart and the
> reporting (replicated production) database - the the data mart tables
> would be older than the information they were linking against on the
> replicated report DB using the 1st Saturday approach.
>
> Questions:
> 1. Is this the correct approach?
> 2. According to a Microsoft diagram, the load goes from Source to
> Staging to OLAP DB - which is really just another OLTP DB but the tables
> are modeled as Dimension/Fact tables, right?
> 3. Our existing data mart which everyone mostly uses CUBES to
> access, has a big manual step. After all the ETL happens and the Dim/Fact
> tables have been updated - we are having to open up SSAS and process the
> cubes, the first business day after the 1st Saturday, to get them
> refreshed - can't this be automated within SSIS?
> 4. There have been some major production changes to our existing
> data source for our existing production data mart. All the ETL was done
> on SQL2k with stored proc's and DTS was only used as the policeman to
> control the flow. All the SCD processes were done and are working fine
> using SP's. We now have to make changes and are faced with having two
> complete different systems to manage - one with an ETL using the old
> method and the 2nd using SSIS. Some of the CUBES from the original data
> mart are going to be combined with our new cubes we are creating in our
> new system. To try and make it clearer - we have for example a Client
> dimension and a Client fact table that first is refreshed monthly with the
> old ETL, and now our new subject matter is now also going to affect the
> Client dimension and Client fact table that was created the first project.
> The old production changes it appears may be many weeks to adjust using
> the old ETL methods - so it begs the question, since all the ETL is
> working fine with DTS as flow master - is it perhaps better to get it all
> working with SSIS calling all the SP's and incorporate all our new ETL
> within SSIS using the new features that SSIS can offer?
> 5. When adding a 3rd subject matter data mart - is it a 3rd ETL
> package or just enhancing the one we have?
>
>
>
> ____________________
> Joe Horton
> SQL 2005 Business Intelligence
> Aging & Disability Services Administration
>
>
>
|||I'm sorry - the existing archetecture was in place before I arrived and as
my expertise is in coding not hardware - I wouldn't have been much help in
that area anyhow.
"Bharat Tamhankar" <BharatTamhankar@.discussions.microsoft.com> wrote in
message news:2B8F4494-BADC-4F47-86BC-383C5C07F0E6@.microsoft.com...[vbcol=seagreen]
> Hi Joe,
> I'm involved in a Data Warehouse project. Client is already invested in
> SQL
> Server 2005.
> Directly coming to the point..
> You must have setup server(s) for your data warehouse. I wanted to know
> whether you were involved in Server Sizing / Data Sizing. If yes, then can
> you please guide me as to which method / template did you use to size your
> data and based on it what hardware (with configuration) did you suggest /
> decide?
> Please help this is very critical and urgent!
> Regards,
> Bharat Tamhankar
>
> "Joe" wrote:

ETL / Data mart / Cube question

We are in process of using new Kimball toolkit book as our guide t
o create another DM. Since we only have one data source, we think we need t
o pull in all the tables into the staging area. At first we wanted to just
pull in minimal data to populate the Dimension/Fact tables and any informati
on needed to derive columns.
However the business area wants a complete snapshot of ALL THE dat
a for this given subject matter. Their reason is that sometimes our CUBES d
on't have all the information they want, and they want to be able to query t
he data mart tables against the staging area tables using SQL or Access. Si
nce our data mart load times is first Saturday of the month - this argues in
favor of this. If the load process was every day or once a week - it might
be better to just query across the data mart and the reporting (replicated p
roduction) database - the the data mart tables would be older than the infor
mation they were linking against on the replicated report DB using the 1st S
aturday approach.
Questions:
1. Is this the correct approach?
2. According to a Microsoft diagram, the load goes from Source to Stag
ing to OLAP DB - which is really just another OLTP DB but the tables are mod
eled as Dimension/Fact tables, right?
3. Our existing data mart which everyone mostly uses CUBES to access,
has a big manual step. After all the ETL happens and the Dim/Fact tables ha
ve been updated - we are having to open up SSAS and process the cubes, the f
irst business day after the 1st Saturday, to get them refreshed - can't this
be automated within SSIS?
4. There have been some major production changes to our existing data
source for our existing production data mart. All the ETL was done on SQL2k
with stored proc's and DTS was only used as the policeman to control the fl
ow. All the SCD processes were done and are working fine using SP's. We no
w have to make changes and are faced with having two complete different syst
ems to manage - one with an ETL using the old method and the 2nd using SSIS.
Some of the CUBES from the original data mart are going to be combined wit
h our new cubes we are creating in our new system. To try and make it clear
er - we have for example a Client dimension and a Client fact table that fir
st is refreshed monthly with the old ETL, and now our new subject matter is
now also going to affect the Client dimension and Client fact table that was
created the first project. The old production changes it appears may be m
any weeks to adjust using the old ETL methods - so it begs the question, sin
ce all the ETL is working fine with DTS as flow master - is it perhaps bette
r to get it all working with SSIS calling all the SP's and incorporate all o
ur new ETL within SSIS using the new features that SSIS can offer?
5. When adding a 3rd subject matter data mart - is it a 3rd ETL packag
e or just enhancing the one we have?
____________________
Joe Horton
SQL 2005 Business Intelligence
Aging & Disability Services AdministrationI think that for 1, 2 and 3 you're right - you can schedule cube
process withing SSIS (there is a specific task to do that).
For 4 and 5 it's hard to say - it depends from the cost of
implementation and the cost of maintenance... SSIS packages are easier
to maintain in the long term, but you could have an higher cost in the
short term for the initial design. Mixing SSIS and SP's is certainly
feasible, even if it makes the solution harder to manage and also
performance can suffert compared to a native SSIS solution. Anyway, it
all depends by other constraints (volume of data, time of development
available, and so on).
Marco Russo
http://www.sqlbi.eu
http://www.sqljunkies.com/weblog/sqlbi
On Feb 1, 6:16 pm, "Joe" <hortoris...@.gmail.dot.com> wrote:
> =B7 We are in process of using new Kimball toolkit book as our gu=
ide to create another DM. Since we only have one data source, we think we =
need to pull in all the tables into the staging area. At first we wanted t=
o just pull in minimal data to populate the Dimension/Fact tables and any i=
nformation needed to derive columns. =20
> =B7 However the business area wants a complete snapshot of ALL TH=
E data for this given subject matter. Their reason is that sometimes our C=
UBES don't have all the information they want, and they want to be able to =
query the data mart tables against the staging area tables using SQL or Acc=
ess. Since our data mart load times is first Saturday of the month - this =
argues in favor of this. If the load process was every day or once a week -=
it might be better to just query across the data mart and the reporting (r=
eplicated production) database - the the data mart tables would be older th=
an the information they were linking against on the replicated report DB us=
ing the 1st Saturday approach.
> Questions:
> 1. Is this the correct approach? =20
> 2. According to a Microsoft diagram, the load goes from Source to S=
taging to OLAP DB - which is really just another OLTP DB but the tables are=
modeled as Dimension/Fact tables, right? =20
> 3. Our existing data mart which everyone mostly uses CUBES to acces=
s, has a big manual step. After all the ETL happens and the Dim/Fact table=
s have been updated - we are having to open up SSAS and process the cubes, =
the first business day after the 1st Saturday, to get them refreshed - can'=
t this be automated within SSIS?
> 4. There have been some major production changes to our existing da=
ta source for our existing production data mart. All the ETL was done on S=
QL2k with stored proc's and DTS was only used as the policeman to control t=
he flow. All the SCD processes were done and are working fine using SP's. =
We now have to make changes and are faced with having two complete differe=
nt systems to manage - one with an ETL using the old method and the 2nd usi=
ng SSIS. Some of the CUBES from the original data mart are going to be com=
bined with our new cubes we are creating in our new system. To try and mak=
e it clearer - we have for example a Client dimension and a Client fact tab=
le that first is refreshed monthly with the old ETL, and now our new subjec=
t matter is now also going to affect the Client dimension and Client fact t=
able that was created the first project. The old production changes it ap=
pears may be many weeks to adjust using the old ETL methods - so it begs th=
e question, since all the ETL is working fine with DTS as flow master - is =
it perhaps better to get it all working with SSIS calling all the SP's and =
incorporate all our new ETL within SSIS using the new features that SSIS ca=
n offer? =20
> 5. When adding a 3rd subject matter data mart - is it a 3rd ETL pac=
kage or just enhancing the one we have?
> ____________________
> Joe Horton
> SQL 2005 Business Intelligence
> Aging & Disability Services Administration|||Hi Joe,
I'm involved in a Data Warehouse project. Client is already invested in SQL
Server 2005.
Directly coming to the point..
You must have setup server(s) for your data warehouse. I wanted to know
whether you were involved in Server Sizing / Data Sizing. If yes, then can
you please guide me as to which method / template did you use to size your
data and based on it what hardware (with configuration) did you suggest /
decide?
Please help this is very critical and urgent!
Regards,
Bharat Tamhankar
"Joe" wrote:

>
> We are in process of using new Kimball toolkit book as our guide
> to create another DM. Since we only have one data source, we think we
> need to pull in all the tables into the staging area. At first we wanted
> to just pull in minimal data to populate the Dimension/Fact tables and any
> information needed to derive columns.
> However the business area wants a complete snapshot of ALL THE
> data for this given subject matter. Their reason is that sometimes our
> CUBES don't have all the information they want, and they want to be able
> to query the data mart tables against the staging area tables using SQL or
> Access. Since our data mart load times is first Saturday of the month -
> this argues in favor of this. If the load process was every day or once a
> week - it might be better to just query across the data mart and the
> reporting (replicated production) database - the the data mart tables
> would be older than the information they were linking against on the
> replicated report DB using the 1st Saturday approach.
>
> Questions:
> 1. Is this the correct approach?
> 2. According to a Microsoft diagram, the load goes from Source to
> Staging to OLAP DB - which is really just another OLTP DB but the tables
> are modeled as Dimension/Fact tables, right?
> 3. Our existing data mart which everyone mostly uses CUBES to
> access, has a big manual step. After all the ETL happens and the Dim/Fact
> tables have been updated - we are having to open up SSAS and process the
> cubes, the first business day after the 1st Saturday, to get them
> refreshed - can't this be automated within SSIS?
> 4. There have been some major production changes to our existing
> data source for our existing production data mart. All the ETL was done
> on SQL2k with stored proc's and DTS was only used as the policeman to
> control the flow. All the SCD processes were done and are working fine
> using SP's. We now have to make changes and are faced with having two
> complete different systems to manage - one with an ETL using the old
> method and the 2nd using SSIS. Some of the CUBES from the original data
> mart are going to be combined with our new cubes we are creating in our
> new system. To try and make it clearer - we have for example a Client
> dimension and a Client fact table that first is refreshed monthly with the
> old ETL, and now our new subject matter is now also going to affect the
> Client dimension and Client fact table that was created the first project.
> The old production changes it appears may be many weeks to adjust using
> the old ETL methods - so it begs the question, since all the ETL is
> working fine with DTS as flow master - is it perhaps better to get it all
> working with SSIS calling all the SP's and incorporate all our new ETL
> within SSIS using the new features that SSIS can offer?
> 5. When adding a 3rd subject matter data mart - is it a 3rd ETL
> package or just enhancing the one we have?
>
>
>
> ____________________
> Joe Horton
> SQL 2005 Business Intelligence
> Aging & Disability Services Administration
>
>
>|||I'm sorry - the existing archetecture was in place before I arrived and as
my expertise is in coding not hardware - I wouldn't have been much help in
that area anyhow.
"Bharat Tamhankar" <BharatTamhankar@.discussions.microsoft.com> wrote in
message news:2B8F4494-BADC-4F47-86BC-383C5C07F0E6@.microsoft.com...[vbcol=seagreen]
> Hi Joe,
> I'm involved in a Data Warehouse project. Client is already invested in
> SQL
> Server 2005.
> Directly coming to the point..
> You must have setup server(s) for your data warehouse. I wanted to know
> whether you were involved in Server Sizing / Data Sizing. If yes, then can
> you please guide me as to which method / template did you use to size your
> data and based on it what hardware (with configuration) did you suggest /
> decide?
> Please help this is very critical and urgent!
> Regards,
> Bharat Tamhankar
>
> "Joe" wrote:
>

ETL "tool cost" question

I have been told that many ETL tools used to cost hundred of thousands of dollars. Is this true or were they referring to the ETL and Analysis tools, together? Can anyone provide some actual product names and estimated "costs" that hits these marks?

TIA,

barkingdog

Hi,

I know one. DataStage costs just $180,000 for 3 user license.

|||

It's difficult to get license costs for ETL products, as vendors typically offer a range of services / licenses / connectors / training etc.

Forrester Research published a good survey of the state of the market in 2004 - before SSIS was released. You can read a copy on Informatica's site - thanks INFA! http://www.informatica.com/solutions/resource_center/forrester_etl_wave04.pdf

The author of that research, also published an interesting note on SSIS - thanks Phil!: http://www.microsoft.com/sql/technologies/integration/foresterreport.mspx

Prices (and companies) have changed since then. However, SSIS still shows an exceptional price / performance and price / features ratio. (And don't forget that free database in every box.)

Donald

sql

ETL & BINARY_CHECKSUM?

We are currently populating our WH/DM using only tools availale in SQL2K
(i.e no Syncsort etc). Some of our modules (controlled via DTS) take,
understandably, a fair amount of time to processand I was ondering if there
was a trick was missing to speed things up. Example: -
In a data acquisition step for a Junk dimension (containing various state
flags) I extract CustomerURN and15 flags into a work table I then take
select the DISTINCT 15 flags into a table that will eventually be the
dimension for publication. I now need a cross-reference table so that other
processes can link on CustomerURN and pickup the dimension key. The obvious
SQL method is simply to join the tables on the 15 columns and O/P the xref
table - which takes a fair amount of time, but it does work.
Finally to the question, is there a quicker way of doing this? I have looked
at using BINARY_CHECKSUM but believe that this will not guarantee a unique
value that can be used for the match. Anybody got any ideas how to improve
on the method a using?
Thanks,
Paul
Hi Paul,
1. Checksum etc...A common problem.
We have released free utilities for win2000+ and one of them is a
utility to generate deltas given two versions of a file. You give it
yesterdays file and todays file (or just current version and previous
version) and it spits out Insert/Update/Delete records which must have
been applied to previous version to get current version. It is fully
null aware if you have nulls in your source data. We also publish the
source code for our free software and it is on the web site too. ( See
http://www.instantbi.com/Default.aspx?tabid=30 and the section on
Instant Data Warehouse utilities. Anyone is welcome to use our free
software...the utilities are things we have developed over many years
for our own projects so we think they are still useful...)
Checksums (CRCs) are frequently used but I choose not to use them
because there is still a chance to lose a change. And my feeling is
that even though it is 1 in 4 Billion I'd rather be sure.....;-)
2. Processing speeds.
Well depends on your tools...and how you design your jobs....we have
fee software in this area.....the biggest speed up you can get is
using files rather than using the database...
Files vs database is about 10 to 1. When we need to we use files as
much as possible for ETL. Life is much better now that machines are
faster but files are still the way to go if you need the
speed.......also for the really expensive processing we typically
load data into memory and use binary searches...in our fee software we
even do things like load data into memory mapped IO and have many
processes share the memory in the memory mapped IO...this gives us
great speed......
In our free software we do things like reformat the data so it can be
loaded via the database loaders...we also do things like run through
the input file and delete any rows that exist in the target table so
that the loader can be certain no constraint clashes will happen when
the loader runs....
These are all things that you have to write into DTS...I'm not sure
what is included in the next version of DTS...not keen to load 2005
betas onto my laptop.. ;-)
Hope this helps...
Peter Nolan
www.peternolan.com

ETL & BINARY_CHECKSUM?

We are currently populating our WH/DM using only tools availale in SQL2K
(i.e no Syncsort etc). Some of our modules (controlled via DTS) take,
understandably, a fair amount of time to processand I was ondering if there
was a trick was missing to speed things up. Example: -
In a data acquisition step for a Junk dimension (containing various state
flags) I extract CustomerURN and15 flags into a work table I then take
select the DISTINCT 15 flags into a table that will eventually be the
dimension for publication. I now need a cross-reference table so that other
processes can link on CustomerURN and pickup the dimension key. The obvious
SQL method is simply to join the tables on the 15 columns and O/P the xref
table - which takes a fair amount of time, but it does work.
Finally to the question, is there a quicker way of doing this? I have looked
at using BINARY_CHECKSUM but believe that this will not guarantee a unique
value that can be used for the match. Anybody got any ideas how to improve
on the method a using?
Thanks,
PaulHi Paul,
1. Checksum etc...A common problem.
We have released free utilities for win2000+ and one of them is a
utility to generate deltas given two versions of a file. You give it
yesterdays file and todays file (or just current version and previous
version) and it spits out Insert/Update/Delete records which must have
been applied to previous version to get current version. It is fully
null aware if you have nulls in your source data. We also publish the
source code for our free software and it is on the web site too. ( See
http://www.instantbi.com/Default.aspx?tabid=30 and the section on
Instant Data Warehouse utilities. Anyone is welcome to use our free
software...the utilities are things we have developed over many years
for our own projects so we think they are still useful...)
Checksums (CRCs) are frequently used but I choose not to use them
because there is still a chance to lose a change. And my feeling is
that even though it is 1 in 4 Billion I'd rather be sure.....;-)
2. Processing speeds.
Well depends on your tools...and how you design your jobs....we have
fee software in this area.....the biggest speed up you can get is
using files rather than using the database...
Files vs database is about 10 to 1. When we need to we use files as
much as possible for ETL. Life is much better now that machines are
faster but files are still the way to go if you need the
speed.......also for the really expensive processing we typically
load data into memory and use binary searches...in our fee software we
even do things like load data into memory mapped IO and have many
processes share the memory in the memory mapped IO...this gives us
great speed......
In our free software we do things like reformat the data so it can be
loaded via the database loaders...we also do things like run through
the input file and delete any rows that exist in the target table so
that the loader can be certain no constraint clashes will happen when
the loader runs....
These are all things that you have to write into DTS...I'm not sure
what is included in the next version of DTS...not keen to load 2005
betas onto my laptop.. ;-)
Hope this helps...
Peter Nolan
www.peternolan.com

ETL - cancel changes and going back to a previous state

On Dec 18 2006, 9:43 pm, "Chris Le Roquais"
<christophe.leroqu...@.gmail.com> wrote:
> Hi there,
> I would like to know if this is a usual practice to having an ETL able to
> cancel (rollback) the move of data that has been done into a destination.
> In other words, after having inserted data into a destination, can we go
> back to a previous state?
> Is that common to do that? How to do that? What do we need to consider
> before implementing such a feature? What tools/solutions are available to
> activate this feature?
> I personaly use MS SSIS as an ETL tool
> Thanks,
>
Christophe,
Common practice (See Kimball DW Toolkit) is to maintain an audit table
in the datawarehouse. Each row inserted/updated in the datawarehouse
is then assigned an AuditKey such that all records associated with a
Load can be identified.
Rgds,
Garrett
I'm just guessing here - but I would imagine SCD transforms would be
difficult to rollback - but not identify.
<garrett.fitzsimons@.gmail.com> wrote in message
news:1170776521.710994.166840@.m58g2000cwm.googlegr oups.com...
> On Dec 18 2006, 9:43 pm, "Chris Le Roquais"
> <christophe.leroqu...@.gmail.com> wrote:
> Christophe,
> Common practice (See Kimball DW Toolkit) is to maintain an audit table
> in the datawarehouse. Each row inserted/updated in the datawarehouse
> is then assigned an AuditKey such that all records associated with a
> Load can be identified.
> Rgds,
> Garrett
>
>

ETL - cancel changes and going back to a previous state

Hi there,
I would like to know if this is a usual practice to having an ETL able to
cancel (rollback) the move of data that has been done into a destination.
In other words, after having inserted data into a destination, can we go
back to a previous state?
Is that common to do that? How to do that? What do we need to consider
before implementing such a feature? What tools/solutions are available to
activate this feature?
I personaly use MS SSIS as an ETL tool
Thanks,
ChristopheOn Dec 18 2006, 9:43 pm, "Chris Le Roquais"
<christophe.leroqu...@.gmail.com> wrote:
> Hi there,
> I would like to know if this is a usual practice to having an ETL able to
> cancel (rollback) the move of data that has been done into a destination.
> In other words, after having inserted data into a destination, can we go
> back to a previous state?
> Is that common to do that? How to do that? What do we need to consider
> before implementing such a feature? What tools/solutions are available to
> activate this feature?
> I personaly use MS SSIS as an ETL tool
> Thanks,
>
Christophe,
Common practice (See Kimball DW Toolkit) is to maintain an audit table
in the datawarehouse. Each row inserted/updated in the datawarehouse
is then assigned an AuditKey such that all records associated with a
Load can be identified.
Rgds,
Garrett|||I'm just guessing here - but I would imagine SCD transforms would be
difficult to rollback - but not identify.
<garrett.fitzsimons@.gmail.com> wrote in message
news:1170776521.710994.166840@.m58g2000cwm.googlegroups.com...
> On Dec 18 2006, 9:43 pm, "Chris Le Roquais"
> <christophe.leroqu...@.gmail.com> wrote:
> Christophe,
> Common practice (See Kimball DW Toolkit) is to maintain an audit table
> in the datawarehouse. Each row inserted/updated in the datawarehouse
> is then assigned an AuditKey such that all records associated with a
> Load can be identified.
> Rgds,
> Garrett
>
>

eternal lock?

Hi,

does sb recognize (aspects of) the following problem? Or better, know a
solution or direction to search?

At work I've inherited a series of delphi applications that access a common
database using SQL Server 2000 (sp3, sp4 update in preparation due to this
problem). Applications run on one server, db on the second. Both are dual
xeon 2.8 or 3 GHz with 2 GB ram. The apps use about one GB (800MB) memory,
the db too (is configured to use more, 1.8GB, but doesn't.) The db is also
replicated to a third machine.

The problem is that sometimes, after a cascade of query timeouts (recorded
by the apps in the eventlog, cause is the commandtime set on all
components), the whole applications seems to stop responding. Restarting the
apps doesn't solve the problem, rebooting the application server does, which
leads me to believe the problem is in MDAC on the app server? The app server
has an own unused sql server instance (used in migrations) btw.

The problems occur during busier times, but nothing spectacular (up to
ten-thousand of queries per hour maybe).

The problem sometimes goes away after a few minutes in about half of the
cases, but if not, it seems perpetual till reboot (at least 13 hours).

Another notable point is that not all queries time out, most writes (which
append a row or change a row) seem to go ok, same with selects that get a
record for a primary key value, and pure read selects flagged with NOLOCK.
The queries that go wrong all get lists that touch central tables (either
directly or via joins).

The behaviour is consistent with an external row/page lock somewhere that
doesn't go away.

Database layout is fairly uninteresting. A db or 3 (one read-only), the
larger one having say 30 tables,

cardinality of the tables is not that much of a problem. Tens of thousands
of rows max, except a logging table with maybe 300000 tuples. (which is only
traversed for mgmnt info, and not during busy hours) No binary or other
disproportionally large fields, Most db access done based on primary/foreign
keys.

Other details:
- Replication overhead can be considered low (we are talking about
thousand(s) mutationsper day, nothing significant.
- commandtimeout on all db components is set (to 30s)
- all cursors are clientside, except the component used for getting lists,
that has
location=cluseserver; cursortype=ctopenforwardonly;
cachesize=250; locktype=readonly
- the apps are not threaded.
- D6 patched with all three patches

Thnks in advanceMarco van de Voort (marcov@.stack.nl) writes:
> The problem is that sometimes, after a cascade of query timeouts
> (recorded by the apps in the eventlog, cause is the commandtime set on
> all components), the whole applications seems to stop responding.
> Restarting the apps doesn't solve the problem, rebooting the application
> server does, which leads me to believe the problem is in MDAC on the app
> server? The app server has an own unused sql server instance (used in
> migrations) btw.

Have you examined blocking?

With this superficial information about the system it is difficult to
say for sure, but it does smell of a well-known gotcha (been there, done
that myself).

To wit, if a query times out, and there is a transaction in progress,
the transaction is not rolled back automatically. It is irrelevant
whether the transaction was started prior to the submission of the
query batch, or started within the query batch that timed out.

The application must handle this by submitting

IF @.@.trancount > 0 ROLLBACK TRANSACTION

in case of a query timeout.

If the application fails to observe this, the result is chaos.
Transactions never commits, which means that processes keeps on
acquiring more and more locks, and you get blocking galore. And
when you finally restart something, you lose all the updates...

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx|||On 2006-05-22, Erland Sommarskog <esquel@.sommarskog.se> wrote:
> To wit, if a query times out, and there is a transaction in progress,
> the transaction is not rolled back automatically. It is irrelevant
> whether the transaction was started prior to the submission of the
> query batch, or started within the query batch that timed out.
> The application must handle this by submitting
> IF @.@.trancount > 0 ROLLBACK TRANSACTION

I call the rollback of the relevant ADO connection on the exception caused
by the timeout.|||Marco van de Voort (marcov@.stack.nl) writes:
>> To wit, if a query times out, and there is a transaction in progress,
>> the transaction is not rolled back automatically. It is irrelevant
>> whether the transaction was started prior to the submission of the
>> query batch, or started within the query batch that timed out.
>>
>> The application must handle this by submitting
>>
>> IF @.@.trancount > 0 ROLLBACK TRANSACTION
> I call the rollback of the relevant ADO connection on the exception caused
> by the timeout.

So, did you investiagate if you have any blocking?

Also, I have you examined the SQL Server error log?

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx|||On 2006-05-23, Erland Sommarskog <esquel@.sommarskog.se> wrote:
> Marco van de Voort (marcov@.stack.nl) writes:
>>> To wit, if a query times out, and there is a transaction in progress,
>>> the transaction is not rolled back automatically. It is irrelevant
>>> whether the transaction was started prior to the submission of the
>>> query batch, or started within the query batch that timed out.
>>>
>>> The application must handle this by submitting
>>>
>>> IF @.@.trancount > 0 ROLLBACK TRANSACTION
>>
>> I call the rollback of the relevant ADO connection on the exception caused
>> by the timeout.

(for the record: I already did this all the time, it is not a change)

> So, did you investiagate if you have any blocking?

It occurs less than once a month (unfortunately on a painful moment). IOW, I
can't reproduce it at will. Which is why I asked if sb recognized the
problems.

> Also, I have you examined the SQL Server error log?

Yes, nothing. But maybe my logging settings are wrong.|||Marco van de Voort (marcov@.stack.nl) writes:
> It occurs less than once a month (unfortunately on a painful moment).
> IOW, I can't reproduce it at will. Which is why I asked if sb recognized
> the problems.

I'm afraid that without access to the real situation, it is difficult
to say that much intelligent. The fact that it occurs only rarely,
indicates that it is not a general problem with unhandled query timeouts.
But it still smells of transactions that are not committed/rolled back
properly.

The only thing I can suggest is that when it sets try to collect as
much data you can before the reboot panic sets in. I have a stored
procedure on my web site which is good for this purpose:
http://www.sommarskog.se/sqlutil/aba_lockinfo.html.

>> Also, I have you examined the SQL Server error log?
> Yes, nothing. But maybe my logging settings are wrong.

It's always good to have trace flags 1204 and 3605 enabled to get
deadlock information written to the error log, but that was not I
had in mind. I was thinking of error 17883, which indicates that
SQL Server is in bad shape at the moment. This diagnostic message
was added in SQL 2000 SP3, and was augmented in some hotfixes soon
thereafter. SP4 has an even wider set of these messages.

The fact that you don't have message 17883 in the log amplifies the
impression that the problem is related to the application.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspxsql

Eternal 'Enable Pending' on SQL2005

Hi!

I have this problem... im kinda new to sql2005, and im responsible for creating some notification services.

Ive done a lot of tutorials, books and samples and every time, when i reach the step of enabling the instance, i get this...

Generator-Enable Pending

Distributor-Enable Pending

Even doing the Microsoft's Notification Services Tutorial.

Ive tried everything I could figure, registering the instance with a user with every permission there could be... doing this local or on a client... and ive been 2 weeks stuck here...

Any tip or clue will be thankfull.

M.Eng.Leandro Melendez

Have you started the Windows Service for the instance? If you enable an instance when the service hasn't been started, it'll report a status of pending.

HTH...

|||

Well, i thought that i should start the windows service after enabling the instance... but ive tried also that, ive already tried to start it from the 'SQL Server MAngment Studio' and from 'Services' and both of them give me this message:

TITLE: System.ServiceProcess
Cannot start service NS$Tutorial on computer 'CROSS-AIPZ'.

ADDITIONAL INFORMATION:

The service did not respond to the start or control request in a timely fashion

Any clue what does this means?

|||

That's usually a permissions problem. Check to make sure that the service account has the appropriate permissions in the NS databases.

HTH...

|||Ive tried to give the user the permissions that the tutorial says, also ive tried to give the user every permission i could imagine... and nothing.... do you know what permissions i must give? maybe the tutorial is missing some one|||

Generally speaking, you want to make the login account used by the NS$InstanceName service (either window or SQL Server) a member of the NSRunService database role.

Check out this page in BOL for more detail.

ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/sqlntsv9/html/5a70a24c-25cf-4b77-a3a3-1cdf696d9262.htm

HTH...

Joe

|||

Here's another BOL link that may help, too.

ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/sqlntsv9/html/c20ece76-eca0-463d-a226-ba8e93c9fcb9.htm

Joe

|||

Yeah, i did that and everything i could...

After many searches ive found a comment saying that everything worked after reinstalling sql server 2005...

I did it, ununstall and install... and guess what... it just worked smoothly... any idea why does this happends?

Aniway thanks all

|||

Ouch! That's a pretty drastic step! Glad it worked and thanks for posting the solution.

Haven't had that problem before (thank goodness), so no...no idea what's going on there.

Joe

Estimeted Time for SP

Hi,
How can i know estimated time taken by a SP before actual execution ?
thankscheck from the Execution Plan.
best Regards,
Chandra
---
"DMP" wrote:

> Hi,
> How can i know estimated time taken by a SP before actual execution ?
> thanks
>
>|||Sorry , I think you don't understand my requirement.
Basically,I am calling a SP from VB and showing a progress bar till SP is
running.
How can i control progessBar.Value till SP running ?
Any Idea to increase the ProgressBar symetrically ......
thanks,
"Chandra" <Chandra@.discussions.microsoft.com> wrote in message
news:C44599AA-74D3-40B8-BC59-3F0C44B140C8@.microsoft.com...
> check from the Execution Plan.
>
> --
> best Regards,
> Chandra
> ---
>
> "DMP" wrote:
>|||Sorry,Ithink you don't understand my requirement.
Basically, I am calling a SP from VB and showing a progressbar till SP is
running.
How can i control to increase the progressbar value symetrically ?
Any idea ?
"Chandra" <Chandra@.discussions.microsoft.com> wrote in message
news:C44599AA-74D3-40B8-BC59-3F0C44B140C8@.microsoft.com...
> check from the Execution Plan.
>
> --
> best Regards,
> Chandra
> ---
>
> "DMP" wrote:
>|||DMP wrote:
> Hi,
> How can i know estimated time taken by a SP before actual execution ?
> thanks
There is no way to know how long a stored procedure (or any query) is
going to take to execute, short of performing some tests. In order to
display a progress bar, you'll need to use async execution. If it's a
short query, you're better off not worrying about the progress bar
(whether you use an async call or not). If your using .Net, then an
async call is not possible. You'll have to run it on another thread if
you don't want to tie up the main application thread during execution.
If you're fetching rows, you can provide some feedback on screen while
the fetching is taking place since by that time control has returned to
the application.
David Gugick
Imceda Software
www.imceda.com|||Hi
I don't think this is possible. What you can do is gather information about
how long it will normally take and then use that time as a guideline. If thi
s
is taking a long time you may think of changing the method of implementation
.
You may be able to split the process into smaller units or use a different
delivery mechanism that does not require the user to watch a progress bar
tick away!
John
"DMP" wrote:

> Sorry,Ithink you don't understand my requirement.
> Basically, I am calling a SP from VB and showing a progressbar till SP is
> running.
> How can i control to increase the progressbar value symetrically ?
> Any idea ?
>
> "Chandra" <Chandra@.discussions.microsoft.com> wrote in message
> news:C44599AA-74D3-40B8-BC59-3F0C44B140C8@.microsoft.com...
>
>|||Basically I call a SP from VB and showing a ProgressBar till SP execution.
How can I increase the progressbar symetrically ?
That is why I needed the sp execution time.
Or any good sugg.
Thanks,
"DMP" <debdulal.mahapatra@.fi-tek.co.in> wrote in message
news:O0BsDvfUFHA.3624@.TK2MSFTNGP10.phx.gbl...
> Hi,
> How can i know estimated time taken by a SP before actual execution ?
> thanks
>|||Although it doesn't give you percentage completion in time, you could track
where a proc is in its execution by having it write to a table at certain
points in the code. If you run it async as other people have pointed out,
then another thread could query your "progress" table and determine how ar
along the first proc is.
Pretty kludgy but might be what you want.
"DMP" wrote:

> Basically I call a SP from VB and showing a ProgressBar till SP execution.
> How can I increase the progressbar symetrically ?
> That is why I needed the sp execution time.
> Or any good sugg.
> Thanks,
> "DMP" <debdulal.mahapatra@.fi-tek.co.in> wrote in message
> news:O0BsDvfUFHA.3624@.TK2MSFTNGP10.phx.gbl...
>
>

estimation the buffer cache hit ratio

Hi

I have trouble with MSSQL2000 SP4 (without any hotfixes). During last two
weeks it start works anormally. After last optimalization (about few months
ago) it works good (fast, without blocks). Its buffer cache hit ratio was
about 99.7-99.8. Last day it starts work slow, there was many blocks and
dedlocks. There are no any queries, jobs and applications was added. Now
buffer cache hit ratio oscilate about 95-98. I try update statistics and
reindex some hard used tables, but there is no effect or effect is wery
short (after few hours problem return).

Mayby somene know what it could be?

Is it possible to estimate how each table (using DBCC SHOW_STATISTICS or
DBCC SHOWCONTIG or others) how the table affect on total buffer cache hit
ratio?

Marek

--
www.programowanieobiektowe.plMarek Wierzbicki (marek.wierzbickiiiii@.azymuttttt.pl) writes:

Quote:

Originally Posted by

I have trouble with MSSQL2000 SP4 (without any hotfixes). During last
two weeks it start works anormally. After last optimalization (about few
months ago) it works good (fast, without blocks). Its buffer cache hit
ratio was about 99.7-99.8. Last day it starts work slow, there was many
blocks and dedlocks. There are no any queries, jobs and applications was
added. Now buffer cache hit ratio oscilate about 95-98. I try update
statistics and reindex some hard used tables, but there is no effect or
effect is wery short (after few hours problem return).
>
Mayby somene know what it could be?


I would run Profiler and look for long-running queries. As your amount of
data grows and statistics changes, the optimizer may go for a new plan.

Theoretically, you could also run into that when the amount of data
increases over a threshold value, the memory does no longer suffice for the
typical mix of queries.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx

Estimation of Full Backup size

Has anyone come across a more accurate method than sp_spaceused to estimate the size of a full database backup for SQL Server 2000 ?

I have found this to have too great a variance (even after running updateusage) to rely on any accuracy for it. I have also looked at perhaps using the ALLOCATED Pages indicated in the GAM pages but this also seems to be pretty inaccurate.

I have a number of servers where space can be limited and backups using Maintenance Plans have occasionally failed because they delete the old backups AFTER they do the latest one. I am writing a script which can check the space remaining and adjust the backup accordingly but the variance I have observed so far with sp_spaceused is too great.

Any ideas welcomed.

Full backups copy all pages within allocated extents. So, there may be a few pages which are not allocated, but which are copied to the backup. Also, there is a small percentage of overhead in the backup format, but this should be predictable.

The most accurate estimate would be to use updateusage and then look at the reserved column of sp_spaceused.

What is the variance that you are seeing? It should be fairly small.

The real bottom line is that if you have servers which you know occasionally run out of backup space, they need to have storage added so that you can ensure the safety of your databases. You can minimize the space requirements by compressing the backups and/or copying them off the server as soon as they are created, but you need the space there.

|||

Thanks for that Kevin

I need to do a few more tests to confirm the variance that I am seeing - it seemed to be inconsistent for a couple of databases (shooting up to 4MB) after being within about 85KB - 90KB for all of the others so there may have been something else happening.

You mention all pages within allocated extents - does that mean that I could use the GAM or SGAM pages to count the number of allocated extents and use this as an estimate? I already have a method of doing this for estimating differential backups using the DIFF_MAP pages.

Martin

|||

The other component of a backup is that the portion of the log created during the execution of the backup is also backed up. This is required in order to get a consistent view of the data at a single point in time.

Depending on the activity level and size of your database, this could easily account for this much variability and more.

Lets look at this from a different angle:

What are your business requirements around backups that are driving your backup plans? If we understand those, we can perhaps better optimize your strategy. I find that the best way to craft a backup strategy is to start from the recovery requirements and work backwards to come up with a plan which will support those needs.

What are the sizes of the databases you are backing up?