Saturday, November 9, 2013

Choosing a Data Type for Keys (Identity, Sequence object, Nonsequential GUI, Sequential GUI)

The typical options people use to generate key columns are:
  • The identity column property A property that automatically generates keys in an attribute of a numeric type with a scale of 0; namely, any integer type (TINYINT, SMALLINT, INT, BIGINT) or NUMERIC/DECIMAL with a scale of 0.
  • The sequence object An independent object in the database from which you can obtain new sequence values. Like identity, it supports any numeric type with a scale of 0. Unlike identity, it’s not tied to a particular column; instead, as mentioned, it is an independent object in the database. You can also request a new value from a sequence object before using it.
  • Nonsequential GUI Ds You can generate nonsequential global unique identifiers to  be stored in an attribute of a UNIQUEIDENTIFIER type. You can use the T-SQL function NEWID to generate a new GUID, possibly invoking it with a default expression attached to the column. You can also generate one from anywhere—for example, the client  by using an application programming interface (API) that generates a new GUID. The GUIDs are guaranteed to be unique across space and time.
  • Sequential GUI Ds You can generate sequential GUIDs within the machine by using the T-SQL function NEWSEQUENTIALID.
  • Custom solutions If you do not want to use the built-in tools that SQL Server provides to generate keys, you need to develop your own custom solution. The data type for the key then depends on your solution.

Why include another conversion function– PARSE () and TRY_PARSE ()

PARSE is new T-SQL and relies on the .NET framework Common language Runtime (CLR). This function same as CAST and CONVERT with the additional parameter called culture.
Syntax:
There are two required arguments and one optional argument that are passed into the function.
  • String_value: nvarchar(4000) value representing the formatted value to parse into the specified data type.
  • Data_type: Literal value representing the data type requiested for the result.
  • Culture: (Optional) string that identifies the culture in which string_value is formatted.





here is a another new function introduce with SQL 2012:  TRY_PARSE()

The different between PARSE() and TRY_PARSE()  is, if we get one or more incorrect values, PARSE() function will through an error.  However, if we use TRY_PARSE() function, it will not throw error but will return the result as NULL.




Why include another conversion function? The PARSE over CAST or CONVERT is the ability of convert the string to a numeric or date and time value without have to use string functions to manipulate the input. For example, if you take date, based on region position of the day, month, and year may change. The following example clearly illustrates the usefulness. Even first two string values are different; due to culture option output is same. In second and third query both value are same but the output different, due to the different culture.
us_english : en-US
British : en-GB



I also would write other conversion functions as well:
Syntax for the CAST and CONVET






Note: TRY_CONVERT() also new function introduced with SQL Server 2012 version.
References:



Cheers!

Friday, November 8, 2013

MDX for KPI

Returns the member that calculates the value of the specified Key Performance Indicator (KPI)


Common syntax : KPIValue(KPI_Name)


Example 1: Here “Internet Revenue” is a KPI name.




Example 2: 


Cheers!

Thursday, November 7, 2013

Basic MDX Queries

The general syntax for an MDX query is as follows:



As my mention in my previous post, the terms Columns and Rows are simply aliases for the true names of the axes: Axis(0) and Axis(1) respectively. Technically, an MDX query can have up to 128 axes, with aliases for the first five:
1.                   Columns
2.                   Rows
3.                   Pages
4.                   Sections
5.                   Chapters



The following all three queries are same:

SELECT
Measures.[Internet Sales Amount] ON COLUMNS,
[Customer].[Country].MEMBERS ON ROWS,
[Product].[Product Line].MEMBERS ON PAGES
FROM [Adventure Works]


SELECT
Measures.[Internet Sales Amount] ON 0,
[Customer].[Country].MEMBERS ON 1,
[Product].[Product Line].MEMBERS ON 2
FROM [Adventure Works]


SELECT
Measures.[Internet Sales Amount] ON AXIS(0),
[Customer].[Country].MEMBERS ON AXIS(1),
[Product].[Product Line].MEMBERS ON AXIS(2)
FROM [Adventure Works]


Note: In MDX always lower axes is must, without COLUMNS can’t have ROWS or PAGES.





Run the below query, you will find the error: 



Here [M] is mentioning [Mountain]
The above query is return error, because we can use the same dimension in ROWS axis, so we can filter there.
For slicing, we can use other dimensions. For example if you want to filter for year 2007 sales amount.

The WITH Clause and Calculated Members



The MDX with clause provides you with the ability to create calculations and use them within the context of the query. In addition, you can also retrieve data from outside the context of the current cube using LookupCube MDX function.
Named Sets


This query is not too lengthy, but you can imagine a query that would contain a lot of members and functions being applied to this specific set several times within the query. Instead of specifying the complete set every time it ’ s used in the query, you can create a named set and then use it in the query as follows:

Calculated Members


Calculated members are calculations specified by MDX expressions. They are resolved as a result of MDX expression evaluation rather than just by the retrieval of the original fact data.


One other example: Ordering inside the name set


Cheers!

MDX basic concepts and syntax

In cube unique name for a member is dependent upon the dimension properties such as MemberUniqueNameStyle and HierarchyUniqueNameStyle. The algorithm is determining the unique name of members.


We can access members of a dimension using the name path (using name of the member) or the key path (using the key of the member).



We can access a member in a dimension with its dimension name, hierarchy name, and level name.


For example, member Q1 CY 2004 in the Calendar hierarchy is represented as:
[Date].[Calendar].[Calendar Quarter].[Q1 CY 2004]


The following three representations are also valid:
  • Hierarchy.Memeber name can be used whereas there are no two members with the same name. If Quarter name is called Q1 we can’t use this, because every year this Q1 repeats, so we should use level name.
                [Date].[Calendar].[Q1 CY 2004]


  • This is using all the members in the path.
            [Date].[Calendar].[CY 2004].[H1 CY 2004].[Q1 CY 2004]

  • Uses the key path.
            [Date].[Calendar].[Calendar Quarter].&[2004]&[1]

The keys of the members in a path represented as &[membername]. When we use the key path, the members are always preceded with the ampersand (&) symbol.
Ex: To access Canada member
[Customer].[Country].&[Canada]





Tuples: A tuple uniquely identifies a cell or a section of cube. A tuple is represented by one member from each dimension, separated by a comma, and is enclosed within parenthesis ().

For Example: Where Country=Australia AND Product=Mountain AND Period= Q1 CY 2004
([Date].[Calendar].[2004].[H1 CY 2004].[Q1 CY 2004], [Product].[Product Line].[Mountain],[Customer].[Country].[Australia])


A collection of tuples forms a new object called a set.


Set: An MDX set is a collection of tuples that are defined using the exact same set of dimensions, both in type and member. A set is specified within curly brace character ({ })
Ex: {(customer.Country.Australia),(Customer.Country.Canada)}

Wednesday, November 6, 2013

Dimension table design - when should have Parent Child structure

One of the important parts in OLAP design is to design the dimension table structure, because this design affects the Hierarchies in the cube. While I go through the AdventureWorksDW2012, I noticed that the difference table structure between the DimEmployee and DimGeography dimensions, the difference is DimGeography table not contains Parent Key replationship, but DimEmployee has the parent key relationship.

The reason that I found that, in a hierarchy like Geography would provide balanced hierarchy for the United States. Completely balanced hierarchy means that all the leaf (bottom level) notes would be an equal distance of top level. Some hierarchies are in dimensions can have an unbalanced distribution of leaf nodes relative to the top level. Such hierarchies are called unbalanced hierarchies. Employee structure is an example of unbalanced hierarchy. There are different depths to the hierarchy of the Employee, such as General Manager might have unit managers and administrative assistance, under unit managers might have supervisors, whereas administrative assistant would not have any direct reports. Some hierarchies are typically balanced but are missing a unique characteristic of some members in a level. Such hierarchies are called ragged hierarchies. If you take United States, Country (USA)-->State(Washington)-->City(Seattle), whereas in some countries like Greece only Country --> City, so here State level is missing.

Some example of completely balanced hierarchy: Geography, Date

Some example of unbalanced hierarchies: Organization, Employee

Cheers!

Finding difference between two dates considering Month and Date

In this post, I am going to explain about problem of calculating duration from the two dates. One of the real scenarios I faced was to calculate the Tenure (End Date - Start Date) and categorized into some grouping as follow:


'<3 months'     DATEDIFF(MM,StartDate,ISNULL(EndDay,GETDATE())) <3
'3-6 months'   DATEDIFF(MM,StartDate,ISNULL(EndDay,GETDATE())) <6
'6-12 months' DATEDIFF(MM,StartDate,ISNULL(EndDay,GETDATE())) <12
'1-2 years'    DATEDIFF(MM,StartDate,ISNULL(EndDay,GETDATE())) <24
'2-3 years'    DATEDIFF(MM,StartDate,ISNULL(EndDay,GETDATE())) <36
'3-5 years'    DATEDIFF(MM,StartDate,ISNULL(EndDay,GETDATE())) <60
'5-10 years'   DATEDIFF(MM,StartDate,ISNULL(EndDay,GETDATE())) <120
'>10 years'    DATEDIFF(MM,StartDate,ISNULL(EndDay,GETDATE())) >120


For example, if you look at the below query, it returns 12 so this is go under the 1-2 years group, but if you look at the dates fatefully, you will see that actual difference is less than year, because starting date is 28th Feb and End Date is 10th Feb, so different is less than a year. Because this query is not consider the date part, it only consider the Month part.





To solve this problem, instead of writing a function, can write a nested CASE statement as below:

CASE
WHEN DATEPART(day, StartDate) > DATEPART(day, EndDate)
THEN
CASE
WHEN DATEDIFF(MM,StartDate,ISNULL(EndDate,GETDATE()))-1 <3 THEN '<3 months'
WHEN DATEDIFF(MM,StartDate,ISNULL(EndDate,GETDATE()))-1  <6  THEN '3-6 months'
WHEN DATEDIFF(MM,StartDate,ISNULL(EndDate,GETDATE()))-1  <12  THEN '6-12 months'
WHEN DATEDIFF(MM,StartDate,ISNULL(EndDate,GETDATE()))-1  <24  THEN '1-2 years'
WHEN DATEDIFF(MM,StartDate,ISNULL(EndDate,GETDATE()))-1  <36  THEN '2-3 years'
WHEN DATEDIFF(MM,StartDate,ISNULL(EndDate,GETDATE()))-1  <60  THEN '3-5 years'
WHEN DATEDIFF(MM,StartDate,ISNULL(EndDate,GETDATE()))-1  <120  THEN '5-10 years'
ELSE  '>10 years'
END
ELSE
CASE
WHEN DATEDIFF(MM,StartDate,ISNULL(EndDate,GETDATE())) <3 THEN '<3 months'
WHEN DATEDIFF(MM,StartDate,ISNULL(EndDate,GETDATE())) <6  THEN '3-6 months'
WHEN DATEDIFF(MM,StartDate,ISNULL(EndDate,GETDATE())) <12  THEN '6-12 months'
WHEN DATEDIFF(MM,StartDate,ISNULL(EndDate,GETDATE())) <24  THEN '1-2 years'
WHEN DATEDIFF(MM,StartDate,ISNULL(EndDate,GETDATE()))<36  THEN '2-3 years'
WHEN DATEDIFF(MM,StartDate,ISNULL(EndDate,GETDATE())) <60  THEN '3-5 years'
WHEN DATEDIFF(MM,StartDate,ISNULL(EndDate,GETDATE()))<120  THEN '5-10 years'
ELSE  '>10 years'
END
END
AS Tenure
Cheers!