NT Conference 2015, Portorož, Slovenia

Join us on May 18th, 2015 at  20th  NTK Conference in  Portorož, Slovenia.

I will be presenting a “Dive into SQL Azure Machine Learning” exploring in cloud based brother of SQL Server SSAS and exploring the ML algorithms supported with practical use cases.

After the session,… yes. Drinks, swimming and sun bathing and evening partieeeee 😉

See you in Portorož!

Fastest date tally table

This is by far the fastest way for a date tally table to create:

select
convert(nvarchar(10), dateadd(day, row_number() over (order by (select 1)) -1 , getdate()) , 104)
from master..spt_values

And the best part it can be used with any table.

Happy querying.

Unicorn injected my T-SQL code

SQL Injection is a very known problem. After brief unicorn discussion on twitter with @DBArgenis and @way0utwest. I came to work today to find my T-SQL Code being injected by a unicorn!

Yes. By a unicorn. I quickly managed to find a work-around.

unicorn

and a work-in-progress T-SQL code:

DECLARE @string VARCHAR(1000);

SET @string = '

						  *
                         *
                    *** *
                  ,******
       ,,...     .,***"**)
      *  ****sp_who***** ""
      .* *****************  
     "  .**"**      *  .*
        *** (*      *  "
         *   *)      *
         *,   **     **   
'			    

			
SET @STRING =  LTRIM(RTRIM(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(@string, ',',''),'(',''),')',''),'.',''),'"',''),'*',''),CHAR(32),''),CHAR(9),''),CHAR(10),''),CHAR(13),'')))
EXEC @STRING

I hope pixies and fairies will not pay me a visit tomorrow.

Connection Manager in SSIS

Connection Manager is SSIS technology being used and a location of the data being used.

connection_manager

It has a type of data source (or OLE DB Provider), Authentication (which is Identity to be used to connect to this data Source) and initial catalog. This connection manager has Properties window,

detailed_properties_of_connection_manager

where ConnectionString is defined, DataSourceID, InitialCatalog and Authentication.

This connection manager is used to connect to several different providers and connection types. It can be OLE DB connection, ODBC connection, a flat file connection, ADO.NET connection, different Analysis services connection, or any other connection (DQS, FTP, HTTP, etc.) . By default connection manager in SSIS supports:

Connection_managers_types_SSIS

Also new connections can be added here, based on your local ODBC Data Source Administrator.

In SSIS there is also Project Connection manager available, which enables several different packages to use same data source connection. On the other hand you can have a package level connection manager, which will be used only for this particular package.

In terms of architecture, deployment and big picture, a parametrization of connection strings. This is new to SQL Server 2012 SSIS and deserves a special post.

Measure types in fact table

Fact table holds the measures data for measuring the performance of your business. Your business might be sales, purchasing, inventory, logistics, banking, telephony data, and many more. Normally, one fact table represent a line of core business and it usually takes more fact tables to cover more complex business aspects (purchasing, selling, sales, etc.) Nevertheless, a fact table holds the necessary key metrics (based on business rules) for measuring the a line of business performance and setting up main and relevant key performance indexes (KPI).

Based on line of business, measures types in fact tables can be: fully-additive, semi-additive or non-additive.

Fully-additive (short: additive) fact table holds measures that are (and can be) grouped, summed through all the dimensions.

Semi-additive fact table holds measures that can be grouped, summed or aggregated through some dimensions but not all the dimensions (commonly: time dimension.

Non-additive fact table holds measures that can not be grouped, summed or aggregated in any aspect. These measures are normally derived and calculated measures, such as percentages, ratios, running sums  or any similar measures that are typical done on BI layer against the cube using MDX and not stored as a measure in fact table.

Be aware, a fact table can have all three types of measures (additive, semi-additive and non-additive) and therefore, BW designer must know the business in order to have measure type correctly defined.

Of course, all measures can be summed or grouped, but the outcome and results will be absurd. Usually – a rule of thumb – holds, that semi-additive or non-additive already hold some business rule, because both already have some logic applied to.

For better clarification, this is an example of customer purchases from a normal fact table.

Measure_types

All measures can be summed, but do you want to do this? NO! So clearly the only fully-additive measure is transaction_value, which can be summed and grouped along all dimensions (Date, CustomerID,…).

Semi-additive measure is credit_left, because it can not be grouped or summed along Date  (time) dimension, because each value of this measure is a snapshot from a certain time period. Imagine, grouping this measure, the more customer would be buying, the more credit he would have.

And non-additive is in this case a calculated measure Percent_of_ABC_products_bought, which can not be aggregated on none of the dimensions, but is represented for each line of fact table as it is.

Type of dimensions in Business Warehouse (BW)

Several types of dimensions in your Business warehouse:

Conformed: dimension shared among several fact tables (e.g.: a single product categorization that is used by all business aspects).

Non-Conformed: dimension targeted to a single fact table dimension where part of attributes is used in some part of business and different attributes used in different part of business. Cross-overs are possible. (e.g.: a buying product dimension and sales product dimension, where different aspects of business are using different attributes of a same entity – in this case same product).

Shared: dimension is used by several fact tables. Dimension key is stored in fact table but the value is stored in dimension table with other attributes. It is used by multiple measures.

Degenerated: dimension is when dimension attribute is stored directly into fact table and not in dimension table. It has no specific key and no other attribute.

Slowly changing: dimension where attribute value is changed over time and the change is retained in the dimension table based on type of slowly changing dimension (CSD):

Type I : Attributes history is not retained but updated. E.g.: Customer moved from place A to place B and all purchases will be connected to place B; also purchases from place A. This is suitable for time series and forecasts.

Type II : Change in attribute value creates a new record; Two records are showing purchases connected to place A and purchases connected to place B. In data analysis, user has option of selecting based on type of analysis (historical vs. forecasting).

Type III : Change in attribute creates only one record; original and last (current or the latest) value to this attribute. E.g.: this record would hold the date and name of the first place; place A and the latest move to place B. If more moves have occurred between place A and B, only the latest would be stored, namely place B.

Time dimension, hierarchies, …

Speaking at SQL Saturday Lisbon 2015

So excited to be speaking at SQL Saturday Lisbon this year. I am sure, it will be a great event in a beautiful city, lovely people and great food. (not to mention my favorite – coffee).

My session will be about making customer segmentation using SSAS and SQL Server. One of these things, people usually like to talk about, but nobody is doing it. Hands on session will explore all the steps and statistics and mathematics behind.

sqlsatLisbon

Everybody is welcome to participate, especially listeners who will come with a great Portuguese coffee 🙂

See you!