Using Python with Microsoft Reporting Services (SSRS)

Using Python with SQL Server 2017 SSRS should not be an obstacle, since the framework and the technology is the same as with R language. With SQL Server 2017, Python got a full and functional support for native SSRS.

The usability and functionality of Python is simply immense.  This blog post will be dedicated on using Python for data science in SSRS

Most of the samples  will be derived from the Adventure-Works, both SQL Server sample datasets are available at Microfost Github.

Selecting Python results with SSRS

External procedure sp_execute_external_script within SQL Server database using Python language does not need to have the ODBC drivers or python modules like pyodbc or sqlachemy for extracting or writing data between Sql server engine and python engine, the only module needed is Python pandas, since the communication between sql server requires data frames for exposing data to python (from sql server), respectively.

-- Check Python runtime
EXECUTE sp_execute_external_script
       @language =N'Python'
      ,@script=N'OutputDataSet = InputDataSet'
      ,@input_data_1 = N'SELECT 1 AS result'
WITH RESULT SETS
((
result INT
))
GO

Please note: Python code, as specified with @script parameter, must be validated and is sensitive to indentation. So single missing (or excessive) indent might cause the complete query to fail.

We will introduce the input parameters for better python data selection. At the same time, this parameter will be used as input parameter in SSRS. T-SQL query with python script:

-- query parametrization
DECLARE @MaritalStatusIN CHAR(1) = 'S' -- S/M Single/Married

EXECUTE sp_execute_external_script
@language =N'Python',
@script = N'
import pandas as pd
df = InputDataSet
df_gender = df[''MaritalStatus''] == MaritalStatus
df_gen = df[df_gender]
correlation = df_gen.corr(method=''pearson'')
OutputDataSet = pd.DataFrame(correlation, columns=["nof","age"])',
@input_data_1 = N'SELECT 
               COUNT(*) AS nof,MaritalStatus,age 
               FROM AdventureWorksDW2014.dbo.vTargetMail
               WHERE age < 100
               GROUP BY maritalstatus,age'
,@params = N'@MaritalStatus CHAR(1)'
,@MaritalStatus = @MaritalStatusIN

WITH RESULT SETS 
(( 
   CountObs FLOAT
  ,Age FLOAT 
));
GO

For the purposes of SSRS, I will encapsulate the upper code into stored procedure with one input parameter – that is Marital status. This same parameter will be propagated into python script.

EXEC SelectingPythonResults
      @MaritalStatusIN = 'S'

With results presented through SSRS with option of selecting the values.

2018-11-18 18_47_03-01. Selecting Python Results - SQL Server 2017 Reporting Services

Predicting and scoring with Python in SSRS

In addition to selecting filtered parameters in SSRS, we can also check how our prediction model is working.

For this manner, we will create a slightly different subset and we will try to predict the number of cars owned by the customers in the dataset, based on the age and the daily commute distance. Query for the dataset will the following:

--- Query for scoring the model
SELECT 
   age 
  ,NumberCarsOwned
  ,CAST(REPLACE(LEFT(CommuteDistance,2),'-','') AS TINYINT) as CommuteDistance
FROM AdventureWorksDW2014.dbo.vTargetMail
WHERE
    age < 100

2018-11-18 19_14_46-Python-SQL Server.sql - TOMAZK_MSSQLSERVER2017.SQLPY (TOMAZK_Tomaz (55))_ - Micr

We will run a simple linear regression with dependent (y) variable = NumberCarsOwned and age + CommuteDistance both as X variables.  We will store couple of different versions of prediction model in database, trained on different size of training subsets; 20%, 30%, 40% and 50% of the original dataset for training purposes. All four different models will be serialized and stored in a table.

Let’s create a table to store serialized models:

-- creating table for storing models
CREATE TABLE PredictingWithPy_models
(model VARBINARY(MAX)
,modelName VARCHAR(100)
,trainSize FLOAT
)

Now, we will  use Scikit-learn python module to train the model based on different subsets of original dataset.

CREATE OR ALTER PROCEDURE [dbo].[RunningPredictionsPy] 
(
  @size FLOAT --- format: 0.3 or 0.4 or 0.5 
 ,@name VARCHAR(100) 
 ,@trained_model varbinary(max) OUTPUT
)
AS
BEGIN
EXEC sp_execute_external_script
  @language = N'Python'
 ,@script = N'
import numpy as np
import pandas as pd
import pickle
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

##Create SciKit-Learn linear regression model
X = df2[["age", "CommuteDistance"]]
y = np.ravel(df2[["nofCars"]])
name = name
##Create training (and testing) variables based on test_size
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=size)

## CreateLinear Model
SKL_lr = LinearRegression()
linRegObj = SKL_lr.fit(X_train, y_train)

##Serialize model
trained_model = pickle.dumps(linRegObj)'

 ,@input_data_1 = N'SELECT * FROM vTargetMail'
 ,@input_data_1_name = N'df2'
 ,@params = N'@trained_model varbinary(max) OUTPUT, @size FLOAT, @name VARCHAR(100)'
 ,@trained_model = @trained_model OUTPUT
 ,@size = @size
 ,@name = @name
END;
GO

And executing this procedure to store trained models in previously created table:

DECLARE @model VARBINARY(MAX);
EXEC [RunningPredictionsPy] 
    @size = 0.2  
   ,@name = 'Ln_20'
   ,@trained_model = @model OUTPUT
INSERT INTO PredictingWithPy_models (model, modelName, trainSize) 
VALUES(@model, 'Ln_20', 0.2);
GO

We repeat, so that at the end, we end up with percentage for 20, 30, 40 and 50; complete list is available at GitHub. At the end, the table with stored models should hold 4 lines with serialized models.

2018-11-18 20_43_55-Python-SQL Server.sql - TOMAZK_MSSQLSERVER2017.SQLPY (TOMAZK_Tomaz (55))_ - Micr

Now that we have trained the models, we want to predict the results, using SSRS.

For this manner, we will need another stored procedure, that will have X variables (independent variables: age and Commutedistance) as input parameters, besides selected model and have the prediction returned (Y variable – Number of cars owned).

Following procedure will be used in SSRS:

-- CREATE Procedure to predict the number of cars owned
CREATE OR ALTER PROCEDURE [dbo].[RunningPredictionWithValesPy] 
(
   @model varchar(100)
  ,@age INT
  ,@commuteDistance INT
)
AS
BEGIN

DECLARE @modelIN VARBINARY(MAX) = 
    (SELECT model FROM PredictingWithPy_models WHERE modelName = @model)
-- Create a T-SQL Query 
DECLARE @q NVARCHAR(MAX) = N'SELECT '+CAST(@age AS VARCHAR(5))+' AS age, 
         '+ CAST(@commuteDistance AS VARCHAR(5))+' AS CommuteDistance'

-- Store the new values for prediction in temp table
DROP TABLE IF EXISTS #t
CREATE TABLE #t (age INT, CommuteDistance INT)
INSERT INTO #t (age, CommuteDistance)
EXEC sp_executesql @q

EXEC sp_execute_external_script
@language = N'Python'
,@script = N'
import pickle
import numpy as np
import pandas as pd
from sklearn import metrics

##Deserialize model
mod = pickle.loads(modelIN)
X = InputDataSet[["age", "CommuteDistance"]]

##Create numpy Array when you introducte more values at the same time (bulk prediction)
predArray = mod.predict(X)
#predList = []
#for i in range(len(predArray)):
#     predList.append((predArray[i])[1])

OutputDataSet = pd.DataFrame(data = predArray, columns = ["predictions"])' 
  ,@input_data_1 = N'SELECT * FROM #t'
  ,@input_data_1_name = N'InputDataSet'
  ,@params = N'@modelIN varbinary(max)'
  ,@modelIN = @modelIN
WITH RESULT SETS 
((
prediction_Score FLOAT
));
END
GO

Checking for the procedure to work fine, the following T-SQL query will run the predictions against new dataset:

EXEC [RunningPredictionWithValesPy]
   @model = 'Ln_30'
  ,@age = 44
  ,@commuteDistance = 1

We can integrate the procedure into reporting services (SSRS) and have the combination of the input values displayed with the selected model (size of test set):

2018-11-18 21_17_44-02. Running Predictions - SQL Server 2017 Reporting Services

with the results of predictions in SSRS or in SSMS; both return same results:

2018-11-18 21_45_31-Python-SQL Server.sql - TOMAZK_MSSQLSERVER2017.SQLPY (TOMAZK_Tomaz (55)) - Micro

 

All reports featured here will also be available at GitHub.

Visualizing results with Python in SSRS

As we have created four different models, we would also like to have the accuary of the model visually represented using SSRS.

Showing plots created with Python might not be as straight forward, as with R Language.

Following procedure will extract the data from database and generate plot, that can be used and visualized in SSRS.

CREATE OR ALTER PROCEDURE [dbo].[VisualizeWithPyR2] 
(
     @inputVariable VARCHAR(100)
)
AS
BEGIN

DECLARE @q NVARCHAR(MAX) = N'SELECT '+CAST(@inputVariable AS VARCHAR(50))+' AS val1 FROM vTargetMail'
-- Store the values in temp table
DROP TABLE IF EXISTS #t
CREATE TABLE #t (val1 FLOAT)
INSERT INTO #t (val1)
EXEC sp_executesql @q

EXEC sp_execute_external_script
  @language = N'Python'
  ,@script = N'
import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt

fig = plt.figure(figsize=(12, 5))
plt.plot(plotdataset)
fig.savefig(''C:\\\PyGraphs\\firstGraph.png'') 
OutputDataSet = pd.DataFrame(data =[1], columns = ["plot"])' 
,@input_data_1 = N'SELECT * FROM #t'
,@input_data_1_name = N'plotdataset'
WITH RESULT SETS 
((
    plot INT
));
END
GO

 

So far, I have found this as to be the simplest way – saving the plot on same server, where SSRS is running and exposing the disk path to SSRS. Python stores the graph to disk on the host / client / server with following command:

fig = plt.figure(figsize=(12, 5))
plt.plot(plotdataset)
fig.savefig(''C:\\\PyGraphs\\firstGraph.png'')

Resulting in SSRS as:

2018-11-18 22_59_18-03. Visualizing with Python - SQL Server 2017 Reporting Services

In Report builder, one additional setting worth mentioning is the image properties, where the path to the saved plot should be the same as the one presented in Python code. Savefig function in python is pointing to same location as the external source for this image setting, with function as: file://C://PyGraphs//firstGraph.png.

2018-11-18 23_01_10-Image Properties

 

Evaluating all four linear regression models, based on the size of the test sample size, can be calculated using R2 and accompanying python code:

from sklearn.metrics import r2_score
def R2EvalMod(test_sizeIN):
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_sizeIN)
    SKL_lr = LinearRegression()
    linRegObj = SKL_lr.fit(X_train, y_train)
    predArray = linRegObj.predict(X_test)
    R2 = r2_score(y_test, predArray) 
    return R2

sql_conn = pyodbc.connect('DRIVER={ODBC Driver 13 for SQL Server};SERVER=TOMAZK\\MSSQLSERVER2017;DATABASE=SQLPY;Trusted_Connection=yes') 
query3 = '''
SELECT trainSize FROM [dbo].[PredictingWithPy_models]
'''
df3 = pd.read_sql(query3, sql_conn)

R2List = []
for i in range(len(df3)):
    z = df3.at[i, 'trainSize']
    R2=R2EvalMod(z)
    R2List.append(([z],[R2]))
 
labels = ['trainsize', 'R2']
df33 = pd.DataFrame.from_records(R2List, columns=labels)

With the fact that, better the model the higher R2 is – between 0 and 1 (also negative -1 to 0), showing that the model R2 is best when using 30% of dataset for testing and 70% for training.

2018-11-18 23_23_57-Book1 - Excel

 

As always, code is available at GitHub.

Happy Pythoning.

Tagged with: , , , ,
Posted in Uncategorized
One comment on “Using Python with Microsoft Reporting Services (SSRS)
  1. […] Tomaz Kastrun shows how we can visualize results from Python models in SQL Server Reporting Services: […]

    Like

Leave a comment

Follow TomazTsql on WordPress.com
Programs I Use: SQL Search
Programs I Use: R Studio
Programs I Use: Plan Explorer
Rdeči Noski – Charity

Rdeči noski

100% of donations made here go to charity, no deductions, no fees. For CLOWNDOCTORS - encouraging more joy and happiness to children staying in hospitals (http://www.rednoses.eu/red-noses-organisations/slovenia/)

€2.00

Top SQL Server Bloggers 2018
TomazTsql

Tomaz doing BI and DEV with SQL Server and R, Python, Power BI, Azure and beyond

Discover WordPress

A daily selection of the best content published on WordPress, collected for you by humans who love to read.

Revolutions

Tomaz doing BI and DEV with SQL Server and R, Python, Power BI, Azure and beyond

tenbulls.co.uk

tenbulls.co.uk - attaining enlightenment with the Microsoft Data and Cloud Platforms with a sprinkling of Open Source and supporting technologies!

SQL DBA with A Beard

He's a SQL DBA and he has a beard

Reeves Smith's SQL & BI Blog

A blog about SQL Server and the Microsoft Business Intelligence stack with some random Non-Microsoft tools thrown in for good measure.

SQL Server

for Application Developers

Business Analytics 3.0

Data Driven Business Models

SQL Database Engine Blog

Tomaz doing BI and DEV with SQL Server and R, Python, Power BI, Azure and beyond

Search Msdn

Tomaz doing BI and DEV with SQL Server and R, Python, Power BI, Azure and beyond

R-bloggers

Tomaz doing BI and DEV with SQL Server and R, Python, Power BI, Azure and beyond

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Data Until I Die!

Data for Life :)

Paul Turley's SQL Server BI Blog

sharing my experiences with the Microsoft data platform, SQL Server BI, Data Modeling, SSAS Design, Power Pivot, Power BI, SSRS Advanced Design, Power BI, Dashboards & Visualization since 2009

Grant Fritchey

Intimidating Databases and Code

Madhivanan's SQL blog

A modern business theme

Alessandro Alpi's Blog

DevOps could be the disease you die with, but don’t die of.

Paul te Braak

Business Intelligence Blog

Sql Insane Asylum (A Blog by Pat Wright)

Information about SQL (PostgreSQL & SQL Server) from the Asylum.

Gareth's Blog

A blog about Life, SQL & Everything ...