Passing two SQL queries to sp_execute_external_script

Recently, I got a question on one of my previous blog posts, if there is possibility to pass two queries in same run-time as an argument to external procedure sp_execute_external_script.

Some of the  arguments of the procedure sp_execute_external_script are enumerated. This is valid for the inputting dataset and as the name of argument @input_data_1 suggests, one can easily (and this is valid doubt) think, there can also be @input_data_2 argument, and so on. Unfortunately, this is not true.  External procedure can hold only one T-SQL dataset, inserted through this parameter.

There are many reasons for that, one would be the cost of sending several datasets to external process and back, so inadvertently, this forces user to rethink and pre-prepare the dataset (meaning, do all the data munging beforehand), prior to sending it into external procedure.

But there are workarounds on how to pass additional query/queries to sp_execute_external_script. I am not advocating this, and I strongly disagree with such usage, but here it is.

First I will create two small datasets, using T-SQL

USE SQLR;
GO

DROP TABLE IF EXISTS dataset;
GO

CREATE TABLE dataset
(ID INT IDENTITY(1,1) NOT NULL
,v1 INT
,v2 INT
CONSTRAINT pk_dataset PRIMARY KEY (id)
)

SET NOCOUNT ON;
GO

INSERT INTO dataset(v1,v2)
SELECT TOP 1
 (SELECT TOP 1 number FROM master..spt_values WHERE type IN ('EOB') ORDER BY NEWID()) AS V1
,(SELECT TOP 1 number FROM master..spt_values WHERE type IN ('EOD') ORDER BY NEWID()) AS v2
FROM master..spt_values
GO 50

This dataset will be used directly into @input_data_1 argument. The next one will be used through R code:

CREATE TABLE external_dataset
(ID INT IDENTITY(1,1) NOT NULL
,v1 INT
CONSTRAINT pk_external_dataset PRIMARY KEY (id)
)

SET NOCOUNT ON;
GO

INSERT INTO external_dataset(v1)
SELECT TOP 1
 (SELECT TOP 1 number FROM master..spt_values WHERE type IN ('EOB') ORDER BY NEWID()) AS V1
FROM master..spt_values
GO 50

Normally,  one would use a single dataset like:

EXEC sp_execute_external_script
     @language = N'R'
    ,@script = N'OutputDataSet <- data.frame(MySet);'
    ,@input_data_1 = N'SELECT TOP 5 v1, v2 FROM dataset;'
    ,@input_data_1_name = N'MySet'
WITH RESULT SETS
((
 Val1 INT
 ,Val2 INT
))

But by “injecting” the  ODBC into R code, we can allow external procedure, to get back to your SQL Server and get additional dataset.

This can be done by following:

EXECUTE AS USER = 'RR'; 
GO

DECLARE @Rscript NVARCHAR(MAX)
SET @Rscript = '
   library(RODBC)
   myconn <-odbcDriverConnect("driver={SQL Server};
         Server=SICN-KASTRUN;database=SQLR;uid=RR;pwd=Read!2$16")
  External_source <- sqlQuery(myconn, "SELECT v1 AS v3 
                    FROM external_dataset")
  close(myconn) 
  Myset <- data.frame(MySet)
   #Merge both datasets
   mergeDataSet <- data.frame(cbind(Myset, External_source));'

EXEC sp_execute_external_script
    @language = N'R'
   ,@script = @Rscript
   ,@input_data_1 = N'SELECT v1, v2 FROM dataset;'
   ,@input_data_1_name = N'MySet'
   ,@output_data_1_name = N'mergeDataSet'
WITH RESULT SETS
((
    Val1 INT
   ,Val2 INT
   ,Val3 INT
))

REVERT;
GO

And the result will be merged two datasets, in total three columns:

2017-07-25 22_04_48-Two_data_sets_R_sp_execute_external_script.sql - SICN-KASTRUN.SQLR (SPAR_si01017

which correspond to two datasets:

-- Check the results!
SELECT * FROM dataset
SELECT * FROM external_dataset

There are, as already mentioned, several opposing factors to this approach, and I would not recommend this. Some are:

  • validating and keeping R code in one place
  • performance issues
  • additional costs of data transferring
  • using ODBC connectors
  • installing additional R packages (in my case RODBC package)
  • keeping different datasets in one place
  • security issues
  • additional login/user settings
  • firewall inbound/outbound rules setting

This, of course, can also be achieved with *.XDF file formats, if they are stored locally or on server as a files.

As always, code is available at Github.

Happy R-SQLing! 🙂

Advertisements

#SQLSatDenmark 2016 wrap up

SQLSatDenmark 2016 took place in Lyngby at Microsoft Denmark headquarters. Apart from the fact that Lyngby is absolutely cute town, the Microsoft HQ is nice as well.

At Microsoft Denmark:2016-09-23-09_46_29-presentation1-pptx-powerpoint

Lyngy:2016-09-23-09_48_21-presentation1-pptx-powerpoint

On the evening before the precons day, dinner at MASH restaurant was prepared for all the precon speakers Tim Chapman, Andre Kamman, Kevin Kline and myself with hosts Regis Baccaro and Kenneth M. Nielsen.

2016-09-23-19_26_08-presentation1-pptx-powerpoint

After delicious steaks, pints of beer and interesting conversations, the precon day started.

My precon room was full and I had 30 attendees, lots of material and ending with demos on R and SQL Server integration from the fields. Feedback was great, obviously. The problem I had was that I prepared too much material, which was anyways handed over (all the code) for people to learn more when back at home and that I was focusing too much on statistics. But finishing at 16.30 and was available in the Microsoft HQ until 17.30 for any other questions.

Next day, SQLSaturday Denmark started early and in total 280 attendees showed up. It was easy going and well organized event, great sponsors, nice swag, raffle and overall good stuff one can find at the event – juice bar, good food and ending the event with hot dog stand and SQL Beer. Yes, traditional Danish SQL saturday beer 🙂

2016-09-23-19_43_41-presentation1-pptx-powerpoint

I delivered a session on Machine Learning algorithms in Microsoft Azure, explaining which algorithm to use with which dataset and what kind of statistical problem can be solved with. Great feedback from the crowd and very interesting questions – more or less very statistical and data mining questions. And I truly loved it.

Thanks to all the sponsors, organizers, attendees and the SQL family. It was great to see you.