FileTable and storing graphs from Microsoft R Server

FileTable has been around now for quite some time and and it is useful  for storing files, documents, pictures and and binary files in a designated SQL Server table – FileTable. The best part of FileTable is the fact one can access it from windows or other application as if it were stored on file system (because they are) and not making any other changes on the client.

And this feature is absolutely handy for using and storing outputs from Microsoft R Server. In this blog post I will focus mainly on persistently storing charts from statistical analysis.

First we need to secure that FileStream is enabled. Open SQL Server Configuration Manager and navigate to your running SQL Server. Right click and select FILESTREAM and enable Filestream for T-SQL access and I/O access. In addition, allow remote clients access to Filestream data as well.

2016-09-23-21_53_34-sqlquery9-sql-sicn-kastrun-master-spar_si01017988-63_-microsoft-sql-serv

Next step is to enable the configurations in Management Studio.

EXEC sp_configure 'filestream_access_level' , 2;
GO
RECONFIGURE;
GO

For this purpose I have decided to have a dedicated database for storing charts created in R. And this database will have FileTable enabled.

USE master;
GO

CREATE DATABASE FileTableRChart 
ON PRIMARY  (NAME = N'FileTableRChart', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\DATA\FileTableRChart.mdf' , SIZE = 8192KB , FILEGROWTH = 65536KB ),
FILEGROUP FileStreamGroup1 CONTAINS FILESTREAM( NAME = ChartsFG, FILENAME = 'C:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\DATA\RCharts')
LOG ON (NAME = N'FileTableRChart_log', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\DATA\FileTableRChart_log.ldf' , SIZE = 8192KB , FILEGROWTH = 65536KB )
GO

ALTER DATABASE FileTableRChart
    SET FILESTREAM ( NON_TRANSACTED_ACCESS = FULL, DIRECTORY_NAME = N'RCharts' )

So I will have folder RCharts available as a BLOB storage to my FileTableRChart SQL server database. Adding a table to get all the needed information on my charts.

USE FileTableRChart;
GO

CREATE TABLE ChartsR AS FILETABLE
WITH (
 FileTable_Directory = 'DocumentTable'
,FileTable_Collate_Filename = database_default  
);
GO

Setting the BLOB, we can focus now on R code within T-SQL. Following R Code will be used to generated histograms with normal curve for quick data overview (note, this is just a sample):

x <- data.frame(val = c(1,2,3,6,3,2,3,4,5,6,7,7,6,6,6,5,5,4,8))
y <- data.frame(val = c(1,2,5,8,5,4,2,4,5,6,3,2,3,5,5,6,7,7,8))
x$class <- 'XX'
y$class <- 'YY'
d <- rbind(x,y)

#normal function with counts
gghist <- ggplot(d, aes(x=val)) + geom_histogram(binwidth=2, 
                  aes(y=..density.., fill=..count..))
gghist <- gghist + stat_function(fun=dnorm, args=list(mean=mean(d$val), 
                   sd=sd(d$val)), colour="red")
gghist <- gghist + ggtitle("Histogram of val with normal curve")  + 
                   xlab("Variable Val") + ylab("Density of Val")

Returning diagram that will be further parametrized when inserted into T-SQL code.

hist_norm_curv

Besides parametrization, I will add a function to loop through all the input variables and generated diagrams for each of the given variable/column in SQL Server query passed through sp_execute_external_script stored procedure.

Final code:

DECLARE @SQLStat NVARCHAR(4000)
SET @SQLStat = 'SELECT
                     fs.[Sale Key] AS SalesID
                    ,c.[City] AS City
                    ,c.[State Province] AS StateProvince
                    ,c.[Sales Territory] AS SalesTerritory
                    ,fs.[Customer Key] AS CustomerKey
                    ,fs.[Stock Item Key] AS StockItem
                    ,fs.[Quantity] AS Quantity
                    ,fs.[Total Including Tax] AS Total
                    ,fs.[Profit] AS Profit

                    FROM [Fact].[Sale] AS  fs
                    JOIN dimension.city AS c
                    ON c.[City Key] = fs.[City Key]
                    WHERE
                        fs.[customer key] <> 0'

DECLARE @RStat NVARCHAR(4000)
SET @RStat = 'library(ggplot2)
              library(stringr)
              #library(jpeg)
              cust_data <- Sales
              n <- ncol(cust_data)
              for (i in 1:n) 
                        {
                          path <- 
''\\\\SICN-KASTRUN\\mssqlserver\\RCharts\\DocumentTable\\Plot_''
                          colid   <- data.frame(val=(cust_data)[i])
                          colname <- names(cust_data)[i]
                          #print(colname)
                          #print(colid)
                          gghist <- ggplot(colid, aes(x=val)) + 
geom_histogram(binwidth=2, aes(y=..density.., fill=..count..))
                          gghist <- gghist + stat_function(fun=dnorm, 
args=list(mean=mean(colid$val), sd=sd(colid$val)), colour="red")
                          gghist <- gghist + ggtitle("Histogram of val with 
normal curve")  + xlab("Variable Val") + ylab("Density of Val")
                          path <- paste(path,colname,''.jpg'')
                          path <- str_replace_all(path," ","")
                          #jpeg(file=path)
                          ggsave(path, width = 4, height = 4)
                          plot(gghist)
                          dev.off()
                        }';

EXECUTE sp_execute_external_script
     @language = N'R'
    ,@script = @RStat
    ,@input_data_1 = @SQLStat
    ,@input_data_1_name = N'Sales'

I am using ggsave function, but jpeg function from package jpeg is also an option. Matter of a flavour. And variable path should be pointing to your local FileTable directory.

Now I can have graphs and charts stored persistently in filetable and retrieving information on files with simple query:

SELECT FT.Name
,IIF(FT.is_directory=1,'Directory','Files') [File Category]
,FT.file_type [File Type]
,(FT.cached_file_size)/1024.0 [File Size (KB)]
,FT.creation_time [Created Time]
,FT.file_stream.GetFileNamespacePath(1,0) [File Path]
,ISNULL(PT.file_stream.GetFileNamespacePath(1,0),'Root Directory') [Parent Path]
FROM [dbo].[ChartsR] FT
LEFT JOIN [dbo].[ChartsR] PT
ON FT.path_locator.GetAncestor(1) = PT.path_locator

2016-09-25-13_33_40-using_filetable_to_store_graphs_generated_with_revoscale-sql-sicn-kastrun-file

Going through the charts could be now much easier for multiple purposes.

2016-09-25-13_35_08-edit-post-tomaztsql-wordpress-com

There might be some security issues: I have used mklink to create a logical drive pointing to FileTable directory.

2016-09-25-12_05_54-administrator_-command-prompt

You might also want to use Local group policy editor for MSSQLLaunchpad to have access granted (write permissions) to FileTable directory.

2016-09-25-13_37_41-local-group-policy-editor

Code is available at GitHub.

Happy R-SQLing!

 

Tagged with: , , , , , ,
Posted in Uncategorized
3 comments on “FileTable and storing graphs from Microsoft R Server
  1. […] article was first published on R – TomazTsql, and kindly contributed to […]

    Like

  2. […] Tomaz Kastrun shows how to save R plots in SQL Server FileTable: […]

    Like

  3. […] way (among many) is to use File Table. In one of my previous blog posts, I covered how to create and configure File Table. In this scenario, we will consider following. […]

    Like

Leave a comment

Follow TomazTsql on WordPress.com
Programs I Use: SQL Search
Programs I Use: R Studio
Programs I Use: Plan Explorer
Rdeči Noski – Charity

Rdeči noski

100% of donations made here go to charity, no deductions, no fees. For CLOWNDOCTORS - encouraging more joy and happiness to children staying in hospitals (http://www.rednoses.eu/red-noses-organisations/slovenia/)

€2.00

Top SQL Server Bloggers 2018
TomazTsql

Tomaz doing BI and DEV with SQL Server and R, Python, Power BI, Azure and beyond

Discover WordPress

A daily selection of the best content published on WordPress, collected for you by humans who love to read.

Revolutions

Tomaz doing BI and DEV with SQL Server and R, Python, Power BI, Azure and beyond

tenbulls.co.uk

tenbulls.co.uk - attaining enlightenment with the Microsoft Data and Cloud Platforms with a sprinkling of Open Source and supporting technologies!

SQL DBA with A Beard

He's a SQL DBA and he has a beard

Reeves Smith's SQL & BI Blog

A blog about SQL Server and the Microsoft Business Intelligence stack with some random Non-Microsoft tools thrown in for good measure.

SQL Server

for Application Developers

Business Analytics 3.0

Data Driven Business Models

SQL Database Engine Blog

Tomaz doing BI and DEV with SQL Server and R, Python, Power BI, Azure and beyond

Search Msdn

Tomaz doing BI and DEV with SQL Server and R, Python, Power BI, Azure and beyond

R-bloggers

Tomaz doing BI and DEV with SQL Server and R, Python, Power BI, Azure and beyond

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Data Until I Die!

Data for Life :)

Paul Turley's SQL Server BI Blog

sharing my experiences with the Microsoft data platform, SQL Server BI, Data Modeling, SSAS Design, Power Pivot, Power BI, SSRS Advanced Design, Power BI, Dashboards & Visualization since 2009

Grant Fritchey

Intimidating Databases and Code

Madhivanan's SQL blog

A modern business theme

Alessandro Alpi's Blog

DevOps could be the disease you die with, but don’t die of.

Paul te Braak

Business Intelligence Blog

Sql Insane Asylum (A Blog by Pat Wright)

Information about SQL (PostgreSQL & SQL Server) from the Asylum.

Gareth's Blog

A blog about Life, SQL & Everything ...