Power BI Measures in slicer

Power BI measures are mainly used to show and visualize data, that can be measured. But from time to time, end-users will want to have measures to be easily selected and graphs or tables changed, in accordance with selection. And I have seen this example numerous time.

Measures_in_slicer

For example, let’s suppose we have a sample data:

Data_currency

In this sample, we clearly see, that sales columns are expressed in different currencies. And end-users, for this particular case, would want to have the currency easily select-able and all the underlying graphs or tables, changed accordingly. In a way, we would want to have data transposed in order to for the columns to be selected through slicer and not through filters in Power BI.

Let’s get the data from Github and import the file “SalesData.csv” into new Power Bi document. Next step, we will need to convert the integer columns (columns in  essence that we want to use in slicer) to measures.

SalesCHF_Sum = SUM(SalesData[SalesCHF])
SalesEUR_Sum = SUM(SalesData[SalesEUR])
SalesUSD_Sum = SUM(SalesData[SalesUSD])
Once we have this, Let’s go and create an empty table (In power bi: Home -> Enter Data) and enter:
1
Names in the Measure column must correspond to newly created measured. Once you have this, in Sales data dataset create new measure and copy/paste this portion of DAX:
SelectedCurrencyMeasure =
VAR MyCurrency =
SELECTEDVALUE ('MeasuresTable'[Measure]; "SalesCHF_Sum")
RETURN
SWITCH (
TRUE ();
SelectedCurrencyMeasure = "SalesCHF_Sum"; [SalesCHF_Sum];
SelectedCurrencyMeasure = "SalesEUR_Sum"; [SalesEUR_Sum];
SelectedCurrencyMeasure = "SalesUSD_Sum"; [SalesUSD_Sum];
[SalesCHF_Sum]
)

The only thing left is now to create a Visual and Slicer using this SelectedCurrencyMeasure measure in order to get the full effect:

2

The Dataset and PBIX file is available in Github and is available for you to play with it.

Happy Power BIng! 🙂

Tagged with: , , , , ,
Posted in Uncategorized

Essential list of useful R packages for data scientists

I have written couple of blog posts on R packages (here | here ) and this blog post is sort of a preset of all the most needed packages for data science, statistical usage and every-day usage with R.

Among thousand of R packages available on CRAN (with all the  mirror sites) or Github and any developer’s repository.

Many useful functions are available in many different R packages, many of the same functionalities also in different packages, so it all boils down to user preferences and work, that one decides to use particular package. From the perspective of a statistician and data scientist, I will cover the essential and major packages in sections. And by no means, this is not a definite list, and only a personal preference.

Screenshot 2020-04-26 at 07.44.18

1. Loading and importing  data

Loading and read data into R environment is most likely one of the first steps if not the most important. Data is the fuel.

Breaking it into the further sections, reading data from binary files, from ODBC drivers and from SQL databases.

 

1.1. Importing from binary files

# Reading from SAS and SPSS
install.packages("Hmisc", dependencies = TRUE)
# Reading from Stata, Systat and Weka
install.packages("foreign", dependencies = TRUE)
# Reading from KNIME
install.packages(c("protr","foreign"), dependencies = TRUE)
# Reading from EXCEL
install.packages(c("readxl","xlsx"), dependencies = TRUE)
# Reading from TXT, CSV
install.packages(c("csv","readr","tidyverse"), dependencies = TRUE)
# Reading from JSON
install.packages(c("jsonLite","rjson","RJSONIO","jsonvalidate"), dependencies = TRUE)
# Reading from AVRO
install.packages("sparkavro", dependencies = TRUE)
# Reading from Parquet file
install.packages("arrow", dependencies = TRUE)
devtools::install_github("apache/arrow/r")
# Reading from XML
install.packages("XML", dependencies = TRUE)

 

1.2. Importing from ODBC

This will cover most of the used work for ODBC drives:

install.packages(c("odbc", "RODBC"), dependencies = TRUE)

 

1.3. Importing from SQL Databases

Accessing SQL database with a particular package can also have great benefits when pulling data from database into R data frame.  In addition, I have added some useful R packages that will help you query data in R much easier (RSQL) or even directly write SQL Statements (sqldf) and other great features.

#Microsoft MSSQL Server
install.packages(c("mssqlR", "RODBC"), dependencies = TRUE)
#MySQL 
install.packages(c("RMySQL","dbConnect"), dependencies = TRUE)
#PostgreSQL
install.packages(c("postGIStools","RPostgreSQL"), dependencies = TRUE)
#Oracle
install.packages(c("ODBC"), dependencies = TRUE)
#Amazon
install.packages(c("RRedshiftSQL"), dependencies = TRUE)
#SQL Lite
install.packages(c("RSQLite","sqliter","dbflobr"), dependencies = TRUE)
#General SQL packages
install.packages(c("RSQL","sqldf","poplite","queryparser"), dependencies = TRUE)

 

2. Manipulating Data

Data Engineering, data copying, data wrangling and data manipulating data is the very next task in the journey.

2.1. Cleaning data

Data cleaning is essential for cleaning out all the outliers, NULL, N/A values, wrong values, doing imputation or replacing them, checking up frequencies and descriptive and applying different single- , bi-, and multi-variate statistical analysis to tackle this issue. The list is by no means the complete list, but can be a good starting point:

install.packages(c("janitor","outliers","missForest","frequency","Amelia",
                   "diffobj","mice","VIM","Bioconductor","mi",
                    "wrangle"), dependencies = TRUE)

2.2. Dealing with R data types and formats

Working with correct data types and knowing your ways around handling formatting of your data-set can be overlooked and yet important. List of the must have packages:

install.packages(c("stringr","lubridate","glue",
                   "scales","hablar","readr"), dependencies = TRUE)

2.3. Wrangling, subseting and aggregating data

There are many packages available to do the task of wrangling, engineering and aggregating, especially {base} R package should not be overlooked, since it offers a lot of great and powerful features. But following is a list of those most widely used in the R community and easy to maneuver data:

install.packages(c("dplyr","tidyverse","purr","magrittr",
                   "data.table","plyr","tidyr","tibble",
                   "reshape2"), dependencies = TRUE)

 

3. Statistical tests and Sampling Data

3.1. Statistical tests

Many of the statistical tests (Shapiro, T-test, Wilcox, equality, …) are available in base and stats package that are available with R engine. Which is great, because primarily R is a statistical language, and many of the tests are already included. But adding additional packages, that I have used:

install.packages(c("stats","ggpubr","lme4","MASS","car"), 
                   dependencies = TRUE)

3.2. Data Sampling

Data sampling, working with samples and population, working with inference, weights, and type of statistical data sampling can be find in these brilliant packages, also including those that are great for surveying data.

install.packages(c("sampling","icarus","sampler","SamplingStrata",
                    "survey","laeken","stratification","simPop"), 
                     dependencies = TRUE)

4. Statistical Analysis

Regarding of type of the variable, type of the analysis, and results a statistician wants to get, there are list of packages that should be part of daily R environment, when it comes to statistical analysis.

4.1. Regression Analysis

Frankly, one of the most important analysis

install.packages(c("stats","Lars","caret","survival","gam","glmnet",
                  "quantreg","sgd","BLR","MASS","car","mlogit","earth",
                  "faraway","nortest","lmtest","nlme","splines",
                  "sem","WLS","OLS","pls","2SLS","3SLS","tree","rpart"), 
dependencies = TRUE)

4.2. Analysis of variance

Distribution and and data dispersion is core to understanding the data. Many of the tests for variance are already built-in in R engine (package stats), but here are also some, that might be useful for analyzing variance.

install.packages(c("caret","rio","car","MASS","FuzzyNumbers",
                   "stats","ez"), dependencies = TRUE)

4.3. Multivariate analysis

Using more than two variables is considered multi-variate analysis. Excluding regression analysis and analysis of variance (between 2+ variables), since it is introduced in section 4.1., covering statistical analysis with working on many variables  like factor analysis, principal axis component, canonical analysis, discrete analysis, and others:

install.packages(c("psych","CCA","CCP","MASS","icapca","gvlma","smacof",
                 "MVN","rpca","gpca","EFA.MRFA","MFAg","MVar","fabMix",
                 "fad","spBFA","cate","mnlfa","CSFA","GFA","lmds","SPCALDA",
                 "semds", "superMDS", "vcd", "vcdExtra"), 
 dependencies = TRUE)

4.4. Classification and Clustering

Based on different type of clustering and classification, there are many packages to cover both. Some of the essential packages for clustering:

install.packages(c("fpc","cluster","treeClust","e1071","NbClust","skmeans",
                "kml","compHclust","protoclust","pvclust","genie", "tclust",
                "ClusterR","dbscan","CEC","GMCM","EMCluster","randomLCA",
                "MOCCA","factoextra",poLCA), dependencies = TRUE)

and for classification:

install.packages("tree", "e1071")

4.5. Analysis of Time-series

Analysing time series and time-serie type of data will be done easier with the following packages:

install.packages(c("ts","zoo","xts","timeSeries","tsModel", "TSMining",
              "TSA","fma","fpp2","fpp3","tsfa","TSdist","TSclust","feasts",
              "MTS", "dse","sazedR","kza","fable","forecast","tseries",
              "nnfor","quantmod"), dependencies = TRUE)

4.6. Network analysis

Analyzing networks is also part of statistical analysis. And some of the relevant packages:

install.packages(c("fastnet","tsna","sna","networkR","InteractiveIGraph",
                 "SemNeT","igraph","NetworkToolbox","dyads", 
                  "staTools","CINNA"), dependencies = TRUE)

4.7. Analysis of text

Besides analyzing open text, once can analyse any kind of text, including the word corpus, the semantics and many more. Couple of starting packages:

install.packages(c("tm","tau","koRpus","lexicon","sylly","textir",
         "textmineR","MediaNews", "lsa","SemNeT","ngram","ngramrr",
         "corpustools","udpipe","textstem", "tidytext","text2vec"), 
          dependencies = TRUE)

5. Machine Learning

R has variety of good machine learning packages that are powerfull and give you the full Machine Learning cycle. Breaking down the sections by it’s natural way.

5.1. Building and validating  the models

Once you build one or more models, after comparing the results of each models, it is also important to validate the models against the test or any other datasets. Here are powerfull packages to do model validation.

install.packages(c("tree", "e1071","crossval","caret","rpart","bcv",
                  "klaR","EnsembleCV","gencve","cvAUC","CVThresh",
                  "cvTools","dcv","cvms","blockCV"), dependencies = TRUE)

5.2. Random forests packages

sdfs

install.packages(c("randomForest","grf","ipred","party","randomForestSRC",
                  "grf","BART","Boruta","LTRCtrees","REEMtree","refr",
                  "binomialRF","superml"), dependencies = TRUE)

5.3. Regression type (regression, boosting, Gradient descent) algoritms packages

Regression type of machine learning algorithm are many, with additional boosting or gradient. Some of very usable packages:

install.packages(c("earth", "gbm","GAMBoost", "GMMBoost", "bst","superml",
                   "sboost"), dependencies = TRUE)

5.4. Classification algorithms

Classifying problems have many of the packages and many are also great for machine learning cases. Handful.

install.packages(c("rpart", "tree", "C50", "RWeka","klar", "e1071",
                   "kernlab","svmpath","superml","sboost"), 
dependencies = TRUE)

5.5. Neural networks

There are many types of Neural networks and many of different packages will give you all types of NN. Only couple of very useful R packages to tackle the neural networks.

install.packages(c("nnet","gnn","rnn","spnn","brnn","RSNNS","AMORE",
                   "simpleNeural","ANN2","yap","yager","deep","neuralnet",
                   "nnfor","TeachNet"), dependencies = TRUE)

5.6. Deep Learning

R had embraced deep learning and many of the powerfull  SDK and packages have been converted to R, making it very usable for R developers and R machine learning community.

install.packages(c("deepnet","RcppDL","tensorflow","h2o","kerasR",
                   "deepNN", "Buddle","automl"), dependencies = TRUE)

5.7. Reinforcement Learning

Reinforcement learning is gaining popularity and more and more packages are being developered in R as well. Some of the very userful packages:

devtools::install_github("nproellochs/ReinforcementLearning")
install.packages(c("RLT","ReinforcementLearning","MDPtoolbox"), 
dependencies = TRUE)

5.8. Model interpretability and explainability

Results of machine learning models can be a black-box. Many of the packages are dealing to have black-box more like “glass box”, making the models more understandable, interpretable and explainable. Very powerfull packages to do just that for many different machine learning algorithms.

install.packages(c("lime","localModel","iml","EIX","flashlight",
                    "interpret","outliertree","breakDown"), 
dependencies = TRUE)

 

6. Visualisation

Visualisation of the data is not only the final step to understanding the data, but can also bring clarity to interpretation and buidling the mental model around the data. Couple of packages, that will help boost the visualization:

install.packages(c("ggvis","htmlwidgets","maps","sunburstR", "lattice",
  "predict3d","rgl","rglwidget","plot3Drgl","ggmap","ggplot2","plotly",
  "RColorBrewer","dygraphs","canvasXpress","qgraph","moveVis","ggcharts",
  "igraph","visNetwork","visreg", "VIM", "sjPlot", "plotKML", "squash",
  "statVisual", "mlr3viz", "klaR","DiagrammeR","pavo","rasterVis",
  "timelineR","DataViz","d3r","d3heatmap","dashboard" "highcharter",
  "rbokeh"), dependencies = TRUE)

7. Web Scraping

Many R packages are specificly designed to scrape (harvest) data from particular website, API or archive. Here are only couple of very generic:

install.packages(c("rvest","Rcrawler","ralger","scrapeR"), 
             dependencies = TRUE)

8. Documents and books organisation

Organizing your documents (file, code, packages, diagrams, pictures) in readable document and have it as a dashboard or book view, there are couple of packages for this purpose:

install.packages(c("devtools","usethis","roxygen2","knitr",
                    "rmarkdown","flexdashboard","Shiny",
                    "xtable","httr","profvis"), dependencies = TRUE)

Wrap up

The R script for loading and installing the packages is available at Github. Make sure to check the Github repository for latest list updates. And as always, feel free to fork the code or commit updates, add essentials packages to list, comment, improve and agree or disagree.

You can also run the following command to install all of the packages in a single run:

install.packages(c("Hmisc","foreign","protr","readxl","xlsx",
                 "csv","readr","tidyverse","jsonLite","rjson",
                 "RJSONIO","jsonvalidate","sparkavro","arrow","feather",
                 "XML","odbc","RODBC","mssqlR","RMySQL",
                 "dbConnect","postGIStools","RPostgreSQL","ODBC",
                 "RSQLite","sqliter","dbflobr","RSQL","sqldf",
                 "poplite","queryparser","influxdbr","janitor","outliers",
                 "missForest","frequency","Amelia","diffobj","mice",
                 "VIM","Bioconductor","mi","wrangle","mitools",
                 "stringr","lubridate","glue","scales","hablar",
                 "dplyr","purr","magrittr","data.table","plyr",
                 "tidyr","tibble","reshape2","stats","Lars",
                 "caret","survival","gam","glmnet","quantreg",
                 "sgd","BLR","MASS","car","mlogit","RRedshiftSQL",
                 "earth","faraway","nortest","lmtest","nlme",
                 "splines","sem","WLS","OLS","pls",
                 "2SLS","3SLS","tree","rpart","rio",
                 "FuzzyNumbers","ez","psych","CCA","CCP",
                 "icapca","gvlma","smacof","MVN","rpca",
                 "gpca","EFA.MRFA","MFAg","MVar","fabMix",
                 "fad","spBFA","cate","mnlfa","CSFA",
                 "GFA","lmds","SPCALDA","semds","superMDS",
                 "vcd","vcdExtra","ks","rrcov","eRm",
                 "MNP","bayesm","ltm","fpc","cluster",
                 "treeClust","e1071","NbClust","skmeans","kml",
                 "compHclust","protoclust","pvclust","genie","tclust",
                 "ClusterR","dbscan","CEC","GMCM","EMCluster",
                 "randomLCA","MOCCA","factoextra","poLCA","ts",
                 "zoo","xts","timeSeries","tsModel","TSMining",
                 "TSA","fma","fpp2","fpp3","tsfa",
                 "TSdist","TSclust","feasts","MTS","dse",
                 "sazedR","kza","fable","forecast","tseries",
                 "nnfor","quantmod","fastnet","tsna","sna",
                 "networkR","InteractiveIGraph","SemNeT","igraph",
                 "dyads","staTools","CINNA","tm","tau","NetworkToolbox"
                 "koRpus","lexicon","sylly","textir","textmineR",
                 "MediaNews","lsa","ngram","ngramrr","corpustools",
                 "udpipe","textstem","tidytext","text2vec","crossval",
                 "bcv","klaR","EnsembleCV","gencve","cvAUC",
                 "CVThresh","cvTools","dcv","cvms","blockCV",
                 "randomForest","grf","ipred","party","randomForestSRC",
                 "BART","Boruta","LTRCtrees","REEMtree","refr",
                 "binomialRF","superml","gbm","GAMBoost","GMMBoost",
                 "bst","sboost","C50","RWeka","klar",
                 "kernlab","svmpath","nnet","gnn","rnn",
                 "spnn","brnn","RSNNS","AMORE","simpleNeural",
                 "ANN2","yap","yager","deep","neuralnet",
                 "TeachNet","deepnet","RcppDL","tensorflow","h2o",
                 "kerasR","deepNN","Buddle","automl","RLT",
                 "ReinforcementLearning","MDPtoolbox","lime","localModel",
                 "iml","EIX","flashlight","interpret","outliertree",
                 "dockerfiler","azuremlsdk","sparklyr","cloudml","ggvis",
                 "htmlwidgets","maps","sunburstR","lattice","predict3d",
                 "rgl","rglwidget","plot3Drgl","ggmap","ggplot2",
                 "plotly","RColorBrewer","dygraphs","canvasXpress","qgraph",
                 "moveVis","ggcharts","visNetwork","visreg","sjPlot",
                 "plotKML","squash","statVisual","mlr3viz","DiagrammeR",
                 "pavo","rasterVis","timelineR","DataViz","d3r","breakDown",
                 "d3heatmap","dashboard","highcharter","rbokeh","rvest",
                 "Rcrawler","ralger","scrapeR","devtools","usethis",
                 "roxygen2","knitr","rmarkdown","flexdashboard","Shiny",
                 "xtable","httr","profvis"), dependencies = TRUE)

 

Happy R-ing. 🙂

 

 

Tagged with: , , , , , , , , , ,
Posted in Uncategorized

Row-Level security in Power BI

Row -Level Security or managing roles in Power BI is not something new. But environments, where there is a need for securing read access for end-users based on their account name, are very frequent. Row Level Security is omitting and controlling access to a user or group (or distribution group in active directory) to rows on a single dataset (or table in SQL Server) and all the relationships to this dataset.

2020-04-05 23_27_50-Window

 

Creating a simple data-set using SQL Server database.  This data-set is using three tables:

  •  RLS_Sales (typical sales tables containing purchase orders with Sales representative name)
  • RLS_Users (shortened version of Active directory with connection to Sales representative)
  • RLS__manager (relationship table between Manager and Sales representatives)
DROP TABLE IF EXISTS dbo.rls_sales
DROP TABLE IF EXISTS dbo.rls_users
DROP TABLE IF EXISTS dbo.RLS_Manager

CREATE TABLE [dbo].[RLS_Sales](
ID INT IDENTITY(1,1) NOT NULL,
OrderID INT NULL,
SalesRep VARCHAR(20) NOT NULL,
Product VARCHAR(100) NULL,
Qty INT NULL,
Price INT NOT NULL
)

INSERT INTO dbo.RLS_Sales (OrderID,SalesRep,Product,Qty, price)
SELECT 1023, 'sntk\tomaz', 'lamp', 1,12
UNION ALL SELECT 1024, 'sntk\john', 'Key', 2,24
UNION ALL SELECT 1025, 'sntk\john', 'knife', 4,42
UNION ALL SELECT 1026, 'sntk\tomaz', 'pear', 3,2
UNION ALL SELECT 1026, 'sntk\tomaz', 'Key', 1,12
UNION ALL SELECT 1027, 'sntk\jeff', 'bag', 3,42
UNION ALL SELECT 1027, 'sntk\mike', 'paper', 10,19
UNION ALL SELECT 1027, 'sntk\mike', 'phone', 1,62
UNION ALL SELECT 1027, 'sntk\stone', 'bottle', 10,23

CREATE TABLE dbo.RLS_Users
(
ID INT IDENTITY(1,1) NOT NULL
,SamAccountName VARCHAR(20) NOT NULL
,Email VARCHAR(200) NULL)


INSERT INTO dbo.RLS_Users
SELECT 'sntk\tomaz','tomaz@gmail.com'
UNION ALL SELECT 'sntk\john','john@gmail.com'
UNION ALL SELECT 'sntk\jeff','jeff@gmail.com'
UNION ALL SELECT 'sntk\mike','mike@gmail.com'
UNION ALL SELECT 'sntk\stone','stone@gmail.com'
UNION ALL SELECT 'sntk\anna','anna@gmail.com'


CREATE TABLE dbo.RLS_Manager
(
ID INT IDENTITY(1,1) NOT NULL
,SamAccountName VARCHAR(20) NOT NULL
,ManagerSamAccountName VARCHAR(20) NULL)


INSERT INTO dbo.RLS_Manager
SELECT 'sntk\tomaz','sntk\tomaz'
UNION ALL SELECT 'sntk\john','sntk\tomaz'
UNION ALL SELECT 'sntk\jeff','sntk\tomaz'
UNION ALL SELECT 'sntk\mike','sntk\stone'
UNION ALL SELECT 'sntk\stone','sntk\tomaz'
UNION ALL SELECT 'sntk\anna','sntk\tomaz'

Please note, that primary and foreign key can be added, I deliberate left them out, to do data integrity in Power BI.

Once we have the data set, we can fire up the Power BI. In Power BI, simply import all three tables. In addition, I have set the relationship between the tables as following:

2020-04-05 23_28_22-Window

Reason behind is, mainly due to the fact, that I wanted to cover two use-cases in one example:

  •  single user can see own sales
  •  manages can see sales from all the employees

In this way, I can have a manager see all the employees and own sales statistics, where as, employees (that are not managers) can only see their own sales statistics.

Once the relationships are finished, go to Manage Roles:

2020-04-05 23_30_06-Window

And create a new role; mine is called simply “Security”.

2020-04-05 23_28_39-Window

Navigate to RLS_Manager and add DAX expression:

[ManagerSamAccountName] = USERNAME() || [SamAccountName] = USERNAME()

Why I am putting security on Manager table and not User table? In this way, I don’t need to to Lookup to Manager table or I don’t need to have a bridge table between manager and sales; this Manager table is itself already “kind-of” a bridge table. In order to have both scenarios covered (single user can see own sales and manager his employees + own), I am doing OR logical operator, so If I enter Manager name or Sales Representative name, any given scenario will return the sales.

Save the DAX code and now click on “View As” (next to Manage Role) and select “Security policy” (this is the one, we created beforehand) and “Other user” policy and type in one of the user names (user names are in RLS_Manager table)

2020-04-05 23_29_12-Window

This will access and view the Power BI document as if the security policy is taking place. Since we have selected “sntk\tomaz”, complete document is converted to show only relevant sales statistics for this particular user.

2020-04-05 23_29_46-Window

 

Use it wisely. In this blogpost, I have used DAX function USERNAME(), but if you are using Active Directory directly, I recommend you using USERPRINCIPALNAME().

 

As always, code and Power BI to this example, is available at Github.

 

Happy Power BI-ing! 🙂

Tagged with: , , , , , , ,
Posted in Uncategorized

Custom Power BI visual for Line chart with two Y-Axis

Power BI support certain type of visuals that are by default available in the document. These are absolutely great and work perfectly fine, have a lot of capabilities to set properties and change the settings.

But every so often in past year, I have come across the fact that I wanted to visualize a typical line chart with two lines, each with it’s own Y-axis.

Let me give you a quick R example. First, very simple dataset, where there are two values, each with a slightly different scale: quantity in range from 499 to 760 and temperature  in range from 15 to 24.

2020-04-02 22_10_44-RStudio

So what would normally happen, when we put this two ranges on same Y-axis:

# Both on the same y-axis
plot(dataset[,3], type="l", col="red", lty=1, ylim=c(0,1000), 
ylab = "Qty and Temp")
# Add second line
lines(dataset[,2], col="brown",lty=1)
legend("topleft", c("Temperature", "Quantity"),
       col = c("red", "brown"), lty = c(1, 1))

With the following resutls:

2020-04-02 22_21_47-Plot Zoom

We see that the temperature has almost lost all the visible variance, where as, quantity still has some.

To deal with this, we need to shift one of the lines to right Y-axis. Following this, little additional coding:

par(mar = c(5, 5, 3, 5))
plot(dataset[, 2], type ="l", ylab = "Quantity",
main = "Quantity and temperature", xlab = "Date", col = "brown")
par(new = TRUE)
plot(dataset[,3], type = "l", xaxt = "n", yaxt = "n",
ylab = "", xlab = "", col = "red", lty = 1)
axis(side = 4)
mtext("temperature", side = 4, line = 3)
legend("topleft", c("Quantity", "Temperature"), col = c("brown", "red"),
   lty = c(1, 1))

And the result is much more obvious:

2020-04-02 22_26_10-Plot Zoom

And now we can see how the temperature fluctuates (for more than 10°C).

So now that we have a solution, let’s go back to Power BI.

1. Default Visual

With the same dataset, this would be “out-of-the-box” visual, available to you:

2020-04-02 22_34_34-twolines_chart - Power BI Desktop

A combination of Bar chart and line chart, each with it’s own y-axis. Great. But If I want two lines, this is not possible with out of the box visuals.

2. Using R or Python

Copying the R code from above introduction example into Power BI, literally makes it the same in Power BI. Same logic goes and applies to Python.

2020-04-02 22_40_23-twolines_chart - Power BI Desktop

It does the job beautifully.  Where is the trick with R or Python visuals. I have a couple:

  •  to many times I have seen data engineers who start to use Power BI, that R or Python is just an overkill to adopt quickly,
  • it takes coding to plot a graph and not everyone has a great idea how to tackle this issue, and
  • publishing and deploying Power BI with R or Python on on-prem Power BI reporting server will not work with neither – R or Python – visual.

This said, there is a lot of gap for improvement.

3. Downloading custom visuals

Yes, you can download a custom visual. Go ahead 🙂

4. Building your own custom visual

This area is still under-explored among the data engineers and business intelligence people. And purpose of this post is, not only to point them to start exploring on their own, but also to show them, that is not a big deal to tinker on their own.

I have done my on this website: Charts PowerBI.

So, let’s walk through how to do it.

1. Get a sample dataset. You can download (data_twoCharts.csv) mine as well from Github.

2. Go to https://charts.powerbi.tips/ and select New.

3. Drag and drop the csv file into desired Data field, and you should get the preview of data:

2020-04-02 23_14_38-PowerBi.Tips - Charts

4. Click Done.

5. The the consists of 1) blank canvas pane on right hand side and 2) Glyph pane, 3) data Pane and 4) Layers Pane.

2020-04-02 23_17_11-PowerBi.Tips - Charts

The Canvas pane can hold multiple plot segments, which we will use to generate two plots, one on top of the other.

6. Drag and drop the individual data columns (from data pane) onto canvas pane. I did, first the date column, and drag it on top of X-axis (you will see, it will snap automatically on it), and Quantity on the Y-Axis.

2020-04-02 23_23_02-PowerBi.Tips - Charts

7. Drag the Symbols in Glyph Pane. And click on  Links to  Create Links. This will automatically connect all the dots.

2020-04-02 23_25_22-PowerBi.Tips - Charts2020-04-02 23_25_38-PowerBi.Tips - Charts

8. You should get a graph like this.

2020-04-02 23_28_06-PowerBi.Tips - Charts

9. Great, half way done. Now resize the Plot Segment to reduce it to 50%. Click on blank canvas and Add anothoer Glyph, that will be associated with new Plot segment.

2020-04-02 23_36_27-PowerBi.Tips - Charts

10. Add another Plot segment (remember we are building two plot graphs, one on top of each other.)

2020-04-02 23_28_32-PowerBi.Tips - Charts2020-04-02 23_30_13-PowerBi.Tips - Charts

11. With new Plot segment, repeat the step 6, 7 and 8. Drag the Date ( to X-axis) and Temperature (not quantity) (to Y-axis) to canvas pane, drag Symbol to Glyph Pane and click Links to Create Links. And you should end up with:

2020-04-02 23_38_00-PowerBi.Tips - Charts

12. On new Plot (on right hand side), we want to switch Y-axis to right side, by clickling on Layer Pane for selected Plot Segment.2020-04-02 23_40_31-PowerBi.Tips - Charts

13. Last part is a bit finicky. With your mouse hover over the corner of left plot (and repeat with right plot), on green point and drag it over the other Plot. Yellow dotted line will appear to denote that you are expanding the graph.

2020-04-02 23_44_02-PowerBi.Tips - Charts

14. Once you do for both, there will be only “one” X-axis (one on top of the other), both Plot segments will be represented as layer on top of the layer.

2020-04-02 23_48_44-PowerBi.Tips - Charts

15. Optionally, some colour coding and graph design is super desired. This can be changed in the Layers Pane. Once you are happy and satisfied with your visual, export it as Power BI custom visual:

2020-04-02 23_48_56-PowerBi.Tips - Charts

and give it a proper name with labels for X and Y axis. These names will be visible in the Power BI document. And also, give a Custom Visual a proper name 🙂

2020-04-02 23_49_07-C__DataTK_git_DAX_Functions_04_Custom_Visual

 

Once you have done this, open Power BI and add it, the same way as all other additional/custom visuals:

2020-04-02 23_56_15-Useful_DAX_and_Power_BI_examples_for_everyday_usage.pptx - PowerPoint

From here on, you can use this visual with any dataset you want (it is not omitted to sample dataset you used for creating custom visual) and it is also ready to be published / deployed on on-prem Power BI Reporting server.

 

All code, data sample and Power BI document with custom visual are available on GitHub.

Happy PowerBI-ing. 🙂

 

 

 

 

Tagged with: , , , ,
Posted in Uncategorized

What and who is IT community? What does it take to be part?

This blog post is long over due and has been rattling in my head for long time. Simply and boldly put it, community is everyone involved behind the result of your search for a particular problem. And by that I am referring to the IT community.

Community (from now on, I am referring only to IT community) consists of people that are contributing and sharing their knowledge, experience, questions, solutions, FAQ, etc to broader group of people. Many (if not most) of us are doing this:

  • for 0 monetary exchange
  • in free (or extra) time
  • for building up a wider knowledge base

In return many of us expect from others:

  • same behavior – sharing when they find an interesting solution to a problem
  • Sharing their time, energy with helping others
  • participating in read and gradually answering questions from other users
  • commitment
  • gratitude and respect to others.

In the past couple of years I have seen decline in the above sharing-caring type of exchange and interaction. Yet, what I have seen more and more is:

  • decline in awareness of how much community is worth
  • decline in respect (disrespect) and gratitude to other fellow and community people
  • aging of the community
  • decline in general interest

Let me address each of the topics separately.

 

1. How much is the community worth?

Yes,  what a bizarre question. And for sure, I can not answer it. But I will paint a different picture. We have all heard sayings like: “programming is just learning how to google things”, but not everyone have had asked themselves, what does it mean. And where does all the hits, results, solutions come? From community. And from people stepping forward and taking time to invest into understanding someone’s problem and solving it. Or people investing time and giving a lecture for free at the local meeting. Or people taking time and writing a blog, posting a youtube video on how to do something. These are all the people that are contributing to this knowledge base. And next time, when you have a question or a problem that you would need to solve, remember, that there was a person or group of people that have invested more time into solving this issue before you. Remember this. This is how much the community is worth and much more.

 

2. Decline in respect

Pay respect to community people. You ask yourself, how to pay respect? It is very easy, how to show respect:

  • Say thanks, thank you (at the end of the forum post, at the end of the blog post, after listening to video online or after community event) and be grateful.
  • If you disagree or if you have a better solution, engage into conversation and propose the solution. But be polite, nice and respectful. Your improvement will be highly appreciated.
  • Give feedback, your words can count, especially when you describe your point of view
  • Don’t be afraid to give praise or to give critique. In general, when giving a positive or negative review, keep in mind to be precise and objective. Think about the others when writing either. Will it be helpful and worth to others or not?
  • Get involved and start helping others. Become an active member, start with observing, reading, listening, conversating and after time, start giving back; answers, ideas, blog posts, videos, speeches, etc. Don’t be afraid, if you are polite, kind, sharing and warm, community will embrace you.
  • Words like:  “sorry”, “I don’t know”, or admitting that you didn’t know or that you were wrong are not the sign of weakness, but are the sign of learning. this is the path for getting more knowledge and getting on track to start helping the others.
  • Respect the general conduct of the website, of the event or of the product. Play by the rules.

 

3. Aging of the community

The fact that the community is aging has to do with the social phenomenon  – lack of available literature before popularity of internet. Those, who were spending long period of times in libraries, analyzing every single page of available book, know and understand the the importance of available materials. Majority of these same IT-people are  the community contributors themselves.

These people have been growing with community in past 20 years (massive emergence of internet and democratization of mass media) and these people are also the big majority of community that are still giving back to community.  Drawing a line with any type of IT event that had been around for more than 10 years and you will find same people at the first iteration of these same events 10 years earlier.

Teaching the community to give back the knowledge, encourage them to start participating more and more in any kind of community work should start at young age. And convincing younger generation to start participating and enjoying the community should also be introduced and discussed. Only in these manner, the knowledge will be returned and not taken for granted.

4. Decline in general

How we live our lives and how technology had changed our habits directly (or indirectly) influence the community as well. With more and more different options to same subject-matter, many people have the capability to choose. Which is absolutely great, but can have a negative aspect:

  •  people apply on way to many free videos, webinars, events, far more than they are able to attend,
  • people are subscribed on way to many free and open posts, forums, aggregates, making them confused when choosing
  • more options is not necessarily a good option
  • To study one topic, people still need to take time, study it thoroughly
  • Fast changing technology and market needs make users hindered
  • Too many technologies, too little time

 

I have been involved in community over 20 years, covering variety of topics, programming languages (Delphi, .NET, Python, C#) , statistical languages (R, Python, SAS, IBM, SAP, Machine Learning, Data Science)  and database software (Microsoft SQL Server, SAP Hana, Oracle) and I have seen a decline in general, especially due to lack in general, lack of time, age gap and content gap. But at the same time, I have also seen many new – more practical – events, blogs, vlogs, articles, that try not only to make the existing community stay and tickle their brains, but also engage new people, younger people and teach people that sharing is also caring, and caring is building new connections, new ties and new possibilities.

Lastly, this is my personal view on the community, community work, evangelists and knowledge builders. Many of their them / us, do this out of sheer enthusiasm, energy, passion and drive and not because  of – as many would have though – recognition or financial aspects. I love to share the passion, the energy, enthusiasm and drive with anyone, who wants to get started on particular topic. But only you must find this in yourself, otherwise it is useless.

Feel free to share your personal view, your opinion, your positive or negative feedback, I would love to hear your view. Much appreciated.

With love!

Tagged with: , , , , , , , ,
Posted in Uncategorized

Copying Measures between Power BI datasets and Power BI files

Measures in Power BI document are DAX based aggregations for slicing and dicing the tabular datasets using evaluate and filter context and and data relationships. Short hand, measures are DAX formulas stored in Power BI as part the dataset.

For showing how easy (once you know) is to copy measures between different dataset, I have created a sample Power BI with diamonds dataset (also available on Github).

2020-02-11 21_17_59-

Measures have little calculators in front of the name: denoting its purpose.Now let’s do the copying part 🙂

1.Copy dataset

The first part we need to do, is to copy dataset in Power BI. There are many ways to copy a dataset, I have chosen the following way:

Go to Modeling -> Select New Table -> and and simply write in the DAX line the name of the dataset; Table = diamonds. I have later renamed the new dataset from “Table” to “diamonds_2”.

2020-02-11 21_18_59-Normal_PBI_with_Measures - Power BI Desktop

You will notice that none of the measures have had been copied, leaving the new dataset free of measures. So now, we need to get the measures in the new dataset.

2. Save as PBIT

The second step is to save the Power BI document as Template.  Go to File -> Save As -> and select Save as Type: Power BI Template File (*.pbit).

2020-02-11 21_20_04-Normal_PBI_with_Measures - Power BI Desktop

The biggest difference between your normal PBIX file and PBIT – template file – is, that the latter does not hold data. But what does PBIT file hold is :

  • Query definitions, query parameters, M and DAX language
  • Holds all information about data model definition
  • preserves all the visuals, elements, bookmarks, filters, etc.

And it can be easily shared in collaborative environment, since it is usually very small file.

3. Open Tabular Editor

Now, download the Tabular Editor from Github. It is a small but powerful program, and once the installation process has completed, open the program,  and open PBIT (template) file, saved in previous step.

2020-02-11 21_21_02-Tabular Editor 2.9.1

You will be presented with semantic description of your dataset, model and all the definitions. You can see, there are two datasets, one holding measures and the other one not.

2020-02-11 21_23_30-C__DataTK_git_DAX_Functions_03_Measures_Copy_Normal_PBI_with_Measures.pbit - Tab

4. Copy measures

Once in Tabular Editor, select all (they should be high-lightened)  the measures you want to copy and right-click on the mouse and select Duplicate 3 measures.

2020-02-11 21_23_47-C__DataTK_git_DAX_Functions_03_Measures_Copy_Normal_PBI_with_Measures.pbit - Tab

 

Program had generated exact copies of your selected measures. Select the copied measures (make sure they are high-lightened).

2020-02-11 21_24_08-C__DataTK_git_DAX_Functions_03_Measures_Copy_Normal_PBI_with_Measures.pbit_ - Ta

And now simply drag them into the new dataset (dataset: diamonds_2).

Alternatively, you can use CTRL + C and CTRL + V (copy/paste) and delete copies in original dataset. Both ways are very straight-forward.

2020-02-11 21_24_36-C__DataTK_git_DAX_Functions_03_Measures_Copy_Normal_PBI_with_Measures.pbit_ - Ta

 

5. Editing the measures

Last touches to be done is, to change the datasets for each of the measures, yes manually.  For each of the measures, under the expression, I need to change the dataset from diamonds -> diamonds_2.

2020-02-11 21_27_11-C__DataTK_git_DAX_Functions_03_Measures_Copy_Normal_PBI_with_Measures.pbit_ - Ta

Or to put it in DAX perspective; from:

IF(SUM(diamonds[price])>1000000 ,SUM('diamonds'[price])*1.2, 
SUM('diamonds'[price]))

to:

IF(SUM(diamonds_2[price])>1000000 ,SUM('diamonds_2'[price])*1.2, 
SUM('diamonds_2'[price]))

 

And so you are finished.  If you want to do bulk export of all the measures, I strongly recommend to use DAX Studio.

Comparing the original and copied measure definitions, you will see that the copied process added some brackets.

2020-02-11 21_32_36-Presentation1 - PowerPoint

 

As always, the files (PBIX, PBIT and CSV) are available at Github.

 

Happy Power BI-ing 🙂

 

Tagged with: , , , , , , ,
Posted in Uncategorized

The palindrome of 02.02.2020

As of writing this blog-post, today is February 2nd, 2020. Or as I would say it, 2nd of February, 2020. There is nothing magical about it, it is just a sequence of numbers. On a boring Sunday evening, what could be more thrilling to look into this little bit further 🙂

Let’s kick R Studio and start writing a lot of useless stuff. First, we don’t need a function, but since, this is all about useless stuff, let’s make a useless palindrome function:

palindrome <- function(date){
identical(date, paste(rev(strsplit(date, "")[[1]]), collapse=""))
}

Once we have the function, let’s create a range of dates we want to explore. The range will be set from 1st of January 1000 until 1st of January 2500.

dd <- data.frame(seq(as.Date("1000/01/01"), as.Date("2500/01/01"), "days"))

I don’t want to get into useless debate whether the Gregorian Calendar was already accepted worldwide or not, but if you want to read the 16th Century history, click the link.

Now, the most useless part is a loop, that will take us from the pre-historic age all the way to the future:

#empty df
df <- data.frame(paldate=as.character())

#loop through all the dates
for (i in 1:nrow(dd)){ 
  dat <- dd[i,1]
   #Year Month Day format
   dat <- format(dat, "%Y%m%d")

   #Year Day month format
   #dat <- format(dat, "%Y%d%m")

  if (palindrome(dat) == TRUE) {
     df <- rbind(df, data.frame(paldate=as.character(dat)))
                              }
}

 

Issues

Following are the Issues I am having with this pretty useless Palindrome fact.

  1.  Year Format  – Abbreviated or non-abbreviated format; ’20 or 2020?
  2. Leading zeros – Today’s date (02.02.) would have never been a palindrome if not for the leading zeros. Otherwise, it would have just been 2.2.
  3. American vs. European Date format. Today’s date is fully interchangeable, but what if the date would have been February 22nd, of 2022?
    1. European: 22.02.2022 -> Is Palindrome
    2. American: 02.22.2022 -> Is not Palindrome.

Useless statistics

Issues aside, let’s do the comparison for the fun between American and European date format.

Date range will remain the same; from 01/01/1000 until 01/01/2500.

Number of Palidromes between the US and EU date format is 79 vs. 121:

01

 

Let’s check the distribution between the two formats and put them on graphs for easier visualization with ggplot:

library(plyr)
library(ggplot2)
library(gridExtra)
df_ALL_m_d <- ddply(df_ALL, "Region", summarise, grp.mean=median(Day))

ggplot(df_ALL, aes(x=Day, color=Region, fill=Region)) +
geom_histogram(fill="white", position="identity")+
theme(legend.position="top")
pd<-ggplot(df_ALL, aes(x=Day, color=Region,fill=Region)) +
geom_histogram(fill="white", position="identity", binwidth = 2)+
geom_vline(data=df_ALL_m_d, aes(xintercept=grp.mean, color=Region),
linetype="dashed")+
theme(legend.position="top")
pd

grid.arrange(p, pm, pd, ncol = 1)

And the distribution comparison is:

2

Is is clear from the graphs that the time formatting plays significant role for a particular date to be a palindrome or not.

Since we can see from the graphs that days and months are significantly different between the EU and US formats (months for EU format are in appearing only on January, February, November and December, where US format are ranging through all the months, and days are the exact opposite).

With following R code:

## Get distribution for the days in the year (13th day of the year, 
## 241st day of the year, etc)

v <- ggplot(df_ALL, aes(y=DaysDiff, x=Region)) + 
    geom_violin(trim=FALSE) + 
    geom_dotplot(binaxis='DaysDiff', stackdir='center', dotsize=1)

b <- ggplot(df_ALL, aes(x=Region, y=DaysDiff)) + geom_boxplot() 
b <- b + geom_jitter(shape=16, position=position_jitter(0.2))

grid.arrange(v, b, nrow = 1)

We can generate this:

3

 

We can again see, that the US distribution of the differences from the Palindrome date until the end of the year (difference is the number of days between December, 31st and the palindrome date), is in EU date format bimodal where as in US format evenly distributed.

Pretty useless, I guess.

For those who want to dig into more useless stuff, complete R code is here.

Happy R-coding.

Tagged with: , , , ,
Posted in Uncategorized

My MSSQL database is corrupt and I can not get to my predictive models. Now what? Stellar to the rescue.

Small project database, with old backup, ended up in corrupt state. This happened.  A corrupted database; all-in-all 150MB.  I was super annoyed. Let alone, I had some R and Python code there and some – yes 🙂 – predictive models, I forgot to have automated backup created. Data stored in database should not be a problem, but the table with last stored model was the information / row, I needed

So the problem with serialized models stored in a table is, that it needs to be fully compatible and correctly restored to the last bit in order for R / Python to be read correctly. So missing one bit, and the model is literally useless.

My models are stored in T-SQL Table:

SELECT  
        [model_name]
       ,[model]
       ,[accuracy]
       ,[model_date]
       ,[version_Model]
       ,[Approved_by]
FROM [SQLR].[dbo].[Predictive_models]

Once I calmed my self down, I started with my options. Worse case scenario, I can restore two-months old backup, but what are my other options?

Using the Microsoft built-in Database Console Command

DBCC CHECKDB

utility.  The utility comes with multiple repair options to resolve corruption errors in MS SQL database.  But, yeah,  not that it is a super challenging but I started to repair with option:

DBCC CHECKDB REPAIR_ALLOW_DATA_LOSS

and it ended with data loss. Hurray for the better restore.  But still was not successful enough due to the data loss.

Digging further, I used  PowerShell Restore-SqlDatabase cmdlet for restoring SQL Server database. But, was left empty-handed, since this works only with when you have a restorable backup.

Digging for further information, I googled up and came to this blogpost, by Stellar. Next thing you know, I am downloading the software, and looking back now, I was the best choice.

sql-box

The benefits using Stellar software repair for MS SQL Server:

  • Comes with a simple to use GUI which makes the repair process straightforward.
  • Helps repair SQL database quickly in three simple steps: 1. Select database file. 2. Repair database file. 3. Preview and Save repaired file.
  • Recovers all major database components of SQL server including tables, views, indexes, stored procedures, etc.
  • Resolves corruption errors in databases running on MS SQL Server 2019, 2017, 2016, and older versions

Additional Benefits:

  • Repairs SQL tables with PAGE and ROW compression.
  • Allows saving the repaired file in MS SQL, HTML, XLS, and CSV formats
  • Provides Windows and SQL authentication options to connect to the server
  • Supports Standard Compression Scheme for Unicode (SCSU) for SQL Server 2008 R2.

(based on the info/sales text from their website)

Step-by-step to repair corrupt .mdf file with Stellar Repair for MS SQL Server

After downloading, stop the MS SQL Server service and copy database (mdf. file) to different location.

1. The Instruction window is displayed. Click OK.

1-Software Instruction

2.  Select Database window appears. Click Browse to select .mdf file you want to repair.

2-Main Interface

Great feature is also, the ability to recover deleted database records; in this case select ‘Include Deleted Records’ checkbox (and make sure that you have the appropriate database recovery model selected).

After selecting the desired file, click repair to engage the process. Hopefully only one mdf file is corrupt 🙂 But you can also select multiple.

3-Selected database

There is also the ability to select the SQL Server version if the software is not able to detect the MSSQL Server database file version.  In this case, just  select the version of the SQL Server where your database was originally created.

4-Select Version

After the process completes with recovery, you can inspect what has been recovered by Stellar.

2020-01-16 08_48_03-Photos

 

 And I can recover only selected table and prior to recovering / saving, I can also inspect the content of my data in right pane.  Since the models are serialized and stored as blob, I can only hope that R will be able to de-serialized it and used it again using sp_execute_external_script.

Save the data back to  MDF file and later attach it to the SQL Server or in my case, exporting to CSV file is also a possibility since the table was not that big.

After attaching back new MDF file with table holding the models:

2020-01-16 08_55_13-SQLQuery8.sql - SNTK.UmbriaPrediction (SNTK_Tomaz (77))_ - Microsoft SQL Server

i’m able to run successfully the R script and score new values.

After all, I am happy that I found this software that helped me dealing with corrupt table / database and I can score the data on my older model that has great accuracy.

As always, happy coding!

Tagged with: , , , , , , , ,
Posted in Uncategorized

Sudoku game with R

Sudoku is a classical logical game based on combinatorial number replacement puzzle. Objective is to to fill 9×9 matrix with digits so that each column, each row, and each box (3×3 sub-grid) of nine contain all of the digits from 1 to 9.

Solving sometimes can be a nagging process. For this purpose, here is the R helper function for you to solve this with R.

grid

Let’s get the Sudoku grid we want to solve into R:

sudoku <- matrix(data=c(
6,0,0,2,1,0,0,3,0,
5,0,9,0,0,0,6,0,0,
2,0,0,9,7,0,0,0,4,
0,0,2,3,0,4,0,0,0,
0,6,0,0,5,0,0,9,0,
0,0,0,1,0,9,7,0,0,
9,0,0,0,3,8,0,0,6,
0,0,7,0,0,0,2,0,5,
0,8,0,0,4,2,0,0,9), nrow=9, ncol=9, byrow=FALSE
)

 

Now, we will need a function that will find all the 0 values – these are the values we need to work on.

get_zeros <- function(board_su){
  #empty df
  df <- data.frame(i=NULL,j=NULL)
  for (i in 1:nrow(board_su)){
    for (j in 1:ncol(board_su)){
      if (board_su[i,j] == 0) {
        a <- data.frame(i,j)
        #names(a) <- c("i", "j")
        #df <- rbind(df, a)
        df <- a
        return(df)
      } 
    }
  }
}

In addition we will need a function to solve and validated the solution.

Function validater will validate for the sudoku board a particular solution at a particular position:

validater(sudoku, 1, c(1,4))

In matrix, at position x=1, y=4, where there is 0, it will test if number 1 is valid or not. If the number is valid, it returns TRUE (number) to outer function for finding complete solution.

This function iterates through all the possible 0-positions and iterates through solutions that are still available based on the rules:

  •  each row can contain only one number in range of 1..9
  • each column can contain only one numer in range of 1..9
  • each sub-grid of 3×3 can contain only one number in range of 1..9

And the nuts and bolts of the validater function:

validater <- function(board_su, num, pos=c(NULL,NULL)){
  status <- FALSE
  a <- as.integer(pos[1])
  b <- as.integer(pos[2])
  num <- as.integer(num)
  while (status == FALSE) {
    for (i in 1:9) {    
      if ((board_su[a,i] == num & b != i) == TRUE) {
        status <- FALSE
        return(status)
      }
    }
    
    for (i in 1:9) {    
      if ((board_su[i,b] == num & a != i) == TRUE) {
        status <- FALSE
        return(status)
      }
    }
    
    #which box are we in
    boxNx <- as.integer(ifelse(as.integer(b/3)==0, 1, as.integer(b/3)))
    boxNy <- as.integer(ifelse(as.integer(a/3)==0, 1, as.integer(a/3)))
    
    #looping through the box
    for (i in boxNy*3:(boxNy*3 + 3)) {
      for (j in  boxNx * 3 : (boxNx*3 + 3)) {
        if ((board_su[i, j] == num &  i != a & j != b) == TRUE){
          status <- FALSE
        }
      }
    }
    status <- TRUE
    return(status) 
  }
}

With the following solution:

solution

For sure, this is not to be taken seriously, as you get the application on your mobile phone where you make a photo of your grid to be solved and the phone solves it for you, using library like OpenCV. The code was created only and just for fun (and because the Advent of Code for 2019 is over).

Happy R coding 🙂

As always, the code is available at Github.

Tagged with: , , ,
Posted in Uncategorized

Working with Windows CMD system commands in R

From time to time, when developing in R, working and wrangling data , preparing for machine learning projects, it comes the time, one would still need to access the operating system commands from/in R.

In this blog post, let’s take a look at some most useful cmd commands when using R.  Please note, that the cmd commands apply only to windows environment, for Linux/MacOS, the system commands should be slightly changed, but the wrapper R code should remains the same.

1

1. R commands to access the CMD

R commands that are used for accessing, running and doing changes through cmd are:

  • system
  • system2
  • shell
  • shell.exec

All invoke the OS command that is specified and followed. All of the stated functions are part of the base R, so no additional packages are needed.

Let’s say, we want to start with DIR command and return the results in RStudio.

## 1 Simple dir command with order switch
system("dir /o")
system2("dir /o")
shell("dir /o")
shell.exec("dir /o")

and you will see that shell command will output the results back to console pane in R Studio, where as the others will not give you the immediate desired output. The main difference is that shell function will run the desired command under the shell (following the POSIX standard or here you can find POSIX functions)  and the system command will invoke OS command.

In R Base package,a set of dedicated functions is also available to create, open, append and close file and URL connections, as well as opening and loading the compressed files. Just listing the functions:

  • object connections: file, url
  • compression and encoding: gzfile, bzfile, xzfile, unz, pipe
  • connection functions: open,close,flush
  • boolean functions: isOpen, isIncomplete

Rule of thumb (with default settings) on when to use system or shell function, shell will work always, system will only work when the system settings (paths) are correctly prepared. Example:

### Checking the usage
Sys.which("copy")
Sys.which("ping")

and the result is that system and system2 functions will work for ping CMD command, but not for copy CMD command, and later in the blogpost, also the rename, move command.

4

2. Chaining the CMD commands in R

For the simplicity, let’s continue with shell function and try how chaining can be achieved.

Following example will not work, as every time, the shell function is called, the environment is initiated. So running shell sequentially will not work!

setwd("C:\\Users\\Tomaz")

shell("dir")
shell("cd ..")
shell("dir")

Chaining the cmd commands together in one command will work:

shell("dir && cd .. && dir")

Note that this will not alter your working directory, so every time the shell is completed, you will be returned back to initiated environment. By checking:

getwd()

you can see, that the working directory has not changed. Checking before and after the R environment variables, these will also remained intact:

Sys.getenv(c("R_HOME","HOME"))

You can also define the variable, use the echo command, use the build in system variables like %TIME% and many more. More is available at Github.

3. Most typical CMD commands executed from R

Here is the list of some CMD commands we will encapsulate in R:

  • fc / compare
  • rename
  • move / copy
  • ping (pingpath)
  • systeminfo
  • tasklist (taskkill)
  • ipconfig / netstat

 

3.1. Comparing two files in CMD

Having capability to compare two files in CMD is just another of many ways to check if your dataset is the same.

Let’s create two *.csv files with iris dataset and run the file compare in cmd:

setwd("C:\\Users\\Tomaz")

#create two files
write.csv(iris, file="iris_file1.csv")
write.csv(iris, file="iris_file2.csv")

#compare both files
shell("FC /a C:\\Users\\Tomaz\\iris_file1.csv 
             C:\\Users\\Tomaz\\iris_file2.csv")

The result information is: FC: no differences encountered. Same as if the command would be run from command line:

2

Same result could be achieved with R function that comes in base build: all.equal.

file1 <- "C:\\Users\\Tomaz\\iris_file1.csv"
file2 <- "C:\\Users\\Tomaz\\iris_file2.csv"

# or in R using all.equal
all.equal(readLines(file1), readLines(file2))

Since we are in R, we would like to have results stored in data.frame. To do so, let’s use the previously mentioned System2 function. This is a newer version of System function that uses arguments and commands separately and some additional settings.

# run the same command using newer function "system2" and set the arguments

cmd_command <- "FC" 
#cmd_args <- "/a C:\\Users\\Tomaz\\iris_file1.csv 
                 C:\\Users\\Tomaz\\iris_file2.csv"
cmd_args <- c('/a', file1, file2)

rr <- system2(command=cmd_command,
                      args= cmd_args, 
                      stdout=TRUE,
                      stderr=TRUE, 
                       wait = TRUE)

#suppose we want to store the results in data.frame
#empty dataframe
df_rr <- data.frame(file1 = character(),
                    file2 = character(),
                    fcompare = character(),
                    stringsAsFactors=FALSE)

#temporary results
temp <- data.frame(file1=file1, file2=file2, fcompare=rr[2])

#bind all into dataframe
df_rr <- rbind(df_rr, setNames(temp, names(df_rr)))

 

with results stored in data.frame when comparing multiple files.

3

3.2. Renaming a file in CMD

Keeping the file with iris dataset: C:\\Users\\Tomaz\\iris_file1.csv, let’s do the renaming.

# renaming the file using shell command
shell("cd c:\\users\\tomaz && ren iris_file1.csv iris_file1_copy.csv")

and with the parametrized shell function (remember the copy will not work due to Sys.which(“copy”) is returned empty):

file_old_name <- "c:\\users\\Tomaz\\iris_file2.csv"
file_new_name <- "iris_file2_new.csv"

cmd_command <- paste("RENAME", file_old_name, file_new_name) 

# System2 Does not work
# system2(command=cmd_command)
shell(cmd_command)

In both ways, the files are renamed. Fortunately or not, the output in this case is none, when there is no syntax error and file exists.

3.3. Copy a file in CMD

Keeping the file with iris dataset: C:\\Users\\Tomaz\\iris_file1.csv, let’s do also the copying.

#Copying file
shell("copy c:\\Users\\Tomaz\\iris_file1.csv iris_file1_copy.csv")

and with the parametrized shell function:

orig_file <- "c:\\Users\\Tomaz\\iris_file1.csv"
copy_file <- "iris_file1_copy1.csv"
command <- "copy" 

cmd_command <- paste(command, orig_file, copy_file)
shell(cmd_command)

Now check your files.

3.4. Ping

Since ping CMD is available with the sys.which() function, we can use it directly as:

URL <- 'tomaztsql.wordpress.com'

cmd_command <- "Ping" 
cmd_args <- c('-n 1', URL)

system2(command=cmd_command,
           args=cmd_args, 
           stdout=TRUE,
           stderr=TRUE, 
           wait = TRUE)

And since we want to check many URLs, let’s create an empty data.frame to store the result and a function to walk through the list.

URLs <- c("google.com", "tomaztql.wordpress.com", "youtube.com")

#empty dataframe
df_rr <- data.frame(URL = character(),
                   reply = character(),
                   package = character(),
                   stringsAsFactors=FALSE)

ping_fn <- function(url) {
         system2("ping",c(url,' -n 1'),
         stdout=TRUE,
         stderr=TRUE)
}

for (i in 1:length(URLs)){
       site <- print(URLs[i])
       rr <- ping_fn(site)
       temp <- data.frame(URL=rr[2], reply=rr[3], package=rr[6])
       df_rr <- rbind(df_rr, setNames(temp, names(df_rr)))
}

head(df_rr)

and the result is binded in a single data.frame:

5

 

3.5. Systeminfo

Assume, you need the information about the system not only about the R environment. Some of the R functions are available here:

# R
R.Version()
Sys.info()
.Platform

And all are very useful. But if you need Systeminfo CMD information into your R IDE environment, save it in data.frame, using system2() function is a way to go.

# Using system2

cmd_command <- "systeminfo" 
rr <- system2(command=cmd_command,
stdout=TRUE,
stderr=TRUE, 
wait = TRUE)

and getting the result:

6

So grabbing the e.g.: “System Manufacturer” just walk to the 13th position in the vector:

rr[13]

 

3.6. Tasklist

Getting the list of all the running task

cmd_command <- "tasklist" 
rr <- system2(command=cmd_command,
               stdout=TRUE,
               stderr=TRUE, 
               wait = TRUE)

# getting the results into something more readable format
dd <- tibble::enframe(rr)
stack(dd)[1]

and can get all the additional information from the data.frame.

In addition, you can also kill a specific task

# kill a specific task
shell("taskkill /F /PID 6816")

with the returned information: “SUCCESS: Sent termination signal to the process with PID 6816.

3.7 Ipconfig / netstat

Getting the list IpConfig information or all the netstat information, the task is the same:

cmd_command <- "netstat" 
rr <- system2(command=cmd_command,
               stdout=TRUE,
               stderr=TRUE, 
               wait = TRUE)

and storing the results with tibble:

dd <- tibble::enframe(rr)

and the results are same as in CMD:

7

4. Conclusion

Using CMD in R IDE will for sure not be your daily driver when crunching numbers and wrangling data. But when you will need to operate closely with your files, datasets, system or tasks, system2 and shell functions are absolute winners to do the job.

 

As always, complete set of the code is available on Github.

Happy R-coding! 🙂

 

Tagged with: , , , , , , , , , , , , , , ,
Posted in Uncategorized
Categories
Follow TomazTsql on WordPress.com
Programs I Use: SQL Search
Programs I Use: R Studio
Programs I Use: Plan Explorer
Programs I use: Scraper API
Rdeči Noski – Charity

Rdeči noski

100% of donations made here go to charity, no deductions, no fees. For CLOWNDOCTORS - encouraging more joy and happiness to children staying in hospitals (http://www.rednoses.eu/red-noses-organisations/slovenia/)

€2.00

Top SQL Server Bloggers 2018
TomazTsql

Tomaz doing BI and DEV with SQL Server and R, Python, Power BI, Azure and beyond

Discover

A daily selection of the best content published on WordPress, collected for you by humans who love to read.

Revolutions

Tomaz doing BI and DEV with SQL Server and R, Python, Power BI, Azure and beyond

tenbulls.co.uk

tenbulls.co.uk - attaining enlightenment with the Microsoft Data and Cloud Platforms with a sprinkling of Open Source and supporting technologies!

SQL DBA with A Beard

He's a SQL DBA and he has a beard

Reeves Smith's SQL & BI Blog

A blog about SQL Server and the Microsoft Business Intelligence stack with some random Non-Microsoft tools thrown in for good measure.

SQL Server

for Application Developers

Business Analytics 3.0

Data Driven Business Models

SQL Database Engine Blog

Tomaz doing BI and DEV with SQL Server and R, Python, Power BI, Azure and beyond

Search Msdn

Tomaz doing BI and DEV with SQL Server and R, Python, Power BI, Azure and beyond

R-bloggers

Tomaz doing BI and DEV with SQL Server and R, Python, Power BI, Azure and beyond

Ms SQL Girl

Julie Koesmarno's Journey In Data, BI and SQL World

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Data Until I Die!

Data for Life :)

Paul Turley's SQL Server BI Blog

sharing my experiences with the Microsoft data platform, SQL Server BI, Data Modeling, SSAS Design, Power Pivot, Power BI, SSRS Advanced Design, Power BI, Dashboards & Visualization since 2009

Grant Fritchey

Intimidating Databases and Code

Madhivanan's SQL blog

A modern business theme

Alessandro Alpi's Blog

DevOps could be the disease you die with, but don’t die of.

Paul te Braak

Business Intelligence Blog

Sql Server Insane Asylum (A Blog by Pat Wright)

Information about SQL Server from the Asylum.

Gareth's Blog

A blog about Life, SQL & Everything ...

SQLPam's Blog

Life changes fast and this is where I occasionally take time to ponder what I have learned and experienced. A lot of focus will be on SQL and the SQL community – but life varies.