Custom Power BI visual for Line chart with two Y-Axis

Power BI support certain type of visuals that are by default available in the document. These are absolutely great and work perfectly fine, have a lot of capabilities to set properties and change the settings.

But every so often in past year, I have come across the fact that I wanted to visualize a typical line chart with two lines, each with it’s own Y-axis.

Let me give you a quick R example. First, very simple dataset, where there are two values, each with a slightly different scale: quantity in range from 499 to 760 and temperature  in range from 15 to 24.

2020-04-02 22_10_44-RStudio

So what would normally happen, when we put this two ranges on same Y-axis:

# Both on the same y-axis
plot(dataset[,3], type="l", col="red", lty=1, ylim=c(0,1000), 
ylab = "Qty and Temp")
# Add second line
lines(dataset[,2], col="brown",lty=1)
legend("topleft", c("Temperature", "Quantity"),
       col = c("red", "brown"), lty = c(1, 1))

With the following resutls:

2020-04-02 22_21_47-Plot Zoom

We see that the temperature has almost lost all the visible variance, where as, quantity still has some.

To deal with this, we need to shift one of the lines to right Y-axis. Following this, little additional coding:

par(mar = c(5, 5, 3, 5))
plot(dataset[, 2], type ="l", ylab = "Quantity",
main = "Quantity and temperature", xlab = "Date", col = "brown")
par(new = TRUE)
plot(dataset[,3], type = "l", xaxt = "n", yaxt = "n",
ylab = "", xlab = "", col = "red", lty = 1)
axis(side = 4)
mtext("temperature", side = 4, line = 3)
legend("topleft", c("Quantity", "Temperature"), col = c("brown", "red"),
   lty = c(1, 1))

And the result is much more obvious:

2020-04-02 22_26_10-Plot Zoom

And now we can see how the temperature fluctuates (for more than 10°C).

So now that we have a solution, let’s go back to Power BI.

1. Default Visual

With the same dataset, this would be “out-of-the-box” visual, available to you:

2020-04-02 22_34_34-twolines_chart - Power BI Desktop

A combination of Bar chart and line chart, each with it’s own y-axis. Great. But If I want two lines, this is not possible with out of the box visuals.

2. Using R or Python

Copying the R code from above introduction example into Power BI, literally makes it the same in Power BI. Same logic goes and applies to Python.

2020-04-02 22_40_23-twolines_chart - Power BI Desktop

It does the job beautifully.  Where is the trick with R or Python visuals. I have a couple:

  •  to many times I have seen data engineers who start to use Power BI, that R or Python is just an overkill to adopt quickly,
  • it takes coding to plot a graph and not everyone has a great idea how to tackle this issue, and
  • publishing and deploying Power BI with R or Python on on-prem Power BI reporting server will not work with neither – R or Python – visual.

This said, there is a lot of gap for improvement.

3. Downloading custom visuals

Yes, you can download a custom visual. Go ahead 🙂

4. Building your own custom visual

This area is still under-explored among the data engineers and business intelligence people. And purpose of this post is, not only to point them to start exploring on their own, but also to show them, that is not a big deal to tinker on their own.

I have done my on this website: Charts PowerBI.

So, let’s walk through how to do it.

1. Get a sample dataset. You can download (data_twoCharts.csv) mine as well from Github.

2. Go to https://charts.powerbi.tips/ and select New.

3. Drag and drop the csv file into desired Data field, and you should get the preview of data:

2020-04-02 23_14_38-PowerBi.Tips - Charts

4. Click Done.

5. The the consists of 1) blank canvas pane on right hand side and 2) Glyph pane, 3) data Pane and 4) Layers Pane.

2020-04-02 23_17_11-PowerBi.Tips - Charts

The Canvas pane can hold multiple plot segments, which we will use to generate two plots, one on top of the other.

6. Drag and drop the individual data columns (from data pane) onto canvas pane. I did, first the date column, and drag it on top of X-axis (you will see, it will snap automatically on it), and Quantity on the Y-Axis.

2020-04-02 23_23_02-PowerBi.Tips - Charts

7. Drag the Symbols in Glyph Pane. And click on  Links to  Create Links. This will automatically connect all the dots.

2020-04-02 23_25_22-PowerBi.Tips - Charts2020-04-02 23_25_38-PowerBi.Tips - Charts

8. You should get a graph like this.

2020-04-02 23_28_06-PowerBi.Tips - Charts

9. Great, half way done. Now resize the Plot Segment to reduce it to 50%. Click on blank canvas and Add anothoer Glyph, that will be associated with new Plot segment.

2020-04-02 23_36_27-PowerBi.Tips - Charts

10. Add another Plot segment (remember we are building two plot graphs, one on top of each other.)

2020-04-02 23_28_32-PowerBi.Tips - Charts2020-04-02 23_30_13-PowerBi.Tips - Charts

11. With new Plot segment, repeat the step 6, 7 and 8. Drag the Date ( to X-axis) and Temperature (not quantity) (to Y-axis) to canvas pane, drag Symbol to Glyph Pane and click Links to Create Links. And you should end up with:

2020-04-02 23_38_00-PowerBi.Tips - Charts

12. On new Plot (on right hand side), we want to switch Y-axis to right side, by clickling on Layer Pane for selected Plot Segment.2020-04-02 23_40_31-PowerBi.Tips - Charts

13. Last part is a bit finicky. With your mouse hover over the corner of left plot (and repeat with right plot), on green point and drag it over the other Plot. Yellow dotted line will appear to denote that you are expanding the graph.

2020-04-02 23_44_02-PowerBi.Tips - Charts

14. Once you do for both, there will be only “one” X-axis (one on top of the other), both Plot segments will be represented as layer on top of the layer.

2020-04-02 23_48_44-PowerBi.Tips - Charts

15. Optionally, some colour coding and graph design is super desired. This can be changed in the Layers Pane. Once you are happy and satisfied with your visual, export it as Power BI custom visual:

2020-04-02 23_48_56-PowerBi.Tips - Charts

and give it a proper name with labels for X and Y axis. These names will be visible in the Power BI document. And also, give a Custom Visual a proper name 🙂

2020-04-02 23_49_07-C__DataTK_git_DAX_Functions_04_Custom_Visual

 

Once you have done this, open Power BI and add it, the same way as all other additional/custom visuals:

2020-04-02 23_56_15-Useful_DAX_and_Power_BI_examples_for_everyday_usage.pptx - PowerPoint

From here on, you can use this visual with any dataset you want (it is not omitted to sample dataset you used for creating custom visual) and it is also ready to be published / deployed on on-prem Power BI Reporting server.

 

All code, data sample and Power BI document with custom visual are available on GitHub.

Happy PowerBI-ing. 🙂

 

 

 

 

Tagged with: , , , ,
Posted in Uncategorized

What and who is IT community? What does it take to be part?

This blog post is long over due and has been rattling in my head for long time. Simply and boldly put it, community is everyone involved behind the result of your search for a particular problem. And by that I am referring to the IT community.

Community (from now on, I am referring only to IT community) consists of people that are contributing and sharing their knowledge, experience, questions, solutions, FAQ, etc to broader group of people. Many (if not most) of us are doing this:

  • for 0 monetary exchange
  • in free (or extra) time
  • for building up a wider knowledge base

In return many of us expect from others:

  • same behavior – sharing when they find an interesting solution to a problem
  • Sharing their time, energy with helping others
  • participating in read and gradually answering questions from other users
  • commitment
  • gratitude and respect to others.

In the past couple of years I have seen decline in the above sharing-caring type of exchange and interaction. Yet, what I have seen more and more is:

  • decline in awareness of how much community is worth
  • decline in respect (disrespect) and gratitude to other fellow and community people
  • aging of the community
  • decline in general interest

Let me address each of the topics separately.

 

1. How much is the community worth?

Yes,  what a bizarre question. And for sure, I can not answer it. But I will paint a different picture. We have all heard sayings like: “programming is just learning how to google things”, but not everyone have had asked themselves, what does it mean. And where does all the hits, results, solutions come? From community. And from people stepping forward and taking time to invest into understanding someone’s problem and solving it. Or people investing time and giving a lecture for free at the local meeting. Or people taking time and writing a blog, posting a youtube video on how to do something. These are all the people that are contributing to this knowledge base. And next time, when you have a question or a problem that you would need to solve, remember, that there was a person or group of people that have invested more time into solving this issue before you. Remember this. This is how much the community is worth and much more.

 

2. Decline in respect

Pay respect to community people. You ask yourself, how to pay respect? It is very easy, how to show respect:

  • Say thanks, thank you (at the end of the forum post, at the end of the blog post, after listening to video online or after community event) and be grateful.
  • If you disagree or if you have a better solution, engage into conversation and propose the solution. But be polite, nice and respectful. Your improvement will be highly appreciated.
  • Give feedback, your words can count, especially when you describe your point of view
  • Don’t be afraid to give praise or to give critique. In general, when giving a positive or negative review, keep in mind to be precise and objective. Think about the others when writing either. Will it be helpful and worth to others or not?
  • Get involved and start helping others. Become an active member, start with observing, reading, listening, conversating and after time, start giving back; answers, ideas, blog posts, videos, speeches, etc. Don’t be afraid, if you are polite, kind, sharing and warm, community will embrace you.
  • Words like:  “sorry”, “I don’t know”, or admitting that you didn’t know or that you were wrong are not the sign of weakness, but are the sign of learning. this is the path for getting more knowledge and getting on track to start helping the others.
  • Respect the general conduct of the website, of the event or of the product. Play by the rules.

 

3. Aging of the community

The fact that the community is aging has to do with the social phenomenon  – lack of available literature before popularity of internet. Those, who were spending long period of times in libraries, analyzing every single page of available book, know and understand the the importance of available materials. Majority of these same IT-people are  the community contributors themselves.

These people have been growing with community in past 20 years (massive emergence of internet and democratization of mass media) and these people are also the big majority of community that are still giving back to community.  Drawing a line with any type of IT event that had been around for more than 10 years and you will find same people at the first iteration of these same events 10 years earlier.

Teaching the community to give back the knowledge, encourage them to start participating more and more in any kind of community work should start at young age. And convincing younger generation to start participating and enjoying the community should also be introduced and discussed. Only in these manner, the knowledge will be returned and not taken for granted.

4. Decline in general

How we live our lives and how technology had changed our habits directly (or indirectly) influence the community as well. With more and more different options to same subject-matter, many people have the capability to choose. Which is absolutely great, but can have a negative aspect:

  •  people apply on way to many free videos, webinars, events, far more than they are able to attend,
  • people are subscribed on way to many free and open posts, forums, aggregates, making them confused when choosing
  • more options is not necessarily a good option
  • To study one topic, people still need to take time, study it thoroughly
  • Fast changing technology and market needs make users hindered
  • Too many technologies, too little time

 

I have been involved in community over 20 years, covering variety of topics, programming languages (Delphi, .NET, Python, C#) , statistical languages (R, Python, SAS, IBM, SAP, Machine Learning, Data Science)  and database software (Microsoft SQL Server, SAP Hana, Oracle) and I have seen a decline in general, especially due to lack in general, lack of time, age gap and content gap. But at the same time, I have also seen many new – more practical – events, blogs, vlogs, articles, that try not only to make the existing community stay and tickle their brains, but also engage new people, younger people and teach people that sharing is also caring, and caring is building new connections, new ties and new possibilities.

Lastly, this is my personal view on the community, community work, evangelists and knowledge builders. Many of their them / us, do this out of sheer enthusiasm, energy, passion and drive and not because  of – as many would have though – recognition or financial aspects. I love to share the passion, the energy, enthusiasm and drive with anyone, who wants to get started on particular topic. But only you must find this in yourself, otherwise it is useless.

Feel free to share your personal view, your opinion, your positive or negative feedback, I would love to hear your view. Much appreciated.

With love!

Tagged with: , , , , , , , ,
Posted in Uncategorized

Copying Measures between Power BI datasets and Power BI files

Measures in Power BI document are DAX based aggregations for slicing and dicing the tabular datasets using evaluate and filter context and and data relationships. Short hand, measures are DAX formulas stored in Power BI as part the dataset.

For showing how easy (once you know) is to copy measures between different dataset, I have created a sample Power BI with diamonds dataset (also available on Github).

2020-02-11 21_17_59-

Measures have little calculators in front of the name: denoting its purpose.Now let’s do the copying part 🙂

1.Copy dataset

The first part we need to do, is to copy dataset in Power BI. There are many ways to copy a dataset, I have chosen the following way:

Go to Modeling -> Select New Table -> and and simply write in the DAX line the name of the dataset; Table = diamonds. I have later renamed the new dataset from “Table” to “diamonds_2”.

2020-02-11 21_18_59-Normal_PBI_with_Measures - Power BI Desktop

You will notice that none of the measures have had been copied, leaving the new dataset free of measures. So now, we need to get the measures in the new dataset.

2. Save as PBIT

The second step is to save the Power BI document as Template.  Go to File -> Save As -> and select Save as Type: Power BI Template File (*.pbit).

2020-02-11 21_20_04-Normal_PBI_with_Measures - Power BI Desktop

The biggest difference between your normal PBIX file and PBIT – template file – is, that the latter does not hold data. But what does PBIT file hold is :

  • Query definitions, query parameters, M and DAX language
  • Holds all information about data model definition
  • preserves all the visuals, elements, bookmarks, filters, etc.

And it can be easily shared in collaborative environment, since it is usually very small file.

3. Open Tabular Editor

Now, download the Tabular Editor from Github. It is a small but powerful program, and once the installation process has completed, open the program,  and open PBIT (template) file, saved in previous step.

2020-02-11 21_21_02-Tabular Editor 2.9.1

You will be presented with semantic description of your dataset, model and all the definitions. You can see, there are two datasets, one holding measures and the other one not.

2020-02-11 21_23_30-C__DataTK_git_DAX_Functions_03_Measures_Copy_Normal_PBI_with_Measures.pbit - Tab

4. Copy measures

Once in Tabular Editor, select all (they should be high-lightened)  the measures you want to copy and right-click on the mouse and select Duplicate 3 measures.

2020-02-11 21_23_47-C__DataTK_git_DAX_Functions_03_Measures_Copy_Normal_PBI_with_Measures.pbit - Tab

 

Program had generated exact copies of your selected measures. Select the copied measures (make sure they are high-lightened).

2020-02-11 21_24_08-C__DataTK_git_DAX_Functions_03_Measures_Copy_Normal_PBI_with_Measures.pbit_ - Ta

And now simply drag them into the new dataset (dataset: diamonds_2).

Alternatively, you can use CTRL + C and CTRL + V (copy/paste) and delete copies in original dataset. Both ways are very straight-forward.

2020-02-11 21_24_36-C__DataTK_git_DAX_Functions_03_Measures_Copy_Normal_PBI_with_Measures.pbit_ - Ta

 

5. Editing the measures

Last touches to be done is, to change the datasets for each of the measures, yes manually.  For each of the measures, under the expression, I need to change the dataset from diamonds -> diamonds_2.

2020-02-11 21_27_11-C__DataTK_git_DAX_Functions_03_Measures_Copy_Normal_PBI_with_Measures.pbit_ - Ta

Or to put it in DAX perspective; from:

IF(SUM(diamonds[price])>1000000 ,SUM('diamonds'[price])*1.2, 
SUM('diamonds'[price]))

to:

IF(SUM(diamonds_2[price])>1000000 ,SUM('diamonds_2'[price])*1.2, 
SUM('diamonds_2'[price]))

 

And so you are finished.  If you want to do bulk export of all the measures, I strongly recommend to use DAX Studio.

Comparing the original and copied measure definitions, you will see that the copied process added some brackets.

2020-02-11 21_32_36-Presentation1 - PowerPoint

 

As always, the files (PBIX, PBIT and CSV) are available at Github.

 

Happy Power BI-ing 🙂

 

Tagged with: , , , , , , ,
Posted in Uncategorized

The palindrome of 02.02.2020

As of writing this blog-post, today is February 2nd, 2020. Or as I would say it, 2nd of February, 2020. There is nothing magical about it, it is just a sequence of numbers. On a boring Sunday evening, what could be more thrilling to look into this little bit further 🙂

Let’s kick R Studio and start writing a lot of useless stuff. First, we don’t need a function, but since, this is all about useless stuff, let’s make a useless palindrome function:

palindrome <- function(date){
identical(date, paste(rev(strsplit(date, "")[[1]]), collapse=""))
}

Once we have the function, let’s create a range of dates we want to explore. The range will be set from 1st of January 1000 until 1st of January 2500.

dd <- data.frame(seq(as.Date("1000/01/01"), as.Date("2500/01/01"), "days"))

I don’t want to get into useless debate whether the Gregorian Calendar was already accepted worldwide or not, but if you want to read the 16th Century history, click the link.

Now, the most useless part is a loop, that will take us from the pre-historic age all the way to the future:

#empty df
df <- data.frame(paldate=as.character())

#loop through all the dates
for (i in 1:nrow(dd)){ 
  dat <- dd[i,1]
   #Year Month Day format
   dat <- format(dat, "%Y%m%d")

   #Year Day month format
   #dat <- format(dat, "%Y%d%m")

  if (palindrome(dat) == TRUE) {
     df <- rbind(df, data.frame(paldate=as.character(dat)))
                              }
}

 

Issues

Following are the Issues I am having with this pretty useless Palindrome fact.

  1.  Year Format  – Abbreviated or non-abbreviated format; ’20 or 2020?
  2. Leading zeros – Today’s date (02.02.) would have never been a palindrome if not for the leading zeros. Otherwise, it would have just been 2.2.
  3. American vs. European Date format. Today’s date is fully interchangeable, but what if the date would have been February 22nd, of 2022?
    1. European: 22.02.2022 -> Is Palindrome
    2. American: 02.22.2022 -> Is not Palindrome.

Useless statistics

Issues aside, let’s do the comparison for the fun between American and European date format.

Date range will remain the same; from 01/01/1000 until 01/01/2500.

Number of Palidromes between the US and EU date format is 79 vs. 121:

01

 

Let’s check the distribution between the two formats and put them on graphs for easier visualization with ggplot:

library(plyr)
library(ggplot2)
library(gridExtra)
df_ALL_m_d <- ddply(df_ALL, "Region", summarise, grp.mean=median(Day))

ggplot(df_ALL, aes(x=Day, color=Region, fill=Region)) +
geom_histogram(fill="white", position="identity")+
theme(legend.position="top")
pd<-ggplot(df_ALL, aes(x=Day, color=Region,fill=Region)) +
geom_histogram(fill="white", position="identity", binwidth = 2)+
geom_vline(data=df_ALL_m_d, aes(xintercept=grp.mean, color=Region),
linetype="dashed")+
theme(legend.position="top")
pd

grid.arrange(p, pm, pd, ncol = 1)

And the distribution comparison is:

2

Is is clear from the graphs that the time formatting plays significant role for a particular date to be a palindrome or not.

Since we can see from the graphs that days and months are significantly different between the EU and US formats (months for EU format are in appearing only on January, February, November and December, where US format are ranging through all the months, and days are the exact opposite).

With following R code:

## Get distribution for the days in the year (13th day of the year, 
## 241st day of the year, etc)

v <- ggplot(df_ALL, aes(y=DaysDiff, x=Region)) + 
    geom_violin(trim=FALSE) + 
    geom_dotplot(binaxis='DaysDiff', stackdir='center', dotsize=1)

b <- ggplot(df_ALL, aes(x=Region, y=DaysDiff)) + geom_boxplot() 
b <- b + geom_jitter(shape=16, position=position_jitter(0.2))

grid.arrange(v, b, nrow = 1)

We can generate this:

3

 

We can again see, that the US distribution of the differences from the Palindrome date until the end of the year (difference is the number of days between December, 31st and the palindrome date), is in EU date format bimodal where as in US format evenly distributed.

Pretty useless, I guess.

For those who want to dig into more useless stuff, complete R code is here.

Happy R-coding.

Tagged with: , , , ,
Posted in Uncategorized

My MSSQL database is corrupt and I can not get to my predictive models. Now what? Stellar to the rescue.

Small project database, with old backup, ended up in corrupt state. This happened.  A corrupted database; all-in-all 150MB.  I was super annoyed. Let alone, I had some R and Python code there and some – yes 🙂 – predictive models, I forgot to have automated backup created. Data stored in database should not be a problem, but the table with last stored model was the information / row, I needed

So the problem with serialized models stored in a table is, that it needs to be fully compatible and correctly restored to the last bit in order for R / Python to be read correctly. So missing one bit, and the model is literally useless.

My models are stored in T-SQL Table:

SELECT  
        [model_name]
       ,[model]
       ,[accuracy]
       ,[model_date]
       ,[version_Model]
       ,[Approved_by]
FROM [SQLR].[dbo].[Predictive_models]

Once I calmed my self down, I started with my options. Worse case scenario, I can restore two-months old backup, but what are my other options?

Using the Microsoft built-in Database Console Command

DBCC CHECKDB

utility.  The utility comes with multiple repair options to resolve corruption errors in MS SQL database.  But, yeah,  not that it is a super challenging but I started to repair with option:

DBCC CHECKDB REPAIR_ALLOW_DATA_LOSS

and it ended with data loss. Hurray for the better restore.  But still was not successful enough due to the data loss.

Digging further, I used  PowerShell Restore-SqlDatabase cmdlet for restoring SQL Server database. But, was left empty-handed, since this works only with when you have a restorable backup.

Digging for further information, I googled up and came to this blogpost, by Stellar. Next thing you know, I am downloading the software, and looking back now, I was the best choice.

sql-box

The benefits using Stellar software repair for MS SQL Server:

  • Comes with a simple to use GUI which makes the repair process straightforward.
  • Helps repair SQL database quickly in three simple steps: 1. Select database file. 2. Repair database file. 3. Preview and Save repaired file.
  • Recovers all major database components of SQL server including tables, views, indexes, stored procedures, etc.
  • Resolves corruption errors in databases running on MS SQL Server 2019, 2017, 2016, and older versions

Additional Benefits:

  • Repairs SQL tables with PAGE and ROW compression.
  • Allows saving the repaired file in MS SQL, HTML, XLS, and CSV formats
  • Provides Windows and SQL authentication options to connect to the server
  • Supports Standard Compression Scheme for Unicode (SCSU) for SQL Server 2008 R2.

(based on the info/sales text from their website)

Step-by-step to repair corrupt .mdf file with Stellar Repair for MS SQL Server

After downloading, stop the MS SQL Server service and copy database (mdf. file) to different location.

1. The Instruction window is displayed. Click OK.

1-Software Instruction

2.  Select Database window appears. Click Browse to select .mdf file you want to repair.

2-Main Interface

Great feature is also, the ability to recover deleted database records; in this case select ‘Include Deleted Records’ checkbox (and make sure that you have the appropriate database recovery model selected).

After selecting the desired file, click repair to engage the process. Hopefully only one mdf file is corrupt 🙂 But you can also select multiple.

3-Selected database

There is also the ability to select the SQL Server version if the software is not able to detect the MSSQL Server database file version.  In this case, just  select the version of the SQL Server where your database was originally created.

4-Select Version

After the process completes with recovery, you can inspect what has been recovered by Stellar.

2020-01-16 08_48_03-Photos

 

 And I can recover only selected table and prior to recovering / saving, I can also inspect the content of my data in right pane.  Since the models are serialized and stored as blob, I can only hope that R will be able to de-serialized it and used it again using sp_execute_external_script.

Save the data back to  MDF file and later attach it to the SQL Server or in my case, exporting to CSV file is also a possibility since the table was not that big.

After attaching back new MDF file with table holding the models:

2020-01-16 08_55_13-SQLQuery8.sql - SNTK.UmbriaPrediction (SNTK_Tomaz (77))_ - Microsoft SQL Server

i’m able to run successfully the R script and score new values.

After all, I am happy that I found this software that helped me dealing with corrupt table / database and I can score the data on my older model that has great accuracy.

As always, happy coding!

Tagged with: , , , , , , , ,
Posted in Uncategorized

Sudoku game with R

Sudoku is a classical logical game based on combinatorial number replacement puzzle. Objective is to to fill 9×9 matrix with digits so that each column, each row, and each box (3×3 sub-grid) of nine contain all of the digits from 1 to 9.

Solving sometimes can be a nagging process. For this purpose, here is the R helper function for you to solve this with R.

grid

Let’s get the Sudoku grid we want to solve into R:

sudoku <- matrix(data=c(
6,0,0,2,1,0,0,3,0,
5,0,9,0,0,0,6,0,0,
2,0,0,9,7,0,0,0,4,
0,0,2,3,0,4,0,0,0,
0,6,0,0,5,0,0,9,0,
0,0,0,1,0,9,7,0,0,
9,0,0,0,3,8,0,0,6,
0,0,7,0,0,0,2,0,5,
0,8,0,0,4,2,0,0,9), nrow=9, ncol=9, byrow=FALSE
)

 

Now, we will need a function that will find all the 0 values – these are the values we need to work on.

get_zeros <- function(board_su){
  #empty df
  df <- data.frame(i=NULL,j=NULL)
  for (i in 1:nrow(board_su)){
    for (j in 1:ncol(board_su)){
      if (board_su[i,j] == 0) {
        a <- data.frame(i,j)
        #names(a) <- c("i", "j")
        #df <- rbind(df, a)
        df <- a
        return(df)
      } 
    }
  }
}

In addition we will need a function to solve and validated the solution.

Function validater will validate for the sudoku board a particular solution at a particular position:

validater(sudoku, 1, c(1,4))

In matrix, at position x=1, y=4, where there is 0, it will test if number 1 is valid or not. If the number is valid, it returns TRUE (number) to outer function for finding complete solution.

This function iterates through all the possible 0-positions and iterates through solutions that are still available based on the rules:

  •  each row can contain only one number in range of 1..9
  • each column can contain only one numer in range of 1..9
  • each sub-grid of 3×3 can contain only one number in range of 1..9

And the nuts and bolts of the validater function:

validater <- function(board_su, num, pos=c(NULL,NULL)){
  status <- FALSE
  a <- as.integer(pos[1])
  b <- as.integer(pos[2])
  num <- as.integer(num)
  while (status == FALSE) {
    for (i in 1:9) {    
      if ((board_su[a,i] == num & b != i) == TRUE) {
        status <- FALSE
        return(status)
      }
    }
    
    for (i in 1:9) {    
      if ((board_su[i,b] == num & a != i) == TRUE) {
        status <- FALSE
        return(status)
      }
    }
    
    #which box are we in
    boxNx <- as.integer(ifelse(as.integer(b/3)==0, 1, as.integer(b/3)))
    boxNy <- as.integer(ifelse(as.integer(a/3)==0, 1, as.integer(a/3)))
    
    #looping through the box
    for (i in boxNy*3:(boxNy*3 + 3)) {
      for (j in  boxNx * 3 : (boxNx*3 + 3)) {
        if ((board_su[i, j] == num &  i != a & j != b) == TRUE){
          status <- FALSE
        }
      }
    }
    status <- TRUE
    return(status) 
  }
}

With the following solution:

solution

For sure, this is not to be taken seriously, as you get the application on your mobile phone where you make a photo of your grid to be solved and the phone solves it for you, using library like OpenCV. The code was created only and just for fun (and because the Advent of Code for 2019 is over).

Happy R coding 🙂

As always, the code is available at Github.

Tagged with: , , ,
Posted in Uncategorized

Working with Windows CMD system commands in R

From time to time, when developing in R, working and wrangling data , preparing for machine learning projects, it comes the time, one would still need to access the operating system commands from/in R.

In this blog post, let’s take a look at some most useful cmd commands when using R.  Please note, that the cmd commands apply only to windows environment, for Linux/MacOS, the system commands should be slightly changed, but the wrapper R code should remains the same.

1

1. R commands to access the CMD

R commands that are used for accessing, running and doing changes through cmd are:

  • system
  • system2
  • shell
  • shell.exec

All invoke the OS command that is specified and followed. All of the stated functions are part of the base R, so no additional packages are needed.

Let’s say, we want to start with DIR command and return the results in RStudio.

## 1 Simple dir command with order switch
system("dir /o")
system2("dir /o")
shell("dir /o")
shell.exec("dir /o")

and you will see that shell command will output the results back to console pane in R Studio, where as the others will not give you the immediate desired output. The main difference is that shell function will run the desired command under the shell (following the POSIX standard or here you can find POSIX functions)  and the system command will invoke OS command.

In R Base package,a set of dedicated functions is also available to create, open, append and close file and URL connections, as well as opening and loading the compressed files. Just listing the functions:

  • object connections: file, url
  • compression and encoding: gzfile, bzfile, xzfile, unz, pipe
  • connection functions: open,close,flush
  • boolean functions: isOpen, isIncomplete

Rule of thumb (with default settings) on when to use system or shell function, shell will work always, system will only work when the system settings (paths) are correctly prepared. Example:

### Checking the usage
Sys.which("copy")
Sys.which("ping")

and the result is that system and system2 functions will work for ping CMD command, but not for copy CMD command, and later in the blogpost, also the rename, move command.

4

2. Chaining the CMD commands in R

For the simplicity, let’s continue with shell function and try how chaining can be achieved.

Following example will not work, as every time, the shell function is called, the environment is initiated. So running shell sequentially will not work!

setwd("C:\\Users\\Tomaz")

shell("dir")
shell("cd ..")
shell("dir")

Chaining the cmd commands together in one command will work:

shell("dir && cd .. && dir")

Note that this will not alter your working directory, so every time the shell is completed, you will be returned back to initiated environment. By checking:

getwd()

you can see, that the working directory has not changed. Checking before and after the R environment variables, these will also remained intact:

Sys.getenv(c("R_HOME","HOME"))

You can also define the variable, use the echo command, use the build in system variables like %TIME% and many more. More is available at Github.

3. Most typical CMD commands executed from R

Here is the list of some CMD commands we will encapsulate in R:

  • fc / compare
  • rename
  • move / copy
  • ping (pingpath)
  • systeminfo
  • tasklist (taskkill)
  • ipconfig / netstat

 

3.1. Comparing two files in CMD

Having capability to compare two files in CMD is just another of many ways to check if your dataset is the same.

Let’s create two *.csv files with iris dataset and run the file compare in cmd:

setwd("C:\\Users\\Tomaz")

#create two files
write.csv(iris, file="iris_file1.csv")
write.csv(iris, file="iris_file2.csv")

#compare both files
shell("FC /a C:\\Users\\Tomaz\\iris_file1.csv 
             C:\\Users\\Tomaz\\iris_file2.csv")

The result information is: FC: no differences encountered. Same as if the command would be run from command line:

2

Same result could be achieved with R function that comes in base build: all.equal.

file1 <- "C:\\Users\\Tomaz\\iris_file1.csv"
file2 <- "C:\\Users\\Tomaz\\iris_file2.csv"

# or in R using all.equal
all.equal(readLines(file1), readLines(file2))

Since we are in R, we would like to have results stored in data.frame. To do so, let’s use the previously mentioned System2 function. This is a newer version of System function that uses arguments and commands separately and some additional settings.

# run the same command using newer function "system2" and set the arguments

cmd_command <- "FC" 
#cmd_args <- "/a C:\\Users\\Tomaz\\iris_file1.csv 
                 C:\\Users\\Tomaz\\iris_file2.csv"
cmd_args <- c('/a', file1, file2)

rr <- system2(command=cmd_command,
                      args= cmd_args, 
                      stdout=TRUE,
                      stderr=TRUE, 
                       wait = TRUE)

#suppose we want to store the results in data.frame
#empty dataframe
df_rr <- data.frame(file1 = character(),
                    file2 = character(),
                    fcompare = character(),
                    stringsAsFactors=FALSE)

#temporary results
temp <- data.frame(file1=file1, file2=file2, fcompare=rr[2])

#bind all into dataframe
df_rr <- rbind(df_rr, setNames(temp, names(df_rr)))

 

with results stored in data.frame when comparing multiple files.

3

3.2. Renaming a file in CMD

Keeping the file with iris dataset: C:\\Users\\Tomaz\\iris_file1.csv, let’s do the renaming.

# renaming the file using shell command
shell("cd c:\\users\\tomaz && ren iris_file1.csv iris_file1_copy.csv")

and with the parametrized shell function (remember the copy will not work due to Sys.which(“copy”) is returned empty):

file_old_name <- "c:\\users\\Tomaz\\iris_file2.csv"
file_new_name <- "iris_file2_new.csv"

cmd_command <- paste("RENAME", file_old_name, file_new_name) 

# System2 Does not work
# system2(command=cmd_command)
shell(cmd_command)

In both ways, the files are renamed. Fortunately or not, the output in this case is none, when there is no syntax error and file exists.

3.3. Copy a file in CMD

Keeping the file with iris dataset: C:\\Users\\Tomaz\\iris_file1.csv, let’s do also the copying.

#Copying file
shell("copy c:\\Users\\Tomaz\\iris_file1.csv iris_file1_copy.csv")

and with the parametrized shell function:

orig_file <- "c:\\Users\\Tomaz\\iris_file1.csv"
copy_file <- "iris_file1_copy1.csv"
command <- "copy" 

cmd_command <- paste(command, orig_file, copy_file)
shell(cmd_command)

Now check your files.

3.4. Ping

Since ping CMD is available with the sys.which() function, we can use it directly as:

URL <- 'tomaztsql.wordpress.com'

cmd_command <- "Ping" 
cmd_args <- c('-n 1', URL)

system2(command=cmd_command,
           args=cmd_args, 
           stdout=TRUE,
           stderr=TRUE, 
           wait = TRUE)

And since we want to check many URLs, let’s create an empty data.frame to store the result and a function to walk through the list.

URLs <- c("google.com", "tomaztql.wordpress.com", "youtube.com")

#empty dataframe
df_rr <- data.frame(URL = character(),
                   reply = character(),
                   package = character(),
                   stringsAsFactors=FALSE)

ping_fn <- function(url) {
         system2("ping",c(url,' -n 1'),
         stdout=TRUE,
         stderr=TRUE)
}

for (i in 1:length(URLs)){
       site <- print(URLs[i])
       rr <- ping_fn(site)
       temp <- data.frame(URL=rr[2], reply=rr[3], package=rr[6])
       df_rr <- rbind(df_rr, setNames(temp, names(df_rr)))
}

head(df_rr)

and the result is binded in a single data.frame:

5

 

3.5. Systeminfo

Assume, you need the information about the system not only about the R environment. Some of the R functions are available here:

# R
R.Version()
Sys.info()
.Platform

And all are very useful. But if you need Systeminfo CMD information into your R IDE environment, save it in data.frame, using system2() function is a way to go.

# Using system2

cmd_command <- "systeminfo" 
rr <- system2(command=cmd_command,
stdout=TRUE,
stderr=TRUE, 
wait = TRUE)

and getting the result:

6

So grabbing the e.g.: “System Manufacturer” just walk to the 13th position in the vector:

rr[13]

 

3.6. Tasklist

Getting the list of all the running task

cmd_command <- "tasklist" 
rr <- system2(command=cmd_command,
               stdout=TRUE,
               stderr=TRUE, 
               wait = TRUE)

# getting the results into something more readable format
dd <- tibble::enframe(rr)
stack(dd)[1]

and can get all the additional information from the data.frame.

In addition, you can also kill a specific task

# kill a specific task
shell("taskkill /F /PID 6816")

with the returned information: “SUCCESS: Sent termination signal to the process with PID 6816.

3.7 Ipconfig / netstat

Getting the list IpConfig information or all the netstat information, the task is the same:

cmd_command <- "netstat" 
rr <- system2(command=cmd_command,
               stdout=TRUE,
               stderr=TRUE, 
               wait = TRUE)

and storing the results with tibble:

dd <- tibble::enframe(rr)

and the results are same as in CMD:

7

4. Conclusion

Using CMD in R IDE will for sure not be your daily driver when crunching numbers and wrangling data. But when you will need to operate closely with your files, datasets, system or tasks, system2 and shell functions are absolute winners to do the job.

 

As always, complete set of the code is available on Github.

Happy R-coding! 🙂

 

Tagged with: , , , , , , , , , , , , , , ,
Posted in Uncategorized
Categories
Follow TomazTsql on WordPress.com
Programs I Use: SQL Search
Programs I Use: R Studio
Programs I Use: Plan Explorer
Programs I use: Scraper API
Rdeči Noski – Charity

Rdeči noski

100% of donations made here go to charity, no deductions, no fees. For CLOWNDOCTORS - encouraging more joy and happiness to children staying in hospitals (http://www.rednoses.eu/red-noses-organisations/slovenia/)

€2.00

Top SQL Server Bloggers 2018
TomazTsql

Tomaz doing BI and DEV with SQL Server and R, Python, Power BI, Azure and beyond

Discover

A daily selection of the best content published on WordPress, collected for you by humans who love to read.

Revolutions

Tomaz doing BI and DEV with SQL Server and R, Python, Power BI, Azure and beyond

tenbulls.co.uk

attaining enlightenment with sql server, .net, azure devops, windows and linux

SQL DBA with A Beard

He's a SQL DBA and he has a beard

Reeves Smith's SQL & BI Blog

A blog about SQL Server and the Microsoft Business Intelligence stack with some random Non-Microsoft tools thrown in for good measure.

SQL Server

for Application Developers

Business Analytics 3.0

Data Driven Business Models

SQL Database Engine Blog

Tomaz doing BI and DEV with SQL Server and R, Python, Power BI, Azure and beyond

Search Msdn

Tomaz doing BI and DEV with SQL Server and R, Python, Power BI, Azure and beyond

R-bloggers

Tomaz doing BI and DEV with SQL Server and R, Python, Power BI, Azure and beyond

Ms SQL Girl

Julie Koesmarno's Journey In Data, BI and SQL World

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Data Until I Die!

Data for Life :)

Paul Turley's SQL Server BI Blog

sharing my experiences with the Microsoft data platform, SQL Server BI, Data Modeling, SSAS Design, Power Pivot, Power BI, SSRS Advanced Design, Power BI, Dashboards & Visualization since 2009

Grant Fritchey

Intimidating Databases and Code

Madhivanan's SQL blog

A modern business theme

Alessandro Alpi's Blog

DevOps could be the disease you die with, but don’t die of.

Paul te Braak

Business Intelligence Blog

Sql Server Insane Asylum (A Blog by Pat Wright)

Information about SQL Server from the Asylum.

Gareth's Blog

A blog about Life, SQL & Everything ...

SQLPam's Blog

Life changes fast and this is where I occasionally take time to ponder what I have learned and experienced. A lot of focus will be on SQL and the SQL community – but life varies.