Monday, December 10, 2018

Part 1: Questions and answer about R service in SQL Server

How to verify R exists in your SQL instance
Run the code below, sometime you might need to enable external scripting and SQL Server Launchpad service to start.
EXECUTE sp_execute_external_script
@language =N'R',
@script=N'print(version)';
GO
You can see that SQL Server in this case have R version 3.3.3


How to get the list of R packages that are already installed in your SQL Server instance
EXECUTE sp_execute_external_script
@language = N'R'
, @script = N'
OutputDataSet <- data.frame(installed.packages()[,c("Package", "Version", "Depends", "License","LibPath")]);'
WITH result sets((Package NVARCHAR(255), Version NVARCHAR(100), Depends NVARCHAR(4000), License NVARCHAR(1000), LibPath NVARCHAR(2000)));


What are the parameters available can be used to execute sp_execute_external_script
sp_execute_external_script   
   @language = N'language',   
   @script = N'script'  
   [ , @input_data_1 = N'input_data_1' ]   
   [ , @input_data_1_name = N'input_data_1_name' ]  
   [ , @output_data_1_name = N'output_data_1_name' ]  
   [ , @parallel = 0 | 1 ]  
   [ , @params = N'@parameter_name data_type [ OUT | OUTPUT ] [ ,...n ]' ]
   [ , @parameter1 = 'value1' [ OUT | OUTPUT ] [ ,...n ] ]
WITH RESULT SET ()
Let’s looks a simple example:
EXECUTE sp_execute_external_script
   @language = N'R'
   , @script = N'OutputDataSet <- InputDataSet;'
   , @input_data_1 = N'SELECT ''Uma'',''Bale'',38;'
WITH RESULT SETS (([FirstName] NVARCHAR(5),[LastName] NVARCHAR(5),[Age] INT));

@language = N'language'
Indicates the script language. The valid values are R (SQL Server 2016 and later), Python (SQL Server 2017 and later), and Java (SQL Server 2019 preview)
@script = N'script'
External language script specified as a literal or variable input. Script is nvarchar(max). Default input and output variables of sp_execute_external_script: InputDataSet and OutputDataSet
@input_data_1
specifies the input data used by the external script in the form of a Transact-SQL query. The data type of input_data_1 is nvarchar(max).
@input_data_1_name
Specifies the name of the variable used to represent the query defined by @input_data_1. The data type of the variable in the external script depends on the language. In case of R, the input variable is a data frame. In the case of Python, input must be tabular. input_data_1_name is sysname. Default value is InputDataSet.
@output_data_1_name
Specifies the name of the variable in the external script that contains the data to be returned to SQL Server upon completion of the stored procedure call. The data type of the variable in the external script depends on the language. For R, the output must be a data frame. For Python, the output must be a pandas data frame. output_data_1_name is sysname. Default value is OutputDataSet.
@parallel
Enable parallel execution of R scripts by setting the @parallel parameter to 1. The default for this parameter is 0 (no parallelism).
WITH RESULT SET
By default, result sets returned by this stored procedure are output with unnamed columns. Column names used within a script are local to the scripting environment and are not reflected in the outputted result set. To name result set columns, use the WITH RESULT SET clause of EXECUTE.
EXECUTE sp_execute_external_script
 @language = N'R'
 , @script = N' SQL_out <- SQL_in * 5;'
 , @input_data_1 = N'SELECT 2 as Col;'
 , @input_data_1_name  = N'SQL_in'
 , @output_data_1_name =  N'SQL_out'
 WITH RESULT SETS (([Output] INT NOT NULL));
Cheers!
Uma