6月 272017
 

When I teach my Data Cleaning course, the last topic I cover in the two-day course is SAS Integrity Constraints.  I find that most of the students, who are usually quite advanced programmers, have never heard of Integrity Constraints (abbreviated ICs).  I decided a short discussion on this topic would [...]

The post Keeping your data set clean: Integrity constraints appeared first on SAS Learning Post.

6月 262017
 

The role of analytics in combating terrorism Earlier this spring, I found myself walking through a quiet and peaceful grove of spruce trees south of the small hamlet of Foy outside of Bastogne, Belgium.  On travel in Europe, I happened to have some extra time before heading to London.  I [...]

The analytics of evil was published on SAS Voices by Steve Bennett

6月 262017
 

The role of analytics in combating terrorism Earlier this spring, I found myself walking through a quiet and peaceful grove of spruce trees south of the small hamlet of Foy outside of Bastogne, Belgium.  On travel in Europe, I happened to have some extra time before heading to London.  I [...]

The analytics of evil was published on SAS Voices by Steve Bennett

6月 262017
 

My presentation at SAS Global Forum 2017 was "More Than Matrices: SAS/IML Software Supports New Data Structures." The paper was published in the conference proceedings several months ago, but I recently recorded a short video that gives an overview of using the new data structures in SAS/IML 14.2:


If your browser does not support embedded video, you can go directly to the video on YouTube.

I know that customers will be excited about programming with lists and tables. Now you can create complex data structures in the SAS/IML language, including in-memory data tables, arrays of matrices, lists of inhomogeneous data, and hierarchical data structures.

If you prefer to read about lists and tables, the following blog posts give an overview of these data structures:

For more details about tables and lists, see the following chapters in the SAS/IML User's Guide.

The post Video: Create and use lists and tables in SAS/IML appeared first on The DO Loop.

6月 242017
 

To demonstrate the power of text mining and the insights it can uncover, I used SAS Text Mining technologies to extract the underlying key topics of the children's classic Alice in Wonderland. I want to show you what Alice in Wonderland can tell us about both human intelligence and artificial [...]

AI in wonderland was published on SAS Voices by Frederic Thys

6月 232017
 

Dictionary tables are one of the things I love most about SQL! What a useful thing it is to be able to programmatically determine what your data looks like so you can write self-modifying and data-driven programs. While PROC SQL has a great set of dictionary tables, they all rely [...]

The post Jedi SAS Tricks - FedSQL Dictionary Tables appeared first on SAS Learning Post.

6月 232017
 
1. 准备工作
1.1 安装python
目前支持python3.6,直接下载anaconda3,
Anaconda 4.4.0 For Windows python3.6 64位
https://www.continuum.io/downloads
下载完后,直接安装,在配置环境变量环节,选择需要配置环境变量
1.2 下载 visual studio community 2017安装VC编译环境
https://www.visualstudio.com/zh-hans/, 下载 visual studio community 2017
安装的时候,在工作负载的tab页面,选择使用c++的桌面开发
环境变量设置
在PATH环境变量添加
D:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.10.25017\bin\HostX64\x64
D:\Program Files (x86)\Microsoft Visual Studio\2017\Community\Common7\IDE
新建LIB环境变量
D:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.10.25017\lib\x64
新建INCLUDE环境变量
D:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.10.25017\include
1.3 cuda安装
下载cuda https://developer.nvidia.com/cuda-downloads
直接安装,自动配置cuda相应的环境变量,可以去环境变量配置页面看看CUDA_PATH的设置
下载cuDNN,https://developer.nvidia.com/cudnn
cudnn-8.0-windows10-x64-v5.1,解压文件,将文件夹放到cuda的安装文件夹下。拷贝cudnn bin下的cudnn64_5.dll至
cuda安装文件夹下的bin下面。
2. tensorflow-gpu安装
pip install --upgrade -I setuptools
pip install --upgrade tensorflow-gpu
3. 测试
import tensorflow as tf
a = tf.constant(2.0, dtype=tf.float32)
b = tf.constant(3.0, dtype=tf.float32)
sum = tf.add(a, b)
with tf.Session() as sess:
    print(sess.run([sum]))
运行日志:
D:\Anaconda3\python.exe E:/PycharmProject/deeplearning/demo.py
2017-06-19 11:18:46.312109: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE instructions, but these are available on your machine and could speed up CPU computations.
2017-06-19 11:18:46.312532: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE2 instructions, but these are available on your machine and could speed up CPU computations.
2017-06-19 11:18:46.312924: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
2017-06-19 11:18:46.313309: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-06-19 11:18:46.313686: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-06-19 11:18:46.314053: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-06-19 11:18:46.314408: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-06-19 11:18:46.314767: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-06-19 11:18:48.199218: I c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:940] Found device 0 with properties:
name: GeForce 940MX
major: 5 minor: 0 memoryClockRate (GHz) 1.189
pciBusID 0000:01:00.0
Total memory: 4.00GiB
Free memory: 3.36GiB
2017-06-19 11:18:48.199659: I c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:961] DMA: 0
2017-06-19 11:18:48.199868: I c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:971] 0:   Y
2017-06-19 11:18:48.200090: I c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce 940MX, pci bus id: 0000:01:00.0)
[5.0]
Process finished with exit code 0

 
 Posted by at 9:47 下午
6月 232017
 

Here in the US, we typically use top level domains such as .com, .gov, and .org. I guess we were one of the first countries to start using web domains in a big way, and therefore we kind of got squatter's rights. As other countries started using the web, they [...]

The post A map of country code top-level domains (ccTLD) appeared first on SAS Learning Post.

6月 212017
 

IT organizations today are constantly challenged to do more with less. Reusing data processing jobs and employing best practices in monitoring the health of your data are proven ways to improve the productivity of data professionals. Dataflux Data Management Studio is a component of both the SAS Data Quality and the SAS Data Management offerings that allows you to create data processing jobs to integrate, cleanse and monitor your data.

You can write global functions for SAS Data Management jobs that can be reused in any expression in the system, in either data or process flow jobs. Global functions can be called from expression nodes, monitor rules, profile filters, surviving record indicators, process flow if-nodes and more.

Global functions are defined in a text file and saved in the Data Management installation directory under “etc/udf” in Data Management Studio or Data Management Server respectively.

Each global function has to have a unique name and is wrapped with a function / end function block code, can process any number of input parameters and returns a single value of either integer, real, date, string or boolean type.

Hello World

For a start, let’s create a “hello world” function.

  • If it does not exist, create a folder in the installation directory under “etc/udf” (DM Studio and DM Server).
  • In “etc/udf” create a file named hello_world.txt.
  • In the hello_word file create the function as follows:
function hello_world return string
return “hello world!”
end function
  • Save the file and restart DM Studio, if necessary, in order to use hello_world().

The new function is fully integrated in Data Management Studio. You can see the new function in an expression node under Function->Other or as expression language help in the expression node editor.

Handling Parameters

Global functions can handle any number of parameters. Parameter helper functions are available to access input parameters inside a function:

  • paramatercount() returns the number parameters that have been passed into the function call.  This is helpful if the incoming parameters are unknown.
integer i
for i = 1 to parametercount() 
begin
   // process each parameter
end

 

  • parametertype(integer) returns the type of the parameter for the given parameter position. The first parameter is 1. The return value will either be integer, real, date, string or Boolean.
  • parameterstring(integer), parameterinteger(integer), parameterboolean(integer), parameterdate(integer), parameterreal(integer) these functions return the value of the parameter as specified by position, or null if the parameter doesn’t exist. You can use these functions if you know the incoming parameter type at a given position.
  • parameter(integer) returns the value of the parameter as specified by position, or null if the parameter doesn’t exist. If you don’t know the incoming parameter type you can use this function. Note: Using the parameter() function may require additional overhead to coerce the values to the field type. Using the specific data type parameter functions above will eliminate the cost of coercion.

Global Function Example

This global function will check if password rules are followed.

//////////////////////////////////////////////////////////////////////////
// Function:     check_psw_rule
// Inputs:       string
// Output:       boolean -> true == passed check; false == failed check
// Description:  Check the rules for password. The rules are:
//               Need to be at least 8 characters long
//               Need to have at least one lower case character
//               Need to have at least one upper case character
//               Need to have at least one number
//////////////////////////////////////////////////////////////////////////
function check_psw_rule return boolean
	string check_str
	boolean rc
	regex r
 
	check_str= parameterstring(1)   //copy input parameter to variable
 
	rc= false                       //set default return value to failed (false)
	if(len(check_str) < 8)          //check if at least 8 characters
		return rc
	r.compile("[a-z]")              
	if (!r.findfirst(check_str))    //check if at least one lower case character
		return rc
	r.compile("[A-Z]")
	if (!r.findfirst(check_str))    //check if at least one upper case character
		return rc
	r.compile("[0-9]")
	if (!r.findfirst(check_str))    //check if at least one number
		return rc
	rc= true                        //return true if all checks passed
	return rc
end function

 

This function can be called from any expression in a Data Management job:

boolean  check_result
check_result= check_psw_rule(password)

Global function can also call other global function

Just a few things to be aware of. There is a late binding process, which means if function B() wants to call function A(), then function A() needs to be loaded first. The files global functions are stored in are loaded alphabetically by file name. This means the file name containing function A() has to occurs alphabetically before file name containing function B().

Best Practices

Here are some best practice tips which will help you to be most successful writing global functions:

  1. Create one file per expression function.
    This allows for global functions to easily be deployed and shared.
  2. Use lots of comments.
    Describe what the function’s purpose, expected parameters, and outputs and improve the readability and reusability of your code
  3. Test the expressions in data jobs first.
    Write a global function body as an expression first and test it via preview. This way it is easier to find typos, syntax errors and to ensure that the code is doing what you would like it to do.
  4. Debugging - If the global function is not loading, check the platform_date.log.  For Studio, this could for example be found under: C:\Users\<your_id>\AppData\Roaming\DataFlux\DMStudio\studio1

You now have a taste of how to create reusable functions in Data Management Studio to help you both improve the quality of your data as well as improve the productivity of your data professionals. Good luck and please let us know what kind of jobs you are using to help your organization succeed.

Writing your own functions in SAS Data Quality using Dataflux Data Management Studio was published on SAS Users.

6月 212017
 

IT organizations today are constantly challenged to do more with less. Reusing data processing jobs and employing best practices in monitoring the health of your data are proven ways to improve the productivity of data professionals. Dataflux Data Management Studio is a component of both the SAS Data Quality and the SAS Data Management offerings that allows you to create data processing jobs to integrate, cleanse and monitor your data.

You can write global functions for SAS Data Management jobs that can be reused in any expression in the system, in either data or process flow jobs. Global functions can be called from expression nodes, monitor rules, profile filters, surviving record indicators, process flow if-nodes and more.

Global functions are defined in a text file and saved in the Data Management installation directory under “etc/udf” in Data Management Studio or Data Management Server respectively.

Each global function has to have a unique name and is wrapped with a function / end function block code, can process any number of input parameters and returns a single value of either integer, real, date, string or boolean type.

Hello World

For a start, let’s create a “hello world” function.

  • If it does not exist, create a folder in the installation directory under “etc/udf” (DM Studio and DM Server).
  • In “etc/udf” create a file named hello_world.txt.
  • In the hello_word file create the function as follows:
function hello_world return string
return “hello world!”
end function
  • Save the file and restart DM Studio, if necessary, in order to use hello_world().

The new function is fully integrated in Data Management Studio. You can see the new function in an expression node under Function->Other or as expression language help in the expression node editor.

Handling Parameters

Global functions can handle any number of parameters. Parameter helper functions are available to access input parameters inside a function:

  • paramatercount() returns the number parameters that have been passed into the function call.  This is helpful if the incoming parameters are unknown.
integer i
for i = 1 to parametercount() 
begin
   // process each parameter
end

 

  • parametertype(integer) returns the type of the parameter for the given parameter position. The first parameter is 1. The return value will either be integer, real, date, string or Boolean.
  • parameterstring(integer), parameterinteger(integer), parameterboolean(integer), parameterdate(integer), parameterreal(integer) these functions return the value of the parameter as specified by position, or null if the parameter doesn’t exist. You can use these functions if you know the incoming parameter type at a given position.
  • parameter(integer) returns the value of the parameter as specified by position, or null if the parameter doesn’t exist. If you don’t know the incoming parameter type you can use this function. Note: Using the parameter() function may require additional overhead to coerce the values to the field type. Using the specific data type parameter functions above will eliminate the cost of coercion.

Global Function Example

This global function will check if password rules are followed.

//////////////////////////////////////////////////////////////////////////
// Function:     check_psw_rule
// Inputs:       string
// Output:       boolean -&gt; true == passed check; false == failed check
// Description:  Check the rules for password. The rules are:
//               Need to be at least 8 characters long
//               Need to have at least one lower case character
//               Need to have at least one upper case character
//               Need to have at least one number
//////////////////////////////////////////////////////////////////////////
function check_psw_rule return boolean
	string check_str
	boolean rc
	regex r
 
	check_str= parameterstring(1)   //copy input parameter to variable
 
	rc= false                       //set default return value to failed (false)
	if(len(check_str) &lt; 8)          //check if at least 8 characters
		return rc
	r.compile("[a-z]")              
	if (!r.findfirst(check_str))    //check if at least one lower case character
		return rc
	r.compile("[A-Z]")
	if (!r.findfirst(check_str))    //check if at least one upper case character
		return rc
	r.compile("[0-9]")
	if (!r.findfirst(check_str))    //check if at least one number
		return rc
	rc= true                        //return true if all checks passed
	return rc
end function

 

This function can be called from any expression in a Data Management job:

boolean  check_result
check_result= check_psw_rule(password)

Global function can also call other global function

Just a few things to be aware of. There is a late binding process, which means if function B() wants to call function A(), then function A() needs to be loaded first. The files global functions are stored in are loaded alphabetically by file name. This means the file name containing function A() has to occurs alphabetically before file name containing function B().

Best Practices

Here are some best practice tips which will help you to be most successful writing global functions:

  1. Create one file per expression function.
    This allows for global functions to easily be deployed and shared.
  2. Use lots of comments.
    Describe what the function’s purpose, expected parameters, and outputs and improve the readability and reusability of your code
  3. Test the expressions in data jobs first.
    Write a global function body as an expression first and test it via preview. This way it is easier to find typos, syntax errors and to ensure that the code is doing what you would like it to do.
  4. Debugging - If the global function is not loading, check the platform_date.log.  For Studio, this could for example be found under: C:\Users\<your_id>\AppData\Roaming\DataFlux\DMStudio\studio1

You now have a taste of how to create reusable functions in Data Management Studio to help you both improve the quality of your data as well as improve the productivity of your data professionals. Good luck and please let us know what kind of jobs you are using to help your organization succeed.

Writing your own functions in SAS Data Quality using Dataflux Data Management Studio was published on SAS Users.