4月 102017

Earth is an explosive world! Data from the Smithsonian Institution's Global Volcanism Program (GVP) documents Earth's volcanoes and their eruptive history over the past 10,000 years. The GVP database includes the names, locations, types, and features of more than 1,500 volcanoes. Let's look closer into volcanic eruptions across the globe [...]

How to design an infographic about volcanic eruptions using SAS Visual Analytics was published on SAS Voices by Falko Schulz

4月 092017

Thanks to a new open source project from SAS, Python coders can now bring the power of SAS into their Python scripts. The project is SASPy, and it's available on the SAS Software GitHub. It works with SAS 9.4 and higher, and requires Python 3.x.

I spoke with Jared Dean about the SASPy project. Jared is a Principal Data Scientist at SAS and one of the lead developers on SASPy and a related project called Pipefitter. Here's a video of our conversation, which includes an interactive demo. Jared is obviously pretty excited about the whole thing.

Use SAS like a Python coder

SASPy brings a "Python-ic" sensibility to this approach for using SAS. That means that all of your access to SAS data and methods are surfaced using objects and syntax that are familiar to Python users. This includes the ability to exchange data via pandas, the ubiquitous Python data analysis framework. And even the native SAS objects are accessed in a very "pandas-like" way.

import saspy
import pandas as pd
sas = saspy.SASsession(cfgname='winlocal')
cars = sas.sasdata("CARS","SASHELP")

The output is what you expect from pandas...but with statistics that SAS users are accustomed to. PROC MEANS anyone?

In[3]: cars.describe()
       Variable Label    N  NMiss   Median          Mean        StdDev  
0         MSRP     .   428      0  27635.0  32774.855140  19431.716674   
1      Invoice     .   428      0  25294.5  30014.700935  17642.117750   
2   EngineSize     .   428      0      3.0      3.196729      1.108595   
3    Cylinders     .   426      2      6.0      5.807512      1.558443   
4   Horsepower     .   428      0    210.0    215.885514     71.836032   
5     MPG_City     .   428      0     19.0     20.060748      5.238218   
6  MPG_Highway     .   428      0     26.0     26.843458      5.741201   
7       Weight     .   428      0   3474.5   3577.953271    758.983215   
8    Wheelbase     .   428      0    107.0    108.154206      8.311813   
9       Length     .   428      0    187.0    186.362150     14.357991   

       Min       P25      P50      P75       Max  
0  10280.0  20329.50  27635.0  39215.0  192465.0  
1   9875.0  18851.00  25294.5  35732.5  173560.0  
2      1.3      2.35      3.0      3.9       8.3  
3      3.0      4.00      6.0      6.0      12.0  
4     73.0    165.00    210.0    255.0     500.0  
5     10.0     17.00     19.0     21.5      60.0  
6     12.0     24.00     26.0     29.0      66.0  
7   1850.0   3103.00   3474.5   3978.5    7190.0  
8     89.0    103.00    107.0    112.0     144.0  
9    143.0    178.00    187.0    194.0     238.0  

SASPy also provides high-level Python objects for the most popular and powerful SAS procedures. These are organized by SAS product, such as SAS/STAT, SAS/ETS and so on. To explore, issue a dir() command on your SAS session object. In this example, I've created a sasstat object and I used dot<TAB> to list the available SAS analyses:

SAS/STAT object in SASPy

The SAS Pipefitter project extends the SASPy project by providing access to advanced analytics and machine learning algorithms. In our video interview, Jared presents a cool example of a decision tree applied to the passenger survival factors on the Titanic. It's powered by PROC HPSPLIT behind the scenes, but Python users don't need to know all of that "inside baseball."

Installing SASPy and getting started

Like most things Python, installing the SASPy package is simple. You can use the pip installation manager to fetch the latest version:

pip install saspy

However, since you need to connect to a SAS session to get to the SAS goodness, you will need some additional files to broker that connection. Most notably, you need a few Java jar files that SAS provides. You can find these in the SAS Deployment Manager folder for your SAS installation:

The jar files are compatible between Windows and Unix, so if you find them in a Unix SAS install you can still copy them to your Python Windows client. You'll need to modify the sascgf.py file (installed with the SASPy package) to point to where you've stashed these. If using local SAS on Windows, you also need to make sure that the sspiauth.dll is in your Windows system PATH. The easiest method to add SASHOMESASFoundation9.4coresasexe to your system PATH variable.

All of this is documented in the "Installation and Configuration" section of the project documentation. The connectivity options support an impressively diverse set of SAS configs: Windows, Unix, SAS Grid Computing, and even SAS on the mainframe!

Download, comment, contribute

SASPy is an open source project, and all of the Python code is available for your inspection and improvement. The developers at SAS welcome you to give it a try and enter issues when you see something that needs to be improved. And if you're a hotshot Python coder, feel free to fork the project and issue a pull request with your suggested changes!

The post Introducing SASPy: Use Python code to access SAS appeared first on The SAS Dummy.

4月 082017

from https://github.com/m2dsupsdlclass/lectures-labs
LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. (1998). Gradient-based learning applied to document recognition. LeNet
Simonyan, Karen, and Zisserman. "Very deep convolutional networks for large-scale image recognition." (2014) VGG-16
Simplified version of Krizhevsky, Alex, Sutskever, and Hinton. "Imagenet classification with deep convolutional neural networks." NIPS 2012 AlexNet
He, Kaiming, et al. "Deep residual learning for image recognition." CVPR. 2016. ResNet
Szegedy, et al. "Inception-v4, inception-resnet and the impact of residual connections on learning." (2016)
Canziani, Paszke, and Culurciello. "An Analysis of Deep Neural Network Models for Practical Applications." (May 2016).

classification and localization
Redmon, Joseph, et al. "You only look once: Unified, real-time object detection." CVPR (2016)
Liu, Wei, et al. "SSD: Single shot multibox detector." ECCV 2016
Girshick, Ross, et al. "Fast r-cnn." ICCV 2015
Ren, Shaoqing, et al. "Faster r-cnn: Towards real-time object detection with region proposal networks." NIPS 2015
Redmon, Joseph, et al. "YOLO9000, Faster, Better, Stronger." 2017

Long, Jonathan, et al. "Fully convolutional networks for semantic segmentation." CVPR 2015
Noh, Hyeonwoo, et al. "Learning deconvolution network for semantic segmentation." ICCV 2015
Pinheiro, Pedro O., et al. "Learning to segment object candidates" / "Learning to refine object segments", NIPS 2015 / ECCV 2016
Li, Yi, et al. "Fully Convolutional Instance-aware Semantic Segmentation." Winner of COCO challenge 2016.

弱监督学习 Weak supervision
Joulin, Armand, et al. "Learning visual features from large weakly supervised data." ECCV, 2016
Oquab, Maxime, "Is object localization for free? – Weakly-supervised learning with convolutional neural networks", 2015

Self-supervised learning
Doersch, Carl, Abhinav Gupta, and Alexei A. Efros. "Unsupervised visual representation learning by context prediction." ICCV 2015.

Ren, Mengye, et al. "Normalizing the Normalizers: Comparing and Extending Network Normalization Schemes." 2017
Salimans, Tim, and Diederik P. Kingma. "Weight normalization: A simple reparameterization to accelerate training of deep neural networks." NIPS 2016.
Ba, Jimmy Lei, Jamie Ryan Kiros, and Geoffrey E. Hinton. "Layer normalization." 2016.
Ioffe, Sergey, and Christian Szegedy. "Batch normalization: Accelerating deep network training by reducing internal covariate shift." ICML 2015
Understanding deep learning requires rethinking generalization, C. Zhang et al., 2016.
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima, N. S. Keskar et al., 2016
1. A strong optimizer is not necessarily a strong learner.
2. DL optimization is non-convex but bad local minima and saddle structures are rarely a problem (on common DL tasks).
3. Neural Networks are over-parametrized but can still generalize.
4. Stochastic Gradient is a strong implicit regularizer.
5. Variance in gradient can help with generalization but can hurt final convergence.
6. We need more theory to guide the design of architectures and optimizers that make learning faster with fewer labels.
7. Overparametrize deep architectures
8. Design architectures to limit conditioning issues:
(1)Use skip / residual connections
(2)Internal normalization layers
(3)Use stochastic optimizers that are robust to bad conditioning
9. Use small minibatches (at least at the beginning of optimization)
10. Use validation set to anneal learning rate and do early stopping
11. Is it very often possible to trade more compute for less overfitting with data augmentation and stochastic regularizers (e.g. dropout).
12. Collecting more labelled data is the best way to avoid overfitting.

 Posted by at 4:04 下午
4月 072017

“This conference has been one of my best because I’ve learned about GatherIQ, a social, good cause initiative that’s made me think about the bigger picture, and how I can help people who need help,” said a SAS Global Forum attendee who heard about SAS’ new data-for-good crowdsourcing app at [...]

Using data for good with GatherIQ was published on SAS Voices by Becky Graebe

4月 072017

“This conference has been one of my best because I’ve learned about GatherIQ, a social, good cause initiative that’s made me think about the bigger picture, and how I can help people who need help,” said a SAS Global Forum attendee who heard about SAS’ new data-for-good crowdsourcing app at [...]

Using data for good with GatherIQ was published on SAS Voices by Becky Graebe

4月 062017

For those of you who don't have SAS/Graph's Proc GMap, I recently showed how to 'fake' a variety of maps using Proc SGplot polygons. So far I've written blogs on creating: pretty maps, gradient shaded choropleth maps, and maps with markers at zip codes. And now (by special request from [...]

The post Drawing paths on a map using SGplot appeared first on SAS Learning Post.

4月 062017

The widespread adoption of the term "analytics" reminds me of the evolution of the term "supply chain management." Initially the term focused on supply chain planning. It involved demand and supply balancing and the heuristics and optimization tools that came out of advanced planning and scheduling. Over time practically everything was included [...]

When everything is analytics, nothing is analytics...? was published on SAS Voices by Scott Nalick

4月 052017


1. siftscience




盗账户,支付欺诈,垃圾内容, 账户冒用,营销资金冒用,设备指纹





2. forter



服务: All E-Commerce Needs Marketplaces,Digital Goods,Services,Physical Goods,Travel ,Mobile (SDK & API),Alternative Payments

技术: Machine Learning with a Human Touch, Understanding the Context of a Transaction, Real-Time Approve/Decline Decision



3. datavisor






yelp, momo,唱吧等


https://www.datavisor.com/blog/, 介绍datavisor在大数据风控的技术,产品和架构,机器学习,规则引擎和决策平台等;

4. patternex



提供的服务: 数据分析,盗账户, 人工智能风控助理 基于大数据分析驱动人工智能,提供大数据风控服务;



 Posted by at 8:55 下午
4月 052017

Most regression models try to model a response variable by using a smooth function of the explanatory variables. However, if the data are generated from some nonsmooth process, then it makes sense to use a regression function that is not smooth. A simple way to model a discontinuous process in SAS is to use spline effects and specify repeated value for the knots.

Discontinuous processes: More common than you might think

The classical ANOVA is one way to analyze data that are collected before and after a known event. For example, you might record gas mileage for a car before and after a tune-up. You might collect patient data before and after they undergo a medical or surgical treatment. You might have data about real estate prices before and after some natural disaster. In all these cases, you might suspect that the response changes abruptly because of the event.

To give a simple example, suppose that a driver records the fuel economy (in mile per gallon) for a car for 12 weeks. Because the car engine is coughing and knocking, the owner brings the car to a mechanic for maintenance. After the maintenance, the car seems to run better and he records the fuel economy for another six weeks. The hypothetical data are below:

data MPG;
input mpg @@;
week = _N_;
period = ifc(week > 12, "After ", "Before");
label mpg="Miles per Gallon";
30.5 28.1 27.1 31.2 25.2 31.1 27.7 28.2 29.6 30.6 28.9 25.9 
30.6 33.0 31.2 29.7 32.7 31.1 

Notice that the data contains a binary indicator variable (period) that records whether the data are from before or after the tune-up. You can use PROC GLM to perform a simple ANOVA analysis to determine whether there was a significant change in the mean fuel economy after the maintenance. The following call to PROC GLM runs an ANOVA and indicates that the mean fuel economy is about 2.7 mpg better after the tune-up.

proc glm data=MPG plots=fitplot;
   class period / ref=first;
   model mpg = period /solution;
   output out=out predicted=Pred;

Graphically, this regression analysis is usually visualized by using two box plots. (PROC GLM creates the box plots automatically when ODS graphics are enabled.) However, because the independent variable is time, you could also use a series plot to show the observed data and the mean response before and after the maintenance. By using the GROUP= option on the SERIES statement, you can get two lines for the "before" and "after" time periods.

title "Piecewise Constant Regression with Jump Discontinuity";
proc sgplot data=Out;
   block x=week block=period / transparency=0.8;
   scatter x=week y=mpg / markerattrs=(symbol=CircleFilled color=black);
   series x=week y=pred / group=period lineattrs=(thickness=3) ;
Piecewise constant regression function with jump discontinuity

The graph shows that the model has a jump discontinuity at the time point at which the maintenance intervention occurred. If you include the WEEK variable in the analysis, you can model the response as a linear function of time, with a jump at the time of the tune-up.

All this is probably very familiar. However, did you know that you can use splines to model the data as a continuous function that has a kink or "corner" at the time of the maintenance event? You can use this feature when the model is continuous, but the slope changes at a known time.

Splines for nonsmooth models

Several SAS procedures support the EFFECT statement, which enables you to build spline effects. The paper "Rediscovering SAS/IML Software" (Wicklin 2010, p. 4) has an example where splines are used to construct a highly nonlinear curve for a scatter plot.

A spline effect is determined by the placement of certain points called "knots." Often knots are evenly spaced within the range of the explanatory variable, but the EFFECT statement supports many other ways to position the knots. In fact, the documentation for the EFFECT statement says: "If you remove the restriction that the knots of a spline must be distinct and allow repeated knots, then you can obtain functions with less smoothness and even discontinuities at the repeated knot location. For a spline of degree d and a repeated knot with multiplicity md, the piecewise polynomials that join such a knot are required to have only d – m matching derivatives."

The degree of a linear regression is d=1, so if you specify a knot position once you obtain a piecewise linear function that contains a "kink" at the knot. The following call to PROC GLIMMIX demonstrates this technique. (I use GLIMMIX because neither PROC GLM nor PROC GENMOD support the EFFECT statement.) You can manually specify the position of knots by using the KNOTMETHOD=LIST(list) option on the EFFECT statement.

proc glimmix data=MPG;
   effect spl = spline(week / degree=1 knotmethod=list(1 13 18));     /* p-w linear */
   *effect spl = spline(week / degree=2 knotmethod=list(1 13 13 1); /*p-w quadratic */
   model mpg = spl / solution;  
   output out=out predicted=Pred;
title "Piecewise Linear Regression with Kink";
proc sgplot data=Out noautolegend;
   block x=week block=period / transparency=0.8;
   scatter x=week y=mpg / markerattrs=(symbol=CircleFilled color=black);
   series x=week y=pred / lineattrs=(thickness=3) ;
Piecewise Linear Regression with Kink

The graph shows that the model is piecewise linear, but that the slope of the model changes at week=13. In contrast, the second EFFECT statement in the PROC GLIMMIX code (which is commented out), specifies piecewise quadratic polynomials (d=2) and repeats the knot at week=13. That results in two quadratic models that give the same predicted value at week=13 but the model is not smooth at that location. Try it out!

If you are using a SAS procedure that does not support the EFFECT statement, you can use the GLMMIX procedure to output the dummy variables that are associated with the spline effects. A nice paper by David Pasta (2003) describes how to use dummy variables in a variety of models. The paper was written before the EFFECT statement; many of the ideas in the paper are easier to implement by using the EFFECT statement.

Lastly, the TRANSREG procedure in SAS supports spline effects but has its own syntax. See the TRANSREG documentation, which includes an example of repeating knots to build a regression model for discontinuous data.

Have you ever needed to construct a nonsmooth regression model? Tell your story by leaving a comment.

The post Nonsmooth models and spline effects appeared first on The DO Loop.