6月 142017
 

There's been a lot of talk in the media lately about the death of retail. Every week, it seems like another retailer announces the closing of stores, acquisitions or even going out of business. Many relate it to the growing competitive landscape with the convenience of online shopping and lure [...]

Retail -- more alive than ever was published on SAS Voices by Brittany Bullard

6月 142017
 

In a previous article, I showed two ways to define a log-likelihood function in SAS. This article shows two ways to compute maximum likelihood estimates (MLEs) in SAS: the nonlinear optimization subroutines in SAS/IML and the NLMIXED procedure in SAS/STAT. To illustrate these methods, I will use the same data sets from my previous post. One data set contains binomial data, the other contains data that are lognormally distributed.

Maximum likelihood estimates for binomial data from SAS/IML

I previously wrote a step-by-step description of how to compute maximum likelihood estimates in SAS/IML. SAS/IML contains many algorithms for nonlinear optimization, including the NLPNRA subroutine, which implements the Newton-Raphson method.

In my previous article I used the LOGPDF function to define the log-likelihood function for the binomial data. The following statements define bounds for the parameter (0 < p < 1) and provides an initial guess of p0=0.5:

/* Before running program, create Binomial and LN data sets from previous post */
 
/* Example 1: MLE for binomial data */
/* Method 1: Use SAS/IML optimization routines */
proc iml;
/* log-likelihood function for binomial data */
start Binom_LL(p) global(x, NTrials);
   LL = sum( logpdf("Binomial", x, p, NTrials) );
   return( LL );
finish;
 
NTrials = 10;    /* number of trials (fixed) */
use Binomial; read all var "x"; close;
 
/* set constraint matrix, options, and initial guess for optimization */
con = { 0,      /* lower bounds: 0 < p     */
        1};     /* upper bounds:     p < 1 */
opt = {1,       /* find maximum of function   */
       2};      /* print some output      */
p0  = 0.5;      /* initial guess for solution */
call nlpnra(rc, p_MLE, "Binom_LL", p0, opt, con);
print p_MLE;
Maximum likelihood estimate for binomial data

The NLPNRA subroutine computes that the maximum of the log-likelihood function occurs for p=0.56, which agrees with the graph in the previous article. We conclude that the parameter p=0.56 (with NTrials=10) is "most likely" to be the binomial distribution parameter that generated the data.

Maximum likelihood estimates for binomial data from PROC NLMIXED

If you've never used PROC NLMIXED before, you might wonder why I am using that procedure, since this problem is not a mixed modeling regression. However, you can use the NLMIXED procedure for general maximum likelihood estimation. In fact, I sometimes joke that SAS could have named the procedure "PROC MLE" because it is so useful for solving maximum likelihood problems.

PROC NLMIXED has built-in support for computing maximum likelihood estimates of data that follow the Bernoulli (binary), binomial, Poisson, negative binomial, normal, and gamma distributions. (You can also use PROC GENMOD to fit these distributions; I have shown an example of fitting Poisson data.)

The syntax for PROC NLMIXED is very simple for the Binomial data. You use the PARMS statement to supply an initial guess for the parameter p. On the MODEL statement, you declare that you want to model the X variable as Binom(p) where NTrials=10. Be sure to ALWAYS check the documentation for the correct syntax for a binomial distribution. Some functions like PDF, CDF, and RAND use the p parameter as the first argument: Binom(p, NTrials). Some procedure (like the MCMC and NLMIXED) use the p parameter as the second argument: Binom(NTrials, p).

/* Method 2: Use PROC NLMIXED solve using built-in modeling syntax */
proc nlmixed data=Binomial;
   parms p = 0.5;             * initial value for parameter;
   NTrials = 10;
   model x ~ binomial(NTrials, p);
run;
Maximum likelihood estimates for binomial data by using PROC NLMIXED in SAS

Notice that the output from PROC NLMIXED contains the parameter estimate, standard error, and 95% confidence intervals. The parameter estimate is the same value (0.56) as was found by the NLPNRA routine in SAS/IML. The confidence interval confirms what we previously saw in the graph of the log-likelihood function: the function is somewhat flat near the optimum, so a 95% confidence interval is wide: [0.49, 0.63].

Maximum likelihood estimates for lognormal data

You can use similar syntax to compute MLEs for lognormal data. The SAS/IML syntax is similar to the binomial example, so it is omitted. To view it, download the complete SAS program that computes these maximum likelihood estimates.

PROC NLMIXED does not support the lognormal distribution as a built-in distribution, which means that you need to explicitly write out the log-likelihood function and specify it in the GENERAL function on the MODEL statement. Whereas in SAS/IML you have to use the SUM function to sum the log-likelihood over all observations, the syntax for PROC NLMIXED is simpler. Just as the DATA step has an implicit loop over all observations, the NLMIXED procedure implicitly sums the log-likelihood over all observations. You can use the LOGPDF function, or you can explicitly write the log-density formula for each observation.

If you look up the lognormal distribution in the list of "Standard Definition" in the PROC MCMC documentation, you will see that one parameterization of the lognormal PDF in terms of the log-mean μ and log-standard-deviation σ is
f(x; μ, σ) = (1/(sqrt(2π σ x) exp(-(log(x)-μ)**2 / (2σ**2))
When you take the logarithm of this quantity, you get two terms, or three if you use the rules of logarithms to isolate quantities that do not depend on the parameters:

proc nlmixed data=LN;
parms mu 1 sigma 1;                 * initial values of parameters;
bounds 0 < sigma;                   * bounds on parameters;
sqrt2pi = sqrt(2*constant('pi'));
LL = -log(sigma) 
     - log(sqrt2pi*x)               /* this term is constant w/r/t (mu, sigma) */
     - (log(x)-mu)**2  / (2*sigma**2);
/* Alternative: LL = logpdf("Lognormal", x, mu, sigma); */
model x ~ general(LL);
run;
Maximum likelihood estimates for lognormal data by using PROC NLMIXED in SAS

The parameter estimates are shown, along with standard errors and 95% confidence intervals. The maximum likelihood estimates for the lognormal data are (μ, σ) = (1.97, 0.50). You will get the same answer if you use the LOGPDF function (inside the comment) instead of the "manual calculation." You will also get the same estimates if you omit the term log(sqrt2pi*x) because that term does not depend on the MLE parameters.

In conclusion, you can use nonlinear optimization in the SAS/IML language to compute MLEs. This approach is especially useful when the computation is part of a larger computational program in SAS/IML. Alternatively, the NLMIXED procedure makes it easy to compute MLEs for discrete and continuous distributions. For some simple distributions, the log-likelihood functions are built into PROC NLMIXED. For others, you can specify the log likelihood yourself and find the maximum likelihood estimates by using the GENERAL function.

The post Two ways to compute maximum likelihood estimates in SAS appeared first on The DO Loop.

6月 142017
 

In SAS Viya 3.2, the Self-Service Import provides a mechanism for a user to import (copy) data into the SAS Cloud Analytic Services (CAS) environment. The data is copied as a .sashdat file into the selected CAS Library location when it is imported.  Self-Service Import data can only be imported into CAS libraries of type PATH, HDFS, or DNFS.

The Self-Service Import functionality is available in the following applications:

  • SAS Visual Data Builder
  • SAS Visual Analytics
  • SAS Environment Manager – Data

To have access to Self-Service Import, the end user must be granted the Read permission on the /casManagement_capabilities/importData object URI in the Security ⇨ Rules area of SAS Environment Manager.

Self-Service Import supports importing data to CAS from Local, Server, and Social Media sources.

Self-Service Import in SAS Viya

SAS Viya 3.2: Self-Service Import

Local

Local file data can be imported from Microsoft Excel (.XLSX or .XLS), text file (.TXT or .CSV), the clipboard, or a SAS Data Set (SASHDAT or SAS7BDAT). The file(s) must exist on a file system available to your PC.

Server

After providing the appropriate server connection information, a table from LASR or select database types can be imported. The currently supported database types are:  Oracle, Teradata, Hadoop, PostgreSQL, and Impala. The Server selections displayed are dependent on your licensing and configuration.

Social Media

After authentication with the social media provider (Twitter, Facebook, Google Analytics, or YouTube), data can be imported through the social media provider’s public API. Access to these APIs is subject to the social media provider’s applicable license terms, terms of use, and other usage terms and policies.

Currently, there is a size limit for file imports that is set on the CAS Management service Configuration screen in SAS Environment Manager. The default size is 4GB. The local file importer has a 4GB limit because that is what the smallest size limit browser (Internet Explorer) is restricted to; however, Chrome and other browsers will allow larger file sizes, which is why there is a property that allows an Admin to set a higher limit. A modification to the max-file-size property requires a restart of the casManagement service.

Social Media and DBMS importers have no explicit limits. However, there is a limitation of the disk size of where casManagement is running because the uploaded file gets written to a temporary file on the server relative to casManagement.

For more information refer to the Self-Service Import section of the The Self-Service Import in SAS Viya 3.2 was published on SAS Users.

6月 132017
 
Keras is a very convenient tool to build your deep learning model from scratch, it is so easy to use that it almost becomes the de-facto deep learning modeling framework in Kaggle competition.

Keras used to support only TensorFlow and Theano, now, CNTK, now rebranding as Cognitive Toolkit in full name, from Microsoft is joining as a new backend choice, but you need to download Microsoft private fork of Keras at this moment while Microsoft is working with Keras author to fully integrate CNTK as its backend.

CNTK is a very good deep learning tool: 1. it is super fast especially for RNN type of model; 2. it scales well across multiple GPUs Here is our own speed comparsion (number in sec) on Windows platform using NVidia Titan Xp, YMMV though:
If you want to give CNTK a try, you can follow the installation instruction @ here. To install Keras that supports CNTK, follow the instructions @here.
 Posted by at 9:44 上午
6月 122017
 
深度学习课程 http://www.samuelcheng.info/deeplearning_2017/
docker入门到实践 https://www.gitbook.com/book/yeasy/docker_practice/details
LONG SHORT-TERM MEMORY http://www.bioinf.jku.at/publications/older/2604.pdf
http://seanlook.com/tags/docker/
docker run -ti --volume=$(pwd):/workspace caffe:cpu bash  启动docker 映射工作目录  docker环境切换到工作环境目录,先ctrl p,然后ctrl q;
容器生命周期管理 — docker [run|start|stop|restart|kill|rm|pause|unpause]
容器操作运维 — docker [ps|inspect|top|attach|events|logs|wait|export|port]
容器rootfs命令 — docker [commit|cp|diff]
镜像仓库 — docker [login|pull|push|search]
本地镜像管理 — docker [images|rmi|tag|build|history|save|import]
其他命令 — docker [info|version]
https://huangying-zhan.github.io/ faster rcnn , fast rcnn, rcnn
Transferrable Representations for Visual Recognition https://www2.eecs.berkeley.edu/Pubs/TechRpts/2017/EECS-2017-106.pdf

https://yahooeng.tumblr.com/post/151148689421/open-sourcing-a-deep-learning-solution-for
1. macos 安装 docker,下载安装文件,https://store.docker.com/editions/community/docker-ce-desktop-mac 直接安装
2. 查看docker版本 docker --version
Docker version 17.03.1-ce, build c6d412e
3. 测试web服务
docker run -d -p 80:80 --name webserver nginx, 打开localhost网页
4. 下载nsfw代码,git clone https://github.com/yahoo/open_nsfw
5. 进入nsfw代码目录,cd open_nsfw/
6. 下载caffe的docker文件 wget https://github.com/BVLC/caffe/raw/master/docker/cpu/Dockerfile
7. 编译安装caffe镜像,docker build -t caffe:cpu ./
8. 启动docker
docker run -ti caffe:cpu caffe --version
9. 映射工作目录
docker run -ti --volume=$(pwd):/workspace caffe:cpu bash
10. 测试黄图识别
wget http://image.tianjimedia.com/uploadImages/2015/288/26/R99Q7A2345V5.jpg
mv R99Q7A2345V5.jpg test3.jpg
python ./classify_nsfw.py \
--model_def nsfw_model/deploy.prototxt \
--pretrained_model nsfw_model/resnet_50_1by2_nsfw.caffemodel \
test3.jpg
运行日志
I0605 12:36:59.237032    11 upgrade_proto.cpp:77] Attempting to upgrade batch norm layers using deprecated params: nsfw_model/resnet_50_1by2_nsfw.caffemodel
I0605 12:36:59.237094    11 upgrade_proto.cpp:80] Successfully upgraded batch norm layers using deprecated params.
I0605 12:36:59.242766    11 net.cpp:744] Ignoring source layer loss
NSFW score:   0.970513343811

https://github.com/alex-paterson/Barebones-Flask-and-Caffe-Classifier  


 
 Posted by at 6:05 下午
6月 122017
 

Maximum likelihood estimation (MLE) is a powerful statistical technique that uses optimization techniques to fit parametric models. The technique finds the parameters that are "most likely" to have produced the observed data. SAS provides many tools for nonlinear optimization, so often the hardest part of maximum likelihood is writing down the log-likelihood function. This article shows two simple ways to construct the log-likelihood function in SAS. For simplicity, this article describes fitting the binomial and lognormal distributions to univariate data.

Always use the log-likelihood function!

Although the method is known as maximum likelihood estimation, in practice you should optimize the log-likelihood function, which is numerically superior to work with. For an introduction to MLE, including the definitions of the likelihood and log-likelihood functions, see the Penn State Online Statistics Course, which is a wonderful reference.

MLE assumes that the observed data x={x1, x2, ..., xn} are independently drawn from some population. You want to find the most likely parameters θ = (θ1,...,θk) such that the data are fit by the probability density function (PDF), f(x; θ). Since the data are independent, the probability of observing the data is the product Πi f(xi, θ), which is the likelihood function L(θ | x). If you take the logarithm, the product becomes a sum. The log-likelihood function is
LL(θ | x) = Σi log( f(xi, θ) )

This formula is the key. It says that the log-likelihood function is simply the sum of the log-PDF function evaluated at the data values. Always use this formula. Do not ever compute the likelihood function (the product) and then take the log, because the product is prone to numerical errors, including overflow and underflow.

Two ways to construct the log-likelihood function

There are two simple ways to construct the log-likelihood function in SAS:

Example: The log-likelihood function for the binomial distribution

A coin was tossed 10 times and the number of heads was recorded. This was repeated 20 times to get a sample. A student wants to fit the binomial model X ~ Binom(p, 10) to estimate the probability p of the coin landing on heads. For this problem, the vector of MLE parameters θ is merely the one parameter p.

Recall that if you are using SAS/IML to optimize an objective function, the parameter that you are trying to optimize should be the only argument to the function, and all other parameters should be specified on the GLOBAL statement. Thus one way to write a SAS/IML function for the binomial log-likelihood function is as follows:

proc iml;
/* Method 1: Use LOGPDF. This method works in DATA step as well */
start Binom_LL1(p) global(x, NTrials);
   LL = sum( logpdf("Binomial", x, p, NTrials) );
   return( LL );
finish;
 
/* visualize log-likelihood function, which is a function of p */
NTrials = 10;    /* number of trials (fixed) */
use Binomial; read all var "x"; close;
 
p = do(0.01, 0.99, 0.01);      /* vector of parameter values */
LL = j(1, ncol(p), .);
do i = 1 to ncol(LL);
   LL[i] = Binom_LL1( p[i] );  /* evaluate LL for a sequence of p */
end;
 
title "Graph of Log-Likelihood Function";
title2 "Binomial Distribution, NTrials=10";
call series(p, LL) grid={x y} xvalues=do(0,1,0.1)
                   label={"Probability of Sucess (p)", "Log Likelihood"};
Graph of log-likelihood function for the binomial distribution

Notice that the data is constant and does not change. The log likelihood is considered to be a function of the parameter p. Therefore you can graph the function for representative values of p, as shown. The graph clearly shows that the log likelihood is maximal near p=0.56, which is the maximum likelihood estimate. The graph is fairly flat near its optimal value, which indicates that the estimate has a wide standard error. A 95% confidence interval for the parameter is also wide. If the sample contained 100 observations instead of only 20, the log-likelihood function might have a narrower peak.

Notice also that the LOGPDF function made this computation very easy. You do not need to worry about the actual formula for the binomial density. All you have to do is sum the log-density at the data values.

In contrast, the second method requires a little more work, but can handle any distribution for which you can compute the density function. If you look up the formula for the binomial PDF in the MCMC documentation, you see that
PDF(x; p, NTrials) = comb(NTrials, x) * p**x * (1-p)**(NTrials-x)
where the COMB function computes the binomial coefficient "NTrials choose x." There are three terms in the PDF that are multiplied together. Therefore when you apply the LOG function, you get the sum of three terms. You can use the LCOMB function in SAS to evaluate the logarithm of the binomial coefficients in an efficient manner, as follows:

/* Method 2: Manually compute log likelihood by using formula */
start Binom_LL2(p) global(x, NTrials);
   LL = sum(lcomb(NTrials, x)) + log(p)*sum(x) + log(1-p)*sum(NTrials-x);
   return( LL );
finish;
 
LL2 = Binom_LL2(p);      /* vectorized function, so no need to loop */

The second formulation has an advantage in a vector language such as SAS/IML because you can write the function so that it can evaluate a vector of values with one call, as shown. It also has the advantage that you can modify the function to eliminate terms that do not depend on the parameter p. For example, if your only goal is maximize the log-likelihood function, you can omit the term sum(lcomb(NTrials, x)) because that term is a constant with respect to p. That reduces the computational burden. Of course, if you omit the term then you are no longer computing the exact binomial log likelihood.

Example: The log-likelihood function for the lognormal distribution

In a similar way, you can use the LOGPDF or the formula for the PDF to define the log-likelihood function for the lognormal distribution. For brevity, I will only show the SAS/IML functions, but you can download the complete SAS program that defines the log-likelihood function and computes the graph.

The following SAS/IML modules show two ways to define the log-likelihood function for the lognormal distribution. For the lognormal distribution, the vector of parameters θ = (μ, σ) contains two parameters.

/* Method 1: use LOGPDF */
start LogNormal_LL1(param) global(x);
   mu = param[1];
   sigma = param[2];
   LL = sum( logpdf("Lognormal", x, mu, sigma) );
   return( LL );
finish;
 
/* Method 2: Manually compute log likelihood by using formula
   PDF(x; p, NTrials) = comb(NTrials,x) # p##x # (1-p)##(NTrials-x)
*/
start LogNormal_LL2(param) global(x);
   mu = param[1];
   sigma = param[2];
   twopi = 2*constant('pi');
   LL = -nrow(x)/2*log(twopi*sigma##2) 
        - sum( (log(x)-mu)##2 )/(2*sigma##2)
        - sum(log(x));  /* this term is constant w/r/t (mu, sigma) */
   return( LL );
finish;

The function that uses the LOGPDF function is simple to write. The second method is more complicated because the lognormal PDF is more complicated than the binomial PDF. Nevertheless, the complete log-likelihood function only requires a few SAS/IML statements.

For completeness, the contour plot on this page shows the log-likelihood function for 200 simulated observations from the Lognormal(2, 0.5) distribution. The parameter estimates are (μ, σ) = (1.97, 0.5).

Graph of the log-likelihood function for the lognormal distribution

Summary

This article has shown two simple ways to define a log-likelihood function in SAS. You can sum the values of the LOGPDF function evaluated at the observations, or you can manually apply the LOG function to the formula for the PDF function. The log likelihood is regarded as a function of the parameters of the distribution, even though it also depends on the data. For distributions that have one or two parameters, you can graph the log-likelihood function and visually estimate the value of the parameters that maximize the log likelihood.

Of course, SAS enables you to numerically optimize the log-likelihood function, thereby obtaining the maximum likelihood estimates. My next blog post shows two ways to obtain maximum likelihood estimates in SAS.

The post Two simple ways to construct a log-likelihood function in SAS appeared first on The DO Loop.

6月 102017
 


The most essential short QA lists on the way of SAS learning

by sxlion

导读:今天遇见畅销SAS图书作者谷鸿秋博士(“畅销”是“预测”出来的,暂无数据支撑),匆忙会面后简单聊了一下,感慨颇多,一时兴起(咖啡喝多了,睡不着。),为弥补2016年零博文记录的遗憾(凑个数),在此行文一篇。纯属胡拼乱凑,尽可一笑了之。

注:画了重点,看粗体。

Q:为什么要学SAS?

A:因为这个世界是随机的,但是人类总想认识世界

Q:如果我学好SAS软件到底会给我怎么样的改变 ?

A:纯会SAS的话,不愁饭碗,但顶多是个高级程序员。如果你还精通专业业务及理论,那么你就是你专业里面SAS最厉害的,SASor里面你专业最厉害的,想不牛都不行

Q: SAS怎么安装?

A:网上搜。SAS很难安装,但是学习SAS的难度远大于安装SAS的难度。

Q:学好SAS到底难不难?

A:难,非常难,那些说30天学好SAS语言的,纯属骗人,更不用说7天学会SAS编程的了。

Q:SAS这么难,为什么还要学SAS ?

A:学SAS的过程,是系统性学习了统计的过程,特别是对非统计专业的人来说。另外付出与回报是同等的,当然这个是说智力正常的前提下。

Q:怎么才能学会SAS ?

A:不知道。这个世界上精通SAS的没几个人,但是为什么需要精通SAS ? 学一些能解决你实际问题的SAS知识(不单是编程)就行了,半部论语治天下,懂一点SAS能安身立命。根据你的需要学习用好SAS的特定模块的特定功能。

Q:怎么才能用好SAS ?

A:每次看金庸、古龙的小说碰到为争夺武功秘籍而费劲周折的曲折动人惊心动魄冗长乱俗的情节时,我总是为SAS公司的慷慨大方从内心感赞。SAS Help堪比葵花宝典,但没事千万别翻。结婚生子要紧,需要用的时候查查SAS Help特定模块功能使用说明就行了,如果真要为学好SAS而从头到尾研究Help,你一辈子的时间根本不够。当然,你如果Help都研究透了,也只是会使用SAS而已,重要的是你解决了问题没有,SAS的本质是工具。 

Q: SAS Help怎么个好法?

A:SAS所有功能模块的用法全在Help里面。

Q:SAS Help有没有缺陷?

A:有。SAS Help只是告诉你使用方法。Help只讲怎么用,不讲为什么这样用。

。。。。。。编不下去了。。。。。。

原创文章: ”SAS学习过程必提问题简答列表“,转载请注明: 转自SAS资源资讯列表

本文链接地址: http://saslist.net/archives/445


6月 092017
 

If you were a fan of the original Star Trek television series, you probably remember lots of little details about the show. And you might even feel sorry for the people who don't get the clever references you make to things from the show. If you're that person, then you'll [...]

The post Star Trek (the original series) - the infographic! appeared first on SAS Learning Post.