7月 052017
 

A SAS customer asked how to use SAS to conduct a Z test for the equality of two proportions. He was directed to the SAS Usage Note "Testing the equality of two or more proportions from independent samples." The note says to "specify the CHISQ option in the TABLES statement of PROC FREQ to compute this test," and then adds "this is equivalent to the well-known Z test for comparing two independent proportions."

You might wonder why a chi-square test for association is equivalent to a Z test for the equality of proportions. You might also wonder if there is a direct way to test the equality of proportions. This article implements the well-known test for proportions in the DATA step and compares the results to the chi-square test results. It also shows how to get this test directly from PROC FREQ by using the RISKDIFF option.

A chi-square test for association in SAS

The SAS Usage Note poses the following problem: Suppose you want to compare the proportions responding "Yes" to a question in independent samples of 100 men and 100 women. The number of men responding "Yes" is observed to be 30 and the number of women responding Yes was 45.

You can create the data by using the following DATA step, then call PROC FREQ to analyze the association between the response variable and gender.

data Prop;
length Group $12 Response $3;
input Group Response N;
datalines;
Men          Yes  30
Men          No   70
Women        Yes  45
Women        No   55
;
 
proc freq data=Prop order=data;
   weight N;
   tables Group*Response / chisq;
run;
Test for association of two categorical variables in SAS

As explained in the PROC FREQ documentation, the Pearson chi-square statistic indicates an association between the variables in the 2 x 2 table. The results show that the chi-square statistic (for 1 degree of freedom) is 4.8, which corresponds to a p-value of 0.0285. The test indicates that we should reject the null hypothesis of no association at the 0.05 significance level.

As stated in the SAS Usage Note, this association test is equivalent to a Z test for whether the proportion of males who responded "Yes" equals the proportion of females who responded "Yes." The equivalence relies on a fact from probability theory: a chi-square random variable with 1 degree of freedom is the square of a random variable from the standard normal distribution. Thus the square root of the chi-square statistic is the Z statistic (up to a sign) that you get from the test of equality of two proportion. Therefore the Z statistic should be z = ±sqrt(4.8) = ±2.19. The p-value is unchanged.

Z test for the equality of two proportions: A DATA step implmentation

For comparison, you can implement the classical Z test by applying the formulas from a textbook or from the course material from Penn State, which includes a section about comparing two proportions. The following DATA step implements the Z test for equality of proportions:

/* Implement the Z test for pre-summarized statistics. Specify the group proportions and sizes. 
   For formulas, see https://onlinecourses.science.psu.edu/stat414/node/268 */
%let alpha = 0.05;
%let N1    = 100;   /* total trials in Group1 */
%let Event1=  30;   /* Number of events in Group1  */
%let N2    = 100;   /* total trials in Group2 */
%let Event2=  45;   /* Number of events in Group2  */
 
%let Side  =   2;   /* use L, U, or 2 for lower, upper, or two-sided test */
title "Test of H0: p1=p2 vs Ha: p1^=p2"; /* change for Side=L or U */
 
data zTestProp;
p1Hat = &Event1 / &N1;                /* observed proportion in Group1 */
var1  = p1Hat*(1-p1Hat) / &N1;        /* variance in Group1 */
p2Hat = &Event2 / &N2;                /* observed proportion in Group2 */
var2  = p2Hat*(1-p2Hat) / &N2;        /* variance in Group2 */
/* use pooled estimate of p for test */
Diff = p1Hat - p2Hat;                 /* estimate of p1 = p2 */
pHat = (&Event1 + &Event2) / (&N1 + &N2);
pVar = pHat*(1-pHat)*(1/&N1 + 1/&N2); /* pooled variance */
SE   = sqrt(pVar);                    /* estimate of standard error */
Z = Diff / SE;    
 
Side = "&Side";
if Side="L" then                      /* one-sided, lower tail */
   pValue = cdf("normal", z);
else if Side="U" then                 /* one-sided, upper tail */
   pValue = sdf("normal", Z);         /* SDF = 1 - CDF */
else if Side="2" then
   pValue = 2*(1-cdf("normal", abs(Z))); /* two-sided */
format pValue PVALUE6.4 Z 7.4;
label pValue="Pr < Z";
drop var1 var2 pHat pVar;
run;
 
proc print data=zTestProp label noobs; run;
Test for equality of two independent proportions in SAS

The DATA step obtains a test statistic of Z = –2.19, which is one of the square roots of the chi-square statistic in the PROC FREQ output. Notice also that the p-value from the DATA step matches the p-value from the PROC FREQ output.

Test equality of proportions by using PROC FREQ

There is actually a direct way to test for the equality of two independent proportions: use the RISKDIFF option in the TABLES statement in PROC FREQ. In the documentation, binomial proportions are called "risks," so a "risk difference" is a difference in proportions. (Also, a "relative risk" (the RELRISK option) measures the ratio of two proportions.) Equality of proportions is equivalent to testing whether the difference of proportions (risks) is zero.

As shown in the documentation, PROC FREQ supports many options for comparing proprtions. You can use the following suboptions to reproduce the classical equality of proportions test:

  1. EQUAL requests an equality test for the difference in proportion. By default, the Wald interval (METHOD=WALD) is used, but you can choose other intervals.
  2. VAR=NULL specifies how to estimate the variance for the Wald interval.
  3. (optional) CL=WALD outputs the Wald confidence interval for the difference.

Combining these options gives the following direct computation of the difference between two proportions:

proc freq data=Prop order=data;
   weight N;
   tables Group*Response / riskdiff(equal var=null cl=wald); /* Wald test for equality */
run;
Test for difference of proportions in SAS

The 95% (Wald) confidence interval is shown in the first table. The confidence interval is centered on the point estimate of the difference (-0.15). The interval does not contain 0, so the difference is significantly different from 0 at the 0.05 significance level.

The second table show the result of the Wald equality test. The "ASE (H0)" row gives the estimate for the (asymptotic) standard error, assuming the null hypothesis. The Z score and the two-sided p-value match the values from the DATA step computation, and the interpretation is the same.

Summary

In summary, the SAS Usage Note correctly states that the chi-square test of association is equivalent to the Z test for the equality of proportion. To run the Z test explicitly, this article uses the SAS DATA step to implement the test when you have summary statistics. As promised, the Z statistic is one of the square roots of the chi-square statistic and the p-values are the same. The DATA step removes some of the mystery regarding the equivalence between these two tests.

However, writing DATA step code cannot match the convenience of a procedure. For raw or pre-summarized data, you can use the RISKDIFF option in PROC FREQ to run the same test (recast as a difference of proportions or "risks"). To get exactly the same confidence intervals and statistics as the classical test (which is called the Wald test), you need to add a few suboptions. The resulting output matches the DATA step computations.

The post Test for the equality of two proportions in SAS appeared first on The DO Loop.

7月 042017
 

When I arrived for a training course with the name, "Powerful storytelling for tech," I was surprised to discover that most of the attendees worked in sales, rather than the marketing and communications crowd I'd been expecting. They were all there to learn how best to engage prospects – how [...]

Powerful storytelling for tech was published on SAS Voices by David Smith

7月 032017
 

If you spend a lot of time in SAS Enterprise Guide (as I do), you probably get to know its features pretty well. But we don't always take the time to explore as we should, so there might be a few golden nuggets of editor knowledge that have escaped you so far. Here are 10 program editor features that I've found essential while writing, editing, and debugging SAS programs. How many of these do you already know and use?

1. Turn on the line numbers

We programmers like to count lines of code. The SAS log often uses line numbers to reference problems in WARNINGs and ERRORs. So of course, you should have line numbers displayed in the program editor. But they aren't on by default. Go to Program → Editor Options and select "Show line numbers" to turn them on.

line numbers

2. Get the tabs out (or leave them in)

Tabs or spaces? Your choice here can have a significant effect on your earning potential, and perhaps even on your love life. Most code editors have options that support your choice, regardless of which camp you choose. SAS Enterprise Guide offers these:

  • Tab size - width of a tab character, represented in number of spaces. Default is 4, but I like to use 2 as it makes my program lines less wide.
  • Insert spaces for tabs - when you press the TAB key, don't add a TAB character but instead add the specified number of space characters.
  • Replace tabs with spaces on file open - a perfect passive-aggressive option when working with team members who disagree with your TAB world view. This option will change TAB characters to spaces when you open the program file. If you must retain the TAB characters...well, my main advice is do not rely on TAB characters in your code file. But if you must for some crazy reason, don't select this option and sign a pact with your teammates for the same.

tabs vs spaces

3. Define abbreviations for commonly used code

The most common code snippet that I reuse is a LIBNAME statement that points to my project data. To save on typing and mistakes, I've saved this as an editor abbreviation. When I begin to type the alias I've assigned to the snippet, the program editor offers to complete it for me. The custom abbreviation is presented along side all of the other built-in syntax suggestions.

abbrev animation

See more about editor abbreviations in this article.

4. Let the editor format your code

As shown in the vigorous "TABS vs spaces" debate, programmers care deeply about how their code is formatted. Individuals and teams adopt various standards for line breaks and indenting, and these are usually particular to the programming language. In general, SAS doesn't care how your code is laid out -- statements are delimited by semicolons, and that's the only cue that SAS needs. However, your teammates (and your future YOU, rereading your code) might appreciate something a little more readable.

Press Ctrl+I to format your entire program, applying some reasonable readability rules to indent code lines with conditionals and looping logic. Or select just a portion of the program and press Ctrl+I to affect a smaller part of the program. You can adjust some of the formatting rules by visiting Program → Editor Options, the Indenter tab.

autoformat code

5. Zoom out for the big picture

Some SAS programs are long -- hundreds (or thousands!) of lines of code. Sometimes it's helpful to get a birds-eye view of your code to understand its structure and to help you navigate. The Zoom feature is super helpful for this. Simply press Ctrl+- (control-minus) until you get the view you need. Press Ctrl++ (control-plus) to zoom back in, or press Ctrl+0 to get to the 100% view.

This trick works for SAS logs as well, and also data sets and ODS output (including text listing, which uses the program editor in a special mode for viewing SAS output).

zoom out

6. Change the program editor font

Want to waste an afternoon? Search the Internet for "best font for programmers" and experiment with all of the results that you find. I discovered Consolas (built into Microsoft Windows) a decade ago, and I've yet to find anything better. I use it for all of my "fixed font" needs: programming, terminal windows, command consoles, etc. But you can choose your own favorite -- just don't feel that you're stuck with the default "Courier" that seems to be standard issue.

Change your font in Programs → Editor Options, Appearance tab. You'll find lots of elements that you can tweak for typeface, size and color.

7. Select columns of content with block selection

Even though column block selection -- also known as "Alt+Select" -- is a standard feature in most advanced text editors, many programmers don't know about it. It's the perfect trick for selecting just a few columns of your text without including the content that's on the rest of the line. In SAS programming, this can be handy for selecting columns of values from the text listing output and pasting somewhere else, such as into a DATALINES block. It takes a little practice to master the Alt+Select, but once you do you'll find all sorts of uses for it. To get started, simply hold down the Alt key and click-drag to highlight a vertical column of text within the editor.

column selection animation

8. Find (and replace) using regular expressions

Regular expressions are a powerful, if confusing, method for finding and replacing text that matches certain patterns. The Find/Find and Replace window in SAS Enterprise Guide supports "Regular expression search" as a checkbox option.

Here's an example. Suppose I wanted to find all occurrences of 3 numbers after the thousands separator (comma) at the end of each data line -- and I wanted to turn those digits into zeros. (I don't know why--but just stick with me here.) A regex pattern to match this is ",\d\d\d\n" (comma, followed by 3 occurrences of numeric digits, followed by a line ending). Here's an animation of this in action.

regex replace animation

For more, select Help→SAS Enterprise Guide help and search for "regular expressions". The help topics contain several examples of useful patterns.

9. Scroll just part of your document using a split view

Do you find yourself scrolling back and forth in your program view? Trying to remember what was in that DATA step at the top of your program so you can reference the proper variable in another part of your code? Instead of dealing with "scrolling whiplash", you can split the program editor view to keep one part of your code always visible while you work on another code segment that's hundreds of lines away from it.

split view

There are several ways to split your view of SAS code, log output, and listing. Check out the article here for details.

10. Break out to your other favorite editor

Please don't tell anyone, but I have a secret: SAS Enterprise Guide is not my default application associated with .SAS files. When I double-click on a .SAS file in Windows Explorer, I like to use Notepad++ to provide a quick view of what's in that program file. Don't get me wrong: I use SAS Enterprise Guide for all of my serious SAS programming work. With syntax suggestions, color coding, built-in DATA step debugger, and more -- there just isn't a better, more full-featured environment. (No, I'm not trying to troll you, diehard SAS display manager users -- you keep using what makes you happy.) But Notepad++ has a deep set of text editing features, and sometimes I like to use it for hardcore find/replace functions, deeper inspection of special characters in my files, and more.

You can launch your program into your other favorite editor from SAS Enterprise Guide. Simply right-click on the program node in your process flow, select Open → Open <program name> with Windows Default. And make sure your other editor is registered in Windows as the default "Open with" action for SAS programs. Note: this trick works only with SAS programs that you've saved locally on your Windows file system.

Open with default

More than editing -- this is your workbench

The program editor isn't just about "editing programs." It's also the launchpad for several other programmer-centric features, such as debugging your DATA step, comparing your SAS programs, viewing program history and source control, and more. If you use SAS Enterprise Guide, take the time to learn about all of its programming features -- you'll become a more productive programmer as a result.

The post Ten SAS Enterprise Guide program editor tricks appeared first on The SAS Dummy.

7月 032017
 

Students in introductory statistics courses often use summary statistics (such as sample size, mean, and standard deviation) to test hypotheses and to compute confidence intervals. Did you know that you can provide summary statistics (rather than raw data) to PROC TTEST in SAS and obtain hypothesis tests and confidence intervals? This article shows how to create a data set that contains summary statistics and how to call PROC TTEST to compute a two-sample or one-sample t test for the mean.


Did you know you that PROC TTEST in #SAS can analyze summary statistics?
Click To Tweet


Run a two-sample t test for difference of means from summarized statistics

The documentation for PROC TTEST includes an example that shows how to compute a two-sample t test for the difference between the means of two groups. Rather than repeat the documentation example, let's compare the mean heights of 19 students based on gender. The data are contained in the Sashelp.Class data set.

To use PROC TTEST on summary statistics, the statistics must be in a SAS data set that contains a character variable named _STAT_ with values 'N', 'MEAN', and 'STD'. Because we are interested in a two-sample test, the data must also contain a grouping variable. The following SAS statements sort the data by the grouping variable, call PROC MEANS to write the summary statistics to a data set, and print a subset of the statistics:

proc sort data=sashelp.class out=class; 
   by sex;                                /* sort by group variable */
run;
proc means data=class noprint;           /* compute summary statistics by group */
   by sex;                               /* group variable */
   var height;                           /* analysis variable */
   output out=SummaryStats;              /* write statistics to data set */
run;
proc print data=SummaryStats label noobs; 
   where _STAT_ in ("N", "MEAN", "STD");
   var Sex _STAT_ Height;
run;

The table shows the structure of the SummaryStats data set for two-sample tests. The two samples are defined by the levels of the Sex variable ('F' for females and 'M' for males). The _STAT_ variable specifies the name of the statistics that are used in standard formulas for computing confidence intervals and hypothesis tests. The Height column shows the value of the statistics for each group.

You can use a data set like this one to conduct a two-sample t test of independent means. In a textbook, the problem is usually accompanied by a little story, like this:
The heights of sixth-grade students are normally distributed. Random samples of n1=9 females and n2=10 males are selected. The mean height of the female sample is m1=60.5889 with a standard deviation of s1=5.0183. The mean height of the male sample is m2=63.9100 with a standard deviation of s2=4.9379. Is there evidence that the mean height of sixth-grade students depends on gender?

The story suggests running a two-tailed test of the null hypothesis μ1 = μ2 against the alternative hypothesis that μ1 ≠ μ2. You do not have to do anything special to get PROC TTEST to use the summary statistics: if the procedure sees that the input data set contains a special variable named _STAT_ and the special values 'N', 'MEAN', and 'STD', then the procedure assumes that the data set contains summarized statistics. The following statements compare the mean heights of males and females for these students:

proc ttest data=SummaryStats order=data
           alpha=0.05 test=diff sides=2; /* two-sided test of diff between group means */
   class sex;
   var height;
run;

Notice that the output includes 95% confidence intervals for the group means, an estimate for the difference in means (-3.3211), and a confidence interval for the difference. You also get confidence intervals for the standard deviations.

In the second table, the "Pooled" row gives the t test under the assumption that the variances of the two groups are approximately equal, which seems to be true for these data. The value of the t statistic is t = -1.45 with a two-sided p-value of 0.1645. With this small sample, we fail to reject the null hypothesis at the 0.05 significance level. (For unequal group variances, use the "Satterthwaite" row.)

The syntax for the PROC TTEST statement enables you change the significance level and the type of hypothesis test. For example, to run a one-sided test for the alternative hypothesis μ1 < μ2 at the 0.10 significance level, you can use:

proc ttest ... alpha=0.10 test=diff sides=L;  /* Left-tailed test */

Run a one-sample t test of the mean from summarized statistics

In the previous section, PROC MEANS generated the summary statistics. However, you can also create the summary statistics manually, which is useful when you do not have access to the original data. As before, the key requirements are a variable named _STAT_ that has values 'N', 'MEAN', and 'STD'.

For example, a Penn State online statistics course states the following problem:
A research study measured the pulse rates of 57 college men and found a mean pulse rate of 70.4211 beats per minute with a standard deviation of 9.9480 beats per minute. Researchers want to know if the mean pulse rate for all college men is different from the current standard of 72 beats per minute.

The following SAS DATA step writes the summary statistics to a data set, then calls PROC TTEST to run a one-sample test of the null hypothesis μ = 72 against a two-sided alternative hypothesis:

data SummaryStats;
  infile datalines dsd truncover;
  input _STAT_:$8. X;
datalines;
N, 57
MEAN, 70.4211
STD, 9.9480
;
 
proc ttest data=SummaryStats alpha=0.05 H0=72 sides=2; /* H0: mu=72 vs two-sided alternative */
   var X;
run;

The results show that a 95% confidence interval for the mean contains the value 72. The value of the t statistic is t = -1.20, which corresponds to a p-value of 0.2359. Consequently, the data fails to reject the null hypothesis at the 0.05 significance level. These pulse rates are consistent with a random sample from a normal population with mean 72.

Summary

Although most people run PROC TTEST on raw data, you can also run PROC TTEST on summarized data. The procedure outputs the same tables and statistics as for raw data. You can run one- or two-sided tests for the mean value, or you can compare group means. When comparing the difference between group means, you can assume equal group variance ("Pooled" estimates) or unequal group variance ("Satterthwaite" estimates).

The only limitation when analyzing summary statistics is that PROC TTEST cannot produce ODS graphics because the graphics require the raw data.

The post Summary statistics and t tests in SAS appeared first on The DO Loop.

7月 012017
 


导读:SASor论坛上经典对话很多,以前用爬虫把整个网站down下来过一次,可惜没时间整理,然后电脑升级,最后就丢失了。每每想重温论坛,都无法得到原文。网络上仅仅转载了部分的对话,想起来真是可惜。现在仅凭一些时有时无的记忆点,在网络重搜出来一些经典对话,整理一下,分享一下。

QKAIWEI:
1

1
2
3
%let rc=%sysfunc(open(yourTable));
%let nobs=%sysfunc(attrn(&amp;rc,nobs));
%let rc=%sysfunc(close(&amp;rc));

2

1
2
3
proc sql noprint;
select count(1) into:nobs from yourTable;
quit;

3

1
2
3
4
5
6
data _null_;
point=0;
set yourTable point=point nobs=n;
call symput('nobs',n);
stop;
run;

4

others???

SAS_DREAM:

1

1
2
3
%let rc=%sysfunc(open(yourTable));
%let nobs=%sysfunc(attrn(&amp;rc,nobs));
%let rc=%sysfunc(close(&amp;rc));

– open, attrn是SCL函数在base里的调用方式,需要测试一下对AF license的依赖性,该方法从表头meta区读取信息,与数据大小无关,效率较高。不过attrn要建立scl list,稍费周折。

2

1
2
3
proc sql noprint;
select count(1) into:nobs from yourTable;
quit;

– 顺序遍历表,数数记录,不从表头取信息,表越大,时间越长。

3

1
2
3
4
5
6
data _null_;
point=0;
set yourTable point=point nobs=n;
call symput('nobs',n);
stop;
run;

考虑一下空表情形:

1
2
3
4
5
6
%let nobs = 0;
data _null_;
set yourTable (drop=_all_) nobs=n;
call symput('nobs',n);
stop;
run;

如果空表,n是取不到值的,所以宏obs要赋初值。
(drop=_all_)当表列数很多又是通过远程提交时,这个选项会省时间。
point=0不用也可,因为stop会限制data step在一条记录以后就停。

这个方法也是直接从表头meta区取观测数,与表大小无关。

4 others???
你已经知道这么多方法,除了第二条没有什么理由去用,第一,三方法都是既简洁准确又不用一秒钟就精确搞定,余下的时间,不如去想其它有意思的专题了。

IMOEN:

1
2
3
4
5
PROC SQL NOPRINT;
SELECT NOBS INTO: NUM_OF_ROW
FROM SASHELP.VTABLE
WHERE LIBNAME="SASHELP" AND MEMNAME="CLASS";
QUIT;%PUT &amp;NUM_OF_ROW;

这种方法也是直读metadata,优点是不仅仅可以用来读出纪录数,还可读出大量有用的表信息。打开sashelp.vtable看一看就知道了。

SAS_DREAM:

imoen 写到:
PROC SQL NOPRINT;

1
2
3
4
SELECT NOBS INTO: NUM_OF_ROW
FROM SASHELP.VTABLE
WHERE LIBNAME="SASHELP" AND MEMNAME="CLASS";
QUIT;%PUT &amp;NUM_OF_ROW;

这种方法也是直读metadata,优点是不仅仅可以用来读出纪录数,还可读出大量有用的表信息。打开sashelp.vtable看一看就知道了。

透过vtable这样一个集中元数据表来访问表信息,对于一次性访问多个表的多种信息,是很方便的。

但是就每次零星地考察单表记录数而言,有些地方似乎值得进一步考量:
sashelp.vtable是sashelp库里的一个sql view, 应该是有后台管理进程辅助更新。对于通过vtable访问某个表的记录数,可能有几个担心:
1,在vtable中搜寻一个单表的信息需要进行View的retrieve遍历,还要进行逐条比对(WHERE LIBNAME=”SASHELP” AND MEMNAME=”CLASS”);尚不清楚vtable view的生成代码是什么(感觉像是Proc contents类似的操作生成的),但是该view的建立应该很难高效地利用针对“libname”和”memname”的索引,这样,搜寻访问效率可能随着view retrieve遍历的表记录增加而降低。
当系统中总的表数增大到一个较大的数量级,这在大型数据仓库系统,尤其是系统安装的模块较多,或者存在大量OLAP, EM projects时很有可能发生,vtable中包含的内容就会很多,因为它的内容涵盖系统中所有的表,甚至临时表;此时如果又用不上索引,vtable的访问速度会相当慢,事实上,用VT sashelp.vtable时已会有所感觉。
2,对于服务器端的应用,如果多个连接程序需要同时访问vtable取信息时,可能会进一步面临效率问题
3,当服务器端系统中一方面不断生成和删除大量表,同时有很多远程连接需要访问vtable取信息时,vtable所透视的内容可能面临更新和取用的双重压力

不过对于一般应用情形,都是OK了。 上面的担心只是瞎猜而已… …

IMOEN:

View只在被访问时才生成数据。同时,SAS的串行查询,令到冲突几乎不可能发生。

由于都是访问metadata,即使系统中有数千个表,也是秒级的操作。在大数据量ETL优化时,这类秒级操作基本上是忽略的。程序的可读性和通用性更加重要。

在主流DBMS中,SQL优化器在遇到qkaiwei第二种方法时,都会直接查询meta,而不是遍历原数据。如果数据引擎用的是其他DBMS,最快的方法是用SQL Pass-through。例如:

1
2
3
4
5
6
Proc SQL;
CONNECT TO ODBC AS DW ();SELECT nob INTO :NO_OF_ROW
FROM CONNECTION TO DW
(SELECT count(*) AS nob
FROM a_table);DISCONNECT FROM DW;
QUIT;

如果在SAS里面Count(*),立刻就会把所有的数据,通过你的接口引擎,在SAS中遍历。两种方法的速度有本质的差别。

SAS_DREAM:

imoen 写到:
在主流DBMS中,SQL优化器在遇到qkaiwei第二种方法时,都会直接查询meta,而不是遍历原数据。如果数据引擎用的是其他DBMS,最快的方法是用SQL Pass-through。例如:

1
2
3
4
5
6
Proc SQL;
CONNECT TO ODBC AS DW ();SELECT nob INTO :NO_OF_ROW
FROM CONNECTION TO DW
(SELECT count(*) AS nob
FROM a_table);DISCONNECT FROM DW;
QUIT;


如果在SAS里面Count(*),立刻就会把所有的数据,通过你的接口引擎,在SAS中遍历。两种方法的速度有本质的差别。

说得有道理!

感谢Q版,大清早信手拈来就给大家一个好玩的话题!
今天和vtable玩了个游戏,发现一些现象很有趣:

1,先造10000个表在系统里

1
2
3
4
5
6
7
8
9
10
11
12
13
%let tableNum = 10000;

%macro tables();
data
%do i=1 %to &amp;tableNum;
test&amp;i
%end;
;
set sashelp.class;
run;
%mend tables;

%tables;

2,再去找其中TEST100的观测数
set nobs跑了1秒

1
2
3
4
5
6
7
%let nobs=0;

data _null_;
set TEST100(drop=_all_) nobs=rows;
call symput('nobs',rows);
stop;
run;

%put &nobs;
NOTE: There were 1 observations read from the data set WORK.TEST100.
NOTE: DATA statement used:
real time 1.00 seconds
cpu time 0.00 seconds

FROM SASHELP.VTABLE跑了29秒

1
2
3
4
5
PROC SQL;
SELECT NOBS
FROM SASHELP.VTABLE
WHERE LIBNAME="WORK" AND MEMNAME="TEST100";
QUIT;

NOTE: PROCEDURE SQL used:
real time 29.10 seconds
cpu time 0.17 seconds

如果再运行一遍,由于cache的机理,两个方法都是不到1秒了。

猜想:
vtable被访问时的应该是动态收集系统中所有的表meta,搜集的耗时与表的数目有关,而且不论是访问那个表的信息(where 。。。),都是先去收集系统中所有的表meta;否则,利用WHERE LIBNAME=”WORK” AND MEMNAME=”TEST100″取一条记录,不会在系统表数目增长时产生显著的响应下降(系统表1000个时,就是2,3秒而已)
而set nobs的方法与系统表数目无关。
因此,极端情况下vtable还是有响应效率问题的,不过算不了什么。

结论:
1万个系统表的状况还是比较罕见的,而且vtable的响应也是可以接受(机器快了更是影响比较小),如Imoen所说,使用vtable方法比较通用和易读,也不致在效率上产生实质问题。
只取观测数的话,set nobs方法仍是比较高效的。
如果愿意,编个macro把set nobs和vtable的优势结合,也是很好玩的。

提示:
这个测试情形必须保证消除cache效应,方法可以是在每种方法测试之前,运行一个大进大出的summary,搞到out of memory更好,会把之前测试的cache清得一干二净,各种方法的真实效率才会显现,在其它类似测试中也如是。

QKAIWEI:
我其实一直想努力使sasor向这个方向走:画一个圈,挖一口井。

我一般使用data步的做法,比较省时,而且sas语言上data步是最稳定的,如果用别的,说不定哪里就来个bug。

如果是在scl中,我一般用
dsid=open(talbename);
nobs=attrn(dsid,’nobs’);

接下来,我可能要前移游标,做变量取值操作。但是scl语法不能操作sas server上的表,使用%sysfunc一般也是折衷之计。

在远程sas server如果想得到scl的同样效果,我的做法是

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
rsubmit;
%let rc=%sysfunc(open(tablename));
%let nobs=%sysfunc(attrn(&amp;rc,nobs));
%let rc=%sysfunc(close(&amp;rc));

data _null_;
nobs=&amp;nobs;
do p=1 to nobs
set my table point=p;
call symput('a',a);
call symput('b',b);
call execute('%macro1(&amp;a)');
call execute('%macro2(&amp;b)');
end;
stop;run;
endrsubmit;

proc sql在使用v系列的视图确实相比其他方法比较耗费时间,当然这个差别要考虑机器性能。
我以前的电脑在使用这一方法的时候总要顿一下。
当然imoen的说的也是,就是在读取其他数据库的时候,如oracle上的表时,这是scl 以及data步不能实现的。

在sas server上分组取观测数时,proc sql也很有意思,有的时候也用一下,比较方便。

1
2
3
4
5
6
rsubmit;
proc sql noprint;
select count(1) into:group_obs separated by ',' from tblname group by var;
quit;
&amp;sysrput group_obs=&amp;group_obs /*caution the length of macro variable*/
endrsubmit;

接下来有个%superq的用法,然后循环,用%scan分别取值。
我去做饭,不能多写了,希望多开一些这样的题目,做深入的讨论。
两位周末快乐!

原创文章: ”【论坛沉钩_001】何种方式最有效地得到数据表的观测数?“,转载请注明: 转自SAS资源资讯列表

本文链接地址: http://saslist.net/archives/448


7月 012017
 

As a technical consultant for SAS, I have the privilege of meeting with SAS customers, learning more about how they use our software, and then helping them solve their problems. Recently, a client of mine was having trouble finding a way to implement the use of multiple application servers in SAS Visual Analytics 7.3 based on what business unit a user is a part of. Guessing that some of our SAS Visual Analytics Administrators may have the same question, I thought I’d share how we solved his problem.

The outline below is the order of operations of how a server context is selected. The example provided in the screenshots is how you would control access so that user requests will only be sent through the appropriate business unit server.

Preliminary Requirements

  • The Server is registered with the job execution service.
  • The Server is visible to the requesting user.

Assuming that all servers are registered with the job execution service the following steps are how an application server would be selected:

Step 1: The server associated with the target LASR library will be used. If the server is not visible to the user, proceed to step 2.

Figure 1: In this example the server context associated with the target library is SASApp which the user is denied access to in the server context permissions

Step 2: The suite-level default server defined in the SAS Visual Analytics configuration properties at “va.defaultWorkspaceServer” will be used. If the server is not visible to the user, proceed to step 3.

Figure 2: In this example the configuration property is also set to SASApp which the user is denied access to in the server context permissions

Step 3: Lastly, use any server that is registered with the job execution service and visible to the requesting user.

Figure 3: In this example the BU server context would end up being chosen because it is the only context that the user has permission to access

Some additional Info to Keep in Mind

  • The preferences in the administrator and data builder tabs allow the forced used of a specific server by opting out of automatic selection. These preferences can be separately set and one does not affect the other.
  • In SAS Visual Analytics Explorer, the pooled workspace server is used to populate the available data to import field, and the workspace server is used to perform the actual importing.

Figure 4: The pooled workspace server is denied access so the import data area will not be populated

I hope you found this blog helpful. Please feel free to leave a question or comment below.

Using multiple server contexts in SAS Visual Analytics was published on SAS Users.

7月 012017
 

We live in exciting times. Our relationships with machines, objects and things are quickly changing. Since mankind lived in caves, we have pushed our will into passive tools with our hands and our voices. Our mice and our keyboards do exactly as we tell them to, and devices like the [...]

Artificial intelligence: Separating the reality from the hype was published on SAS Voices by Oliver Schabenberger

6月 302017
 

In this blog post I’d like to explore how to create a custom group in SAS Viya to restrict access to functionality. To illustrate my points, we will create a report developers custom group and ensure that only users of that group can create reports and analysis in SAS Visual Analytics.

What a user or group can do (and see) is controlled by rules. A rule is a composite of authorization elements including:

  • Principal: user or group.
  • Target: a resource for example a service, folder or report.
  • Permissions: type of access for example read or write.
  • Setting: indication of whether access is provided, for example grant or prohibit.

The target of a rule is identified using a uniform resource identifier (uri). The uri can represent a folder, content such as a report or data plan, or functionality and features such as being able to import data. Here are some examples of uri’s in SAS Viya:

  • Data Plan: /dataPreparationPlans/plans/810e2c6b-4733-4d53-94fd-dfeb4df0de9e
  • Folder: /folders/folders/e28e35af-2673-4fc7-81fa-1a074f4c0de9
  • Functionality: /SASVisualAnalytics/**

In our example, we will look at restricting access SAS Visual Analytics for a subset of users. In SAS 9.4 this would have been accomplished using roles and capabilities. In SAS Viya, we will:

  • Create a custom group.
  • Govern that groups access to functionality using rules.

Create a New Custom Group

In SAS Environment Manager, as an administrator (only administrators can manage users and groups) select Users > Custom Groups > New.

In the new custom group screen give the group a name, a unique id and a description. We will call our group Report Developers.

After the new group is created, click the edit button to add new members to the group. You can add users or other groups as members of the new group.

Change the Rules so that Only Report Developers can Access the SAS Visual Analytics Application

Now that we have a new group called Report Developers, the next step is to create or update the rule that determines who can access this functionality. First, we will look at what rules currently apply to SAS Visual Analytics.

In SAS Environment Manager select the Security menu item and select the Rules view.

Select Filter by: ObjectURI and enter SASVisualA in the search box.

The second rule listed is the one we are interested in. Notice that URI ends with /**.  URI’s can end with /* or  / **.  An objectUri that includes the /** suffix affects access to all descendant functionality. For example, the /SASVisualAnalytics/** means all functionality in the SAS Visual Analytics application.

Select /SASVisualAnalytics/** and click the Edit icon.  The attributes show that this rule determines who can use SAS Visual Analytics. Currently you’ll see:

  • Grants Read access
    • to /SASVisualAnalytics/** and all its descendent functionality
      • to all authenticated users.

The rule works because the general authorization system implicitly disallows any access that is not granted. The current rule overrides the implicit deny to allow authenticated users to access SAS Visual Analytics. We will edit the rule and change the principal from Authenticated Users to ReportDevelopers.

In the edit rule screen under Principal, select ReportDevelopers.

The impact of the change is that now only users who are members of the Report Developers group can access the Visual Analytics application to create reports.

To test this, you can logon as a user who is not a member of the group. Those users will be able to navigate to reports and open then using the report viewer, but they will not be able to access SAS Visual Analytics to create new reports.

That is a quick look at using custom groups and rules to dictate what users can do in SAS Viya. There is much more detail on these topics in the SAS® Viya 3.2 Administration Guide:

New rules for authorization in SAS Viya was published on SAS Users.

6月 302017
 

Choosing great colors for a graph is sometimes the most difficult part. And here is yet another thing you need to worry about ... sometimes colors represent different things in different cultures! In this blog post, I improve a graphic to help you get a grasp on those color-to-culture relationships. [...]

The post Colors represent different things, in different cultures appeared first on SAS Learning Post.