4月 032017
 

One of the advantages of the new mixed-type tables in SAS/IML 14.2 is the greatly enhanced printing functionality. You can control which rows and columns are printed, specify formats for individual columns, and even use templates to completely customize how tables are printed. Printing a table is accomplished by using the TableCreateFromDataSet function. Finally, the TablePrint subroutine prints a customize portion of the table:

data class;
set sashelp.class;
Birthday = '03APR2017'd - age*365 - floor(365*uniform(1)); /* create birthday */
format Birthday DATE9.;
run;
 
proc iml;
tClass = TableCreateFromDataSet("Class");    /* read data into table */
run TablePrint(tClass) firstobs=3 numobs=5 
                       var={"Age" "Birthday"} 
                       ID="Name"
                       label="Subset of Class";
Basic printing of SAS/IML Tables

Notice that the table contains both numeric and character columns. Furthermore, the numeric columns have different formats. The TablePrint subroutine has some distinct advantages over the traditional PRINT statement in SAS/IML:

  • The TablePrint subroutine supports an easy way to display a range of observations. When you use the PRINT statement for multiple vectors, you have to use row subscripts in each vector, such as PRINT (X[3:8,]) (Y[3:8,]);
  • The TablePrint subroutine supports printing any columns in any order. When you use the PRINT statement on a matrix, you have to use column subscripts to change the order of the matrix columns: PRINT (X[, {2 3 1}]);
  • The PRINT statement supports the ROWNAME= option (for specifying row headers), the COLNAME= option (for specifying column headers), and the LABEL= option. Those options are easy to work with when you print a single matrix. However, you can't store mixed-type data in a matrix and those options are less convenient when you print a set of vectors.

Advanced printing of SAS/IML tables

Trafic lighting: Color cells in SAS/IML tables by cell contents

The SAS/IML documentation has several sections of documentation devoted to

For statistical programmers, the ability to use ODS templates means that output from PROC IML can look the same as output from some other SAS procedue. For example, suppose that you have a table that contains parameter estimates for a linear regression. The following example prints that table by using the same ODS template that PROC REG uses, which is the Stat.Reg.ParameterEstimates template:

proc iml;
vars = {"Intercept", X, Z};
stats = {32.19   5.08   21.42   42.97,
          0.138  0.0348  0.0644  0.2117, 
          1.227  0.5302  0.1027  2.3506 }; 
tbl = TableCreate("Variable", vars);
call TableAddVar(tbl, {"Estimate" "StdErr" "LowerCL" "UpperCL"},  stats);
 
Confidence=95;
call TablePrint(tbl) template="Stat.Reg.ParameterEstimates"
                     dynamic={Confidence};
Print SAS/IML tables by using existing ODS templates

This example works because the column names in the SAS/IML table match the names that are expected by the Stat.REG.ParameterEstimates template. The DYNAMIC= option specifies a dynamic variable (Confidence) that the template requires. See the documentation for further details.

Summary

In summary, the TablePrint subroutine in SAS/IML gives programmers control over many options for printing tables of data and statistics. For complex layouts, you can use an existing ODS template or create your own template to customize virtually every aspect of your tabular displays.

The post Print tables in SAS/IML appeared first on The DO Loop.

4月 022017
 

python ngram


# -*- coding: utf-8 -*-
# @DATE    : 2017/4/1 10:39
# @Author  : 
# @File    : ngram.py
from collections import defaultdict


def gen_n_gram(input, sep=" ", n=2):
    input = input.split(sep)
    output = {}
    for i in xrange(len(input) - n + 1):
        gram = "".join(input[i: i + n])
        output.setdefault(gram, 0)
        output[gram] += 1
    return output


def dict_sum(*dict):
    ret = defaultdict(int)
    for d in dict:
        for k, v in d.items():
            ret[k] += v
    return ret


def sum_n_gram(inputs, sep=" ", n=2):
    output_sum = defaultdict(int)
    for input in inputs:
        output_sum = dict_sum(output_sum, gen_n_gram(input))
    output_sum = sorted(output_sum.items(), key=lambda x: x[1], reverse=True)
    return output_sum


if __name__ == "__main__":
    inputs = ["a a a j 9 3 h d e", "a j 9 3 h", "g g h 9 3"]
    print(gen_n_gram("a a a j 9 3 h d e"))
    output = sum_n_gram(inputs)
    print(output)
    output_file = "dict.txt"
    cnt = len(output)
    with open(output_file, "w") as out:
        for i, value in enumerate(output):
            if i + 1 <</span> cnt:
                out.write("{}:{}n".format(value[0], value[1]))
            else:
                out.write("{}:{}".format(value[0], value[1]))

运行日志


{'aa': 2, 'de': 1, 'j9': 1, 'aj': 1, '3h': 1, '93': 1, 'hd': 1}
[('93', 3), ('aa', 2), ('aj', 2), ('j9', 2), ('3h', 2), ('de', 1), ('gg', 1), ('h9', 1), ('hd', 1), ('gh', 1)]

Process finished with exit code 0

 
 Posted by at 12:32 下午
4月 012017
 

Security analytics has gotten a lot of attention in the industry the last few years. That’s not surprising. After all, security analytics can help organizations: Transition from reactive threat firefighting to proactive security risk management. Exploit all available security data to develop better insights and priorities. Maximize the effectiveness of [...]

Security analytics skeptics have nothing to fear was published on SAS Voices by Liz Goldberg

4月 012017
 

Solar farm on SAS campus The full text of Fermat's statement, written in Latin, reads "Cubum autem in duos cubos, aut quadrato-quadratum in duos quadrato-quadratos, et generaliter nullam in infinitum ultra quadratum potestatem in duos eiusdem nominis fas est dividere cuius rei demonstrationem mirabilem sane detexi. Hanc marginis exiguitas non caperet."

The English translation is: "It is impossible for a cube to be the sum of two cubes, a fourth power to be the sum of two fourth powers, or in general for any number that is a power greater than the second to be the sum of two like powers. I have discovered a truly marvelous demonstration of this proposition that this margin is too narrow to contain."

Here at SAS, we don’t take challenges lightly. After a short but intensive brainstorming, we came up with a creative and powerful SAS code that effectively proves this long-standing theorem. And it is so simple and short that not only can it be written on the margins of this blog, it can be tweeted!

Drum roll, please!

Here is the SAS code:

data _null_; 
	do n=3 by 1; 
		do a=1 by 1; 
			do b=1 by 1; 
				do c=1 by 1; 
					e = a**n + b**n = c**n;
					if e then stop; 
				end;
			end; 
		end;
	end;
run;

Or written compactly, without unnecessary spaces:

data _null_;do n=3 by 1;do a=1 by 1;do b=1 by 1;do c=1 by 1;e=a**n+b**n=c**n;if e then stop;end;end;end;end;run;

which is exactly 112 character long – well below the Twitter 140-character threshold.

Don’t be fooled by the utter simplicity and seeming unfeasibility of this code.  For the naysayers, let me clarify that we run this code in a distributed multithreaded environment where each do-loop runs as a separate thread.

We also use some creative coding techniques:

1.     Do-loop with just two options, count= and by=, but without the to= option (e.g. do c=1 by 1;). It is a valid syntax in SAS and serves the purpose of creating infinite loops when they are necessary (like in this case). You can easily test it by running the following SAS code snippet:

data _null_;
	start = datetime();
	do i=1 by 1;
		if intck('sec',start,datetime()) ge 20 then leave;
	end;
run;

The if-statement here is added solely for the purpose of specifying a wait time (e.g. 20) sufficient for persuading you in the loop’s infiniteness. Skeptics may increase this number to their comfort level or even remove (or comment out) the if-statement and enjoy the unconstrained eternity.

2.     Expression with two “=” signs in it (e.g. e = a**n + b**n = c**n;) Again, this is a perfectly valid expression in SAS and serves the purpose of assigning a variable the value of 0 or 1 resulting from a logical comparison operation. This expression can be rewritten as

e = a**n + b**n eq c**n;

or even more explicitly as

e = (a**n + b**n eq c**n);

As long as the code runs, the theorem is considered proven. If it stops, then the theorem is false.

You can try running this code on your hardware, at your own risk, of course.

We have a dedicated 128-processor UNIX server powered by an on-campus solar farm that has been autonomously running the above code for 40 years now, and there was not a single instance when it stopped running. Except pausing for the scheduled maintenance and equipment replacements.

During the course of this historic experience, we have accumulated an unprecedented amount of big data (all in-memory), converted it into event stream processing, and become a leader in data mining and business analytics.

This leads us to the following scientific conclusion: whether you are a pure mathematician or an empiricist, you can rest assured that Fermat's Last Theorem has been proven with a probability asymptotic to 1 beyond a reasonable doubt.

Have a happy 91-st day of the year 2017!

 

SAS code to prove Fermat's Last Theorem was published on SAS Users.

4月 012017
 

There's an old song that starts out, "You Can Get Anything You Want at Alice's Restaurant."  Well, maybe you are too young to know that song, but if you’re a SAS users, you’ll be glad to know that you can capture anything produced by any SAS procedure (even if the [...]

The post Capturing output from any procedure with an ODS OUTPUT statement appeared first on SAS Learning Post.

4月 012017
 

March Madness is in full swing. And the success of the Dance Card formula powered by SAS -- along with stories about teams like the New York Mets, the Boston Bruins, the Orlando Magic and more, all using analytics -- demonstrates how sports and analytics are becoming more and more [...]

How to make sense of the Madness in March was published on SAS Voices by Hwa Truong

3月 312017
 

If you're into data visualization, here's something that might interest you - a free eBook showing several ways to use SAS to visually analyze your data. (Did I mention it's FREE?!?!) We've picked juicy chapters from several books and upcoming books (and a few other sources), to show you what [...]

The post How about a free eBook on data visualization using SAS! appeared first on SAS Learning Post.

3月 312017
 

Until the robotic overlords take over, you need people — not just technology and data — to drive growth and innovation in your analytics programs. But how can you plan for the talent you need today and the talent you'll need in the future as your goals and your use [...]

5 mistakes to avoid when devising (or revising) a talent management strategy was published on SAS Voices by Analise Polsky

3月 312017
 

The U.S. Marshals Service is the federal agency known for bringing wanted fugitives to justice. Often, the Marshals Service gets attention for these arrests, but once the publicity has died down they face a basic challenge --- where to put the individuals in their custody. The agency uses data to [...]

U.S. Marshals Service use analytics to save more than $200 million was published on SAS Voices by Steve Bennett