5月 192010

1. Data import: input/format/label/keep+array
2. GTL can definitely be reused;

data top12;
input rank company & $20. country & $15. revenue comma12.2 netincome comma12.2 employee;
format revenue dollar12.2 netincome dollar12.2 personearn dollar12.;
label personearn= "Net income per employee" revenue="Revenue($M)" rank="Global rank" netincome="Net income($M)" employee="Employee number" ;
keep rank company personearn statname stat;
array sname{4} rank revenue netincome employee;
do i=1 to dim(sname);
if i>1 then personearn=.;
1 Johnson & Johnson United States 63,747.00 12,949.00 118700
2 Pfizer United States 48,296.00 8,104.00 81800
3 GlaxoSmithKline United Kingdom 44,654.00 8,438.60 99003
4 Roche Switzerland 44,267.50 8,288.10 80080
5 Sanofi-Aventis France 42,179.00 5,636.70 98213
6 Novartis Switzerland 41,459.00 8,195.00 96717
7 AstraZeneca United Kingdom 31,601.00 6,101.00 65000
8 Abbott Laboratories United States 29,527.60 4,880.70 68838
9 Merck United States 23,850.30 7,808.40 55200
10 Wyeth United States 22,833.90 4,417.80 47426
11 Bristol-Myers Squibb United States 21,366.00 5,247.00 35000
12 Eli Lilly United States 20,378.00 -2,071.90 40500
proc template;
define statgraph bartable;
begingraph/designwidth=600px designheight=400px;
entrytitle "Top 12 pharms as of July 2009";
entryfootnote halign=right "Sources: http://en.wikipedia.org/wiki/List_of_pharmaceutical_companies" ;
layout lattice/rows=2 rowgutter=0 rowweights=(.75 .25);
layout overlay / xaxisopts=(display=(tickvalues))
yaxisopts=(griddisplay=on linearopts=(tickvalueformat=(extractscale=true)) );
barchart x=company y=personearn / barlabel=true barlabelformat=dollar12. fillattrs=graphdata1
skin=satin outlineattrs=(color=black);
layout overlay / xaxisopts=(type=discrete display=none) walldisplay=(fill);
blockplot x=company block=stat / class=statName display=(outline values label) valuehalign=right repeatedvalues=true labelattrs=(size=7pt);

proc template;
define Style bartablestyle;
parent = styles.blockprint;
style GraphFonts from GraphFonts
"Fonts used in graph styles" /
'GraphTitleFont' = (", ",10pt,bold)
'GraphLabelFont' = (", ",8pt)
'GraphValueFont' = (", ",7pt)
'GraphDataFont' = (", ",7pt);

options nodate nonumber;

ods listing close;
ods html file="bartable.html" style=bartablestyle image_dpi=300;

ods graphics /reset imagename='BarTable' imagefmt=gif;
proc sgrender data=top12 template=bartable;

ods html close;
ods listing;

5月 192010

Here is how to direct the SAS LOG file and or SAS Output  to a seperate file. Approach 1: Using Display Manager Statements; filename log 'C:\temp\logfile.log'; filename out 'C:\temp\output.lst'; *Select only male students and age less than 16; proc sql; create table males as select age, height, weight from sashelp.class where sex='M' and age lt 16  order by age; quit; *Get the descriptive statistics for height variable by age; proc means data=males ; by age; var height; output out=htstats mean=mean n=n std=sd median=med min=min max=max; run; DM 'OUT;FILE OUT REP;'; DM 'LOG;FILE LOG REP;'; Information about Display Manager Commands: DEXPORT and DIMPORT: DISPLAY MANAGER commands used to IMPORT and EXPORT the Tab delimited (Excel and .CSV) files; SAS Display Manager Commands Approach 2: Using Proc PRINTTO procedure; Refer:  How to save the log file or what is PROC PRINTTO procedure  ('’)

[[ This is a content summary only. Visit my website for full links, other content, and more! ]]
 Posted by at 12:57 上午
5月 192010

Here is how to direct the SAS LOG file and or SAS Output  to a seperate file. Approach 1: Using Display Manager Statements; filename log 'C:\temp\logfile.log'; filename out 'C:\temp\output.lst'; *Select only male students and age less than 16; proc sql; create table males as select age, height, weight from sashelp.class where sex='M' and age lt 16  order by age; quit; *Get the descriptive statistics for height variable by age; proc means data=males ; by age; var height; output out=htstats mean=mean n=n std=sd median=med min=min max=max; run; DM 'OUT;FILE OUT REP;'; DM 'LOG;FILE LOG REP;'; Information about Display Manager Commands: DEXPORT and DIMPORT: DISPLAY MANAGER commands used to IMPORT and EXPORT the Tab delimited (Excel and .CSV) files; SAS Display Manager Commands Approach 2: Using Proc PRINTTO procedure; Refer:  How to save the log file or what is PROC PRINTTO procedure  ('’)

[[ This is a content summary only. Visit my website for full links, other content, and more! ]]
 Posted by at 12:57 上午
5月 192010

data two;
infile 'https://www.cms.gov/HCPCSReleaseCodeSets/Downloads/INDEX2009.pdf' truncover;
input @1 code $5.
@7 description $200.;
if code='Page ' then delete;
if code=' ' then delete;

proc sort data=two; by code;run;
proc transpose data=two out=four;
by code;
var description;

data five (keep=description code);
set four;
sp=' ';
description=trim(left(col1)) || sp || trim(left(col2)) || sp || trim(left(col3))|| sp || trim(left(col4))|| sp || trim(left(col5)) ;

5月 172010
Larry LaRusso is the Editor of SAS Training Report. His editorial letter from the May issue was so good, I thought I would share it with you. Enjoy!

If you had to choose the Simon Cowell of the mid-1900s, you'd be hard-pressed to pick someone more qualified than James Rae Denny. From 1946 until 1957, Denny was the primary talent evaluator and general manager for the Grand Ole Opry, turning the popular music house from a "dance barn" into a showcase for future musical superstars. Shortly after his death in 1963, Denny was inducted into the Country Music Hall of Fame for his uncanny ability to "discover legends" and his many contributions to the music industry.

You have to imagine, then, that Denny was pretty confident when he shared this brutally honest assessment with a teenager who had just come off the Opry stage to polite applause on Oct. 2, 1954: "You ain't going nowhere, son," Denny told him. "You ought to go back to driving a truck."

Fortunately for the music world, Denny's advice didn't discourage that performer. Vowing never to set foot on the Opry stage again, a young Elvis Presley did return to his home in Memphis - not to resume his job delivering office supplies, but to launch one of the most successful music careers in modern history.

So, how did Denny miss Elvis's rising star? Most seasoned forecasters would argue it was simply proof of a saying often attributed to baseball-player-turned-"philosopher" Yogi Berra: "It's tough to make predictions, especially about the future."

Tough? Yes. But not impossible. To help organizations improve their forecasting endeavors, SAS is hosting the fifth annual business forecasting conference, F2010. This year's gathering, to be held June 7-8 at SAS world headquarters in Cary, NC, will shine a light on getting the right information, and using the best methods, to streamline your forecasts. The conference will feature four keynote addresses and a dozen session talks covering practical applications, methods, tools and successful case studies in business forecasting. Pre (June 6) and post-conference (June 9-11) training extend the opportunities for learning.

If you're doing forecasting of any kind for your organization, you won't want to miss this excellent learning opportunity.

I hope you enjoy this month’s issue of SAS Training Report. And, as the King liked to say, for reading, I…thank you, thank you very much!

Larry LaRusso
Editor, SAS Training Report
5月 152010
So here we are trying to put our best feet forward with this group blog and be serious about the topics on the minds of marketers today. As part of that, we thought we should all have pictures for our profile that are somewhat uniform and vaguely professional. We scheduled with one of our photographers and violá! got new profile pictures for Deb, Justin and myself while we were together at SAS Global Forum in Seattle. Our other contributors already had suitable photos, so the three if us needed to get cracking.

And it’s not clear whether it was the thousands of terrific SAS Users and Executives in one place at one time, the cool Twitter wall, the great coffee, the views of Puget Sound, the lush greenery, or just the general je-ne-sais-coolness of Seattle, or what. But it brought out lots of positive emotions, including our silly side – which were caught in a few candid shots. So we thought we’d share them for your enjoyment.

We got a good chuckle imagining using these as our profile pictures, or occasionally swapping them out. Along the way, we have gotten some feedback that we should not be afraid to show our fun side. So take a look and let us know what you think – should we be all serious all the time or should we mix it up?

Enterprise Search Summit

 enterprise search  Enterprise Search Summit已关闭评论
5月 152010
My colleagues, Kathy Lange and Cailyn Clark were both at Enterprise Search Summit earlier this week in New York City to attend the conference as well as host our customer, The Tribune Company. The Summit is a highly intense, in-depth, two-day conference that covers how to develop, implement and enhance cutting-edge internal search capabilities, so Kathy and Cailyn were also there to support SAS' recent news on Text Analytics.

This year the emphasis for Enterprise Search Summit 2010 is how enterprise search enables Information Access. Search can no longer be viewed as a stand-alone application. It's so timely because Kathy & Cailyn sponsored the SAS booth at the conference to highlight and share SAS Enterprise Content Categorization, which applies natural language processing and advanced linguistic techniques to automatically categorize multilingual content. It parses and analyzes content for entities, facts and events to create metadata, develop taxonomies, and generate category rules and concept definitions to apply to large volumes of documents to trigger business processes.

We were fortunate to have Keith DeWeese, Taxonomy and Automated Indexing Project Manager at Chicago Tribune join us as well. Keith manages the Tribune Company's controlled vocabularies used for automated news categorization and indexing. Prior to working at the Tribune, he was employed by Dow-Jones, the Federal Reserve, Encyclopedia Britannica, and Columbia College, Chicago. At the conference he spoke on ontology and "controlled vocabularies." Keith also shared some headlines, proving you need to "know" your content. For example, "Obama Earmarks Funding for Pet Causes" is not an article about pets! If you are interested in learning more about the conference you can following the Tweets from the Summit #ESS10. You can also review some of the presentations from the Summit at The Noisy Channel .
5月 152010
Raise your hand if you have an iPad or iPad envy? All of you with your hands up know what I mean when I say "but hey, we want to develop using the cool new stuff."

We try very hard to use standards for the SAS Customer Support site that will make the site experience the best possible for all of our users. Those standards include, but aren't limited to, the browser functionality that we support. Browser support is getting harder and harder to manage. Microsoft is releasing new browsers at a faster rate than usual. Google is using their vast user-base to increase the use of Chrome. Firefox and Mozilla have always been favorites of the techie crowd. And then we have Safari, a favorite of the Apple loyalists.

Like you and me, our developers and designers have their favorite browsers. We all want to leverage the cool features that are available in the most recent browser release. But it isn't all about the shiny new features; we also keep an eye on the W3C standards and recommendations as well as 508 compliance and accessibility guidelines. These recommendations and guidelines don't always keep up with browser releases, but they do seem to outpace the public adoption of browsers.

How do we balance new features, new standards and browser incompatibilities with user adoption? As we look to update support.sas.com and the features that it offers, we are continually asking ourselves this question.

One thing that we do, is watch the browser usage numbers from our Web logs. According to our logs, Internet Explorer is still the browser most used to visit support.sas.com. What is surprising about that is that of the three versions (IE 6, IE 7, and IE 8), IE 6 is still the most used, but IE 7 and IE 8 are becoming more popular. (See the numbers at the end of this post.)

As you know, numbers and data can tell us a lot. They can't tell us everything. What these numbers don't tell us is:

  • Do you control the browser you use or it is controlled by your IT department?
  • If you control the browser, do you keep a browser because you don't think to update it or because you choose to stay with the one you have?
  • Would you suffer through spacing and display issues on support.sas.com to keep your older browser or would you be encouraged to upgrade?
  • Are you annoyed when a Website displays a message indicating that it has detected your use of an older browser and informs you that the page you are viewing would perform better if you upgraded your browser?

Below are a few browser usage numbers and trends if you want to know more about what I see when I watch browser usage.

As we head into our next design phase, I'd love to hear what you think about browser upgrades and support.
5月 142010
Today I am happy to present a great guest blog post from my friend, Michele Reister. Michele is a field Marketing Specialist in SAS' Education and Training group.

Last week Michele attended the Event Marketing Summit in Chicago. And by 'attended', I mean attended! She sat through one workshop, one social media masterclass, five keynotes and nine session presentations. According to Michele, "Many were good, some were great." And, although the speakers were from different industries (some B2B, some B2C), there were a few common themes throughout the three days. Here’s her recap of what she learned.

1. Be Authentic.
Almost every session I attended spoke about the need for companies to “be authentic.” As an example of this principle, Paul Kalbfleisch, VP-Brand Marketing, Research in Motion (makers of Blackberry), spoke about Blackberry’s sponsorship of the Black Eyed Peas concert tour. Yes, they were trying to reach that demographic of music fans, but it was more than that. Through their relationship with Will.i.am, they created an authentic experience where Blackberry BBM was incorporated into the concert. This wasn’t the traditional signage and demo stations sponsorship; Blackberry was part of the concert experience. Will.i.am freestyled lyrics made up of BBMs from concert attendees. It felt real. Will.i.am was into it. The concert attendees loved it. It was authentic.

Along this same line of authenticity, it’s important to remember to keep your marketing purposeful. Don’t use technology just for the sake of using technology. For example, the Event Marketing Summit used the BeLinker technology for attendees to send and receive information from other attendees, vendors and speakers. Despite their efforts to incorporate technology into networking, I still saw plenty of people trading old-fashioned, paper business cards. BeLinker sounded okay in theory, but in my opinion, it just added a layer of complexity to the natural networking that happens at conferences.

2. Don’t Bother Trying to Keep Up with the Jones’.
I was happy to hear Erik Qualman, author of Socialnomics say “there’s too much happening in the world of digital marketing. You can’t keep up with it, so focus on what you CAN do and do it well.” We’re all in the same boat – limited staff, smaller budget, never enough time. If maintaining a Twitter account is all you have time for, then just stop there. But, make it a really good Twitter account. Share lots of helpful content. Interact with your customers. Follow the right people and really listen.

3. Stop trying to attribute sales to a single marketing tactic, it’s about the mix.
Measurement and ROI are always hot topics, but especially in social media these days. A lot of management teams are hesitant to dedicate resources to social media when we can’t prove that Facebook fans or Twitter followers equals more sales. The bottom line, though, is that your brand is being talked about whether you like it or not. Decide if you want to be a part of the conversation and what role you will play. Maybe you can’t specifically prove that your LinkedIn group is driving revenue, but you can monitor trends in your metrics such as Web traffic, sales and new customers and compare the difference to your baseline.

4. Marketing is less about pushing messages and more about co-creating everyday life experiences with them.
Don’t just tell your customer about your product; invite them to experience it. Not just on a tradeshow floor, but at your facility. Not just through your sales people, but with your R&D folks. Alicia Dietsch, VP-Marketing at AT&T described how they invite customers to attend their quarterly emergency drills. AT&T performs these drills regardless; why not invite some customers to take part in the action?

Another example I liked comes from Cisco, who changed their conference session format to game shows and talk shows to engage their audience after receiving feedback that the sessions were too static (read: boring).

Gone are the days of corporate speak. Business today is about people talking to people. This is difficult for some old-school marketers to embrace, but right now it’s all about merging the personal with the professional. Just try it. I promise, it’s easier and more fun than writing another boring product fact sheet.

5. Fish where the fish are.
Stop trying to send everyone back to your web site. Meet them where they are. If your users are on Facebook, create a fantastic Facebook fan page where customers can download your content, enter your contests and purchase your product without ever having to leave Facebook. Dan Hanover, Editor & Publisher of Event Marketer Magazine pointed out that “marketers are in the conversion business. We need to add more transactional elements to all of our marketing activities.”

6. And, I’ll throw in one extra:
My sixth take away from this conference is, if you want attendees to visit your booth, give away an iPad!

Part 3: Understanding Topic Discovery from an “historical” perspective

 Jim Cox  Part 3: Understanding Topic Discovery from an “historical” perspective已关闭评论
5月 132010
I mentioned last time that the technique we use to determine topics is a variant of something that has been around for fifty years. In this part I will talk about the intriguing history of this technique, and in the process, I hope to illuminate what we are doing and why.

We start in the mid-1800s with a half-cousin of Charles Darwin: Sir Frances Galton. Stimulated by his cousin’s work, Galton investigates how traits are passed down from generation to generation. He was the first person to attempt to apply the scientific method to psychological phenomena. In the process of his investigations, he created the concept of “correlation.”

Galton’s protégé and statistical “heir” was Karl Pearson. Pearson was the first person to establish “mathematical statistics.” Before him, statistics was primarily concerned with creating tabulations and summarizing counts. He established a good many of the basic statistical techniques still in use today, including the correlation coefficient (r), Chi-square tests, the notion of p values, the use of the normal distribution for modeling data, the method of moments, and together with Galton, linear regression.

Galton and Pearson introduced the use of questionnaires for gathering information on human capabilities, particularly mental capabilities. They were both active in the Eugenics movement… they believed that if one could identify the most intelligent people, those people could be encouraged to reproduce and those less intelligent discouraged (their work was unfortunately seized upon as a raison d'entendre by the founders of the Nazi party in Germany, but I digress…).

Pearson struggled with the notion of how to use multiple items on a questionnaire to get at some general underlying factor of intelligence. He discovered that if you created a matrix of correlations between all the items on a questionnaire containing mental tasks that you could then “project” the results down to a single line, which would represent a significant amount of the total variance among all the answers in the questionnaire: the position that a given questionnaire is projected to on that line represents the intelligence of the person filling out that questionnaire. Pearson further determined that one could “project” any set of items down to any lower dimensionality in a way that “factored” the matrix. One real-life example of subspace “projecting” is how a camera projects the representation of our three-dimensional world into a two-dimensional photograph. Information is lost in that projection, but hopefully the most critical information is retained. He called the projection technique he developed “Principal Components Analysis” which he published in “Philosophical Magazine” in 1901 (this paper is online at http://stat.smmu.edu.cn/history/pearson1901.pdf ).

It did not take long for this concept to catch on. Charles Spearman refined this technique into what he called “factor analysis” in 1904, and the first widely-used IQ test, the Stanford-Binet, appeared in 1908. This test has been updated over the years and is still in common use today. Over time, however, the notion of one, unitary concept of intelligence fell out of favor --- perhaps there are multiple intelligences. For example, a person might have good mathematical skills but be poor verbally, or vice versa. But how do you get to the notion of what these separate intelligences are?

Let’s go back to the photograph example. Nothing is retained in the photograph about the orientation of the camera when the picture is taken. If the camera is tilted in some direction, then the photograph will not necessarily represent the same view that a person standing up would have of the same scene. But if I or somebody else views that photograph, since I have a general up-down left-right orientation in the world, I am usually able to rotate the photograph to restore that familiar orientation. We can say that these “up-down” and “left-right” dimensions are “latent” or implied dimensions or factors in the photograph.

Similarly, when trying to identify multiple intelligences from a principal components projection of say, a set of responses to an IQ test, we are once again trying to extract the “latent” dimensions ---- those representing the different kinds of intelligence, which may not be known directly. The natural solution is to rotate the projection, just as we would rotate the photograph. Over the years, a variety of these “rotations” were developed to try to understand what these separate intelligences may be. If we assume that each separate question on our test primarily measures only one of these “types” of intelligence, then it turns out that the optimal rotation is known as the “Varimax” rotation and was first proposed by Henry Kaiser in 1958. In the case of the IQ test, the Varimax rotation would rotate the axes to line up as closely as possible with the individual test questions.

Our technique for creating topics from text is very similar to what is described above for discovering types of intelligence using an IQ test --- We are projecting the raw matrix that includes all the terms in the collection down to a much smaller dimensionality using a Singular Value Decomposition, which is equivalent to doing a Principal Components Analysis of the co-variances of the individual terms, and then doing a Varimax rotation to determine the “latent” topics represented in the documents. The topics then are represented by the axes of our rotated space since they are lined up as closely as possible with specific terms in a document collection. Presto!

Stay tuned in part four, where I discuss how we tested this approach against all these modern, resource-hogging techniques that also attempt to determine “latent” topics, and why I think this antiquated technique put them to shame.