Tech

1月 122017
 

analytics resolutionsThe holiday season is over – and you survived. You’ve made a lot of personal resolutions for 2017 - go to the gym, eat less sugar, save more money, visit Grandma more often. These are all great personal resolutions for 2017, but what about your analytics resolutions? If you are having trouble with your analytics resolutions then let us help you out. The recent release of SAS 9.4 M4 will help you make 2017 your best analytics year yet.

Resolution 1: Build more accurate models faster!

Now you will be able to leverage the power of the two most advanced analytics platforms on the market, SAS 9 and SAS Viya from one interface. Using SAS/Connect, users can call powerful SAS Viya analytics from within a process flow in Enterprise Miner. Would you prefer to use the super-fast, autotuned gradient boosting in SAS Viya? No problem! Call SAS Viya analytics directly from Enterprise Miner using the SAS Viya Code node. Then, from the same process flow you can also call open source models, all from one interface, SAS Enterprise Miner. Do you prefer to use SAS Studio on SAS 9? You will also be able to call SAS Viya analytics from SAS Studio as well. With SAS 9 M4, SAS gives you the ability to use both of SAS’ powerful platforms from one interface.

Resolution 2: Score your unstructured models in Hadoop without moving your data!

Got Hadoop? Got a lot of unstructured data? Now SAS Contextual Analysis allows you to score models in Hadoop using the SAS Code Accelerator add-on. Identify new insights with your unstructured text without ever having to move your data. Score it all in Hadoop. Uncover new trends and topics buried in documents, emails, social media and other unstructured text that is stored in Hadoop. You will be able to do it faster because you won’t have to move that data outside of Hadoop. SAS just keeps getting better in 2017.

Resolution 3: Make better forecasts using the weather!

Through SAS/ETS, econometricians and others wanting to incorporate weather data into their models can now do so directly through two new interface engines. SASERAIN enables SAS users to retrieve weather data from the World Weather Online website. And SASENOAA provides access to severe weather data from the National Oceanic and Atmospheric Administration (NOAA) Severe Weather Data Inventory (SWDI) web service. So now you’ll know why there was that big sales spike for rock salt and snow shovels in July! Who says there is no climate change in 2017?

Resolution 4: Estimate causal effects more efficiently!

The new CAUSALTRT procedure in SAS/STAT estimates the average causal effect of a binary treatment variable T on a continuous or discrete outcome Y. Depending on the application, the variable T can represent an intervention (such as smoking cessation – which is a great 2017 resolution - versus control), an exposure to a condition (such as attending private versus public schools), or an existing characteristic of subjects (such as high versus low socioeconomic status). The CAUSALTRT procedure estimates two types of causal effects: the average treatment effect and the average treatment effect for the treated. And best of all, the causal inference methods that the CAUSALTRT procedure implements are designed primarily for use with data from nonrandomized trials or observational studies, where you observe T and Y without assigning subjects randomly to the treatment conditions.

Resolution 5: Design better factory floors!

A factory floor can be a complicated place, with raw materials coming in one side, and finished products going out the other. Options are virtually unlimited for the placement of materials and equipment – and a poorly designed layout can dramatically reduce production capability. Yet experimenting with different layouts would be extremely costly and time consuming. Thankfully, SAS Simulation Studio (a component of SAS/OR) provides a rich – and animated – environment for testing alternatives and coming up with the most appropriate design. And it can handle any kind of discrete-event simulation, integrating with JMP for experimental design and input analysis, and with JMP and SAS for source data and analysis of simulation results. How will your factory floor simulation impact your productivity in 2017?

tags: analytics, SAS 9.4, SAS Viya

Five great analytics resolutions for 2017 was published on SAS Users.

1月 062017
 

Regardless of how long they’ve used the software, there’s no better event for SAS professionals then SAS Global Forum. The event will attract thousands of users from across the globe and is an excellent place to network with and learn from users of all skill levels. To help those relatively new users of SAS experience the conference for the first time, the conference offers the Junior Professional Award program.

The program is designed exclusively for full-time SAS professionals who have used SAS on the job for three years or less, have never attended SAS Global Forum, and whose circumstances would otherwise keep them from attending. But, don’t let the word “junior” confuse you. All “new” SAS professionals regardless of age are eligible.

The Junior Professional award provides user with a waived conference registration fee, including conference meals, a free pre-conference tutorial, and great opportunities to learn from and network in a large community of SAS users. The program does not cover other costs associated with attending the event (travel and lodging are not included for example).

To apply, users need to submit fill out the online application form. Award applications must be received by January 16, 2017. Questions can be directed to the Junior Professional Program Coordinator, whose contact information can be found on the website.

To learn more about the award and its benefits, I recently sat down with one of the 2015 winners, Shavonne Standifer.


junior-professional-program

Shavonne Standifer, 2015 SAS Global Forum Junior Professional Award winner

Larry LaRusso: Hello Shavonne. First of all, let me congratulate you on winning a past award. That’s a great accomplishment, for sure. So tell me, how did you first learn about the program?
Shavonne Standifer: Interestingly, I wasn’t looking specifically for the award and didn’t even really know it existed. I was searching for a SAS proceeding paper and somehow stumbled across the application. I just applied, and got it!

LL:  That’s awesome. What made you want to attend SAS Global Forum?
SS: I knew a little bit about the event and really wanted to attend so that I could take advantage of the hands-on learning opportunities. I also thought it would be super cool if I could attend the lectures of my favorite SAS authors, and I knew many of them planned to present.

LL: What were your first impressions of the event?
SS: I was amazed by how many people were there. I was also amazed by how nice and helpful everyone was. I met so many new friends.

LL: What was the best part of your Global Forum experience?
SS: The best part of my experience by far was when I met John Amrhein. We met during a networking event in the Quad. After subjecting him to a 2-minute rant about how much I loved SAS software, and all of the reasons why, he finally had a minute to introduce himself and mentioned that he was the 2017 global forum conference chair. I was completely shocked! To my complete surprise, he encouraged me to be a part of his team, to which I later applied and was accepted.

LL: What are doing now? Are you using SAS?
SS: I currently use SAS software to provide data and statistical analysis that support the strategic business objectives of my organization. I am also a member of the conference planning team where I assist with the selection and delivery of Global Forum papers and volunteer coordination. Having the opportunity to be a part of this team has helped to increase my knowledge of SAS technologies and business trends. It’s been an incredible experience.

LL: How were you able to apply the knowledge you gained from the experience to what you’re doing now?
SS: Most definitely. I’ve used the learning from a tutorial Art Carpenter presented on Innovative SAS Techniques to help me utilize SAS more efficiently for data cleaning, scrubbing, and reshaping big datasets. The knowledge I gained has really helped improve project turnaround and provide more meaningful insights.

LL: Are you planning to attend SAS Global Forum again?
SS: Absolutely! In fact, I have returned every year since winning that award and plan to for many years to come. It’s just a great place to learn from and network with fellow SAS users.

LL: Any other comments you’d like to share about the award?
SS: I would encourage anyone who is eligible to consider applying for the award. I remember sitting in front of my laptop, hopeful, but thinking that I had a 1 in a million chance of being selected for the award. I decided to give it a try and it has changed my life! So much awesomeness has occurred in both my professional and personal life as a direct result of receiving the award. Professionally, the advice and mentorship from expert SAS users has helped me mature my SAS programming talents. Personally, the fellow JPP awardees that I’ve met along the way has provided an extended community of users whom I can call or email to ask advice. We keep in contact and support one another as needed, these relationships are invaluable. If you are eligible, Apply! It’s a great opportunity!

LL: Thanks Shavonne. Sounds like it was an awesome experience and I really enjoyed our time together.

tags: Junior Professional Program, SAS Global Forum

Junior Professional Program helps new users attend SAS Global Forum 2017 was published on SAS Users.

1月 042017
 

In my last blog, I showed you how to generate a word cloud of pdf collections. Word clouds show you which terms are mentioned by your documents and the frequency with which they occur in the documents. However, word clouds cannot lay out words from a semantic or linguistic perspective. In this blog I’d like to show you how we can overcome this constraint with new methods.

Word embedding has been widely used in Natural Language Processing, where words or phrases from the vocabulary are mapped to vectors of real numbers. There are several open source tools that can be used to build word embedding models. Two of the most popular tools are word2vec and Glove, and in my experiment I used Glove. GloVe is an unsupervised learning algorithm for obtaining vector representations for words. Training is performed on aggregated global word-word co-occurrence statistics from a corpus, and the resulting representations showcase interesting linear substructures of the word vector space.

Suppose you have obtained the term frequencies from documents with SAS Text Miner and downloaded the word embedding model from http://nlp.stanford.edu/projects/glove/. Next you can extract vectors of terms using PROC SQL.

libname outlib 'D:temp';
* Rank terms according to frequencies;
proc sort data=outlib.abstract_stem_freq;
   by descending freq;
run;quit;

data ranking;
   set outlib.abstract_stem_freq;
   ranking=_n_;
run;

data glove;
   infile "d:tempglove_100d_tab.txt" dlm="09"x firstobs=2;
   input term :$100. vector1-vector100;
run;

proc sql;
   create table outlib.abstract_stem_vector as
   select glove.*, ranking, freq  
   from glove, ranking
   where glove.term = ranking.word;
quit;

Now you have term vectors, and there are two ways to project 100-dimension vector data in two dimensions. One is SAS PROC MDS (multiple dimensional scaling) and the other is T-SNE (t-Distributed Stochastic Neighbor Embedding). T-SNE is a machine learning algorithm for dimensionality reduction developed by Geoffrey Hinton and Laurens van der Maaten. It is a nonlinear dimensionality reduction technique that is particularly well-suited for embedding high-dimensional data into a space of two or three dimensions, which can then be visualized in a scatter plot.

Let’s try SAS PROC MDS first.  Before running PROC MDS, you need to run PROC DISTANCE to calculate distance or similarity of each pair of words. According to Glove website, the Euclidean distance (or cosine similarity) between two word vectors provides an effective method for measuring the linguistic or semantic similarity of the corresponding words. Sometimes, the nearest neighbors according to this metric reveal rare but relevant words that lie outside an average human's vocabulary. I used Euclidean distance in my experiment and showed the top 10 words on the scatter plot as seen in Figure-1.

word scatter plot with SAS

proc distance data=outlib.abstract_stem_vector method=EUCLID out=distances;
   var interval(vector1-vector100);
   id term;
run;quit;

ods graphics off;
proc mds data=distances level=absolute out=outdim;
   id term;
run;quit;

data top10;
   set outlib.abstract_stem_vector;
   if ranking le 10 then label=term;
   else label='';
   drop ranking freq;
run;

proc sql;
   create table mds_plot as
   select outdim.*, label
   from outdim
   left join top10
   on outdim.term = top10.term;
quit;

ods graphics on;
proc sgplot data = mds_plot;
   scatter x=dim1 y= dim2 
       / datalabel = label
         markerattrs=(symbol=plus size=5)
         datalabelattrs = (family="Arial" size=10pt color=red);
run;quit;

SAS does not have t-sne implementation, so I used PROC IML to call the RTSNE library. There are two libraries in R that can be used for t-sne plot: TSNE and RTSNE. RTSNE was acclaimed faster than TSNE. Just as I did with the SAS MDS plot, I showed the top 10 words only, but their font sizes are varied according to their frequencies in documents.
word-scatter-plot-with-sas02
data top10;
   length label $ 20;
   length color 3;
   length font_size 8;
   set outlib.abstract_stem_vector;
   if ranking le 10 then label=term;
   else label='+';
   if ranking le 10 then color=1;
   else color=2;
   font_size = max(freq/25, 0.5);
   drop ranking freq;
run;

proc iml;
   call ExportDataSetToR("top10", "vectors");

   submit /R;
      library(Rtsne)

      set.seed(1) # for reproducibility
      vector_matirx

If you compare the two figures, you may feel that MDS plot is more symmetric than T-SNE plot, but the T-SNE plot seems more reasonable from a linguistic/semantic perspective. In Colah’s blog, he did an exploration and comparison of various dimensionality reduction techniques with well-known computer vision dataset MNIST. I totally agree with his opinions below.

It’s easy to slip into a mindset of thinking one of these techniques is better than others, but I think they’re all complementary. There’s no way to map high-dimensional data into low dimensions and preserve all the structure. So, an approach must make trade-offs, sacrificing one property to preserve another. PCA tries to preserve linear structure, MDS tries to preserve global geometry, and t-SNE tries to preserve topology (neighborhood structure).

To learn more I would encourage you to read the following articles.

http://colah.github.io/posts/2014-10-Visualizing-MNIST/
http://colah.github.io/posts/2015-01-Visualizing-Representations
https://www.codeproject.com/Tips/788739/Visualization-of-High-Dimensional-Data-using-t-SNE

Word scatter plot with SAS was published on SAS Users.

1月 042017
 

In my last blog, I showed you how to generate a word cloud of pdf collections. Word clouds show you which terms are mentioned by your documents and the frequency with which they occur in the documents. However, word clouds cannot lay out words from a semantic or linguistic perspective. In this blog I’d like to show you how we can overcome this constraint with new methods.

Word embedding has been widely used in Natural Language Processing, where words or phrases from the vocabulary are mapped to vectors of real numbers. There are several open source tools that can be used to build word embedding models. Two of the most popular tools are word2vec and Glove, and in my experiment I used Glove. GloVe is an unsupervised learning algorithm for obtaining vector representations for words. Training is performed on aggregated global word-word co-occurrence statistics from a corpus, and the resulting representations showcase interesting linear substructures of the word vector space.

Suppose you have obtained the term frequencies from documents with SAS Text Miner and downloaded the word embedding model from http://nlp.stanford.edu/projects/glove/. Next you can extract vectors of terms using PROC SQL.

libname outlib 'D:temp';
* Rank terms according to frequencies;
proc sort data=outlib.abstract_stem_freq;
   by descending freq;
run;quit;

data ranking;
   set outlib.abstract_stem_freq;
   ranking=_n_;
run;

data glove;
   infile "d:tempglove_100d_tab.txt" dlm="09"x firstobs=2;
   input term :$100. vector1-vector100;
run;

proc sql;
   create table outlib.abstract_stem_vector as
   select glove.*, ranking, freq  
   from glove, ranking
   where glove.term = ranking.word;
quit;

Now you have term vectors, and there are two ways to project 100-dimension vector data in two dimensions. One is SAS PROC MDS (multiple dimensional scaling) and the other is T-SNE (t-Distributed Stochastic Neighbor Embedding). T-SNE is a machine learning algorithm for dimensionality reduction developed by Geoffrey Hinton and Laurens van der Maaten. It is a nonlinear dimensionality reduction technique that is particularly well-suited for embedding high-dimensional data into a space of two or three dimensions, which can then be visualized in a scatter plot.

Let’s try SAS PROC MDS first.  Before running PROC MDS, you need to run PROC DISTANCE to calculate distance or similarity of each pair of words. According to Glove website, the Euclidean distance (or cosine similarity) between two word vectors provides an effective method for measuring the linguistic or semantic similarity of the corresponding words. Sometimes, the nearest neighbors according to this metric reveal rare but relevant words that lie outside an average human's vocabulary. I used Euclidean distance in my experiment and showed the top 10 words on the scatter plot as seen in Figure-1.

word scatter plot with SAS

proc distance data=outlib.abstract_stem_vector method=EUCLID out=distances;
   var interval(vector1-vector100);
   id term;
run;quit;

ods graphics off;
proc mds data=distances level=absolute out=outdim;
   id term;
run;quit;

data top10;
   set outlib.abstract_stem_vector;
   if ranking le 10 then label=term;
   else label='';
   drop ranking freq;
run;

proc sql;
   create table mds_plot as
   select outdim.*, label
   from outdim
   left join top10
   on outdim.term = top10.term;
quit;

ods graphics on;
proc sgplot data = mds_plot;
   scatter x=dim1 y= dim2 
       / datalabel = label
         markerattrs=(symbol=plus size=5)
         datalabelattrs = (family="Arial" size=10pt color=red);
run;quit;

SAS does not have t-sne implementation, so I used PROC IML to call the RTSNE library. There are two libraries in R that can be used for t-sne plot: TSNE and RTSNE. RTSNE was acclaimed faster than TSNE. Just as I did with the SAS MDS plot, I showed the top 10 words only, but their font sizes are varied according to their frequencies in documents.
word-scatter-plot-with-sas02
data top10;
   length label $ 20;
   length color 3;
   length font_size 8;
   set outlib.abstract_stem_vector;
   if ranking le 10 then label=term;
   else label='+';
   if ranking le 10 then color=1;
   else color=2;
   font_size = max(freq/25, 0.5);
   drop ranking freq;
run;

proc iml;
   call ExportDataSetToR("top10", "vectors");

   submit /R;
      library(Rtsne)

      set.seed(1) # for reproducibility
      vector_matirx

If you compare the two figures, you may feel that MDS plot is more symmetric than T-SNE plot, but the T-SNE plot seems more reasonable from a linguistic/semantic perspective. In Colah’s blog, he did an exploration and comparison of various dimensionality reduction techniques with well-known computer vision dataset MNIST. I totally agree with his opinions below.

It’s easy to slip into a mindset of thinking one of these techniques is better than others, but I think they’re all complementary. There’s no way to map high-dimensional data into low dimensions and preserve all the structure. So, an approach must make trade-offs, sacrificing one property to preserve another. PCA tries to preserve linear structure, MDS tries to preserve global geometry, and t-SNE tries to preserve topology (neighborhood structure).

To learn more I would encourage you to read the following articles.

http://colah.github.io/posts/2014-10-Visualizing-MNIST/
http://colah.github.io/posts/2015-01-Visualizing-Representations
https://www.codeproject.com/Tips/788739/Visualization-of-High-Dimensional-Data-using-t-SNE

Word scatter plot with SAS was published on SAS Users.

12月 272016
 

We have seen in a previous post of this series how to configure SAS Studio to better manage user preferences in SAS Grid environments. There are additional settings that an administrator can leverage to properly configure a multi-user environment; as you may imagine, these options deserve special considerations when SAS Studio is deployed in SAS Grid environments.

SAS Studio R&D and product management often collect customer feedback and suggestions, especially during events such as SAS Global Forum. We received several requests for SAS Studio to provide administrators with the ability to globally set various options. The goal is to eliminate the need to have all users define them in their user preferences or elsewhere in the application. To support these requests, SAS Studio 3.5 introduced a new configuration option, webdms.globalSettings. This setting specifies the location of a directory containing XML files used to define these global options.

Tip #1

How can I manage this option?

The procedure is the same as we have already seen for the webdms.studioDataParentDirectory property. They are both specified in the config.properties file in the configuration directory for SAS Studio. Refer to the previous blog for additional details, including considerations for environments with clustered mid-tiers.

Tip #2

How do I configure this option?
By default, this option points to the directory path !SASROOT/GlobalStudioSettings. SASROOT translates to the directory where SAS Foundation binaries are installed, such as /opt/sas/sashome/SASFoundation/9.4 on Unix or C:/Program Files/SASHome/SASFoundation/9.4/ on Windows. It is possible to change the webdms.globalSettings property to point to any chosen directory.

SAS Studio 3.6 documentation provides an additional key detail : in a multi-machine environment, the GlobalStudioSettings directory must be on the machine that hosts the workspace servers used by SAS Studio. We know that, in grid environments, this means that this location should be on shared storage accessible by every node.

Tip #3

Configuring Global Folder Shortcuts

SAS Studio Tips for SAS Grid Manager Administrators

In SAS Studio, end users can create folder shortcuts from the Files and Folders section in the navigation pane. An administrator might want to create global shortcuts for all the users, so that each user does not have to create these shortcuts manually. This is achieved by creating a file called shortcuts.xml in the location specified by webdms.globalSettings, as detailed in

SAS Studio repositories are an easy way to share tasks and snippets between users. An administrator may want to configure one or multiple centralized repositories and make them available to everyone. SAS Studio users could add these repositories through their Preferences window, but it’s easier to create global repositories that are automatically available from the Tasks and Utilities and Snippets sections. Again, this is achieved by creating a file called repositories.xml in the location specified by webdms.globalSettings, as detailed in tags: SAS Administrators, SAS Grid Manager, SAS Professional Services, sas studio

More SAS Studio Tips for SAS Grid Manager Administrators: Global Settings was published on SAS Users.

12月 222016
 

melissa_marshallEditor's note: This following post is from Melissa Marshall, Principal at Melissa Marshall Consulting LLC. Melissa is a featured speaker at SAS Global Forum 2017, and on a mission to transform how scientists and technical professionals present their work.  

Learn more about Melissa.


Think back to the last technical talk you were an audience member for. What did you think about that talk? Was it engaging and interesting? Boring and overwhelming?  Perhaps it was a topic that was important to you, but it was presented in a way that made it difficult to engage with the content. As an expert in scientific presentations, I often observe a significant “disconnect” between the way a speaker crafts a presentation and the needs of the audience. It is my belief that the way to bridge this gap is for you, as a technical presenter, to become an audience centered speaker vs. a speaker centered speaker.

transform-your-technical-talks01

Here I will provide some quick tips on how to transform your content and slides using your new audience centered speaking approach!

Audience Centered vs. Speaker Centered

The default setting for most presenters is that they are speaker centered—meaning that they make choices in their presentation because it is what works primarily for themselves as a speaker. Examples include: spending a lot of time speaking about an area of the topic that gave you the most difficulty or that you spent the most amount of time working on or using terms that are familiar to you but are jargon for the audience, putting most of the words you want to say on your slides to remind you what to say during the talk so your slides are basically your speaker notes, and standing behind a podium and disconnecting yourself physically from your audience. These choices are common in presentations, but they do not set you up for success. It is a key reason why many presentations of technical information fail.

A critical insight is to realize that your success as a speaker depends entirely upon your ability to make your audience successful.  You don’t get to decide that you gave a great talk (even if no one understood it)!  That’s because presentations, by their very nature, are always made for an audience.  You need something from your audience—that is why you are giving a talk!  So, it is time to get serious about making your audience successful (so you can be too!).  I might define “audience success” as: your audience understands and views your subject in the way you wanted them to.  Strategically, if you desire to be a successful speaker, then the best thing you do is go “all in” on making your audience successful!

Audience Centered Content

To make your content more audience centered, you can ask yourself 4 critical questions ahead of time about your audience:

  • Who are they?
  • What do they know?
  • Why are they here?
  • What biases do they have?

transform-your-technical-talks02

The answers to these questions will guide how you begin to focus your content. Additionally, as a presenter of technical information, one of the most important questions you need to answer along the way, at many stages in your presentation, is “So what?”.  Too often presenters share complex technical information or findings, but they do not make the direct connection to the audience of how that information is relevant or important to the big picture or overall message.  Remind yourself each time you share a technical finding to also follow up that information with the answer to the question “So what?”.  This will make your content immediately more engaging and relevant to your audience.

transform-your-technical-talks03

Audience Centered Slide Design

Think about the last several presentations that you sat through as an audience member.  How would you describe the slides?  Text heavy? Cluttered? No clear message? Full of bulleted lists?  Audiences consistently complain of “Death by PowerPoint”, which refers to the endless march of speakers through text filled slide after text filled slide.  The reason this is so detrimental to audiences is that our brains have a limited “bandwidth” for verbal information.  When we reach that limit, it’s called cognitive overload and our brains stop processing the information as effectively and efficiently.  When you have a speaker talking (the speaker’s words are verbal information) and then you have slides to read with lots of words on them (also more verbal information), you are at a high risk of cognitive overload for the audience.  Therefore, many audiences “tune out” during presentations or report feeling exhausted after a day of listening to presentations.  This is a result of cognitive overload.  A more effective way to approach slides for your audience is to think about making your slides do something for you that your words cannot. You are giving a talk, so the words part is mostly covered by what you are saying…it is much more powerful to make your slides primarily visual so that they convey information in a more memorable, engaging, and understandable way. This is known in the field of cognitive research as the Picture Superiority Effect.  John Medina’s excellent book Brain Rules states that “Based on research into the Picture Superiority Effect, when we read text alone, we are likely to remember only 10 percent of the information 3 days later. If that information is presented to us as text combined with a relevant image, we are likely to remember 65 percent of the information 3 days later.” 

A great a slide design strategy that I advocate for is called the assertion-evidence design.  This slide design strategy is based in research (including Medina’s mentioned above) and works beautifully for presentations of technical information. The assertion-evidence slide design is characterized by a concise, complete sentence headline (no longer than 2 lines) that states the main assertion (i.e. what you want the audience to know) of the slide. The body of the slide then consists of visual evidence for that take away message (charts, graphs, images, equations, etc.). Here is an example of a traditional slide transformed to an assertion-evidence slide:

transform-your-technical-talks04

transform-your-technical-talks05

Having trouble banishing bullet lists? One of my favorite quick (and free!) tools for getting yourself past bulleted lists is Nancy Duarte’s Diagrammer tool.  I like this tool because it asks you what is the relationship between the information that you are trying to show and creates a graphic to show that relationship.  Remember: the best presentations use a variety of visual evidence!  Charts, graphs, pictures, videos, diagrams, etc.  Give your audience lots of visual ways to connect with your content!

Final Thoughts

Next time you present, I encourage you to let every decision you make along the way be guided first by the needs of your audience.  Remember, the success of your audience in understanding your work is how your success as a speaker is measured! For more tips on technical talks, check out my TED Talk entitled “Talk Nerdy To Me.” For questions, comments, or to book a technical presentations workshop at your company or institution, please contact me at melissa@presentyourscience.com.

About Melissa Marshall

melissa_marshallMelissa Marshall is on a mission: to transform how scientists and technical professionals present their work. That’s because she believes that even the best science is destined to remain undiscovered unless it’s presented in a clear and compelling way that sparks innovation and drives adoption.

For almost a decade, she’s traveled around the world to work with Fortune 100 corporations, institutions and universities, teaching the proven strategies she’s mastered through her consulting work and during her 10 years as a faculty member at Penn State University.

When you work with Melissa, you will get the practical skills and the natural confidence you need to immediately shift your “information dump”-style presentations into ones that are meaningful, engaging, and inspire people to take action. And the benefits go far beyond any single presentation; working with Melissa, your entire organization will develop a culture of successful communication, one that will help you launch products and ideas more effectively than ever before.

Melissa is also a dynamic speaker who has lectured at Harvard Medical School, the New York Academy of Sciences, and the Centers for Disease Control and Prevention (CDC). For a sneak peek, check out her TED talk, “Talk Nerdy to Me.” It’s been watched by over 1.5 million people (and counting).

Visit Melissa and learn more at www.PresentYourScience.com.

Melissa can be reached at melissa@presentyourscience.com.

tags: papers & presentations, SAS Global Forum

Transform your technical talks with an audience centered approach was published on SAS Users.

12月 212016
 

The report-ready SAS Environment Manager Data Mart has been an invaluable addition to SAS 9.4 for SAS administrators. The data mart tables are created and maintained by the SAS Environment Manager Service Architecture Framework and provide a source of data for out-of-the box reports as well as custom reports that any SAS administrator can easily create. As you can imagine, the size of the tables in the data mart can grow quite large over time so balancing the desired time span of reporting and the size of the tables on disk requires some thought. The good news: SAS 9.4 M4 has made that job even easier.

The Environment Manager Data Mart (EVDM) has always provided a configuration setting to determine how many days of resource records to keep in the data mart tables. You can see below that in a fresh SAS 9.4 M4 installation, the default setting for “Number of Days of Resource Records in Data Mart” is set to 60 days. This means that EVDM data records older than 60 days are deleted from tables whenever the data mart ETL process executes.

EV Data Mart Tables in 9.4M4

The space required to house the Environment Manager Data Mart is split across three primary areas.

  • The ACM library tables contain system level information
  • The APM library tables contain audit and performance data culled from SAS logs
  • The KITS library tables contains miscellaneous tables created by data mart kits that collect specialty information about HTTP access, SAS data set access, and such.

Prior to SAS 9.4M4, the ACM and APM libraries duly archived data according to the “Number of Days of Resource Records in Data Mart” setting, but the KITS library did not. For most of the KITS tables this is not such a big deal but for some deployments, the HTTPACCESS table in the KITS library can grow quite large. For administrators who have enabled the VA feed for the Service Architecture Framework, the size of the HTTPACCESS table directly impacts the time it takes to autoload the results of each refresh of the data mart, as well as the amount of memory consumed by the LASR Server used for the Environment Manager Data Mart LASR library.

So what is the big change for SAS 9.4 M4?

The KITS library now respects the “Number of Days of Resource Records in Data Mart” setting and removes data older than the threshold.  If you are a SAS administrator, you can now forget about having to separately manage the KITS library which should simplify space management.

SAS administrators may need to adjust the “Number of Days of Resource Records in Data Mart” setting to strike a balance between the date range requirements for reporting and the amount of disk space they have available for storing the EVDM tables.  With SAS 9.4 M4, however, administrators can rest assured that all EVDM tables will self-manage according to their wishes.

More on the Service Architecture Framework.

tags: SAS 9.4, SAS Administrators, SAS Environment Manager, SAS Professional Services

Easier Space Management for EV Data Mart Tables in 9.4M4 was published on SAS Users.

12月 202016
 

Joining tables with PROC FORMAT

The title of this post borrows from Stanley Kubrick’s 1964 comedy “Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb.” It stars the great Peter Sellers as the titular character as well as George C. Scott and Slim Pickens. The black and white film is strange and brilliant just like Kubrick was. Similarly, as I was experiencing the issue I outline below and was told of this solution, I thought two things. The first was “what a strange but brilliant solution” and the second one was “I’ll try anything as long as it works.”   Thus, a title was born. But enough about that. Why are we here?

Problem

You want to add a couple of columns of information to your already large dataset, but each time you try to join the tables you run out of memory!  For example, you want to append latitude and longitude values from Table B to an existing list of customer phone numbers in Table A.

You’ve tried this and got nowhere fast:

proc sort data = demo.tablea;
by npa nxx;
proc sort data = demo.tableb;
by npa nxx;
run;
 
data demo.aunionb;
merge demo.tablea (in=a) demo.tableb (in=b);
by npa nxx;
if a;
run;

And then you tried this and also got nowhere (albeit a little slower):

proc sql;
   	create table demo.aunionb as 
   	select *,
	from demo.tablea a
left join demo.tableb b on (a.npa = b.npa) and (a.nxx = b.nxx);
quit;

Solution - Joining tables with PROC FORMAT

Use PROC FORMAT!

Here’s how:

First, take Table B and create character equivalents of the fields required in your join (assuming they aren’t characters already). In this example, NPA and NXX are the two fields that you are joining on. They will be your key once you concatenate them.  Next, create character equivalents of the fields that you want appended.

data work.tableb (keep = npa_nxx--nxx_c); 
set demo.tableb; 
 
npa_c = compress(put(npa, best10.));
nxx_c = compress(put(nxx, best10.));
 
npa_nxx = catx('_',npa_c, nxx_c);
 
lat_c = compress(put(latitude, best14.3)); 
long_c = compress(put(longitude, best14.3)); 
run;

Next, make sure that you have only unique values of your key. Use PROC SORT with OPT=noduprecs turned on.

Now, create a table that will be used as the input into PROC FORMAT. In this example, you are creating a table that will contain the formats for the latitude column.

proc sort data = work.tableb noduprecs;
by npa_nxx;
 
data demo.tableb_lat_fmt(keep=fmtname type start label); 
retain fmtname 'lat_f' type 'C'; 
set work.tableb; 
 
if npa_nxx = '._.' then start = 'Other  ';
else start = npa_nxx; 
label = lat_c; 
run;
proc sort data = demo.tableb_fmt;
by start;
run;

This step creates a table that includes the format name (lat_f), the format type (C), the key field (start) and its corresponding latitude value (label).  Sort this table by the ‘start’ column and then repeat this step for every column you wish to append, with each column getting its own unique format and table.

Now run PROC FORMAT using the CNTLIN option pointing to the tables that you just created in order to create your format.

proc format cntlin=demo.tableb_lat_fmt; 
run; 
proc format cntlin=demo.tableb_long_fmt; 
run;

Now all you have to do is run your data step to create the resultant dataset with the appended values.

data demo.aunionb (drop = npa_nxx); 
set demo.tablea; 
 
npa_nxx = catx('_',compress(put(npa,best10.)),compress(put(nxx, best10.)));
 
latitude = input(put(npa_nxx, $lat_f.), BEST.); 
longitude = input(put(npa_nxx, $long_f.), BEST.);
 
run;

This step creates 3 columns: npa_nxx, latitude, and longitude. Npa_nxx is the key built from the NPA  and NXX values. Latitude and longitude are then populated with the formatted value of npa_nxx, which in this case is the character equivalent of the original latitude or longitude. It also formats the value back into a numeric field.

The result is a clever way to add columns to a dataset, much like a VLOOKUP function works in Microsoft Excel, without the hassle of running out of memory space.

Notes:

  1. The author realizes there are other, more boring ways of tackling this issue like indexing and using WHERE statements, but where’s the fun in that?
  2. This solution may not be right for you. See your doctor if you experience any of the following symptoms:  memory loss, headache, frustration, Cartesian rage, cranial-keyboard embedment or memory loss.
tags: Global Technology Practice, Joining tables, PROC FORMAT, SAS Programmers, tips & techniques

Dr. Strangeformat or: How I Learned to Stop Joining Tables and Love the PROC was published on SAS Users.

12月 172016
 

A multilabel format enables you to assign multiple labels to a value or a range of values. The capability to add multilabel formats was added to the FORMAT procedure in SAS® 8.2.  You assign multiple labels by specifying the MULTILABEL option in the VALUE statement of PROC FORMAT. For example, specifying the MULTILABEL option in the following VALUE statement enables the Agef format to have overlapping ranges.

value agef (multilabel)
11='11'
12='12'
13='13'
11-13='11-13';

Multilabel formats are available for use only in the MEANS, SUMMARY, TABULATE, and REPORT procedures. The code examples that follow show the creation of a simple mutlilabel format (using PROC FORMAT) and its use in each of these procedures.

First, a PROC FORMAT step creates a multilabel format for the Agef variable in the Sashelp.Class data set, along with a character format for the Sex variable. The NOTSORTED option is specified to indicate the preferred order of the ranges in the results.

proc format library=work;
value agef (multilabel notsorted)
11='11'
12='12'
13='13'
11-13='11-13'
14='14'
15='15'
16='16'
14-16='14-16';
value $sexf
'F'='Female'
'M'='Male';
run;

Now, the multilabel format is used in the other SAS procedures that are mentioned earlier. In PROC MEANS and PROC TABULATE, the MLF option must be specified in the CLASS statement for the Age variable. In PROC REPORT, the MLF option is specified in the DEFINE statement for Age. The PRELOADFMT and ORDER=DATA options are also specified to preserve the order as defined in the format. The PRELOADFMT option applies only to group and across variables in PROC REPORT.

proc tabulate data=sashelp.class format=8.1;
class age / mlf preloadfmt order=data;
class sex;
var height;
table age, sex*height*mean;
format age agef. sex $sexf.
title 'PROC TABULATE';
run;
proc means data=sashelp.class nway Mean nonobs maxdec=1 completetypes;
class age / mlf preloadfmt order=data;
class sex;
var height;
format age agef. sex $sexf.;
title 'PROC MEANS';
run;
 
proc report data=sashelp.class NOWD headline completerows;
col age sex height;
define age / group mlf preloadfmt order=data format=agef.;
define sex / group format=$sexf.;
define height / mean format=8.1;
break after age / skip;
title 'PROC REPORT';
run;

The output from each of these procedures is shown below.

creating-and-using-multilabel-formats01creating-and-using-multilabel-formats02creating-and-using-multilabel-formats03

You can use a multilabel format to facilitate the calculation of moving averages, as illustrated in the next example. This example creates a multilabel format using the CNTLIN= option in PROC FORMAT. Then, that format is used to calculate a three-month moving average in PROC SUMMARY.

data sample;  /*  Create the sample data set. */
do sasdate='01JAN2015'D to '31DEC2016'D;
x=ranuni(sasdate)*1234;
if day(sasdate)=1 then output;
end;
run;
 
proc print data=sample;
format sasdate date9.;
title 'Sample data set';
run;
 
data crfmt;  /* Create a CNTLIN data set for a multilabel format. */
keep fmtname start end label HLO;
begin='01JAN2015'D;
final='31DEC2016'D;
fmtname='my3month';
periods=intck('month',begin,final) -2;
do i=0 to periods;
end=intnx('month',final,-i,'E');
start=intnx('month',end,-2);
label=catx('-',put(start,date9.),put(end,date9.));
HLO='M';  /* M indicates "multilabel."  */
output;
end;
run;
 
proc print data=crfmt;
var fmtname start end label HLO;
format start end date9.;
title 'CNTLIN data set';
run;
 
/*  Use the CNTLIN= option to create the format.  */
proc format library=work cntlin=crfmt fmtlib;
select my3month;
title 'FMTLIB results for my3month format';
run;
 
proc summary data=sample NWAY order=data;
class sasdate / MLF;   /*  Use the MLF option.  */
var x;
output out=final (drop=_: ) mean= / autoname;
format sasdate my3month.;
run;
 
proc print data=final noobs;
title 'Three-Month Moving Averages';
run;

The example code above generates the following output:

 creating-and-using-multilabel-formats04

For additional information about the PROC FORMAT options that are used in the code examples above, as well as the options that are used in the other procedures, see the Base SAS X.X Procedures Guide for your specific SAS release. The procedures guide is available on the Base SAS Documentation web page (support.sas.com/documentation/onlinedoc/base/index.html).

Creating and Using Multilabel Formats was published on SAS Users.

12月 162016
 

paper-money-stackImagine making $50K a day out of thin air. Did you know that NASDAQ routinely processes around 10,000,000 trades a day? What if instead of rounding cents for each transaction, market makers truncated fractions of cents in the amount they owe you? Under the assumption that each transaction, on average, has half a cent that is usually rounded away, this would produce 10,000,000 x $0.005 = $50,000 and nobody would even notice it. I am not saying it's legal, but this example is just an illustration of the power of ordinary truncation.

However, sometimes it is necessary to truncate displayed numeric values to a specified number of decimal places without rounding. For example, if we need to truncate 3.1415926 to 4 decimal places without rounding, the displayed number would be 3.1415 (as compared to the rounded number, 3.1416).

If you think you can truncate numeric values by applying SAS w.d format, think again.

Try running this SAS code:

data _null_;
   x=3.1415926;
   put x= 6.4;
run;

If you expect to get x=3.1415, you will be disappointed. Surprisingly, you will get x=3.1416, which means that SAS format does not truncate the number, but rounds it. Same is true for the DOLLARw.d and COMMAw.d formats.

After running into this problem, I thought to instead use a SAS function to truncate numbers. The TRUNC function comes to mind. Indeed, if you look up the SAS TRUNC function, you will find that it does truncate numeric values, but (surprise!) not to a specified number of decimal places; rather it truncates to a specified number of bytes, which is not the same thing for numerics. This may be useful for evaluating the precision of numeric values, but has no direct bearing on our problem of truncating numeric values to a set number of decimal places.

What turned to be interesting is that the Excel TRUNC function does exactly what we need – it truncates values to a set number of decimal places (removes decimals) without rounding:

truncating-decimal-numbers-in-sas-without-rounding

Truncating numbers in Excel with the TRUNC Function

In general, the technique of number truncation should be limited to reporting purposes when displayed numbers are required to appear truncated. Be careful not to apply truncation before using the numbers in calculations, as you might get some seriously inaccurate results, even worse than when you round numbers before calculations. Unless, of course, your goal is to get inaccurate results, which is quite an honorable goal in fraud detection, simulation and investigation.

I can see two possible solutions to number truncations:

Solution 1: Numeric truncation

Let’s say we need to truncate the following number X.XXXXXXX , keeping only the red digits (that is get rid of all decimal digits after the 4th decimal place).

We can do it in 3 steps:

  1. Multiply our number by 104, effectively making the decimals part of a whole number (shifting the decimal point 4 positions to the right).
  2. Apply INT() function that truncates the decimal portion, keeping only the whole portion from the previous step.
  3. Divide the result of step 2 by 104, effectively restoring the order disturbed by step 1 (shifting the decimal point 4 positions to the left).

Here is SAS code implementing this algorithm:

%let d = 4; /* d must be a whole number: 0, 1, 2... */
 
data _null_;
   x = 3.1415926;
   p = 10**&d;
   y = int(x*p)/p;
   put x= / y=;
run;

If we run this code SAS log will show the following (expected and desired) results:

x=3.1415926
y=3.1415

WARNING: While in the SAS code example above the int() function might be substituted with the floor() function, for negative numbers the floor() function would produce incorrect results. For negative numbers, the ceil() function is the correct choice. However, the int() function does exactly what we need for both positive and negative numbers.

Solution 2: Character truncation

Since we use truncated numbers for output only, we can solve our truncation problem by converting numeric value into character, and then use character functions to get rid of extra digits. Let’s solve the same problem, to truncate the following number X.XXXXXXX keeping only the red digits.

Using this character approach we can also do it in 3 steps:

  1. Convert the numeric value into a character value and assign it to a new character variable.
  2. Determine position of a decimal point in the character string.
  3. Sub-string our initial character string to keep only 4 characters after decimal point.

Here is SAS code implementing this algorithm:

%let d = 4; /* d must be a whole number: 0, 1, 2... */
 
data _null_;
   x = 3.1415926;
   y = put(x,best.);
   y = substr(y,1,index(y,'.')+&d);
   put x= / y=;
run;

If we run this code SAS log will show the following results:

x=3.1415926
y=3.1415

As you can see, these results are correct and identical to the results of numeric truncation.

Both numeric and character truncation methods work for positive and negative numbers.

User-defined functions

We can also implement the above two methods as user-defined functions, say truncn() and truncc(), using PROC FCMP:

proc fcmp outlib = sasuser.functions.truncations;
   function truncn (x,d);
      p = 10**d;
      y = int(x*p)/p;
      return(y);
   endsub;
 
   function truncc (x,d);
      y = put(x,best.);
      y = substr(y,1,index(y,'.')+d);
      return(y);
   endsub;
run;

Then we can use those user-defined functions truncn() and truncc() as illustrated in the following SAS code sample:

options cmplib=sasuser.functions;
 
data A;
   length x n 8 c $9;
   input x;
   n = truncn(x,4);
   c = truncc(x,4);
   datalines;
3.1415926
-3.1415926
run;

This code will produce the following A dataset:

truncating-decimal-numbers-in-sas-without-rounding02

Notice that variables x and n are of numeric type while variable c is of character type.

Questions

  1. Which of these two methods of decimal number truncation do you like more? Why?
  2. Does it make sense to use these methods as user-defined functions? Why?

 

tags: SAS Programmers

Truncating decimal numbers in SAS without rounding was published on SAS Users.