When developing SAS® data sets, program code and/or applications, efficiency is not always given the attention it deserves, particularly in the early phases of development. Since data sizes and system performance can affect a program and/or an application’s behavior, SAS users may want to access information about a data set’s [...]
Ok, so you know how to create multiple sheets in Excel, but can anyone tell me how to control the name of the sheets when they are all created at once? In the ODS destination for Excel, the suboption SHEET_INTERVAL is set to TABLE by default. So what does that [...]
The post How to control the name of Excel sheets when they are all created at once appeared first on SAS Learning Post.
SAS® users have an easy and convenient way to quickly obtain useful information (referred to as metadata) about their SAS session with a number of read-only SAS DICTIONARY tables or SASHELP views. At any time during a SAS session, information about currently defined system options, libnames, tables, columns and their [...]
The post Exploring the content of the DICTIONARIES table and VSVIEW SASHELP view appeared first on SAS Learning Post.
How many of you have been given a SAS data set with variables such as Age, Height, and Weight and some or all of them were stored as character values instead of numeric? Probably EVERYONE! Yes, we all know how to do the old "swap and drop" (rename and convert), but […]
This SAS Jedi is very excited about the SAS 9.4 M4 release, which brought many wonderful gifts just in time for Christmas. So in the interest of extending the Christmas spirit, I'm going to blog about some of my favorites! I've long loved the SAS DO statement variant which allows […]
The post SAS Jedi Christmas - SAS 9.4 M4 DS2 Do Loop Upgrade appeared first on SAS Learning Post.
JMP supports many date/time formats, but some less conventional (or downright esoteric) formats still crop up from time to time. To many users, converting an oddly formatted date/time from string to numeric form is a frustrating endeavor, requiring custom formulas and an assortment of seldom-used string and numeric operations. With the Custom Date Formula Writer, you can simply point-and-click your date/time troubles away, generating the necessary formula without writing code.
To begin, install the Data Table Tools Add-in and navigate to the formula writer:
Now the new date column is just four steps away:
- Choose the table and column containing the character date/time data.
- Point and click to delimit the "words" in the data.
- Specify the meaning of each word, and various options, using drop-down menus and radio buttons.
- Press the "Build formula column" button.
Here's what the process looks like:
Step 1: Choose the table and date/time column.
Step 2: Point and click the text to delimit the data, then press the "Apply delimiting and choose words" button.
Step 3: Complete the dialog using the radio buttons and drop-down menus to select options and word roles.
Step 4: Click the "Build Formula Column" button to write the new formula column to the data table.
I'll be blogging on more table tools in the future, so stay tuned!
Note: This blog post is first in a series exploring the various features of the Data Table Tools add-in.
The post Data table tools part 1: Custom Date Formula Writer appeared first on JMP Blog.
My river walk last week turned into a spectacular fall show. But if it rains this week in San Antonio, like the weatherman predicts, what will I do? In the coming days, I’ll be presenting at two user groups, one in eastern Canada in Halifax, and the other all the […]
The post The difference between the Subsetting IF and the IF—THEN—ELSE—IF statement appeared first on SAS Learning Post.
As data analysts, we all try to do the right thing. When there is a choice of statistical distributions to be used for a given application, it’s a natural inclination to try to find the “best” one.
Fishing for the best distribution can lead you into a trap. Just because one option appears to be best – that doesn’t mean that it’s correct! For example, consider this data set:
What is the best distribution we can use to describe this data? JMP can help us answer this question. From the Distribution platform, we can choose to fit a number of common distributions to the data: Normal, Weibull, Gamma, Exponential, and others. To fit all possible continuous distributions to this data in JMP, go to the red triangle hotspot for this variable in the Distribution report, and choose “Continuous Fit > All”. Here is the result:
JMP has compared 11 potential distributions for this data, and ranked them from best (Gamma) to worst (Exponential). The metric used to perform the ranking is the corrected Akaike Information Criterion (AICc). Lower values of AICc indicate better fit, and so the Gamma distribution is the winner here.
Here’s the catch
This data set was generated by drawing a random sample of size 50 from a population that is normally distributed with a mean of 50 and a standard deviation of 10. The Normal distribution is the correct answer by definition, but our fishing expedition gave us a misleading result.
How often is there a mismatch like this? One way we can approach this question is through simulation. I wrote a small JMP script to draw samples of various sizes from a normally distributed population. I investigated sample sizes of 5, 10, 20, 30, 50, 75, 100, 250, and 500 observations; for each of these, I drew 1,000 independent samples and had JMP compute the fit for all possible continuous distributions. Last, for each sample I recorded the name of the best-fitting distribution, as measured by AICc. (JSL script available in the JMP File Exchange).
The results were quite surprising!
- Remember, the correct answer in each case is “Normal”. If our fishing expedition was yielding good results across the board, the line for the Normal distribution should be high and flat, hovering near 100%.
- Instead, the wrong distribution was chosen with disturbing frequency. For sample sizes under 50, the Normal distribution was not even the most commonly chosen. That honor belongs to the Weibull distribution.
- For a sample size of 5 observations from a Normal distribution, the correct identification was not made a single time out of 1,000 samples.
- If you want to have at least a 50% chance of correctly identifying normally distributed data by this method, you’ll need more than 100 observations!
- Even at a sample size of 500 observations, the likelihood of the normal distribution being correctly called the best is only about 80%.
The moral of the story
When comparing the fit of different distributions to a data set, don’t assume that the distribution with the smallest AICc is the correct one. Relative magnitudes of the AICc statistics are what counts. A rule of thumb (used elsewhere in JMP) is that models whose values of AICc are within 10 units of the “best” one are roughly equivalent.* In our first example above, the Gamma distribution is nominally the best, but its AICc is only .2 units lower than that of the Normal distribution. There is not good statistical evidence to choose the Gamma over the Normal.
More generally, as a best practice it is wise to consider only distributions that make sense in the context of the problem. Your own knowledge and expertise are usually the best guides. Don’t choose an exotic distribution that has a slightly better fit over one that makes sense and has a proven track record in your field of work.
*This rule is used to compare models built in the Generalized Regression personality of the Fit Model platform in JMP Pro. See Burnham, K.P. and Anderson, D.R. (2002), Model Selection And Multimodel Inference: A Practical Information Theoretic Approach. Springer, New York.
SAS Programming Professionals, SAS & bugs & rock & roll? But, of course! SAS Because of its amazing versatility, SAS is indisputably the greatest software package currently in use anywhere within the Milky Way Galaxy. Can SAS input every type of flat file imaginable? Yes! Can SAS read and write […]
Last time I checked, there are well over 500 functions and call routines in SAS. I’ve taught SAS programming courses for 15 years, and I’ll admit that occasionally my students will ask me about a particular function that I have honestly never heard of. I remember the first time this […]