@SASBooks

1月 082020
 

Did I trick you into seeing what this blog is about with its mysterious title? I am going to talk about how to use the FIND function to search text values.

The FIND function searches for substrings in character values. For example, you might want to extract all email addresses ending in .edu from a list of email addresses. If you are a slightly older SAS programmer like me, you may be more familiar with the INDEX function. If you use only two arguments in the FIND function, the first being the string you are searching and the second being the substring you are looking for, the FIND function is identical to the INDEX function. Both of these functions will searczh the string (first argument) for the substring (second argument) and return the position where the substring starts. If the substring is not found, the function returns a zero.

The newer FIND function has several advantages over the older INDEX function. These advantages are realized by the optional third and fourth arguments to the FIND function. These two arguments allow you to specify a starting position for the search and modifiers that allow you to ignore case. You can use either of these two arguments, or both, and the order doesn't matter! How is this possible? The value for the starting position is always a numeric value and the value for the modifier is always a character value. Thus, SAS can always figure out if a value is a starting position or a modifier.

Let's look at an example

Suppose you have a SAS data set called Emails, and each observation in the data set contains a name and an email address.

Here is a listing of the SAS data set Emails:

You want to select all observations where the variable Email_Address contains .edu (ignoring case).

The program below does just that:

*Searching for .edu;
data Education;
   set Emails;
   if find(Email_Address,'.edu','i') then output;
run;
title "Listing of Data Set Education";
proc print data=Education noobs;
run;

The 'i' modifier is an instruction to ignore case. In the listing of Education below, notice that all the .edu addresses are listed, regardless of case.

Not only is the FIND function more flexible than the older INDEX function, the ignore case modifier is really handy.

For more tips on writing code and how to get started in SAS Studio, check out my book, Learning SAS by Example: A Programmer’s Guide, Second Edition. You can also download a free book excerpt. To also learn more about SAS Press, check out the up-and-coming titles and receive exclusive discounts, make sure to subscribe to the SAS Books newsletter.

Adventures of a SAS detective and the fantastic FIND function was published on SAS Users.

12月 172019
 

The next time you pick up a book, you might want to pause and think about the work that has gone into producing it – and not just from the authors!

The authors of the SAS classic, The Little SAS Book, Sixth Edition, did just that. The Acknowledgement section in the front of a book is usually short – just a few lines to thank family and significant others. In their sixth edition, authors Lora D. Delwiche and Susan J. Slaughter, took it to the next level and produced The Little SAS Book Family Tree. The authors explained:

“Over the years, many people have helped to make this book a reality. We are grateful to everyone who has contributed both to this edition, and to editions in the past. It takes a family to produce a book including reviewers, copyeditors, designers, publishing specialists, marketing specialists, and of course our editors.”

So what happens after you sign a book contract?

First you will be assigned a development editor (DE) who will answer questions and be with you every step of the way – from writing your sample chapter to publication. Your DE will discuss schedules and milestones as well as give you an authoring template, style guidelines, and any software you need to write your book.

Once you have all the resources you need, you'll write your sample chapter. This will help your DE evaluate your writing style, quality of graphics and output, structure, and any potential production issues.

The next step is submitting a draft manuscript for technical review. You'll get feedback from internal and external subject-matter experts, and then you can revise the manuscript based on this feedback. When you and your editor are satisfied with the technical content, your DE will perform a substantive edit on your manuscript, taking particular care with the structure and flow of your writing.

Once in production, your manuscript will be copy edited, and a production specialist will lay out the pages and make sure everything looks good. A graphic designer will work with you to create a cover that encompasses both our branding and your suggestions. Your book will be published in print and e-book versions and made ready for sale.

Finally, after the book is published, a marketing specialist will start promoting your book through our social media channels and other campaigns.

So the next time you pick up a book, spare a thought for the many people who have worked to make it a reality!

For more information about publishing with SAS or for our full catalogue, visit our online bookstore.

How many people does it take to publish a book? was published on SAS Users.

3月 082019
 

In a move to combat "stataphobia" and foster excellence in statistics in developing countries, SAS Press last month donated 70 SAS Press titles to the Serageldin Research Library at the Library of Alexandria in Egypt. The library’s mission is to achieve statistical equity so that a student in Chad has [...]

Breaking down walls for science: SAS Press donates books to the world’s largest research methods library was published on SAS Voices by Sian Roberts

10月 102018
 

Deep learning (DL) is a subset of neural networks, which have been around since the 1960’s. Computing resources and the need for a lot of data during training were the crippling factor for neural networks. But with the growing availability of computing resources such as multi-core machines, graphics processing units (GPUs) accelerators and hardware specialized, DL is becoming much more practical for business problems.

Financial institutions use a large number of computations to evaluate portfolios, price securities, and financial derivatives. For example, every cell in a spreadsheet potentially implements a different formula. Time is also usually of the essence so having the fastest possible technology to perform financial calculations with acceptable accuracy is paramount.

In this blog, we talk to Henry Bequet, Director of High-Performance Computing and Machine Learning in the Finance Risk division of SAS, about how he uses DL as a technology to maximize performance.

Henry discusses how the performance of numerical applications can be greatly improved by using DL. Once a DL network is trained to compute analytics, using that DL network becomes drastically faster than more classic methodologies like Monte Carlo simulations.

We asked him to explain deep learning for numerical analysis (DL4NA) and the most common questions he gets asked.

Can you describe the deep learning methodology proposed in DL4NA?

Yes, it starts with writing your analytics in a transparent and scalable way. All content that is released as a solution by the SAS financial risk division uses the "many task computing" (MTC) paradigm. Simply put, when writing your analytics using the many task computing paradigm, you organize code in SAS programs that define task inputs and outputs. A job flow is a set of tasks that will run in parallel, and the job flow will also handle synchronization.

Fig 1.1 A Sequential Job Flow

The job flow in Figure 1.1 visually gives you a hint that the two tasks can be executed in parallel. The addition of the task into the job flow is what defines the potential parallelism, not the task itself. The task designer or implementer doesn’t need to know that the task is being executed at the same time as other tasks. It is not uncommon to have hundreds of tasks in a job flow.

Fig 1.2 A Complex Job Flow

Using that information, the SAS platform, and the Infrastructure for Risk Management (IRM) is able to automatically infer the parallelization in your analytics. This allows your analytics to run on tens or hundreds of cores. (Most SAS customers run out of cores before they run out of tasks to run in parallel.) By running SAS code in parallel, on a single machine or on a grid, you gain orders of magnitude of performance improvements.

This methodology also has the benefit of expressing your analytics in the form of Y= f(x), which is precisely what you feed a deep neural network (DNN) to learn. That organization of your analytics allows you to train a DNN to reproduce the results of your analytics originally written in SAS. Once you have the trained DNN, you can use it to score tremendously faster than the original SAS code. You can also use your DNN to push your analytics to the edge. I believe that this is a powerful methodology that offers a wide spectrum of applicability. It is also a good example of deep learning helping data scientists build better and faster models.

Fig 1.3 Example of a DNN with four layers: two visible layers and two hidden layers.

The number of neurons of the input layer is driven by the number of features. The number of neurons of the output layer is driven by the number of classes that we want to recognize, in this case, three. The number of neurons in the hidden layers as well as the number of hidden layers is up to us: those two parameters are model hyper-parameters.

How do I run my SAS program faster using deep learning?

In the financial risk division, I work with banks and insurance companies all over the world that are faced with increasing regulatory requirements like CCAR and IFRS17. Those problems are particularly challenging because they involve big data and big compute.

The good news is that new hardware architectures are emerging with the rise of hybrid computing. Computers are increasing built as a combination of traditional CPUs and innovative devices like GPUs, TPUs, FPGAs, ASICs. Those hybrid machines can run significantly faster than legacy computers.

The bad news is that hybrid computers are hard to program and each of them is specific: you write code for GPU, it won’t run on an FPGA, it won’t even run on different generations of the same device. Consequently, software developers and software vendors are reluctant to jump into the fray and data scientist and statisticians are left out of the performance gains. So there is a gap, a big gap in fact.

To fill that gap is the raison d’être of my new book, Deep Learning for Numerical Applications with SAS. Check it out and visit the SAS Risk Management Community to share your thoughts and concerns on this cross-industry topic.

Deep learning for numerical analysis explained was published on SAS Users.

10月 052012
 
This weekend, lots of SAS authors are going to Las Vegas. The draw is the Analytics 2012 conference. There, several of our authors will lead discussions, including keynote speaker Tim Rey (coauthor of the new SAS Press book Applied Data Mining for Forecasting Using SAS) and Gerhard Svolba (author of [...]