SAS 9.4

1月 312023
 

SAS has released SAS 9.4 Maintenance 8, a major update to SAS 9.4.

Security is the primary focus of the Maintenance 8 update. This release contains updates for many of the third-party technologies that are used by the platform, including the Java runtime environment (JRE) and many of the third-party JAR files. This release also adds support for major releases of supported operating systems, while limiting support for operating systems that are no longer supported by their respective suppliers. My colleague Margaret Crevar has summarized these changes in this SAS Communities post.

As with all SAS maintenance releases, this release "rolls up" the hotfixes and enhancements delivered since the last major update (SAS 9.4 Maintenance 7). Most SAS platform products and solutions have also been updated to remain compatible with this release and take advantage of enhancements. However, there are some products and solutions that will not be available immediately, or that will not deliver support for SAS 9.4 Maintenance 8.

Because SAS 9.4 Maintenance 8 is a major software release for SAS 9.4, it is covered by the SAS Support policy for the "Standard Support" timeframe according to its general availability date: Jan 31, 2023. (Current policy offers Standard Support for 5 years from the GA date.)

While this maintenance release doesn't contain new features, it does demonstrate the commitment of SAS to support users of the SAS 9.4 platform for many years to come. (See "Your platform, your way" in "Your analytics, your way" from Shadi Shahin.) New data and analytics capabilities are delivered in the SAS Viya platform, which offers monthly cadence releases via its continuous delivery model.

For an overview of all product changes and updates in SAS 9.4 Maintenance 8, see the What's New topic in the SAS documentation.

SAS 9.4 Maintenance 8 is available was published on SAS Users.

7月 282022
 

SAS Grid Manager for Platform, SAS Grid Manager, and SAS Viya Workload Management all manage and balance job loads, right? So, are there three SAS products providing the same functionality? Let’s explore, as I am curious in your answer after you read this article.

Each of the three applications has a Graphical User Interface (GUI), which provides basic functionality like monitoring and managing jobs, queues, and hosts. The GUIs differ slightly between applications, but let’s save that detailed conversation for another post. For now, let’s take a look at the applications.

Interfaces

SAS Grid Manager for Platform

SAS Grid Manager for Platform is developed by IBM and provided as an OEM license through SAS. This product was first made available in SAS 9.4 M2. The GUI has matured over time and Figure 1 represents the latest iteration.

Figure 1: Visual of the user interface for SAS Grid Manager for Platform

SAS Grid Manager for Platform takes advantage of the Load Sharing Facility (LSF), Process Manager (PM), and Platform Web Services (PWS) features. Together these features provide services for job execution, job scheduling and host information. You launch the GUI for SAS Grid Manager for Platform from a separate URL and it is also listed in the Instructions.html file generated during the install process. Most SAS customers use Platform Suite for SAS. It is our legacy product.

SAS Grid Manager

In SAS 9.4 M6, SAS released SAS Grid Manager which balances workloads, supports high availability, and has multiprocessing capabilities. See Figure 2 for a detailed view.

Figure 2: Visual of the user interface for SAS Grid Manager, which is called SAS Workload Orchestrator

The Balance Workload feature ensures the spread of resources across machines and they are not overloaded with running jobs. High Availability ensures resource availability in case of machine failure and insures jobs still run and execute. Multiprocessing capabilities breaks down jobs, so they run across machines.
The GUI for SAS Grid Manager goes by the name SAS Workload Orchestrator (SWO). You launch the SWO from a separate URL and it is also listed in the Instructions.html, which generates during the install of SAS. Since it is built with the SAS threaded kernel, SAS Grid Manager takes great advantages of the SAS features and capabilities.

SAS Viya Workload Management

SAS developed the SAS Viya Workload Management for the newest version of SAS Viya. The GUI for SAS Viya Workload Management goes by the name Workload Orchestrator (hmmm, sound familiar? 😉). The plug-in appears as a menu item in the left-hand navigation pane from SAS Environment Manager, as seen in Figure 3 below.

Figure 3: Visual of the user interface for SAS Viya Workload Management

SAS Viya is now cloud native and relies on Kubernetes and its capabilities for balancing job workloads. The SAS Viya Workload Orchestrator is designed to enhance Kubernetes’ workload proficiencies by adding prioritization for jobs, queue management, etc.

Functionality

Now that we've seen some of the differences in the user interfaces, let’s take a look at functionality.

SAS Grid Manager for Platform

SAS Grid Manager for Platform has a Grid Controller and Nodes. As seen in Diagram 1, the SAS Grid Controller machine has Process Manager installed on the machine.

Diagram 1: SAS Grid Manager for Platform

The Process Manager schedules the job and gives the job to the Load Sharing Facility (LSF). LSF executes the job and then submits the results back to Process Manager.

SAS Grid Manager

The SAS Workload Orchestrator (SWO) is the “brains” that acts as a coordination engine for SAS Grid Manager. As depicted in Diagram 2, the SWO keeps track of all of the entry points for jobs, including jobs that come in through the Metadata Server (e.g., Enterprise Guide sessions), Mid-Tier (e.g., BI Server sessions), and through batch submission (e.g., usage of gsub utility).

Diagram 2: SAS Grid Manager

These entry points then utilize compute resources; those resources may be managed through the Workspace Server, Stored Process server, or may be spawned directly through batch submission.

SAS Viya Workload Management

SAS Viya Workload Management extends the Kubernetes capabilities. SAS Compute talks to the Launcher Service, followed by the Workload Orchestrator Manager, which talks to the Workload Orchestrator Server. This flow is depicted in Diagram 3 below.

Diagram 3: SAS Viya Workload Management

The Kubernetes API Server talks to the Workload Orchestrator Manager and Server. It determines which nodes can start a pod for jobs. SAS Viya Workload Management adds queue prioritization and other features as well.

Side-by-side comparison

The following table summarizes the features of each product we’ve discussed in this article.

Finally

So, what do you think? Are SAS Grid Manager, SAS Grid for Platform and SAS Viya Workload Management alike or different? Can we agree on both? In my opinion, there are distant differences but there are some similarities, also..:-)

To summarize, all three products manage jobs and processes. SAS Grid Manager and SAS Grid Manager for Platform are similar running on physical hardware in the older version of SAS with multiple applications to manage. While SAS Viya Workload Management is running on the newest release of SAS in a cloud-based environment taking advantage of the Kubernetes technology in a one stop shop of SAS Environment Manager.

Manage and Balance Workloads in SAS was published on SAS Users.

2月 252021
 

The people, the energy, the quality of the content, the demos, the networking opportunities…whew, all of these things combine to make SAS Global Forum great every year. And that is no exception this year.

Preparations are in full swing for an unforgettable conference. I hope you’ve seen the notifications that we set the date, actually multiple dates around the world so that you can enjoy the content in your region and in your time zone. No one needs to set their alarm for 1:00am to attend the conference!

Go ahead and save the date(s)…you don’t want to miss this event!

Content, content, content

We are working hard to replicate the energy and excitement of a live conference in the virtual world. But we know content is king, so we have some amazing speakers and content lined up to make the conference relevant for you. There will be more than 150 breakout sessions for business leaders and SAS users, plus the demos will allow you to see firsthand the innovative solutions from SAS, and the people who make them. I, for one, am looking forward to attending live sessions that will allow attendees the opportunity to ask presenters questions and have them respond in real time.

Our keynote speakers, while still under wraps for now, will have you on the edge of your seats (or couches…no judgement here!).

Networking and entertainment

You read that correctly. We will have live entertainment that'll have you glued to the screen. And you’ll be able to network with SAS experts and peers alike. But you don’t have to wait until the conference begins to network, the SAS Global Forum virtual community is up and running. Join the group to start engaging with other attendees, and maybe take a guess or two at who the live entertainment might be.

A big thank you

We are working hard to bring you the best conference possible, but this isn’t a one-woman show. It takes a team, so I would like to introduce and thank the conference teams for 2021. The Content Advisory Team ensures the Users Program sessions meet the needs of our diverse global audience. The Content Delivery Team ensures that conference presenters and authors have the tools and resources needed to provide high-quality presentations and papers. And, finally, the SAS Advisers help us in a multitude of ways. Thank you all for your time and effort so far!

Registration opens in April, so stay tuned for that announcement. I look forward to “seeing” you all in May.

What makes SAS Global Forum great? was published on SAS Users.

7月 202020
 

I think we can all agree that lifelong learning is the future, for all of us. We know that we need to learn and develop all the time, simply to stay abreast. The world is changing fast, and we must change with it. Investing in analytics talent is an investment in your future.

Create a corporate learning culture

Organizations may embrace the concept of data in their work cultures, yet they fail to commit to developing the analytics skills their teams need to harness the data. Organizations often don’t know what skills their team has, the skills they’re lacking, or even where they need their employees to go. Corporate training brings new avenues for career development and professional training, and that can have a huge impact on retention.

Companies that understand the importance of continued skill development for their employees will set out a vision and develop a corporate learning culture. Leaders are needed who can drive a cultural change towards continued learning in the organization. Hiring curious people and rewarding that curiosity develops the right mindset. Revealing the knowledge gap that employees have is a good trigger to get the learning started.

When an organization wants to close the talent gap for Analytics and AI, SAS can offer support on the whole journey.

Company objectives and goals will determine how the skills development program will be shaped. Once the roles and desired skills levels are determined, SAS will provide a Learning Needs Assessment to uncover the skills gaps.

Based on the findings from the assessment, a program for learning and development will be proposed for the different target audiences and to work towards company goals.

With readily available digital assets, the first steps to a learning platform can be realized quickly. By dynamically developing more assets and creating domain and company specific materials, continuous learning is encouraged and facilitated.

Power of online learning

Today most of us are forced to work from home due to COVID-19. We know the future of work will look different; we are getting used to the digital channel for communicating and interacting. Though the digitization of learning has been going on for many years, learning content is now moving to the cloud, becoming accessible across multiple devices and teaching environments and often being generated, shared, and continually updated.

Millennials feel most comfortable with this digitization, but the wider workforce will ultimately benefit from access to digital learning assets. Integrated cloud-based platforms enable more than just new computer programs or smartphone apps. Smart organizations are now expanding their use of cloud-based learning to run personalized online courses, small tailored live web sessions, instructional videos, e-coaching, communities, virtual classrooms, and simulations games. SAS is responding to this need by offering several free options for learning including e-Learning, online tutorials, lab time, SAS Academy for Data Science, and a SAS Learning Subscription that offers more than 100 e-Learning courses. There is something for everyone!

Finding analytics talent within your workforce can be challenging. It can be hard to assess the skills and identify the talent you need for specific solutions. With an unparalleled depth of understanding in the industry, SAS can help identify, cultivate and grow analytics talent in your organization. In addition, we’ll help you to build a talent pipeline in partnership with local academic institutions, if needed.

Do you want to start today with learning essential skills? Take a look at how SAS can help your organization to get ahead, through our learning solutions customized to help you train your team. Or transform it.

Shaping the future of lifelong learning was published on SAS Users.

4月 072020
 

As many of us are learning to navigate the changing world we are living in amid the COVID-19 outbreak, and as we care for our loved ones, friends, and our community, many of us now find ourselves working and studying from home much more than we did before. As an employee at SAS and an instructor at NC State University, I have found myself splitting my time between setting up my temporary home office and an at-home routine while also trying to help my students feel safe and comfortable as we move to a virtual classroom. But, as my commute and previous social time becomes Facetime calls and text messages, I’ve found myself with more downtime then I previously had, the time I want to dedicate to the training I’ve been wanting to do for the past year.

At SAS, we are striving to care for our users during this time—in that spirit, I wanted to share with you some free SAS offerings, as well as coping techniques I am doing from home.

Take care of yourself and your family

First and foremost, make sure you and your family are taking time for self-care. Whether it be meditation, using a mobile app or YouTube video, or getting some exercise together. I am finding my daily walk something I need to help relax my mind and get myself back to focus on my tasks.

Retrain on skills you haven’t touched in awhile

Sometimes we all need that reboot on tools or methods we use every day. I love SAS’ Statistics 1 course and SAS Programming 1, both of which are free. It’s a great refresher to those who haven’t taken a math course in a few years, or for those just wanting to start out with data science using SAS. Understanding the fundamentals and getting some refreshers on SAS language tricks is something I try to push through with my work constantly. I am also preparing to take the SAS Certification Exam at the end of the year, so the practice exam is something I also plan to use often during my study sessions.

Learn a new skill

It is also a great time to learn something you have always wanted to do. I started by taking some free online photography classes. I also have on my list enrolling in the SAS Academy for Data Science, which is free until the end of May 2020. The advanced analytics professional courses are something I have wanted to complete for a long time, so I am excited to get started on learning more about data modeling. The SAS e-books collection is now also free until April 30, 2020, so I’ve downloaded some great additional materials using code FREEBOOKS at checkout. 

Be kind to others

Being stuck inside can sometimes make you feel like you are spending more time with family than you are used to, or maybe you are spending more time alone. Using this time to connect with those I haven’t made time to talk to has been something I am really thankful for. I call my sister who is a nurse in Florida and check on her. As this outbreak affects us all differently, this is a great time to come together to connect. It is also a great time to think of others affected by the outbreak and who don’t have the ability to continue working. There are some great ways you can help others by getting involved in volunteer work or donating to a helpful cause.

Do fun activities

Though we are stuck at home, this time has been great for enjoying things I often don’t get to during my normal busy schedule. Besides taking free training, I’ve been playing some new video games (#ACNH) and some games I’ve neglected for far too long. I’ve also used this time to find what brings me joy from my home. Making time for reading is bringing great joy to my life right now.

As we all move through this turbulent time, make sure to take care of yourself and others. I hope some of these free tools and training will come in handy as you work towards your personal goals while remaining safe and healthy.

Working remotely? A list of 5 ways to spend your down time was published on SAS Users.

12月 042019
 

If you’re like me, you struggle to buy gifts. Most folks in my inner circle already have everything they need and most of what they want. Most folks, that is, except the tech-lovers. That’s because there’s always something new on the horizon. There’s always a new gadget or program. Or a new way to learn those things. If you know folks who fall into this category, and who love SAS, boy do I have some gift ideas for you:

Try SAS for free

If you know someone who is interested in SAS but doesn’t work with it on a daily basis, or someone looking to experiment in a different area of SAS, let them know about our free software trials. Who doesn’t love free? And you’ll look like a rock star when you point out that they can explore areas like SAS Data Preparation, SAS Visual Forecasting, SAS Visual Statistics and lots more, for free.

SAS University Edition

Let’s keep things in the vein of free, shall we? SAS University Edition includes SAS Studio, Base SAS, SAS/STAT, SAS/IML, SAS/ACCESS, and several time series forecasting procedures from SAS/ETS. It’s the same software used by sites around the world; that means it’s the most up-to-date statistical and quantitative methods. And it’s free for academic, noncommercial use.

Registration to SAS Global Forum 2020

Oh my gosh, you would be your loved one’s favorite person! I’ve been to many SAS Global Forums and I love seeing the excitement on someone’s face as they explore the booths, sit in on sessions, and network to their heart’s content. SAS Global Forum is the place to be for SAS professionals, thought leaders, decision makers, partners, students, and academics. There truly is something for everyone at this event for analytics enthusiasts. And who wouldn’t want to be in Washington, D.C. in the Spring? It’s cherry blossom time! Be sure to register before Jan. 29, 2020 to get early bird pricing and save $600 off the on-site registration fee.

SAS books

I worked for many years in SAS Press and saw, firsthand, how excited folks get over a new book. I mean, who doesn’t want the 6th edition of the Little SAS Book or the latest from Ron Cody? Browse more than 100 titles and find the perfect gift. Don’t forget, SAS Press is offering 25% off everything in the bookstore for the month of December! Use promo code HOLIDAYS25 at checkout when placing your order. Offer expires at midnight Tuesday, December 31, 2019 (US).

A SAS Learning subscription

Give the gift of unlimited learning with access to our selection of e-learning and video tutorials. You can give a monthly or yearly subscription. Help the special people in your life get a head start on a new career. The gift of education is one of the best gifts you can give.

SAS OnDemand for Academics

For those who don't want to install anything, but run SAS in the cloud, we offer SAS OnDemand for Academics. Users get free access to powerful SAS software for statistical analysis, data mining and forecasting. Point-and-click functionality means there’s no need to program. But if you like to program, you do can do that, too! It’s really the best of both worlds.

SAS blogs

I began this post with something free, and I’ll end it with something free: learning from our SAS blogs. Pick from areas such as customer intelligence, operations research, data science and more. You can also read content from specific regions such as Korea, Latin America, Japan and others. To get you started, here are a few posts from three of our most popular blog authors:

Chris Hemedinger, author of The SAS Dummy:

Rick Wicklin, author of The DO Loop:

Robert Allison, major contributor to Graphically Speaking:

Happy holidays and happy learning. Now go help someone geek out. And I won’t tell if you purchase a few of these things for yourself!

Gifts to give the SAS fan in your life was published on SAS Users.

12月 042019
 

If you’re like me, you struggle to buy gifts. Most folks in my inner circle already have everything they need and most of what they want. Most folks, that is, except the tech-lovers. That’s because there’s always something new on the horizon. There’s always a new gadget or program. Or a new way to learn those things. If you know folks who fall into this category, and who love SAS, boy do I have some gift ideas for you:

Try SAS for free

If you know someone who is interested in SAS but doesn’t work with it on a daily basis, or someone looking to experiment in a different area of SAS, let them know about our free software trials. Who doesn’t love free? And you’ll look like a rock star when you point out that they can explore areas like SAS Data Preparation, SAS Visual Forecasting, SAS Visual Statistics and lots more, for free.

SAS University Edition

Let’s keep things in the vein of free, shall we? SAS University Edition includes SAS Studio, Base SAS, SAS/STAT, SAS/IML, SAS/ACCESS, and several time series forecasting procedures from SAS/ETS. It’s the same software used by sites around the world; that means it’s the most up-to-date statistical and quantitative methods. And it’s free for academic, noncommercial use.

Registration to SAS Global Forum 2020

Oh my gosh, you would be your loved one’s favorite person! I’ve been to many SAS Global Forums and I love seeing the excitement on someone’s face as they explore the booths, sit in on sessions, and network to their heart’s content. SAS Global Forum is the place to be for SAS professionals, thought leaders, decision makers, partners, students, and academics. There truly is something for everyone at this event for analytics enthusiasts. And who wouldn’t want to be in Washington, D.C. in the Spring? It’s cherry blossom time! Be sure to register before Jan. 29, 2020 to get early bird pricing and save $600 off the on-site registration fee.

SAS books

I worked for many years in SAS Press and saw, firsthand, how excited folks get over a new book. I mean, who doesn’t want the 6th edition of the Little SAS Book or the latest from Ron Cody? Browse more than 100 titles and find the perfect gift. Don’t forget, SAS Press is offering 25% off everything in the bookstore for the month of December! Use promo code HOLIDAYS25 at checkout when placing your order. Offer expires at midnight Tuesday, December 31, 2019 (US).

A SAS Learning subscription

Give the gift of unlimited learning with access to our selection of e-learning and video tutorials. You can give a monthly or yearly subscription. Help the special people in your life get a head start on a new career. The gift of education is one of the best gifts you can give.

SAS OnDemand for Academics

For those who don't want to install anything, but run SAS in the cloud, we offer SAS OnDemand for Academics. Users get free access to powerful SAS software for statistical analysis, data mining and forecasting. Point-and-click functionality means there’s no need to program. But if you like to program, you do can do that, too! It’s really the best of both worlds.

SAS blogs

I began this post with something free, and I’ll end it with something free: learning from our SAS blogs. Pick from areas such as customer intelligence, operations research, data science and more. You can also read content from specific regions such as Korea, Latin America, Japan and others. To get you started, here are a few posts from three of our most popular blog authors:

Chris Hemedinger, author of The SAS Dummy:

Rick Wicklin, author of The DO Loop:

Robert Allison, major contributor to Graphically Speaking:

Happy holidays and happy learning. Now go help someone geek out. And I won’t tell if you purchase a few of these things for yourself!

Gifts to give the SAS fan in your life was published on SAS Users.

11月 062019
 

I have been programming SAS for a LONG time and have never seen much in the way of programming standards. For example, most SAS programmers indent DATA and PROC statements (I like three spaces). Most programmers do not like to see more than one statement on a line and most agree that there should be blank lines between program boundaries (DATA and PROC steps).

I thought I would share some of my thoughts on programming standards, with the hope that others will chime in with their ideas.

    • I like to indent all the statements in a DO group or DO loop. If there are nested groups, each one gets indented as well.
    • I prefer variable names in proper case.
    • I am not a fan of camel-case. For example, I prefer Weight_Kg to WeightKg. The reason that some programmers like camel-case is that SAS will automatically split a variable name at a capital letter in some headings.
    • I like my TITLE statements in open code, not inside a PROC. To me, that makes sense because TITLE statements are global.
    • There should be no conversion messages (character to numeric or numeric to character) in the SAS log. For example use Num = INPUT(Char_Num,12.); instead of Num = 1*Char_Num;. The latter statement forces an automatic character to numeric conversion and places a message in the log.
    • I always use the statement ODS NOPROCTITLE;. This eliminates the default SAS procedure name at the top of the output.
    • Although fewer and fewer people are reading raw text data, I like my @ signs to all line up in my INPUT statement.
    • I like to use the /* and */ comments to define all macro variables. For example:

Notice that I prefer named parameters in my macros, instead of positional parameters.

If this seems like too much work - SAS Studio has an automatic formatting tool that can help standardize your programs. For example, look at the code below:

Really ugly, right? Here is how you can use the automatic formatting tool in SAS Studio.

When you click this icon, the program now looks like this:

That’s pretty much the way I would write it. By the way, if you don't like how Studio formatted your code, enter a control-z to undo it.

For more tips on writing code and how to get started in SAS Studio – check out my book, Learning SAS by Example: A Programmer’s Guide, Second Edition. You can also download a free book excerpt. To also learn more about SAS Press, check out the up-and-coming titles, and receive exclusive discounts make sure to subscribe to the SAS Books newsletter.

Making your SAS code more readable was published on SAS Users.

10月 292019
 

Thank you to Lora Delwiche and Susan Slaughter for providing the following information:

Six editions is a lot! If you had told us back when we wrote the first edition of The Little SAS Book that someday we would write a sixth, we would have wondered how we could possibly find that much to say. After all, it is supposed to be The Little SAS Book, isn’t it? But the developers at SAS are constantly hard at work inventing new and better ways of analyzing and visualizing data. And some of those ways turn out to be so fundamental that they belong even in a little book about SAS.

Interface independence

One of the biggest changes to SAS software in recent years is the proliferation of interfaces. SAS programmers have more choices than ever before. Previous editions contained some sections specific to the SAS windowing environment (also called Display Manager). We wrote this edition for all SAS programmers whether you use SAS Studio, SAS Enterprise Guide, the SAS windowing environment, or run in batch. That sounds easy, but it wasn’t. There are differences in how SAS behaves with different interfaces, and these differences can be very fundamental. In particular, the system option that sets the rules for names of variables varies depending on how you run SAS. So old sections had to be rewritten, and we added a whole new section showing how to use variable names containing blanks and special characters.

New ways to read and write Microsoft Excel files

Previous editions already covered how to read and write Microsoft Excel files, but SAS developers have created new ways that are even better. This edition contains new sections about the XLSX LIBNAME engine and the ODS EXCEL destination.

More PROC SQL

From the very first edition, The Little SAS Book always covered PROC SQL. But it was in an appendix, and over time we noticed that most people ignore appendices. So for this edition, we removed the appendix and added new sections on using PROC SQL to:

• Subset your data
• Join data sets
• Add summary statistics to a data set
• Create macro variables with the INTO clause

For people who are new to SQL, these sections provide a good introduction; for people who already know SQL, they provide a model of how to leverage SQL in your SAS programs.

Updates and additions throughout the book

Almost every section in this edition has been changed in some way. We added new options, made sure everything is up-to-date, and ran every example in every SAS interface noting any differences. For example, PROC SGPLOT has some new options, the default ODS style for PDF has changed, and the LISTING destination behaves differently in different interfaces. Here’s a short list, in no particular order, of new or expanded topics in the sixth edition:

• More examples with permanent SAS data sets, CSV files, or tab-delimited files
• More log notes throughout the book showing what to look for
• LIKE or sounds-like (=*) operators in WHERE statements
• CROSSLIST, NOCUM, and NOPRINT options in PROC FREQ
• Grouping data with a user-defined format and the PUT function
• Iterative DO groups
• DO WHILE and DO UNTIL statements
• %DO statements

Even though we have added a lot to this edition, it is still a little book. In fact, this edition is shorter than the last—by 12 pages! We think this is the best edition yet. For a sneak preview check out the free book excerpt. You can also learn more about SAS Press, check out the up-and-coming titles, and to exclusive discounts -- make sure to subscribe to the newsletter.

The Little SAS Book 6.0: The best-selling SAS book gets even better was published on SAS Users.

8月 282019
 

This article is not a tutorial on Hadoop, Spark, or big data. At the same time, no prerequisite knowledge of these technologies is required for understanding. We’ll give you enough background prior to diving into the details. In simplest terms, the Hadoop framework maintains the data and Spark controls and directs data processing. As an analogy, think of Hadoop as a train, big data as the payload, and Spark as the crew driving the train and organizing and distributing the goods.

Big data

I recently read that data volumes are doubling each year. Not that long ago we talked in terms of gigabytes. This quickly turned into terabytes and we’re now in the age of petabytes. The type of data is also changing. Data used to fit neatly into rows and columns. Now, nearly eighty percent of data is unstructured. All these trends and facts have led us to deal with massive amounts of data, aka big data. Maintaining and processing big data required creating technical frameworks. Next, we’ll investigate a couple of these tools.

Hadoop

Hadoop is a technology stack utilizing parallel processing on a distributed filesystem. Hadoop is useful to companies when data sets become so large or complex that their current solutions cannot effectively process the information in a reasonable amount of time. As the data science field has matured over the past few years, so has the need for a different approach to processing data.

Apache Spark

Apache Spark is a cluster-computing framework utilizing both iterative algorithms and interactive/exploratory data analysis. The goal of Spark is to keep the benefits of Hadoop’s scalable, distributed, fault-tolerant processing framework, while making it more efficient and easier to use. Using in-memory distributed computing, Spark provides capabilities over and above the batch model of Hadoop MapReduce. As a result, this brings to the big data world new applications of data science that were previously too expensive or slow on massive data sets.

Now let’s explore how SAS integrates with these technologies to maximize capturing, managing, and analyzing big data.

SAS capabilities to leverage Spark

SAS provides Hadoop data processing and data scoring capabilities using SAS/ACCESS Interface to Hadoop and In-Database Technologies to Hadoop with MapReduce or Spark as the processing framework. This addresses some of the traditional data management batch processing, huge volumes of extract, transform, load (ETL) data as well as faster, interactive and in-memory processing for quicker response with Spark.

In SAS Viya, SAS/ACCESS Interface to Hadoop includes SAS Data Connector to Hadoop. All users with SAS/ACCESS Interface to Hadoop can use the serial. Likewise, SAS Data Connect Accelerator to Hadoop can load or save data in parallel between Hadoop and SAS using SAS Embedded Process, as a Hive/MapReduce or Spark job.

Connecting to Spark in a Hadoop Cluster

There are two ways to connect to a Hadoop cluster using SAS/ACCESS Interface to Hadoop, based on the SAS environment: LIBNAME and CASLIB statements.

LIBNAME statement to connect to Spark from MVA SAS

options set=SAS_HADOOP_JAR_PATH="/third_party/Hadoop/jars/lib:/third_party/Hadoop/jars/lib/spark"; 
options set=SAS_HADOOP_CONFIG_PATH="/third_party/Hadoop/conf"; 
 
libname hdplib hadoop server="hadoop.server.com" port=10000 user="hive"
schema='default' properties="hive.execution.engine=SPARK";

Parameters

SAS_HADOOP_JAR_PATH Directory path for the Hadoop and Spark JAR files
SAS_HADOOP_CONFIG_PATH Directory path for the Hadoop cluster configuration files
Libref The hdplib libref specifies the location where SAS will find the data
SAS/ACCESS Engine Name HADOOP option to connect Hadoop engine
SERVER Hadoop Hive server to connect
PORT Listening Hive server Port. 10000 is the default, so it is not required. It is included just in case
USER and PASSWORD Are not always required
SCHEMA Hive schema to access. It is optional; by default, it connects to the “default” schema
PROPERTIES Hadoop properties. Choosing SPARK for the property hive.execution.engine enables SAS Viya to use Spark as the execution platform

 
CASLIB statement to connect from CAS

caslib splib sessref=mysession datasource=(srctype="hadoop", dataTransferMode="auto",username="hive", server="hadoop.server.com", 
hadoopjarpath="/opt/sas/viya/config/data/hadoop/lib:/opt/sas/viya/conf ig/data/hadoop/lib/spark", 
hadoopconfigdir="/opt/sas/viya/config/data/hadoop/conf", schema="default"
platform="spark"
dfdebug="EPALL" 
properties="hive.execution.engine=SPARK");

Parameters

CASLIB Space holder for the specified data access. The splib CAS library specifies the Hadoop data source
sessref Holds the CAS library in a specific CAS session. mysession is the current active CAS session
SRCTYPE Type of data source
DATATRANSFERMODE Type of data movement between CAS and Hadoop. Accepts one of three values – serial, parallel, auto. When AUTO is specified, CAS choose the type of data transfer based on available license in the system. If Data Connect Accelerator to Hadoop has been licensed, parallel data transfer will be used, otherwise serial mode of transfer is used
HADOOPJARPATH Hadoop and Spark JAR files location path on the CAS cluster
HADOOPCONFIGDIR Hadoop configuration files location path on the CAS cluster
PLATFORM Type of Hadoop platform to execute the job or transfer data using SAS Embedded Process. Default value is “mapred” for Hive MapReduce. When using “Spark”, data transfer and job executes as a Spark job
DFDEBUG Used to receive additional information back from SAS Embedded Process transfers data in the SAS log
PROPERTIES Hadoop properties. Choosing SPARK for the property hive.execution.engine enables SAS Viya to use Spark as the execution platform

 

Data Access using Spark

SAS Data Connect Accelerator for Hadoop with the Spark platform option uses Hive as the query engine to access Spark data. Data movement happens between Spark and CAS through SAS generated Scala code. This approach is useful when data already exists in Spark and either needs to be used for SAS analytics processing or moved to CAS for massively parallel data and analytics processing.

Loading Data from Hadoop to CAS using Spark

Processing data in CAS offers advanced data preparation, visualization, modeling and model pipelines, and finally model deployment. Model deployment can be performed using available CAS modules or pushed back to Spark if the data is already in Hadoop.

Load data from Hadoop to CAS using Spark

proc casutil 
      incaslib=splib
      outcaslib=casuser;
      load casdata="gas"
      casout="gas"
      replace;
run;

Parameters

PROC CASUTIL Used to process CAS action routines to process data
INCASLIB Input CAS library to read data
OUTCASLIB Output CAS library to write data
CASDATA Table to load to the CAS in-memory server
CASOUT Output CAS table name

 

We can look at the status of the data load job using Hadoop' resource management and job scheduling application, YARN. YARN is responsible for allocating system resources to the various applications running in a Hadoop cluster and scheduling tasks to be executed on different cluster nodes.

Loading Data from Hadoop to Viya CAS using Spark

In the figure above, the YARN application executed the data load as a Spark job. This was possible because the CASLIB statement had Platform= Spark option specified. The data movement direction, in this case Hadoop to CAS uses the Spark job name, “SAS CAS/DC Input,” where “Input” is data loaded into CAS.

Saving Data from CAS to Hadoop using Spark

You can save data back to Hadoop from CAS at many stages of the analytic life cycle. For example, use data in CAS to prepare, blend, visualize, and model. Once the data meets the business use case, data can be saved in parallel to Hadoop using Spark jobs to share with other parts of the organization.

Using the SAVE CAS action to move data to Hadoop using Spark

proc cas;
session mysession; 
      table.save /
      caslib="splib"
      table={caslib="casuser", name="gas"},
      name="gas.sashdat"
      replace=True;
quit;

Parameters

PROC CAS Used to execute CAS actionsets and actions to process data.
“table” is the actionset and “save” is the action
TABLE Location and name of the source table
NAME Name of the target table saved to the Hadoop library using Spark

 

We can verify the status of saving data from CAS to Hadoop using YARN application. Data from CAS saves as a Hadoop table using, Spark as the execution platform. Furthermore, as SAS Data Connect Accelerator for Hadoop transfers data in parallel, individual Spark executors in each of the Spark executor nodes handles data execution for that specific Hadoop cluster node.

Saving Data from Viya CAS to Hadoop using Spark

Finally, the SAVE data executed as a Spark job. As we can see from YARN, the Spark job named “SAS CAS/DC Output” specifies that the data moves from CAS to Hadoop.

Where we are; where we're going

We have so far traveled across the Spark pond to setup SAS libraries for Spark, Load and Save data from and to Hadoop using Spark. In the next section we’ll look at ways to Score data and execute SAS code inside Hadoop using Spark.

Data and Analytics Innovation using SAS & Spark - part 1 was published on SAS Users.