This Is the third and final installment of a series of posts discussing promising use cases in retail and the benefits of adopting IoT technologies in 2019. What will be the ground-breaking new application of IoT and analytics that drives an epiphany and spurs widespread adoption? In previous posts, I discussed [...]
Multi-tenancy is one of the exciting new capabilities of SAS Viya. Because it is so new, there is quite a lot of misinformation going around about it. I would like to offer you five key things to know about multi-tenancy before implementing a project using this new paradigm.
All tenants share one SAS Viya deployment
Just as apartment units exist within a larger, common building, all tenants, including the provider, exist within one, single SAS Viya deployment. Tenants share some SAS Viya resources such as the physical machines, most microservices, and possibly the SAS Infrastructure Data Server. Other SAS Viya resources are duplicated per tenant such as the CAS server and compute launcher. Regardless, the key point here is that because there is one SAS Viya deployment, there is one, and only one, SAS license that applies to all tenants. Adding a new tenant to a multi-tenant deployment could have licensing ramifications depending upon how the CAS server resources are allocated.
Decision to use multi-tenancy must be made at deployment time
Many people, myself included, are not very comfortable with commitment. Making a decision that cannot be changed is something we avoid. Deciding whether your SAS Viya deployment supports multi-tenancy cannot be put off for later.
This decision must be made at the time the software is deployed. There is currently no way to convert a multi-tenant deployment to a single-tenant deployment or vice versa short of redeployment, so choose wisely. As with marriage, the decision to go single-tenant or multi-tenant should not be taken lightly and there are benefits to each configuration that should be considered.
Each tenant is accessed by separate login
Let’s return to our apartment analogy. Just as each apartment owner has a separate key that opens only the apartment unit they lease, SAS Viya requires users to log on (authenticate) to a specific tenant space before allowing them access.
SAS Viya facilitates this by accessing each tenant by way of a separate sub-domain address. As shown in the diagram below, a user wishing to use the Acme tenant must access the deployment with a URL of acme.viya.sas.com while a GELCorp user would use a URL of gelcorp.viya.sas.com.
This helps create total separation of tenant access and allows administrators to define and restrict user access for each tenant. It does, however, mean that each tenant space is authenticated individually and there is no notion of single sign-on between tenants.
No content is visible between tenants
You will notice in both images above that there are brick walls between each of the tenants. This is to illustrate how tenants are completely separated from one another. One tenant cannot see any other tenant’s content, data, users, groups or even that other tenants exist in the system.
One common scenario for multi-tenancy is to keep business units within a single corporation separated. For example, we could set up Sales as a tenant, Finance as a tenant, and Human Resources as a tenant. This works very well if we want to truly segregate the departments' work. But what happens when Sales wants to share a report with Finance or Finance wants to publish a report for the entire company to view?
There are two options for this situation:
• We could export content from one tenant and import it into the other tenant(s). For example, we would export a report from the Sales tenant and import it into the Finance tenant, assuming that data the report needs is available to both. But now we have the report (and data) in two places and if Sales updates the report we must repeat the export/import process.
• We could set up a separate tenant at the company level for shared content. Because identities are not shared between tenants, this would require users to log off the departmental tenant and log on to the corporate tenant to see shared reports.
There are pros and cons to using multi-tenancy for departmental separation and the user experience must be considered.
Higher administrative burden
Managing and maintaining a multi-tenancy deployment is more complex than taking care of a single-tenant deployment. Multi-tenancy requires additional CAS servers, additional micro-services, possibly additional machines, and multiple administrative personas. The additional resources can complicate backup strategies, authorization models, operating system security, and resource management of shared resources.
There are also more levels of administration which requires an administrator persona for the provider of the environment and separate administrator personas for each tenant. Each of these administration personas have varying scope into which aspects of the entire deployment they can interact with. For example, the provider administrator can see all system resources, all system activity, logs and tenants, but cannot see any tenant content.
Tenant administrators can only see and interact with dedicated tenant resources such as their CAS server and can also manage all tenant content. They cannot, however, see system resources, other tenants, or logs.
Therefore, coordinating management of a complete multi-tenant deployment will require multiple administration personas, careful design of operating system group membership to protect and maintain sufficient access to files and processes, and possibly multiple logins to accomplish administrative tasks.
I have pointed out a handful of key concepts that differ between the usual single-tenant deployments and what you can expect with a multi-tenant deployment of SAS Viya. I am obviously just scratching the surface on these topics. Here are a couple of other resources to check out if you want to dig in further.
Documentation: Multi-tenancy: Concepts
Article: Get ready! SAS Viya 3.4 highlights for the Technical Architect
During each minute you spend reading this article, 18 people will die of cancer. With each tick of the clock, your odds of becoming one of them increases: age is one of the primary risk factors for cancer. Take Nancy. She is a normal, active, healthy woman. Inside her body [...]
As one of SAS' newest systems engineers, recently joining the Americas Artificial Intelligence Team, I’m incredibly excited to gain expertise in artificial intelligence and machine learning. I also look forward applying my knowledge to enable others to leverage the advanced technologies that SAS offers.
However, as a recent graduate with no prior experience coding in SAS, I expected a steep learning curve and a slow introduction into the world of AI. When I first arrived on campus right after ringing in 2019, I had no idea how fast I would employ SAS technology to create a tangible AI application.
During my second week on the job, I learned about the development and deployment of effective computer vision models. Not only did I create a demo to distinguish between a valid company logo and a counterfeit version, but I was also amazed by the ease and speed at which it could be done.
Read on for a look at the model and how it works.
For this model, I wanted to showcase the potential of using computer vision to protect corporate identity — in this case, a company’s logo. I used the familiar SAS logo as an example of a valid company logo, and for demonstration purposes, I employed the logo used by the Scandinavian Airlines System to represent what a counterfeit could look like. Although this airline is a legitimate business that isn’t knocking off SAS (they are actually one of our customers), the two logos showcase the technology’s ability to distinguish between similar branding. Thus, the model could easily be adapted to detect actual counterfeits.
I first assembled a collection of sample images of both companies’ logos to serve as training data. For each image, I drew a bounding box around the logo to label it as either a “SASlogo” or a “SAS_counterfeit.”
This data was then used to train the model to identify the two logos through machine learning. The model was written on the SAS Deep Learning Python Interface, DLPy, using a YOLO (You Only Look Once) algorithm.
To test the model’s effectiveness in object detection, additional validation images were supplied to verify its ability to identify both valid and counterfeit versions of the logo. In the following image of SAS headquarters in Cary, the model correctly identified the displayed logo as the valid version with a confidence level of 0.61.
One of the key advantages of using the DLPy interface is the ability to easily deploy the model to various SAS engines. I simply created an ASTORE file as the model output, which can be deployed with SAS Event Stream Processing (ESP), Cloud Analytics Service (CAS), or Micro Analytic Service (MAS).
However, the model can also be deployed when SAS technology is not available by creating an ONNX file as the output. This type of file can be used to integrate the model into an iOS application, for example.
As I conclude my first weeks at SAS, I am thrilled by the opportunity to continue to build expertise in SAS technologies and programming. For the next several months, I will be attending the Customer Advisory Academy in Cary, and I look forward to applying the knowledge and skills I gain to the creation of other AI applications in the future!Want to learn more about object detection in computer vision? Read this blog post!
Expect to lose time if you don't include a data steward in your project until you're reviewing the data model.
The post Data stewards – Why they’re the unsung heroes of the business appeared first on The Data Roundtable.
Have you ever run a regression model in SAS but later realize that you forgot to specify an important option or run some statistical test? Or maybe you intended to generate a graph that visualizes the model, but you forgot? Years ago, your only option was to modify your program and rerun it. Current versions of SAS support a less painful alternative: you can use the STORE statement in many SAS/STAT procedures to save the model to an item store. You can then use the PLM procedure to perform many post-modeling analyses, including performing hypothesis tests, showing additional statistics, visualizing the model, and scoring the model on new data. This article shows four ways to use PROC PLM to obtain results from your regression model.
What is PROC PLM?
PROC PLM enables you to analyze a generalized linear model (or a generalized linear mixed model) long after you quit the SAS/STAT procedure that fits the model. PROC PLM was released with SAS 9.22 in 2010. This article emphasizes four features of PROC PLM:
- You can use the SCORE statement to score the model on new data.
- You can use the EFFECTPLOT statement to visualize the model.
- You can use the ESTIMATE, LSMEANS, SLICE, and TEST statements to estimate parameters and perform hypothesis tests.
- You can use the SHOW statement to display statistical tables such as parameter estimates and fit statistics.
For an introduction to PROC PLM, see "Introducing PROC PLM and Postfitting Analysis for Very General Linear Models" (Tobias and Cai, 2010). The documentation for the PLM procedure includes more information and examples.
To use PROC PLM you must first use the STORE statement in a regression procedure to create an item store that summarizes the model. The following procedures support the STORE statement: GEE, GENMOD, GLIMMIX, GLM, GLMSELECT, LIFEREG, LOGISTIC, MIXED, ORTHOREG, PHREG, PROBIT, SURVEYLOGISTIC, SURVEYPHREG, and SURVEYREG.
The example in this article uses PROC LOGISTIC to analyze data about pain management in elderly patients who have neuralgia. In the PROC LOGISTIC documentation, PROC LOGISTIC fits the model and performs all the post-fitting analyses and visualization. In the following program, PROC LOGIST fits the model and stores it to an item store named PainModel. In practice, you might want to store the model to a permanent libref (rather than WORK) so that you can access the model days or weeks later.
Data Neuralgia; input Treatment $ Sex $ Age Duration Pain $ @@; datalines; P F 68 1 No B M 74 16 No P F 67 30 No P M 66 26 Yes B F 67 28 No B F 77 16 No A F 71 12 No B F 72 50 No B F 76 9 Yes A M 71 17 Yes A F 63 27 No A F 69 18 Yes B F 66 12 No A M 62 42 No P F 64 1 Yes A F 64 17 No P M 74 4 No A F 72 25 No P M 70 1 Yes B M 66 19 No B M 59 29 No A F 64 30 No A M 70 28 No A M 69 1 No B F 78 1 No P M 83 1 Yes B F 69 42 No B M 75 30 Yes P M 77 29 Yes P F 79 20 Yes A M 70 12 No A F 69 12 No B F 65 14 No B M 70 1 No B M 67 23 No A M 76 25 Yes P M 78 12 Yes B M 77 1 Yes B F 69 24 No P M 66 4 Yes P F 65 29 No P M 60 26 Yes A M 78 15 Yes B M 75 21 Yes A F 67 11 No P F 72 27 No P F 70 13 Yes A M 75 6 Yes B F 65 7 No P F 68 27 Yes P M 68 11 Yes P M 67 17 Yes B M 70 22 No A M 65 15 No P F 67 1 Yes A M 67 10 No P F 72 11 Yes A F 74 1 No B M 80 21 Yes A F 69 3 No ; title 'Logistic Model on Neuralgia'; proc logistic data=Neuralgia; class Sex Treatment; model Pain(Event='Yes')= Sex Age Duration Treatment; store PainModel / label='Neuralgia Study'; /* or use mylib.PaimModel for permanent storage */ run;
The LOGISTIC procedure models the presence of pain based on a patient's medication (Drug A, Drug B, or placebo), gender, age, and duration of pain. After you fit the model and store it, you can use PROC PLM to perform all sorts of additional analyses, as shown in the subsequent sections.
Use PROC PLM to score new data
An important application of regression models is to predict the response variable for new data. The following DATA step defines three new patients. The first two are females who are taking Drug B. The third is a male who is taking Drug A:
/* 1.Use PLM to score future obs */ data NewPatients; input Treatment $ Sex $ Age Duration; datalines; B F 63 5 B F 79 16 A M 74 12 ; proc plm restore=PainModel; score data=NewPatients out=NewScore predicted LCLM UCLM / ilink; /* ILINK gives probabilities */ run; proc print data=NewScore; run;
The output shows the predicted pain level for the three patients. The younger woman is predicted to have a low probability (0.01) of pain. The model predicts a moderate probability of pain (0.38) for the older woman. The model predicts a 64% chance that the man will experience pain.
Notice that the PROC PLM statement does not use the original data. In fact, the procedure does not support a DATA= option but instead uses the RESTORE= option to read the item store. The PLM procedure cannot create plots or perform calculations that require the data because the data are not part of the item store.
Use PROC PLM to visualize the model
I've previously written about how to use the EFFECTPLOT statement to visualize regression models. The EFFECTPLOT statement has many options. However, because PROC PLM does not have access to the original data, the EFFECTPLOT statement in PROC PLM cannot add observations to the graphs.
Although the EFFECTPLOT statement is supported natively in the LOGISTIC and GENMOD procedure, it is not directly supported in other procedures such as GLM, MIXED, GLIMMIX, PHREG, or the SURVEY procedures. Nevertheless, because these procedures support the STORE statement, you can use the EFFECTPLOT statement in PROC PLM to visualize the models for these procedures. The following statement uses the EFFECTPLOT statement to visualize the probability of pain for female and male patients that are taking each drug treatment:
/* 2. Use PROC PLM to create an effect plot */ proc plm restore=PainModel; effectplot slicefit(x=Age sliceby=Treatment plotby=Sex); run;
The graphs summarize the model. For both men and women, the probability of pain increases with age. At a given age, the probability of pain is lower for the non-placebo treatments, and the probability is slightly lower for the patients who use Drug B as compared to Drug A. These plots are shown at the mean value of the Duration variable.
Use PROC PLM to compute contrasts and other estimates
One of the main purposes of PROC PLM Is to perform postfit estimates and hypothesis tests. The simplest is a pairwise comparison that estimates the difference between two levels of a classification variable. For example, in the previous graph the probability curves for the Drug A and Drug B patients are close to each other. Is there a significant difference between the two effects? The following ESTIMATE statement estimates the (B vs A) effect. The EXP option exponentiates the estimate so that you can interpret the 'Exponentiated' column as the odds ratio between the drug treatments. The CL option adds confidence limits for the estimate of the odds ratio. The odds ratio contains 1, so you cannot conclude that Drug B is significantly more effective that Drug A at reducing pain.
/* 3. Use PROC PLM to create contrasts and estimates */ proc plm restore=PainModel; /* 'Exponentiated' column is odds ratio between treatments */ estimate 'Pairwise B vs A' Treatment 1 -1 / exp CL; run;
Use PROC PLM to display statistics from the analysis
One of the more useful features of PROC PLM is that you can use the SHOW statement to display tables of statistics from the original analysis. If you want to see the ParameterEstimates table again, you can do that (SHOW PARAMETERS). You can even display statistics that you did not compute originally, such as an estimate of the covariance of the parameters (SHOW COVB). Lastly, if you have the item store but have forgotten what program you used to generate the model, you can display the program (SHOW PROGRAM). The following statements demonstrate the SHOW statement. The results are not shown.
/* 4. Use PROC PLM to show statistics or the original program */ proc plm restore=PainModel; show Parameters COVB Program; run;
In summary, the STORE statement in many SAS/STAT procedures enables you to store various regression models into an item store. You can use PROC PLM to perform additional postfit analyses on the model, including scoring new data, visualizing the model, hypothesis testing, and (re)displaying additional statistics. This technique is especially useful for long-running models, but it is also useful for confidential data because the data are not needed for the postfit analyses.
The post 4 reasons to use PROC PLM for linear regression models in SAS appeared first on The DO Loop.
Since we added the new "Recommended by SAS" widget in the SAS Support Communities, I often find myself diverted to topics that I probably would not have found otherwise. This is how I landed on this question and solution from 2011 -- "How to convert 5ft. 4in. (Char) into inches (Num)". While the question was deftly answered by my friend (and SAS partner) Jan K. from the Netherlands, the topic inspired me to take it a step further here.
Jan began his response by throwing a little shade on the USA:
Short of moving to a country that has a decent metric system in place, I suggest using a regular expression.
On behalf of my nation I just want say that, for the record, we tried. But we did not get very far with our metrication, so we are stuck with the imperial system for most non-scientific endeavors.
Matching patterns with a regular expression
Regular expressions are a powerful method for finding specific patterns in text. The syntax of regular expressions can be a challenge for those getting started, but once you've solved a few pattern-recognition problems by using them, you'll never go back to your old methods.
Beginning with the solution offered by Jan, I extended this program to read in a "ft. in." measurement, convert to the component number values, express the total value in inches, and then convert the measurement to centimeters. I know that even with my changes, we can think of patterns that might not be matched. But allow me to describe the updates:
- Long-time users of
- The PRXPOSN function returns the nth "capture buffer" from the pattern match. Capture buffers are identified in the pattern by parentheses -- you count each open parenthesis to arrive at the expected buffer. So the first buffer matches on the first sequence of digits: (\d*). The second buffer is the optional whitespace between ft. and in.: (\s*). Buffer 3 is the entire pattern for "n in.": ((\d?\.?\d?)in.). Finally, buffer 4 is an "inner" capture group of buffer 3, containing just the sequence of digits with optional decimal for inches: (\d?\.?\d?)
- The PRXPOSN function returns the text value of the match, so we have to use the INPUT function to convert that to a SAS numeric value.
- Finally, I added the calculations to convert to total inches, and then centimeters.
Here's my program, followed by the result:
data measure; length original $ 25 feet 8 inches 8 total_inches 8 total_cm 8; /* constant regex is parsed just once */ re = prxparse('/(\d*)ft.(\s*)((\d?\.?\d?)in.)?/'); input; original = _infile_; if prxmatch(re, original) then do; feet = input ( prxposn(re, 1, original), best12.); inches = input ( prxposn(re, 4, original), best12.); if missing(inches) and not missing(feet) then inches=0; end; else original = "[NO MATCH] " || original; total_inches = (feet*12) + inches; total_cm = total_inches * 2.54; drop re; cards; 5ft. 4in. 4ft 0in. 6ft. 10in. 3ft.2in. 4ft. 6ft. 1.5in. 20ft. 11in. 25ft. 6.5in. Soooooo big ;
Other tools to help with regular expressions
The Internet offers a plethora of online tools to help developers build and test regular expression syntax. Here's a screenshot from RegExr.com, which I used to test some aspects of my program.
Tools like these provide wonderful insight into the capture groups and regex directives that will influence the pattern matching. They are part tutorial, part workbench for regex crafters at all levels.
Many of these tools also include community contributions for matching common patterns. For example, many of us often need to match/parse data with e-mail addresses, phone numbers, and other tokens as data fields. Sites like RegExr.com include the syntax for many of these common patterns that you can copy and use in your SAS programs.
- Using a regular expression to validate a SAS variable name
- In his book High-Performance SAS Coding, Christian Graffeuille (ChrisNZ on the communities) devotes a chapter to using regular expressions in SAS. (He also answers a TON of regex questions on the community.)
- Ron Cody also teaches about regular expressions for data cleansing -- see his blog post and even more in his book, Cody's Data Cleansing Techniques using SAS.
The post Convert a text-based measurement to a number in SAS appeared first on The SAS Dummy.
Creating a map with SAS Visual Analytics begins with the geographic variable. The geographic variable is a special type of data variable where each item has a latitude and longitude value. For maximum flexibility, VA supports three types of geography variables:
- Custom coordinates
- Custom polygons
This is the first in a series of posts that will discuss each type of geography variable and their creation. The predefined geography variable is the easiest and quickest way to begin and will be the focus of this post.
SAS Visual Analytics comes with nine (9) predefined geographic lookup types. This lookup method requires that your data contains a variable matching one of these nine data types:
- Country or Region Names – Full proper name of a country or region (ISO 3166-1)
- Country or Region ISO 2-Letter Codes – Alpha-2 country code (ISO 3166-1)
- Country or Region ISO Numeric Codes – Numeric-3 country code (ISO 3166-1)
- Country or Region SAS Map ID Values – SAS ID values from MPASGFK continent data sets
- Subdivision (State, Province) Names – Full proper name for level 2 admin regions (ISO 3166-2)
- Subdivision (State, Province) SAS Map ID Values – SAS ID values from MAPSGFK continent data sets (Level 1)
- US State Names – Full proper name for US State
- US State Abbreviations – Two letter US State abbreviation
- US Zip Codes – A 5-digit US zip code (no regions)
Once you have identified a variable in your dataset matching one of these types, you are ready to begin. For our example map, the dataset 'Crime' and variable 'State name' will be used. Let’s get started.
- Begin by opening VA and navigate to the Data panel on the left of the application.
- Select the desired dataset and locate a variable that matches one of the predefined lookup types discussed above. Click the down arrow to the right of the variable and select ‘Geography’ from the Classification dropdown menu.
- The ‘Edit Geography Item’ window will open. Depending upon the type of geography variable selected, some of the options on this dialog will vary. The 'Name' textbox is common for all types and will contain the variable selected from your dataset. Edit this label as needed to make it more user friendly for your intended audience.
- The ‘Geography data type’ drop down list is where you select the desired type of geography variable. In this example, we are using the default predefined option.
- Locate the 'Name or code context' dropdown list. Select the type of predefined variable that matches the data type of the variable chosen from your data. Once selected, VA scans your data and does an internal lookup on each data item. This process identifies latitude and longitude values for each item of your dataset. Lookup results are shown on the right of the window as a percentage and a thumbnail size map. The thumbnail map displays the the first 100 matches.
- If there are any unmatched data items, the first 5 will be displayed. This may provide a better understanding of your data. In this example, it is clear from variable name as to what type should be selected (US State Names). However, in most cases that choice will not be this obvious. The lesson here, know your data!
Once you are satisfied with the matched results, click the OK button to continue. You should see a new section in the Data panel labeled ‘Geography’. The name of the variable will be displayed beside a globe icon. This icon represents the geography variable and provides confirmation it was created successfully.
Now that the geography variable has been created, we are ready to create a map. To do this, simply drag it from the Data panel and drop it on the VA report canvas. The auto-map feature of VA will recognize the geography variable and create a bubble map with an OpenStreetMap background. Congratulations! You have just created your first map in VA.
The concept of a geography variable was introduced in this post as the foundation for creating all maps in VA. Using the predefined geography variable is the quickest way to get started with Geo maps. In situations when the predefined type is not possible, using one of VA's custom geography types becomes necessary. These scenarios will be discussed in future blog posts.
Each day, more than 130 Americans die from opioid overdoses. Combating the opioid epidemic begins with understanding it, and that begins with data. SAS recently partnered with graduate students from Carnegie Mellon University (CMU) 's Heinz College of Information Systems and Public Policy to understand how data mining and machine [...]