11月 082017
 

A SAS programmer wanted to display a table in which the rows have different formats. An example is shown below. The programmer wanted columns that represent statistics and rows that represent variables. She wanted to display formats (such as DOLLAR) for some variables—but only for certain statistics. For example, the number of nonmissing observations (N) should never use the format of the variable, whereas the minimum, maximum, and mean values should.

Format rows of a table

In SAS you can easily apply a format to a column, but not to a row. However, SAS provides two techniques that enable you to control the appearance of cells in tables:

Both PROC TEMPLATE and PROC REPORT have a learning curve. If you are in a hurry, an alternative solution is to use the DATA step to create a factoid. Recall that a factoid is an ODS table that contains character values and (formatted) numbers in the same column. A factoid can display mixed-type data by converting every value to a string. This conversion process gives you complete control over the format of each cell in the table. Consequently, you can use context to format some numbers differently from other numbers.

Creating a custom factoid

Suppose that you want to generate the table shown at the top of this article. You can obtain the statistics from PROC MEANS and then transpose the data. The following statements produce an output data set that contains descriptive statistics for three variables in the Sashelp.Cars data:

proc means data=sashelp.cars noprint;
var Cylinders EngineSize MSRP;
output out=stats(drop=_TYPE_ _FREQ_
                 rename=(_STAT_=Label1)
                 where=(Label1 ^= "STD") );
run;
 
proc print data=stats noobs;
run;
Descriptive statistics with formats

I have highlighted the first row of the output data to indicate why this output is not suitable for a report. The output variables retain the format of the original variables. However, the 'N' statistic is the sample size, and a sample of size '$428' does not make sense! Even '428.000' is a strange value to see for a sample size. It would be great to format the first row with an intrger format such as 8.0.

The next DATA step creates three character variables (CVALUE1-CVALUE3) that contain formatted values of the three numerical variables in the data. The SELECT-WHEN statement does the following:

  • For each observation and for each of the three numerical variables:
    • If the LABEL1 variable is 'N', then apply the 8.0 format to the value of the numerical variable.
    • Otherwise use the VVALUE function to get the formatted value for the numerical variable.
data Factoid;
set stats;
array nValue[3] Cylinders EngineSize MSRP;      /* numerical variables */
array cValue[3] $12.;                           /* cValue[i] is formatted version of nValue[i] */
/* with a little effort you could assign labels dynamically */
label cValue1="Cylinders" cValue2="EngineSize" cValue3="MSRP";
 
do i = 1 to dim(nValue);
   select (Label1);
      when ('N')    cValue[i] = put(nvalue[i], 8.0);
      otherwise     cValue[i] = vvalue(nvalue[i]);
   end;
end;
run;
 
proc print data=Factoid label noobs;
   var Label1 cValue:;
run;

The output displays the character columns, which contain formatted numbers. With additional logic, you could display fewer decimal values for the MEAN row.

Transpose the factoid

The previous table contains the information that the programmer wants. However, the programmer asked to display the transpose of the previous table. For completeness, here are SAS statements that transpose the table:

proc transpose data=Factoid out=TFactoid;
   var cValue:;
   ID Label1;
   IDLABEL Label1;
run;
 
proc print data=TFactoid(drop=_NAME_) noobs label;
   label _LABEL_ = "Variable";
run;

The final table is displayed at the top of the article.

Summary

In summary, this article shows one way to display a column of data in which each cell potentially has a different format. The trick is to create a factoid, which means that you create a character copy of numeric data. As you create the copy, you can use the PUT function to apply a custom format, or you can use the VVALUE function to get the existing formatted value.

In general, you should try to avoid creating copies of data, so use PROC TEMPLATE and PROC REPORT if you know how. However, if you don't know how to use those tools, the factoid provides an alternative way to control the formatted values that appear in a table. Although not shown in this article, a factoid also enables you to display character and numeric values in the same column.

The post How to format rows of a table in SAS appeared first on The DO Loop.

11月 072017
 

SAS Tips and TricksThere is certainly no shortage of terrific tips and tricks in various SAS blogs from some of our most distinguished SAS in-house experts. But, there's another group of equally qualified experts who don't often get to share their expertise on this channel: our customers. So, I went on a quest to get the inside scoop from various SAS users, polling Friends of SAS members to get their feedback on their favorite SAS tips.

We asked a few of these Friends of SAS members who are regular SAS users to share with us their top SAS tips and tricks for improving performance or something they wished they had known earlier in their SAS career. Based on that, we got a wide range of tips and tricks from a number of different SAS users – ranging from novice to expert and across various industries and product users. Check out some of them below:

FUNCTIONS

Functions are either built into SAS itself or you can write your own customized code that act in the same manner, all of which help in analyzing and processing data. There are a variety of function categories that include mathematical, date and time, character, truncation, and miscellaneous. Using functions makes us more efficient, and we don’t have to re-invent the wheel every time we want to figure something out. With this being said, some of our regular SAS users have a thing or two to say about dealing with functions that may help you out:

“Before you program any complex code, look for a SAS function that will do the task for you.”
     - John Ladds, Past President, OASUS

“Insert a line break in a concatenated string, such as: manylines = catx('0a'x,a,b,c);”
     - Aroop Ghosh, Principal Consultant, Webtalk Communications

“Use the lag function to create time related variables, for example, in time punch data”
     - Yolanda, Analyst, TD

“A good trick that I have recently learnt [sic]which can make the code less wordier is using the functions IFN and IFC as an alternative to IF THEN ELSE statements in conditional processing.”
     - Sunny Giroti, Master of Business Analytics Candidate, Schulich School of Business

“IFN can be used in place of IF THEN ELSE to shorten code”
     - Neil Menezes, Senior Business Anlyst, CTFS

“Ron Cody’s link from SAS.COM. It has many SAS function examples.”
     - John Lam, CIBC

SYNTAX/SHORTCUTS/EFFICIENCIES

You know what they say: time is money. So for a SAS programmer, finding shortcuts and ways to work more efficiently and faster are important to get a job done quicker. Here are a few ways SAS users think can make your life easy while working with SAS:

“Use missover to ensure no records are skipped when reading in a file”
     - Scott Bellefeuille, IT Solutions Developer (Merchant Services), TD Bank

“Pressing keys 'Ctrl'+'/' to comment out a line of code.”
     - Bunce Leung, Execution Manager, RBC

“Variable Lists - being able to refer to variables using double dashes to indicate all variables between first and last in a dataset is super useful for many procs. The later versions of being able to use the prefix and colon to indicate all datasets with a prefix is a great shortcut as well.”
     - Fareeza Khurshed, Manager (Statistical Services), Alberta Treasury Board and Finance

“I like to use PERL in SAS for finding stuff in character variables.”
     - Peter Timusk, Statistics Officer, Statistics Canada

“Title "SAS can give you an Inheritance". Have an ODBC driver on your local PC but not on a remote server? No problem. Use rsubmit with the inheritlib option. Your remote server will now inherit the ODBC driver and be able to access a database you thought you could only reach with your PC.”
     - Horst Wolter, Manager, TD Bank

“If you want to speed the processing of your program. Run your join statements on the "work" library. It is must faster.”
     - Estela Tavares, Economist, Statistics Canada

“When dealing with probability, can logistic be used in all cases? Trick Q - as A is N0. What about the times, probability is 0 and 1. What if the data is heavily distributed on 1s and 0s.”
     - Mukul Pandey, Student Business Analytics, Schulich School of Business

“Proc tabulate can perform descriptive statistics better than proc freq and proc means.”
     - Taha Azizi, Senior Business Insight Analyst, TD

Your turn

Were any of these tips and tricks useful? Do you use them already? What are some of your top SAS tips and tricks? Please be sure to share in the comments below!

Looking for more tips and tricks? Check out this video featuring six Canadian SAS programmers, including a few Friends of SAS members, who share some of their favourite SAS programming tips.

About Friends of SAS

If you’re not familiar with Friends of SAS, it is an exclusive online community available only to our Canadian SAS customers and partners to recognize and show our appreciation for their affinity to SAS. Members complete activities called 'challenges' and earn points that can be redeemed for rewards. There are opportunities to build powerful connections, gain privileged access to SAS resources and events, and boost your learning and development of SAS all in a fun environment.

Interested in learning more about Friends of SAS? Feel free to email myself at Natasha.Ulanowski@sas.com or Martha.Casanova@sas.com with any questions or more details.

SAS tips and tricks: Users-tell-all edition was published on SAS Users.

11月 062017
 

When she was a little girl, SAS Senior User Experience Designer Khaliah Cothran’s parents gave her a creativity building set. She fell in love with building models of all kinds, from cars and trains to merry-go-rounds. And just like that, an engineer was born. A few years later, Cothran happened [...]

The post Building the STEM pipeline one story at a time appeared first on SAS Analytics U Blog.

11月 062017
 
A factoid in SAS is a table that displays numeric and chanracter values in a single column

Have you ever seen the "Fit Summary" table from PROC LOESS, as shown to the right? Or maybe you've seen the "Model Information" table that is displayed by some SAS analytical procedures? These tables provide brief interesting facts about a statistical procedure, hence they are called factoids.

In SAS, a "factoid" has a technical meaning:

  • A factoid is an ODS table that can display numerical and character values in the same column.
  • A factoid displays row headers in a column. In most ODS styles, the row headers are displayed in the first column by using a special style, such as a light-blue background color in the HTMLBlue style.
  • A factoid does not display column headers because the columns display disparate and potentially unrelated information.

I want to emphasize the first item in the list. Since variables in a SAS data set must be either character or numeric, you might wonder how to access the data that underlies a factoid. You can use the ODS OUTPUT statement to look at the data object behind any SAS table, as shown below:

proc loess data=sashelp.cars plots=none;
   model mpg_city = weight;
   ods output FitSummary=Fit(drop=SmoothingParameter);
run;
 
proc print data=Fit noobs;
run;
The data object that underlies a factoid in SAS

The PROC PRINT output shows how the factoid manages to display characters and numbers in the same column. The underlying data object contains three columns. The LABEL1 column contains the row headers, which identify each row. The CVALUE1 column is the column that is displayed in the factoid. It is a character column that contains character strings and formatted copies of the numbers in the NVALUE1 column. The NVALUE1 column contains the raw numeric value of every number in the table. Missing values represent rows for which the table displays a character value.

All factoids use the same naming scheme and display the LABEL1 and CVALUE1 columns. The form of the data is important when you want to use the numbers from a factoid in a SAS program. Do not use the CVALUE1 character variable to get numbers! Those values are formatted and possibly truncated, as you can see by looking at the "Smoothing Parameter" row. Instead, read the numbers from the NVALUE1 variable, which stores the full double-precision number.

For example, if you want to use the AICC statistic (the last row) in a program, read it from the NVALUE1 column, as follows:

data _NULL_;
   set Fit(where=( Label1="AICC" ));  /* get row with AICC value */
   call symputx("aicc", NValue1);     /* read value of NValue1 variable into a macro */
run;
%put &=aicc;                          /* display value in log */
AICC=3.196483775

Some procedures produce factoids that display multiple columns. For example, PROC CONTENTS creates the "Attributes" table, which is a factoid that displays four columns. The "Attributes table displays two columns of labels and two columns of values. When you use the ODS OUTPUT statement to create a data set, the variables for the first two columns are LABEL1, CVALUE1, and NVALUE1. The variables for the second two columns are LABEL2, CVALUE2, and NVALUE2.

Be aware that the values in the LABEL1 (and LABEL2) columns depend on the LOCALE= option for your SAS session. This means that some values in the LABEL1 column might be translated into French, German, Korean, and so forth. When you use a WHERE clause to extract a value, be aware that the WHERE clause might be invalid in other locales. If you suspect that your program will be run under multiple locales, use the _N_ automatic variable, such as if _N_=14 then call symputx("aicc", NValue1);. Compared with the WHERE clause, using the _N_ variable is less readable but more portable.

Now that you know what a factoid is, you will undoubtedly notice them everywhere in your SAS output. Remember that if you need to obtain numerical values from a factoid, use the ODS OUTPUT statement to create a data set. The NVALUE1 variable contains the full double-precision numbers in the factoid. The CVALUE1 variable contains character values and formatted versions of the numbers.

The post What is a factoid in SAS? appeared first on The DO Loop.

11月 042017
 

Internet of Things for dementiaDementia describes different brain disorders that trigger a loss of brain function. These conditions are all usually progressive and eventually severe. Alzheimer's disease is the most common type of dementia, affecting 62 percent of those diagnosed. Other types of dementia include; vascular dementia affecting 17 percent of those diagnosed, mixed dementia affecting 10 percent of those diagnosed.

Dementia Statistics

There are 850,000 people with dementia in the UK, with numbers set to rise to over 1 million by 2025. This will soar to 2 million by 2051. 225,000 will develop dementia this year, that’s one every three minutes. 1 in 6 people over the age of 80 have dementia. 70 percent of people in care homes have dementia or severe memory problems. There are over 40,000 people under 65 with dementia in the UK. More than 25,000 people from black, Asian and minority ethnic groups in the UK are affected.

Cost of treating dementia

Two-thirds of the cost of dementia is paid by people with dementia and their families. Unpaid careers supporting someone with dementia save the economy £11 billion a year. Dementia is one of the main causes of disability later in life, ahead of cancer, cardiovascular disease and stroke (Statistic can be obtained from here - Alzheimer’s society webpage). To tackle dementia requires a lot of resources and support from the UK government which is battling to find funding to support NHS (National Health Service). During 2010–11, the NHS will need to contribute £2.3bn ($3.8bn) of the £5bn of public sector efficiency savings, where the highest savings are expected primarily from PCTs (Primary care trusts). In anticipation for tough times ahead, it is in the interest of PCTs to obtain evidence-based knowledge of the use of their services (e.g. accident & emergency, inpatients, outpatients, etc.) based on regions and patient groups in order to reduce the inequalities in health outcomes, improve matching of supply and demand, and most importantly reduce costs generated by its various services.

Currently in the UK, general practice (GP) doctors deliver primary care services by providing treatment and drug prescriptions and where necessary patients are referred to specialists, such as for outpatient care, which is provided by local hospitals (or specialised clinics). However, general practitioners (GPs) are limited in terms of size, resources, and the availability of the complete spectrum of care within the local community.

Solution in sight for dementia patients

There is the need to prevent or avoid delay for costly long-term care of dementia patients in nursing homes. Using wearables, monitors, sensors and other devices, NHS based in Surrey is collaborating with research centres to generate ‘Internet of Things’ data to monitor the health of dementia patients at the comfort of staying at home. The information from these devices will help people take more control over their own health and wellbeing, with the insights and alerts enabling health and social care staff to deliver more responsive and effective services. (More project details can be obtained here.)

Particle filtering

One method that could be used to analyse the IoT generated data is particle filtering methods. IoT dataset naturally fails within Bayesian framework. This method is very robust and account for the combination of historical and real-time data in order to make better decision.

In Bayesian statistics, we often have a prior knowledge or information of the phenomenon/application been modelled. This allows us to formulate a Bayesian model, that is, prior distribution, for the unknown quantities and likelihood functions relating these quantities to the observations (Doucet et al., 2008). As new evidence becomes available, we are often interested in updating our current knowledge or posterior distribution. Using State Space Model (SSM), we are able to apply Bayesian methods to time series dataset. The strength of these methods however lies in the ability to properly define our SSM parameters appropriately; otherwise our model will perform poorly. Many SSMs suffers from non-linearity and non-Gaussian assumptions making the maximum likelihood difficult to obtain when using standard methods. The classical inference methods for nonlinear dynamic systems are the extended Kalman filter (EKF) which is based on linearization of a state and trajectories (e.g. Johnson et al., 2008 ). The EKF have been successfully applied to many non-linear filtering problems. However, the EKF is known to fail if the system exhibits substantial nonlinearity and/or if the state and the measurement noise are significantly non-Gaussian.

An alternative method which gives a good approximation even when the posterior distribution is non-Gaussian is a simulation based method called Monte Carlo. This method is based upon drawing observations from the distribution of the variable of interest and simply calculating the empirical estimate of the expectation.

To apply these methods to time series data where observation arrives in sequential order, performing inference on-line becomes imperative hence the general term sequential Monte Carlo (SMC). A SMC method encompasses range of algorithms which are used for approximate filtering and smoothing. Among this method is particle filtering. In most literature, it has become a general tradition to present particle filtering as SMC, however, it is very important to note this distinction. Particle filtering is simply a simulation based algorithm used to approximate complicated posterior distributions. It combines sequential importance sampling (SIS) with an addition resampling step. SMC methods are very flexible, easy to implement, parallelizable and applicable in very general settings. The advent of cheap and formidable computational power in conjunction with some recent developments in applied statistics, engineering and probability, have stimulated many advancements in this field (Cappe et al. 2007). Computational simplicity in the form of not having to store all the data is also an additional advantage of SMC over MCMC (Markov Chain Monte Carlo).

References

[1] National Health Service England, http://www.nhs.uk/NHSEngland/aboutnhs/Pages/Authoritiesandtrusts.aspx (accessed 18 August 2009).

[2] Pincus SM. Approximate entropy as a measure of system complexity. Proc Natl Acad Sci USA 88: 2297–2301, 1991

[3] Pincus SM and Goldberger AL. Physiological time-series analysis: what does regularity quantify?  Am J Physiol Heart Circ Physiol 266: H1643–H1656, 1994

[4] Cappe, O.,S. Godsill, E. Moulines(2007).An overview of existing methods and recent advances in sequential Monte Carlo. Proceedings of the IEEE. Volume 95, No 5, pp 899-924.

[5] Doucet, A., A. M. Johansen(2008). A Tutorial on Particle Filtering and Smoothing: Fifteen years Later.

[6] Johansen, A. M. (2009).SMCTC: Sequential Monte Carlo in C++. Journal of Statistical Software. Volume 30, issue 6.

[7] Rasmussen and Z.Ghahramani (2003). Bayesian Monte Carlo. In S. Becker and K. Obermayer, editors, Advances in Neural Information Processing Systems, volume 15.

[8] Osborne A. M.,Duvenaud D., GarnettR.,Rasmussen, C.E., Roberts, C.E.,Ghahramani, Z. Active Learning of Model Evidence Using Bayesian Quadrature.

Scaling Internet of Things for dementia using Particle filters was published on SAS Users.

11月 042017
 

Internet of Things for dementiaDementia describes different brain disorders that trigger a loss of brain function. These conditions are all usually progressive and eventually severe. Alzheimer's disease is the most common type of dementia, affecting 62 percent of those diagnosed. Other types of dementia include; vascular dementia affecting 17 percent of those diagnosed, mixed dementia affecting 10 percent of those diagnosed.

Dementia Statistics

There are 850,000 people with dementia in the UK, with numbers set to rise to over 1 million by 2025. This will soar to 2 million by 2051. 225,000 will develop dementia this year, that’s one every three minutes. 1 in 6 people over the age of 80 have dementia. 70 percent of people in care homes have dementia or severe memory problems. There are over 40,000 people under 65 with dementia in the UK. More than 25,000 people from black, Asian and minority ethnic groups in the UK are affected.

Cost of treating dementia

Two-thirds of the cost of dementia is paid by people with dementia and their families. Unpaid careers supporting someone with dementia save the economy £11 billion a year. Dementia is one of the main causes of disability later in life, ahead of cancer, cardiovascular disease and stroke (Statistic can be obtained from here - Alzheimer’s society webpage). To tackle dementia requires a lot of resources and support from the UK government which is battling to find funding to support NHS (National Health Service). During 2010–11, the NHS will need to contribute £2.3bn ($3.8bn) of the £5bn of public sector efficiency savings, where the highest savings are expected primarily from PCTs (Primary care trusts). In anticipation for tough times ahead, it is in the interest of PCTs to obtain evidence-based knowledge of the use of their services (e.g. accident & emergency, inpatients, outpatients, etc.) based on regions and patient groups in order to reduce the inequalities in health outcomes, improve matching of supply and demand, and most importantly reduce costs generated by its various services.

Currently in the UK, general practice (GP) doctors deliver primary care services by providing treatment and drug prescriptions and where necessary patients are referred to specialists, such as for outpatient care, which is provided by local hospitals (or specialised clinics). However, general practitioners (GPs) are limited in terms of size, resources, and the availability of the complete spectrum of care within the local community.

Solution in sight for dementia patients

There is the need to prevent or avoid delay for costly long-term care of dementia patients in nursing homes. Using wearables, monitors, sensors and other devices, NHS based in Surrey is collaborating with research centres to generate ‘Internet of Things’ data to monitor the health of dementia patients at the comfort of staying at home. The information from these devices will help people take more control over their own health and wellbeing, with the insights and alerts enabling health and social care staff to deliver more responsive and effective services. (More project details can be obtained here.)

Particle filtering

One method that could be used to analyse the IoT generated data is particle filtering methods. IoT dataset naturally fails within Bayesian framework. This method is very robust and account for the combination of historical and real-time data in order to make better decision.

In Bayesian statistics, we often have a prior knowledge or information of the phenomenon/application been modelled. This allows us to formulate a Bayesian model, that is, prior distribution, for the unknown quantities and likelihood functions relating these quantities to the observations (Doucet et al., 2008). As new evidence becomes available, we are often interested in updating our current knowledge or posterior distribution. Using State Space Model (SSM), we are able to apply Bayesian methods to time series dataset. The strength of these methods however lies in the ability to properly define our SSM parameters appropriately; otherwise our model will perform poorly. Many SSMs suffers from non-linearity and non-Gaussian assumptions making the maximum likelihood difficult to obtain when using standard methods. The classical inference methods for nonlinear dynamic systems are the extended Kalman filter (EKF) which is based on linearization of a state and trajectories (e.g. Johnson et al., 2008 ). The EKF have been successfully applied to many non-linear filtering problems. However, the EKF is known to fail if the system exhibits substantial nonlinearity and/or if the state and the measurement noise are significantly non-Gaussian.

An alternative method which gives a good approximation even when the posterior distribution is non-Gaussian is a simulation based method called Monte Carlo. This method is based upon drawing observations from the distribution of the variable of interest and simply calculating the empirical estimate of the expectation.

To apply these methods to time series data where observation arrives in sequential order, performing inference on-line becomes imperative hence the general term sequential Monte Carlo (SMC). A SMC method encompasses range of algorithms which are used for approximate filtering and smoothing. Among this method is particle filtering. In most literature, it has become a general tradition to present particle filtering as SMC, however, it is very important to note this distinction. Particle filtering is simply a simulation based algorithm used to approximate complicated posterior distributions. It combines sequential importance sampling (SIS) with an addition resampling step. SMC methods are very flexible, easy to implement, parallelizable and applicable in very general settings. The advent of cheap and formidable computational power in conjunction with some recent developments in applied statistics, engineering and probability, have stimulated many advancements in this field (Cappe et al. 2007). Computational simplicity in the form of not having to store all the data is also an additional advantage of SMC over MCMC (Markov Chain Monte Carlo).

References

[1] National Health Service England, http://www.nhs.uk/NHSEngland/aboutnhs/Pages/Authoritiesandtrusts.aspx (accessed 18 August 2009).

[2] Pincus SM. Approximate entropy as a measure of system complexity. Proc Natl Acad Sci USA 88: 2297–2301, 1991

[3] Pincus SM and Goldberger AL. Physiological time-series analysis: what does regularity quantify?  Am J Physiol Heart Circ Physiol 266: H1643–H1656, 1994

[4] Cappe, O.,S. Godsill, E. Moulines(2007).An overview of existing methods and recent advances in sequential Monte Carlo. Proceedings of the IEEE. Volume 95, No 5, pp 899-924.

[5] Doucet, A., A. M. Johansen(2008). A Tutorial on Particle Filtering and Smoothing: Fifteen years Later.

[6] Johansen, A. M. (2009).SMCTC: Sequential Monte Carlo in C++. Journal of Statistical Software. Volume 30, issue 6.

[7] Rasmussen and Z.Ghahramani (2003). Bayesian Monte Carlo. In S. Becker and K. Obermayer, editors, Advances in Neural Information Processing Systems, volume 15.

[8] Osborne A. M.,Duvenaud D., GarnettR.,Rasmussen, C.E., Roberts, C.E.,Ghahramani, Z. Active Learning of Model Evidence Using Bayesian Quadrature.

Scaling Internet of Things for dementia using Particle filters was published on SAS Users.

11月 032017
 

When disasters strike, we’re all left wondering what we can do and how we can help. Many donate money and items. Others donate their time volunteering on the ground. Me? I dive into data. I have a passion for using data for good, and have been fortunate to be involved [...]

Analytics helps Puerto Rico communications after disaster was published on SAS Voices by Mary Osborne

11月 022017
 

In my first four posts in the education analytics blog series, we learned how education customers are using SAS, the positive impact for their users and institution, and some of their best practices. In talking to customers for this series, one of the many things we've learned is that they [...]

Education analytics: Use of SAS evolving was published on SAS Voices by Georgia Mariani

11月 022017
 

SAS Visual Investigatorcustom icons and map pin icons in SAS® Visual Investigator has an attractive user interface. One of its most engaging features is the network diagram, which represents related ‘entities’ and the connections between them, allowing an investigator to see and explore relationships in their source data.

For maximum impact, each entity in the network diagram should have an icon – a symbol – which clearly represents the entity. Network diagrams are significantly more readable if the icons used relate to the entities they represent, particularly when a larger numbers of entity types are in the diagram.

SAS Visual Investigator 10.2.1 comes with 56 icons in various colors and 33 white map pins. But if none of these adequately represent the entities you are working with, you can add your own. The catch is that the icons have to be vector images – defined in terms of straight and curved lines – in a file format called SVG, rather than raster images defined in terms of pixels, as GIF, JPEG, PNG files or similar. The vector format greatly improves how good the icons look at different scales. But requiring a vector icon file means you can’t just use any image as a network diagram symbol in Visual Investigator.

So where can you find new icons to better represent your entities? In this post, we’ll see how you can make (or more accurately, find and set the colors of) icons to suit your needs.

For this example, I’m going to suppose that we have an entity called ‘Family,’ representing a family group who are associated with one or more ‘Addresses’ and with one or more individual ‘Persons.’

There are good icons for an ‘Address’ in the default set of 56 provided: I’d choose one of the building icons, probably the house. There are also good icons for a ‘Person’: any of several icons representing people would be fine. The ‘house’ and ‘people’ icons come in a nice variety of colors, and there are white-on-some-background-color versions of each icon for use in Map Pins.

But, there is no icon in the default set which obviously represents the idea of a ‘family.’ So, let’s make one.

We will use the free* library of icons at www.flaticon.com as an example of how to find a source icon image.

*NOTE: While these are free there are some license terms and conditions you must adhere to if you use them. Please read, but the main points to note are that you must give attribution to the designer and site in a specific and short format for the icons you use, and you’re not allowed to redistribute them.

Begin by browsing to https://www.flaticon.com/, and either just browse the icons, or search for a key word which has something to do with the entity for which you need an icon. I’m going to search for ‘Family’:

Notice that many icons have a smaller symbol beneath them, indicating whether they are a ‘premium’ or ‘selection’ icon:

Premium:

Selection:

Premium icons can only be downloaded if you subscribe to the site. Selection icons can be downloaded and used without any subscription so long as you give proper attribution. If you do subscribe, however, you are allowed to use them without attribution. Icons with no symbol beneath them are also free, but always have to be properly attributed, regardless of whether you subscribe or not.

If you are planning on using icons from this site in an image which may be shared with customers, or on a customer project please be sure to comply with company policies and guidelines on using third party content offered under a Creative Commons license. Please do your own research and adhere by their rules.

The icons here are shown at a larger size than they will be in SAS Visual Investigator. For best results, pick an icon with few small details, so that it will still be clear at less than half the size. For consistency with the other icons in your network diagrams and maps, pick a simple black-and-white icon. Several icons in this screenshot are suitable, but I will choose this one named ‘Family silhouette,’ which I think will be clear and recognizable when shrunk:

Icons made by Freepik from www.flaticon.com is licensed by CC 3.0 BY

When you hover over the icon, you’ll see a graphic overlaid on top of it, like this:

If you are working with a number of icons at once, it can be convenient to click the top half of that graphic, to add the icon to a collection. When you have assembled the icons you require, you can then edit their colors together, and download them together. Since I am just working with one icon, I clicked the bottom half of the overlay graphic, to simply view the icon’s details. This is what you see next:

Click the green SVG button to choose a color for your icon – a box with preset colors swings down below the buttons. This offers 7 default colors, which are all okay (white is perfect for map pin icons). But you may prefer to match the colors of existing icons in SAS Visual Investigator – to set your own color, click the multicolor button (highlighted in a red box below):

These are the hex color codes for icons and map pin backgrounds in SAS Visual Investigator 10.2.1. Enter one of these codes in the website’s color picker to match the new icon’s color to one already used in VI:

I chose the green in the center of this table, with hex color code #4d9954:

Then, click Download, and in the popup, copy the link below the button which will allow you to credit the author, then click ‘Free download’ (you must credit the author):

Your SVG icon file is downloaded in your browser. Paste the attribution text/HTML which credits the icon’s author in a text editor.

You must ensure that the files are named so that they contain only the characters A-Z, a-z, 0-9 and ‘_’ (underscore). SAS Visual Investigator doesn’t like SVG filenames containing other characters. In this example, we MUST rename the files to replace the ‘-‘ (dash or hyphen) with something else, e.g. an underscore.

You may also want to rename the SVG file to reflect the color you chose. So, making both these changes, I renamed my file from ‘family-silhouette.svg’ to ‘family_silhouette_green.svg’.

Adding the color to the file name is a good idea when you create multiple copies of the same icon, in different colors. You should also consider creating a white version of each icon for use as a map pin. Doing that, I saved another copy of the same icon in white, and renamed the downloaded file to ‘family_silhouette_white.svg’.

Then, if necessary, copy your SVG file(s) from your local PC to a machine from which you can access your copy of SAS Visual Investigator. To import the icons, open the SAS Visual Investigator: Administration app, and log in as a VI administrator.

When creating or editing an Entity, switch to the Views tab, and click on the Manage icons and map pins button:

In the Manage Icons and Map Pins popup, click Upload…:

Select all of the SVG files you want to upload:

The new icons are uploaded, and if you scroll through the list of icons and map pins you should be able to find them. Change the Type for any white icons you uploaded to Map Pin, and click OK to save your changes to the set of icons and map pins:

You can now use your new icons and map pins for an Entity:

Don’t forget to include a reference to the source for the icon in any materials you produce which accompany the image, if you share it with anyone else. Here’s mine:

Icons made by Freepik from http://www.freepik.com/ is licensed by CC 3.0 BY

See you next time!

Creating and uploading custom icons and map pin icons in SAS® Visual Investigator was published on SAS Users.