Jim Harris discusses how the lines between data management and analytics are fading.

The post Streaming analytics blurs the lines between data management and analytics appeared first on The Data Roundtable.

4月 122017

Jim Harris discusses how the lines between data management and analytics are fading.

The post Streaming analytics blurs the lines between data management and analytics appeared first on The Data Roundtable.

4月 122017

In recent versions of SAS/Graph, we have been shipping new/updated maps of each country, with 2 levels of detail (such as state & county, or province & division). But what if you only want a map showing the higher level of detail? In this blog post I share my third [...]

The post Your mapping toolkit tip #3 - removing internal borders appeared first on SAS Learning Post.

4月 122017

At SAS Global Forum last week, I saw a poster that used SAS/IML to optimized a quadratic objective function that arises in financial portfolio management (Xia, Eberhardt, and Kastin, 2017). The authors used the Newton-Raphson optimizer (NLPNRA routine) in SAS/IML to optimize a hypothetical portfolio of assets.

The Newton-Raphson algorithm is one of my favorite optimizers. However, for a quadratic objective function there is a simpler choice. You can use the NLPQUA function in SAS/IML to optimize a quadratic objective function.

Let's see how this works by specifying a quadratic polynomial in two variables. The function is

`
f(x,y) = (9*x##2 + x#y + 4*y##2) - 12*x - 4*y + 6
= 0.5* x` H x + L x + 6
`
where

To use the NLPQUA subroutine, you need to write down the Hessian matrix, **H**, which is the symmetric matrix of second derivatives. Of course, for a quadratic function the second derivatives are constants. In the following SAS/IML program, the matrix **H** contains the second derivatives and the vector **lin** contains the coefficients for the linear and constant terms of *f*:

/* objective function is f(x,y) = (3*x-2)##2 + (2*y-1)##2 + x#y + 1 = (9*x##2 + x#y + 4*y##2) + -12*x - 4*y + 6 grad(f) = g(x,y) = [ 18*x + y - 12, x + 8*y - 4 ] hess(f) = [ dg1/dx dg2/dx ] = [ 18 1 ] [ dg1/dy dg2/dy ] [ 1 8 ] */ proc iml; H = {18 1, /* matrix of second derivatives */ 1 8}; lin = { -12 -4 6 }; /* vector of linear terms and the constant term */ |

The NLPQUA subroutine enables you to specify the Hessian matrix and the linear coefficients. From those values, the routine uses an efficient solver to find the global minimum of the quadratic polynomial, which occurs at x=92/143, y=60/143:

x0 = j(1, 2, 1); /* initial guess */ opt = {0 1}; /* minimize, print final parameters */ call nlpqua(rc, xOpt, H, x0, opt) lin=lin; print xOpt; print xOpt[format=Fract.]; |

You can also use the NLPQUA subroutine to solve a constrained problem. Suppose that you are only interested in values of (x,y) in the unit square and you require that the solution satisfies the linear constraint x + y = 1. In that case you can define a matrix that specifies the linear constraints as follows:

blc = {0 0 . ., /* lower bound: 0 <= x_i */ 1 1 . ., /* upper bound: x_i <= 1 */ 1 1 0 1}; /* linear constraint x + y = 1 */ call nlpqua(rc, conOpt, H, x0, opt, blc) lin=lin; print conOpt[format=Fract.]; |

Notice how the linear constraints are specified for this (and every other) NLP function. Let *p* be the number of parameters in the problem. The first *p* elements of the first row specify the lower bounds for the parameters; use a missing value for variables that are not bounded below. The last two columns of the first row are not used, so you should always set those elements to missing values.
Similarly, the first *p* elements of the second row specify the upper bound for the variables. The last two columns of the second row are not used, so set those elements to missing values.

Additional rows specify coefficients for the linear constraint. An equality constraint of the form
a_{1}*x_{1} + a_{2}*x_{2} + ... + a_{n} *x_{n} = c
is encoded as a row of coefficients
{a_{1} a_{2} ... a_{n} 0 c},
where the 0 in the penultimate location indicates equality.
A "less than" inequality of the form
a_{1}*x_{1} + a_{2}*x_{2} + ... + a_{n} *x_{n} ≤ c
is encoded as the row
{a_{1} a_{2} ... a_{n} -1 c},
where the -1 indicates the "less than or equal" symbol. Similarly, a value of +1 in the penultimate location indicates the "greater than or equal" symbol (≥). See the documentation for the NLP routines for additional details about specifying constraints.

If you have access to SAS/OR software, PROC OPTMODEL provides a simple and natural language for solving simple and complex optimization problems. For this problem, PROC OPTMODEL detects that the objective function is quadratic and

/* variable names x and y */ proc optmodel; /* unconstrained quadratic optimization */ var x, y; min f = 9*x^2 + x*y + 4*y^2 - 12*x - 4*y + 6; solve; /* call QP solver */ print x Fract. y Fract.; /* constrained quadratic optimization */ x.lb = 0; x.ub = 1; /* bounds on variables */ y.lb = 0; y.ub = 1; con SumToOne: /* linear constraint */ x + y = 1; solve; /* call QP solver */ print x Fract. y Fract.; quit; |

In summary, if you want to optimize a quadratic objective function (with or without constraints), use the NLPQUA subroutine in SAS/IML, which is a fast and efficient way to maximize or minimize a quadratic function of many variables. If you have access to SAS/OR software, PROC OPTMODEL provides an even simpler syntax.

The post Quadratic optimization in SAS appeared first on The DO Loop.

4月 112017

Family-owned and operated for more than 35 years, Twiddy & Company prides itself on exceptional real estate services to homeowners and vacationers in northeast North Carolina and the Outer Banks. Whether a customer is thinking of putting their house up for rent or planning their next vacation, Twiddy makes the [...]

How to be a data-driven SMB: Part one of Twiddy’s tale was published on SAS Voices by Analise Polsky

4月 112017

Have you ever seen a map that just didn't look right to you? Perhaps the map area seemed squished or stretched? Perhaps this was because they used a different map projection than you were accustomed to. Or maybe the map coordinates weren't projected at all. In this blog post I [...]

The post Your mapping toolkit Tip #2 - projecting your map appeared first on SAS Learning Post.

4月 112017

You may be wondering if you need something special to gain access to the Schedule Chart object. Don’t worry, you don’t, you just need to unhide this visualization if it isn’t already. You can do this from the Objects’ drop down menu. There are several other objects available to you if you’d like to show those as well. Simply check the ones you want to include in the list and then click ok.

The VA 7.3 Schedule Chart is similar to the traditional __Gantt__ chart in that it serves to illustrate the start and finish duration of a category data item. You must provide a category data item and two date or datetime data items representing the start and end dates. You can also add a group by category, lattice by columns and/or rows. Here is a simple example that visualizes my local school district’s 2016-2017 Traditional School Calendar.

Schedule charts can be used to visualize a variety of data such as:

- Calendar Events
- Project Tracking
- Campaign/Promotional Runs
- Floor Service Coverage

Essentially, any category which can be associated with a start and end date can use this visualization.

Here are some examples of using the Schedule Chart to look at project tracking data. Our team uses a similar visualization; however, I have modified real names and gave it a Star Wars theme for a bit of fun.

In this example, the Project name is assigned as the main category. Here are a few takeaways:

- The schedule chart gives a great bird’s eye view of a lot of data. This particular data has over 350 projects spanning a team of 19 individual members.
- The schedule chart automatically includes the least and greatest date value. You can override the X Axis in the Properties tab by assigning a fixed minimum and maximum.

In this next screenshot, I have selected the manager Obi-Wan Kenobi to filter the Schedule Chart. Therefore, by adding section filters to this report, you can see how spotting coverage of Projects and Project Types are easy. And, with some Mock Combat planned later in the year, the Jedis might want to up their training.

This example uses the same Schedule Chart role assignments as before, but different section prompt filters. Here, this report shows how an individual team member can use the Schedule Chart to visualize several things:

- A list of his/her assigned projects.
- The planned duration of each project.
- How the projects are spread throughout the year.

If you chose to look at a particular Project Type, then this visualization would help list the Project names and how they span across the year.

The next example moves away from the traditional use of putting the Project as the main category data item and instead places the Team Lead on the Y Axis. This now allows us to see how busy each Team member is and with which Project Type. By the way, did you notice the neat way the Team names are sorted? I used a custom sort!

Here are just a few more things you can do to enhance this visualization: you can adjust the transparency so you can see overlapping projects and you can easily add reference lines to the X Axis. In this example, I’ve added the reference lines for Q1 through Q4. I’ve also selected a manager from the report section filters and added an additional data item for the Label.

In this example, I wanted to demonstrate the use of the Lattice rows and I also applied a Display Rule for the Project Type. This is a good way if you want to view overlapping information and the transparency property isn’t distinct enough.

If you want a bar representation on the Schedule Chart to appear then the data **must** have a start and end date for every row of data. If either is missing, then the category name will appear on the Y Axis but no bar will be displayed on the visualization. Also, you cannot create an interaction from or to the Schedule Chart object. That means you cannot create a filter or brush with another object in the report area. The Schedule Chart will be filtered by either report or section prompts. I hope you can include the Schedule Chart into your reports, it is one of my favorite visuals.

SAS Visual Analytics Designer 7.3 Schedule Chart was published on SAS Users.

4月 112017

With the advent of things like car GPS & Google Maps, and a steady supply of nice maps from certain news sources (such as the New York Times), people have finally embraced the idea that mapping data can be very useful. And if you are into data visualization, you have [...]

The post Your mapping toolkit Tip #1 - reducing border complexity appeared first on SAS Learning Post.

4月 102017

You can expand on the functionality of SAS Visual Data Builder in SAS Visual Analytics by editing the query code, adding code for pre- and post-processing, or even writing your own query. You can process single tables or join multiple tables, writing the output to a LASR library, a SAS library, or a DBMS library. But you can also easily schedule your queries, right from the Visual Data Builder interface.

Here’s how.

When a query is open in the workspace of Visual Data Builder, you can schedule the query from the application by clicking the Schedule (clock) icon.

The scheduling server used is determined by the SAS Visual Data Builder Scheduling preferences setting, shown below.

By default, the Visual Analytics deployment includes the Operating System Services scheduling server, so it appears automatically as the default.

The Server Manager plug-in to SAS Management Console identifies the scheduling servers that are included in your deployment. You can specify a different scheduling server, such as Platform Suite for SAS server, if your deployment includes it.

**Note**: The Distributed In-process scheduling server is **not **supported.

Any scheduling preferences that you change are used the next time you create and schedule a query. If you need to change the settings for a query that is already scheduled, you can use SAS Management Console Schedule Manager to redeploy the deployed job for the query.

When you schedule a query, the SAS statements are saved in a file in the default deployment directory path: **SAS-config-dir/Lev1/SASApp/SASEnvironment/SASCode/Jobs**.

In the examples in this blog, the SAS-config-dir is **/opt/sasinside/vaconfig**.

The metadata name of the directory is **Batch Jobs**.

The default SAS Application Server name associated with the directory is **SASApp**.

If you are working in a VA environment where multiple application servers are defined, you should be aware of the following SAS Notes at the links below, relating to the application’s choice of application servers for scheduling.

*SAS Note 58186*: SAS® Visual Data Builder might use the wrong application server for scheduling

*SAS Note 52977*: SAS® Visual Data Builder requires the default SAS® Application Server and the default scheduling servers to be located on the same physical machine

To schedule a query, open the query and select the Schedule (clock) icon. (The clock is grayed out if you have not saved the query.)

You can schedule the query to run immediately (Run now) or at a specified time event. To define a time event, select the **Select one or more triggers for this query **button and click **New Time Event**. Grouping events are not supported for the default server, but may be supported for other scheduling servers, such as Platform Suite.

You can schedule for **One time** only, or **More than once**, running Hourly, Daily, Weekly, Monthly, or Yearly. The appearance of the interface and scheduling parameters change with your specification.

In this example, a **One time only **event is specified.

The time event specification gets recorded in the **Trigger **list on the Schedule page, and is selected in the Used column.

After you click OK in the Schedule window, you will get the confirmation below.

After the time event has passed, you can verify that the table has been loaded on the LASR Tables tab of the Visual Analytics Administrator.

When you schedule, the Visual Data Builder:

- creates a job that executes the query.
- creates a deployed job from the job.
- places the job into a new deployed flow.
- schedules the flow on a scheduling server.

The files are named according to * vdb_query_id_timestamp*.

In this example the files are named

When the query executes at the scheduled time, the SAS code that is written to the

If you right-click on Server Manager in SAS Management Console and view Deployment Directories, you will see that this is the Deployment directory (Batch Jobs) for SASApp.

In the **/opt/sasinside/vaconfig/Lev1/SASApp/BatchServer/Logs **directory, you can view the SAS Log.

The scheduling server script and log are in **/opt/sasinside/vaconfig/Lev1/SchedulingServer/Ahmed/vdb_CustomerInfoData_14900112883364**

Observe that the script was written to this location at the time the job was scheduled, rather than at execution time.

If you edit a data query that is already scheduled, you must click the schedule icon again so that the SAS statements for the data query are regenerated and saved.

If you edit the query again and specify additional time events, each event appears in the trigger list, and you can check which time event is to be used for scheduling.

If scheduling a query according to time events, you should also be aware of this Usage note:

**Usage Note 55880: Scheduled SAS® Visual Data Builder queries are executed based on the time zone of the scheduling server **

And to add to the fun, also keep in mind that if your deployment includes SAS Data Integration Studio, you can also export a query as a Job and then perform the deployment steps using DI Studio.

Just right-click on the query in the SAS folder panel in Visual Data Builder and Select **Export as a Job!**

Easy Scheduling in Visual Data Builder - SAS Visual Analytics 7.3 was published on SAS Users.

4月 102017

David Loshin explains why MDM is such a valuable tool in helping to detect fraud.

The post Master data management: Why the unique identifier is critical for fighting fraud appeared first on The Data Roundtable.

4月 102017

Many intervals in statistics have the form *p * ± δ, where *p* is a point estimate and δ is the radius (or half-width) of the interval. (For example, many two-sided confidence intervals have this form, where δ is proportional to the standard error.) Many years ago I wrote an article that mentioned that you can construct these intervals in the SAS/IML language by using a concatenation operator (|| or //). The concatenation creates a two-element vector, like this:

proc iml; mu = 50; delta = 1.5; CI = mu - delta || mu + delta; /* horizontal concatenation ==> 1x2 vector */ |

Last week it occurred to me that there is a simple trick that is even easier: use the fact that SAS/IML is a matrix-vector language to encode the "±" sign as a vector {-1, 1}. When SAS/IML sees a scalar multiplied by a vector, the result will be a vector:

CI = mu + {-1 1}*delta; /* vector operation ==> 1x2 vector */ print CI; |

You can extend this example to compute many intervals by using a single statement. For example, in elementary statistics we learn the "68-95-99.7 rule" for the normal distribution. The rule says that in a random sample drawn from a normal population, about 68% of the observations will be within 1 standard deviation of the mean, about 95% will be within 2 standard deviations, and about 99.7 % will be within 3 standard deviations of the mean. You can construct those intervals by using a "multiplier matrix" whose first row is {-1, +1}, whose second row is {-2, +2}, and whose third row is {-3, +3}. The following SAS/IML statements construct the three intervals for the 69-95-99.7 rule for a normal population with mean 50 and standard deviation 8:

mu = 50; sigma = 8; m = {-1 1, -2 2, -3 3}; Intervals = mu + m*sigma; ApproxPct = {"68%", "95%", "99.7"}; print Intervals[rowname=ApproxPct]; |

Just for fun, let's simulate a large sample from the normal population and empirically confirm the 68-95-99.7 rule. You can use the RANDFUN function to generate a random sample and use the BIN function to detect which observations are in each interval:

call randseed(12345); n = 10000; /* sample size */ x = randfun(n, "Normal", mu, sigma); /* simulate normal sample */ ObservedPct = j(3,1,0); do i = 1 to 3; b = bin(x, Intervals[i,]); /* b[i]=1 if x[i] in interval */ ObservedPct[i] = sum(b) / n; /* percentage of x in interval */ end; results = Intervals || {0.68, 0.95, 0.997} || ObservedPct; print results[colname={"Lower" "Upper" "Predicted" "Observed"} label="Probability of Normal Variate in Intervals: X ~ N(50, 8)"]; |

The simulation confirms the 68-95-99.7 rule. Remember that the rule is a mnemonic device. You can compute the exact probabilities by using the CDF function. In SAS/IML, the exact computation is
`p = cdf("Normal", m[,2]) - cdf("Normal", m[,1]);`

In summary, the SAS/IML language provides an easy syntax to construct intervals that are symmetric about a central value. You can use a vector such as {-1, 1} to construct an interval of the form *p* ± δ, or you can use a *k* x 2 matrix to construct *k* symmetric intervals.

The post A simple trick to construct symmetric intervals appeared first on The DO Loop.