Tech

3月 012017
 

In 2011, Loughran and McDonald applied a general sentiment word list to accounting and finance topics, and this led to a high rate of misclassification. They found that about three-fourths of the negative words in the Harvard IV TagNeg dictionary of negative words are typically not negative in a financial context. For example, words like “mine”, “cancer”, “tire” or “capital” are often used to refer to a specific industry segment. These words are not predictive of the tone of documents or of financial news and simply add noise to the measurement of sentiment and attenuate its predictive value. So, it is not recommended to use any general sentiment dictionary as is.

Extracting domain-specific sentiment lexicons in the traditional way is time-consuming and often requires domain expertise. Today, I will show you how to extract a domain-specific sentiment lexicon from movie reviews through a machine learning method and construct SAS sentiment rules with the extracted lexicon to improve sentiment classification performance. I did the experiment with the help from my colleagues Meilan Ji and Teresa Jade, and our experiment with the Stanford Large Movie Review Dataset showed around 8% increase in the overall accuracy with the extracted lexicon. Our experiment also showed that the lexicon coverage and accuracy could be improved a lot with more training data.

SAS Sentiment Analysis Studio released domain-independent Taxonomy Rules for 12 languages and domain-specific Taxonomy Rules for a few languages. For English, SAS has covered 12 domains, including Automotive, Banking, Health and Life Sciences, Hospitalities, Insurance, Telecommunications, Public Policy Countries and others. If the domain of your corpus is not covered by these industry rules, your first choice is to use general rules, which sometimes lead to poor classification performance, as Loughran and McDonald found. Automatically extracting domain-specific sentiment lexicons has been studied by researchers and three methods were proposed. The first method is to create a domain-specific word list by linguistic experts or domain experts, which may be expensive or time-consuming. The second method is to derive non-English lexicons based on English lexicons and other linguistic resources such as WordNet. The last method is to leverage machine learning to learn lexicons from a domain-specific corpus. This article will show you the third method.

Because of the emergence of social media, researchers are able to relatively easily get sentiment data from the internet to do experiments. Dr. Saif Mohammad, a researcher in Computational Linguistics, National Research Council Canada, proposed a method to automatically extract sentiment lexicons from tweets. His method provided the best results in SemEval13 by leveraging emoticons in large tweets, using the PMI (pointwise mutual information) between words and tweet sentiment to define the sentiment attributes of words. It is a simple method, but quite powerful. At the ACL 2016 conference, one paper introduced how to use neural networks to learn sentiment scores, and in this paper I found the following simplified formula to calculate a sentiment score.

Given a set of tweets with their labels, the sentiment score (SS) for a word w was computed as:
SS(w) = PMI(w, pos) − PMI(w, neg), (1)

where pos represents the positive label and neg represents the negative label. PMI stands for pointwise mutual information, which is
PMI(w, pos) = log2((freq(w, pos) * N) / (freq(w) * freq(pos))), (2)

Here freq(w, pos) is the number of times the word w occurs in positive tweets, freq(w) is the total frequency of word w in the corpus, freq(pos) is the total number of words in positive tweets, and N is the total number of words in the corpus. PMI(w, neg) is calculated in a similar way. Thus, Equation 1 is equal to:
SS(w) = log2((freq(w, pos) * freq(neg)) / (freq(w, neg) * freq(pos))), (3)

The movie review data I used was downloaded from Stanford; it is a collection of 50,000 reviews from IMDB. I used 25,000 reviews in train and test datasets respectively. The constructed dataset contains an even number of positive and negative reviews. I used SAS Text Mining to parse the reviews into tokens and wrote a SAS program to calculate sentiment scores.

In my experiment, I used the train dataset to extract sentiment lexicons and the test dataset to evaluate sentiment classification performance with each sentiment score cutoff value from 0 to 2 with increment of 0.25. Data-driven learning methods frequently have an overfitting problem, and I used test data to filter out all weak-predictive words whose absolute value of sentiment scores are less than 0.75. In Figure-1, there is an obvious drop in the accuracy line plot of test data when the cutoff value is less than 0.75.

extract domain-specific sentiment lexicons

Figure-1 Sentiment Classification Accuracy by Sentiment Score Cutoff

Finally, I got a huge list of 14,397 affective words; 7,850 positive words and 6,547 negative words from movie reviews. The top 50 lexical items from each sentiment category as Figure-2 shows.

Figure-2 Sentiment Score of Top 50 Lexical Items

Now I have automatically derived the sentiment lexicon, but how accurate is this lexicon and how to evaluate the accuracy? I googled movie vocabulary and got two lists from Useful Adjectives for Describing Movies and Words for Movies & TV with 329 adjectives categorized into positive and negative. 279 adjectives have vector data in the GloVe word embedding model downloaded from http://nlp.stanford.edu/projects/glove/ and the T-SNE plot as Figure-3 shows. GloVe is an unsupervised learning algorithm for obtaining vector representations for words. Training is performed on aggregated global word-word co-occurrence statistics from a corpus. T-SNE is a machine learning algorithm for dimensionality reduction developed by Geoffrey Hinton and Laurens van der Maaten.[1]  It is a nonlinear dimensionality reduction technique that is particularly well-suited for embedding high-dimensional data into a space of two or three dimensions, which can then be visualized in a scatter plot. So two words are co-located or located closely in the scatter plot, if their semantic meanings are close or their co-occurrence in same contexts is high. Besides semantic closeness, I also showed sentiment polarity via different colors. Red stands for negative and blue stands for positive.

Figure-3 T-SNE Plot of Movie Vocabularies

From Figure-3, I find the positive vocabulary and the negative vocabulary are clearly separated into two big clusters, with very little overlap in the plot.

Now, let me check the sentiment scores of the terms. 175 of them were included in my result and Figure-4 displays the top 50 terms of each category. I compared sentiment polarity of my result with the list, 168 of 175 terms are correctly labelled as negative or positive and the overall accuracy is 96%.

Figure-4 Sentiment Score of Top 50 Movie Terms

There are 7 polarity differences between my prediction and the list as the Table-1 shows.

Table-1 Sentiment Polarity Difference between Predictions and Actual Labels

One obvious prediction mistake is coherent. I checked the raw movie reviews that contain “coherent”, and only 25 of 103 reviews are positive. This is why its sentiment score is negative rather than positive. I went through these reviews and found most of them had a sentiment polarity reversal, such as “The Plot - Even for a Seagal film, the plot is just stupid. I mean it’s not just bad, it’s barely coherent. …” A possible solution to make the sentiment scores more accurate is to use more data or add a special manipulation for polarity reversals. I tried first method, and it did improve the accuracy significantly.

So far, I have evaluated the sentiment scores’ accuracy with public linguistic resources and next I will test the prediction effect with SAS Sentiment Analysis Studio. I ran sentiment analysis against the test data with the domain-independent sentiment rules developed by SAS and the domain-specific sentiment rules constructed by machine learning, and compared the performance of two methods. The results showed an 8% increase in the overall accuracy. Table-2 and Table-3 show the detailed information.

Test data (25,000 docs)

Table-2 Performance Comparison with Test Data

Table-3 Overall Performance Comparison with Test Data

After you get domain-specific sentiment lexicons from your corpora, only a few steps are required in SAS Sentiment Analysis Studio to construct the domain-specific sentiment rules. So, next time you are processing domain-specific text for sentiment, you may want to try this method to get a listing of terms that are positive or negative polarity to augment your SAS domain-independent model.

Detailed steps to construct domain-specific sentiment rules as follows.

Step 1. Create a new Sentiment Analysis project.
Step 2. Create intermediate entities named “Positive” and “Negative”, then put the learned lexicons to the two entities respectively.

Step 3. Besides the learned lexicons, you may add an entity named “Negation” to handle the negated expressions. You can list some negations you are familiar with, such as “not, don’t, can’t” etc.

Step 4. Create positive and negative rules in the Tonal Keyword. Add the rule “CONCEPT: _def{Positive}” to Positive tab, and the rule “CONCEPT: _def{Negative}” and “CONCEPT: _def{Negation} _def{Positive}” to Negative tab.

Step 5. Build rule-based model, and now, you can use this model to predict the sentiment of documents.

How to extract domain-specific sentiment lexicons was published on SAS Users.

2月 252017
 

Let’s have some fun, shall we? Share your video or photo!

The SAS User Community, albeit spread around the world, is a tight-knit group. We may sit alone in our offices pounding out code, developing applications, tweaking system performance or creating reports,  but the truth is other SAS users (our colleagues at work, in online communities, and at local user group meetings), are always there to assist us, and to socialize with from time to time. We rely on our fellow SAS Users for support and companionship, as well as a resource for new ideas and techniques. Then, once each year, we join users on a global scale by gathering for a few days at SAS Global Forum.

The opportunity to strengthen and extend our bonds with other SAS Users makes SASGF a much sought-after event. We will go to great lengths to attend; by demonstrating value to our employer to secure permission, presenting content to receive a registration discount, applying for an award or scholarship, volunteering as a presenter or room coordinator, joining the Conference Team, or even becoming Conference Chair!

What might these efforts look like if we were to record metaphors for them? What I mean is, how would you represent your effort?

For example, here is a photo of two determined SAS Users negotiating a portage on Lady Evelyn River (Ontario, Canada) on their way to SAS Global Forum.

These two must really understand the value of attending!

So...

What are you willing to do to get to SAS Global Forum?!

Share your videos and photos that represent your efforts to get to SASGF in Orlando. We’ll have some fun seeing how our fellow SAS Users spend their non-SAS-coding time. I’m looking forward to seeing new faces and new places.

Simply follow @SASsoftware on Twitter and Instagram, then post your video, photo or gift. Make sure you tag your post with the #GetToSASGF and @SASsoftware. 

Share more than one, encourage your fellow SAS Users to play along. And check back often to see what your peers have shared.

Who knows, you may even see your picture or video on the Big Screen at SASGF 2017!

 

What are you willing to do to get to SAS Global Forum? was published on SAS Users.

2月 252017
 

As a practitioner of visual analytics, I read the featured blog of ‘Visualizations: Comparing Tableau, SPSS, R, Excel, Matlab, JS, Python, SAS’ last year with great interest. In the post, the blogger Tim Matteson asked the readers to guess which software was used to create his 18 graphs. My buddy, Emily Gao, suggested that I should see how SAS VA does recreating these visualizations. I agreed.

SAS Visual Analytics (VA) is better known for its interactive visual analysis, and it’s also able to create nice visualizations. Users can easily create professional charts and visualizations without SAS coding. So what I am trying to do in this post, is to load the corresponding data to SAS VA environment, and use VA Explorer and Designer to mimic Matteson’s visualizations.

I want to specially thank Robert Allison for his valuable advices during the process of writing this post. Robert Allison is a SAS graph expert, and I learned a lot from his posts. I read his blog on creating 18 amazing graphs using purely SAS code, and I copied most data from his blog when doing these visualization, which saved me a lot time preparing data.

So, here’s my attempt at recreating Matteson’s 18 visualization using SAS Visual Analytics.

Chart 1

This visualization is created by using two customized bar charts in VA, and putting them together using precision layout so it looks like one chart. The customization of bar charts can be done by using the ‘Custom Graph Builder’ in SAS VA, which includes: set the reverse order for X axis, set the axes direction to horizontal, and don’t show axis label for X axis and Y axis, uncheck the ‘show tick marks’, etc. Comparing with Matteson’s visualization, my version has the tick values on X axis displayed as non-negative numbers, as people generally would expect positive value for the frequency.

Another thing is, I used the custom sort for the category to define the order of the items in the bar chart. This can be done by right click on the category and select ‘Edit Custom Sort…’ to get the desired order. You may also have noticed that the legend is a bit strange for the Neutral response, since it is split into Neutral_1stHalf and Neutral_2ndHalf, which I need to gracefully show the data symmetrically in the visualization in VA.

Chart 2

VA can create a grouped bar chart with desired sort order for the countries and the questions easily. However, we can only put the questions texts horizontally atop of each group bar in VA. VA uses vertical section bar instead, with its tooltip to show the whole question text when the mouse is hovered onto it. And we can see the value of each section in bar interactively in VA when hovering the mouse over.

Chart 3

Matteson’s chart looks a bit scattered to me, while Robert’s chart is great at label text and markers for the scatterplot matrix. Here I use VA Explorer to create the scatterplot matrix for the data, which omitted the diagonal cells and its diagonal symmetrical part for easier data analysis purpose. It can then be exported to report, and change the color of data points.

Chart 4

I used the ‘Numeric Series Plot’ to draw this chart of job losses in recession. It was straightforward. I just adjust some setting like checking the ‘Show markers’ in the Properties tab, unchecking the ‘Show label’ in X Axis and unchecking the ‘Use filled markers’, etc. To make refinement of X axis label of different fonts, I need to use the ‘Precision’ layout instead of the default ‘Tile’ layout. Then drag the ‘Text’ object to contain the wanted X axis label.

Chart 5

VA can easily draw the grouped bar charts automatically. Disable the X axis label, and set the grey color for the ‘Header background.’ What we need to do here, is to add some display rules for the mapping of color-value. For the formatted text at the bottom, use the ‘Text’ object. (Note: VA puts the Age_range values at the bottom of the chart.)

Chart 6

SAS VA does not support drawn 3D charts, so I could not make similar chart as Robert did with SAS codes. What I do for this visualization, is to create a network diagram using the Karate club dataset. The grouped detected communities (0, 1, 2, 3) are showing with different colors. The diagram can be exported as image in VAE.

***I use the following codes to generate the necessary data for the visualization:

http://support.sas.com/documentation/cdl/en/procgralg/68145/HTML/default/viewer.htm#procgralg_optgraph_examples07.htm 

/* Dataset of Zachary’s Karate Club data is from: http://support.sas.com/documentation/cdl/en/procgralg/68145/HTML/default/viewer.htm#procgralg_optgraph_examples07.htm  
This dataset describes social network friendships in karate club at a U.S. university.
*/
data LinkSetIn;
   input from to weight @@;
   datalines;
 0  9  1  0 10  1  0 14  1  0 15  1  0 16  1  0 19  1  0 20  1  0 21  1
 0 23  1  0 24  1  0 27  1  0 28  1  0 29  1  0 30  1  0 31  1  0 32  1
 0 33  1  2  1  1  3  1  1  3  2  1  4  1  1  4  2  1  4  3  1  5  1  1
 6  1  1  7  1  1  7  5  1  7  6  1  8  1  1  8  2  1  8  3  1  8  4  1
 9  1  1  9  3  1 10  3  1 11  1  1 11  5  1 11  6  1 12  1  1 13  1  1
13  4  1 14  1  1 14  2  1 14  3  1 14  4  1 17  6  1 17  7  1 18  1  1
18  2  1 20  1  1 20  2  1 22  1  1 22  2  1 26 24  1 26 25  1 28  3  1
28 24  1 28 25  1 29  3  1 30 24  1 30 27  1 31  2  1 31  9  1 32  1  1
32 25  1 32 26  1 32 29  1 33  3  1 33  9  1 33 15  1 33 16  1 33 19  1
33 21  1 33 23  1 33 24  1 33 30  1 33 31  1 33 32  1
;
run;
/* Perform the community detection using resolution levels (1, 0.5) on the Karate Club data. */
proc optgraph
   data_links            = LinkSetIn
   out_nodes             = NodeSetOut
   graph_internal_format = thin;
   community
      resolution_list    = 1.0 0.5
      out_level          = CommLevelOut
      out_community      = CommOut
      out_overlap        = CommOverlapOut
      out_comm_links     = CommLinksOut;
run;
 
/* Create the dataset of detected community (0, 1, 2, 3) for resolution level equals 1.0 */ 
proc sql;
	create table mylib.newlink as 
	select a.from, a.to, b.community_1, c.nodes from LinkSetIn a, NodeSetOut b, CommOut c
 	where a.from=b.node and b.community_1=c.community and c.resolution=1 ;
quit;

Chart 7

I created this map using the ‘Geo Coordinate Map’ in VA. I need to create a geography variable by right clicking on the ‘World-cities’ and selecting Geography->Custom…->, and set the Latitude to the ‘Unprojected degrees latitude,’ and Longitude to the ‘Unprojected degrees longitude.’ To get the black continents in the map, go to VA preferences, check the ‘Invert application colors’ under the Theme. Remember to set the ‘Marker size’ to 1, and change the first color of markers to black so that it will show in white when application color is inverted.

Chart 8

This is a very simple scatter chart in VA. I only set transparency in order to show the overlapping value. The blue text in left-upper corner is using a text object.

Chart 9

To get this black background graph, set the ‘Wall background’ color to black. Then change the ‘Line/Marker’ color in data colors section accordingly. I’ve also checked the ‘Show markers’ option and changed the marker size to bigger 6.

Chart 10

There is nothing special for creating this scatter plot in VA. I simply create several reference lines, and uncheck the ‘Use filled markers’ with smaller marker size. The transparency of the markers is set to 30%.

Chart 11

In VA’s current release, if we use a category variable for color, the marker will automatically change to different markers for different colors. So I create a customized scatterplot using VA Custom Graph Builder, to define the marker as always round. Nothing else, just set the transparency to clearly show the overlapping values. As always, we can add an image object in VA with precision layout.

Chart 12

I used the GEO Bubble Map to create this visualization. I needed to create a custom Geography variable from the trap variable using ‘lat_deg’ and ‘lon_deg’ as latitude and longitude respectively. Then rename the NumMosquitos measure to ‘Total Mosquitos’ and use it for bubble size. To show the presence of west nile virus, I use the display rule in VA. I also create an image to show the meaning of the colored icons for display rule. The precision layout is enabled in order to have text and images added for this visualization.

Chart 13

This visualization is also created with GEO bubble map in VA. First I did some data manipulation to make the magnitude squared just for the sake of the bubble size resolution, so it shows contrast in size. Then I create some display rules to show the significance of the earth quakes with different colors, and set the transparency of the bubble to 30% for clarity. I also created an image to show the meaning of the colored icons.

Be aware that some data manipulation is needed for original longitude data. Since the geographic coordinates will use the meridian as reference, if we want to show the data of American in the right part, we need to add 360 to the longitude, whose value is negative.

Chart 14

My understanding that one of the key points of this visualization Matteson made, is to show the control/interaction feature. Great thing is, VA has various control objects for interactive analysis. For the upper part in this visualization, I simply put a list table object. The trick here is how to use display rule to mimic the style. Before assigning any data to the list table in VA, I create a display rule with Expression, and at this moment we can specify the column with any measure value in an expression. (Otherwise, you need to define the display rule for each column with some expressions.) Just define ‘Any measure value’ is missing or greater than a value with proper filled color for cell. (VA doesn’t support filling the cell with certain pattern like Robert did for missing value. Therefore, I use grey for missing value to differentiate from 0 with a light color.)

For the lower part, I create a new dataset for interventions to hold the intervention items, and put it in the list control and a list table. The right horizontal bar chart is a target bar chart with the expected duration as the targeted value. The label on each bar shows the actual duration.

Chart 15

VA does not have solid-modeling animation like Matteson made in his original chart, yet VA has animation support for bubble plots in an interactive mode. So I made this visualization using Robert’s animation dataset, trying to make an imitation of the famous animation by the late Hans Rosling as a memorial. I set the dates for animation by creating the dates variable with the first day in each year (just for simplicity). One customization here is: I use the custom graph builder to add a new role so that it can display the data label in the bubble plot, and set the country name as the bubble label in VA Designer. Certainly, we can always filter the interested countries in VA for further analysis.

VA can’t show only a part of the bubble labels as Robert did using SAS codes. So in order to clearly show the labels of those interested countries, I made a rank of top 20 countries of average populations, and set a filter to show data between year 1950 to 2011. I use a capture screen tool to have the animation saved as a .gif file. Be sure to click the chart to see the animation.

Chart 16

I think Matteson’s original chart is to show the overview axis in the line chart, since I don’t see specialty of the line chart otherwise. So I draw this time series plot with the overview axis enabled in VA using the SASHELP.STOCK dataset. It shows the date on X axis with tick marks splitting to months, which can be zoomed in to day level in VA interactively. The overview axis can do the zooming in and out, as well as movement of the focused period.

Chart 17

For this visualization, I use a customized bubble plot (in Custom Graph Builder, add a Data Label Role for Bubble Plot.) so it will have bubble labels displayed. I use one reference line with label of Gross Avg., and 2 reference lines for X and Y axis accordingly, thus it visually creats four quadrants. As usual, add 4 text objects to hold the labels at each corner in the precision layout.

Chart 18

I think Matteson made an impressive 3D chart, and Robert recreated a very beautiful 3D chart with pure SAS codes. But VA does not have any 3D charts. So for this visualization, I simply load the data in VA, and drag them to have a visualization in VAE. Then choose the best fit from the fit line list, and export the visualization to report. Then, add display rules according to the value of Yield. Since VA shows the display rules at information panel, I create an image for colored markers to show them as legend in the visualization and put it in the precision layout.

There you have it. Matteson’s 18 visualizations recreated in VA.

How did I do?

18 Visualizations created by SAS Visual Analytics was published on SAS Users.

2月 242017
 

Editor's note: Amanda Farnsworth is Head of Visual Journalism at BBC News and a featured speaker at SAS Global Forum 2017, April 2-5, 2017 in Orlando.

There was a best selling book some years ago called “Men are from Mars and Women are from Venus.” It’s a phrase I thought about a lot when I first started my current job, not so much in the gender sense, but because it can be really challenging to bring together teams with very different experiences, skillsets and, above all, cultures.

Different strokes

In 2013, I was asked to form a new department – Visual Journalism – bringing together Online Designers, TV Designers and Online Journalists with an aptitude for graphics and visuals as well as Developers who worked with me but not for me. These included people staffing the many Language Services of the BBC World Service. The different teams produced content for the BBC News website, TV 24hr News Channels and Bulletins.

And boy, were they all different!

The digital folk were very creative but in a controlled way.  They worked with a set visual design language and liked to do this as a structured process which involved a lot of user testing focusing on what the audience would understand and how would they behave when faced with some of our visual content.

Meanwhile, those with a TV background liked to work in a much more fluid way – creative workshops and experimentation - with less audience focus, as it was so much harder to get proper audience feedback on the visual elements of a TV report that viewers would get a single chance to see and take in.

And, as everyone who has gone through change knows, it can be a scary and difficult time for many of those involved. I have always found The Transition Curve one of the most useful things I ever learnt in a management course, helping me to identify how different parts of the team might be handling change. I stuck this diagram on a filing cabinet next to my desk when I created the Visual Journalism team.

And there was one other thing: I had a predominantly TV background and I was now being asked to lead a team that was packed full of digital experts. I’d always prided myself on my technical as well as editorial ability, and now I was less technically skilled than most of the people working for me. How was I going to cope with that?

New beginnings

The first thing I did was to offer 30-minute one-on-ones with my staff.  About 60% agreed. I sent them a questionnaire to fill in in advance and asked them these  three questions:

1.    What Single Thing could we do very quickly that would change things for the better?

2.    How can the new Visual Journalism team work better together?

3.    What new tools do you need to do your job?

It proved to be a treasure trove of information with some interesting thoughts and great suggestions – here are a few examples of Q1 answers:

“A single management structure for the whole team. Sometimes different disciplines within the team clash as they are pulled in different directions by the priorities of their respective managers. This wastes time and creates unnecessary tensions.”

“Unfortunately we have a rather corrosive habit of 'rumour control' which is usually of a negative nature, particularly in this time of change and uncertainty. I think 'rumour control' can easily be reduced by providing as much information as possible ( good and bad ) so none is left to be made up!”

“I would like to see a re-evaluation of the planning area. A map that just happens to be going on air tomorrow, Should that be taking up a slot in planning? Maybe planning should be more focussed on projects that are moving us and our journalism forward. “

And from question 2:

“One word: flexibility. The teams need to absorb the concept that we have one goal, the individual outputs need to grasp this to. A respect for the established disciplines is all well and good, but tribalism needs to be left behind.”

So I had a lot of face time with a lot of staff.  They all appreciated the dedicated time, but it also gave me a chance to meet them individually. The questionnaires gave me a written record of all their top concerns which I could refer to in the coming months and use as a justification or guide for change. And I could say after six months that I had done a lot of the things they had asked for along with other things that I felt needed to be done.

In addition, I wanted the teams to meet each other.  So, we held a Speed Dating session. We made two long rows of chairs facing each other and sat TV people on one side and online people on the other.  They had one minute to say what they did and one minute to listen to the person opposite them share the same before I sounded a horn and everyone moved down one chair. It was a bit chaotic and a little hysterical to watch, but proved to be a great way of breaking the ice between the teams.

After a month in which I also immersed myself in the work of the various teams with a series of show and tells and shadowing days, and asked external stakeholders what they wanted from the new department, I drew up my vision.

It’s main message was that we were now a cross-platform team who needed to share ideas, information, skills and assets to create great, innovative content across TV and Digital.

The build phase

Even as we began the process of real change, the outside world suddenly started to move quickly. More and more of our news website traffic started to come from mobile, not desktop devices, and the distinction between what was TV and what was Digital began to blur, with the use of more video and motion graphics online. Social media platforms proliferated and became a key way of reaching an audience that didn’t usually access BBCTV or Digital content. We found ourselves on the cutting edge of where TV meets the web. And we had to make the most of it.

I began a series of internal attachments where online and TV designers learnt each other’s skills. I supplemented that with training so they could learn new software tools and design techniques. The lines between journalists and designers also began to fade, with many editorial people learning motion graphics skills for use on the increasingly important social media platforms.

I also encouraged and stood the cost of people spending a month outside the department learning how other parts of the BBC News machine worked and help spread the word that the new dynamic Visual Journalism department wanted to partner up and do big high impact, cross platform projects.

I revamped our Twitter feed, offered myself and other colleagues for public speaking at conferences and made sure we entered our best work for awards.

Quite quickly this all began to pay dividends. We won a big data journalism prize and we formed some big external partnerships with universities doing interesting research and with official bodies like the Office for National Statistics. We received a big investment for more data journalism from our Global division and from BBC Sport who wanted to do some big data led projects around the World Cup and Olympics.

Social glue was also important. We instituted a now legendary annual pot-luck Christmas lunch where the tables groaned with the amazing food people brought in to share. The Christmas jumpers are always impressive and we hold a raffle and quiz too.

There was, and still is, a major job to do just listening and looking after the staff. I make a point of praising and rewarding great work. We don’t have a great deal of flexibility on pay at the BBC, but rewards like attending international conferences, getting training opportunities and receiving some retail vouchers from the scheme the BBC runs all help.  I also always facilitate flexible working as much as is humanly possible, not just for women returning to work after maternity leave, but for caregivers, people who want to work part-time and most recently for two new dads who are going to take advantage of the paternal leave scheme and be the sole parent at home for six months while their wives return to work.

I also write an end-of-year review and look ahead to the next 12 months that I send to all staff.  It outlines achievements and great content we have made but also the aims, objectives and challenges for the year ahead.

Not all plain sailing

Of course there were and still are some issues. As the Transition Curve shows, not everyone is going to follow you and embrace the change you bring. Team members who have been expert in their fields and are happy doing what they do suddenly find they have to learn new things and can feel de-skilled.  By definition, they cannot be an immediate expert at something new that they are asked to do and that can be difficult.

As roles and responsibilities blurred, we found we had to redefine the production process for online content as people became unsure of their roles.

Meanwhile such was the external reputation of the team, we suffered a brain drain to Apple, Amazon and Adidas.

And for me, as the department grew to over 160 people when I took on responsibility for the Picture Editors who edit the video for news and current affairs reports, I had to accept I was going to be more of an enabler and provider of editorial oversight than a practitioner.  Technology was moving so fast, while I had to know and understand it, actually being able to create content myself was going to be a rare occurrence.

Conclusion

Writing this post has helped me see just how far we’ve come as a department in a few short years. It’s certainly not perfect and the challenges we face are ever–changing.  But we have now won over 25 awards across all platforms and the cross-platform vision is embedded in the teams who really enjoy learning from each other and working on projects together.

And, I have a secret weapon.  I enjoy singing pop songs at my desk everyday and of course Carols at Christmas.

Trying not to encourage me to sing is something literally everyone can unite behind.

Bringing teams together was published on SAS Users.

2月 212017
 

The SAS® Output Delivery System provides the ability to generate output in various destination formats (for example, HTML, PDF, and Excel). One of the more recent destinations, ODS Excel, became production in the third maintenance release for SAS 9.4 (TS1M3). This destination enables you to generated native Microsoft Excel formatted files, and it provides the capability to generated worksheets that include graphics, tables, and text. If you generate spreadsheets, then the ODS Excel destination (also known as just the Excel destination) might be just the tool you're looking for to help enhance your worksheets.

This post begins by discussing the following Excel destination options that are useful for enhancing the appearance of worksheets:

  • START_AT=
  • FROZEN_HEADERS= and FROZEN_ROWHEADERS=
  • AUTOFILTER=
  • SHEET_NAME=
  • ROW_REPEAT=
  • EMBEDDED_TITLES=

The discussion also covers techniques for adding images to a worksheet as well as a tip for successfully navigating worksheets. Finally, the discussion offers tips for moving to the use of the Excel destination if you currently use one of the older ODS destinations (for example, the ExcelXP destination) and information about suggested hot fixes.

Using Excel Destination Options to Enhance the Appearance of Your Microsoft Excel Worksheet

There are certain ODS Excel destination options that you could conceivably add to any program that would make it easier for your users to navigate your worksheets.

These options include the following:

  • START_AT= option
  • FROZEN_HEADERS= and FROZEN_ROWHEADERS= options
  • AUTOFILTER= option
  • EMBEDDED_TITLE= option

The following example uses all of the options described above. In this example, filters are added only to the character fields.

ods excel file="c:temp.xlsx" options(start_at="3,3"
frozen_headers="5"
frozen_rowheaders="3"
autofilter="1-5"
sheet_name="Sales Report"
row_repeat="2"
embedded_titles="yes");
 
proc print data=sashelp.orsales; 
title "Sales Report for the Year 1999";
run;
 
ods excel close;

In This Example

  • The START_AT= option enables you to select the placement of the initial table or graph on the worksheet. In Microsoft Excel, by default, a table or graph begins in position A1. However, pinning the table or graph in that position does not always provide the best visual presentation.
  • The FROZEN_ HEADERS= option locks the table header in your table while the FROZEN_ROWHEADERS= option locks row headers. Both of these options lock your headers and row headers so that they remain visible as you scroll through the values in the table rows and columns.
  • The AUTOFILTER= option enables you to add filters to tables headers so that you can filter based on the value of a particular column.
  • The SHEET_NAME= option enables you to add more meaningful text to the worksheet tab.
  • The ROW_REPEAT= option makes your report more readable by repeating the rows that you specify in the option. If this report is ever printed, specifying the rows to repeat, in this case the column headers would allow for easy filtering of the data.
  • The EMBEDDED_TITLE= option specifies that the title that is specified in the TITLE statement should be displayed.

Output

Using the Excel Destination to Add and Update Images

Microsoft Excel is widely known and used for its ability to manipulate numbers. But if you want to go beyond just numbers, you can make your worksheets stand out by adding visual elements such as images and logos.

Graphic images that you generate with ODS Graphics and SAS/GRAPH® software (in SAS 9.4 TS1M3) are easy to add to a worksheet by using the Excel destination. However, the addition and placement of some images (for example, a logo) can take a bit more work. The only fully supported method for adding images other than graphics is to add an image as a background image.

The next sections discuss how you can add various types of images to your worksheet.

Adding Background Images

You can add images to the background of a worksheet by using either the TEMPLATE procedure or cascading style sheets. With PROC TEMPLATE, you add background images by using the BACKGROUNDIMAGE= attribute within the BODY style element. You also must specify the BACKGROUND=_UNDEF attribute to remove the background color. With a cascading style sheet, you use the BACKGROUND-IMAGE style property.

The following example illustrates how to add a background image using PROC TEMPLATE:

proc template; 
define style styles.background;
parent=styles.excel;
class body / background=_undef_
backgroundimage="c:background.jpg.";
end;
run;
 
ods excel file="c:temp.xlsx"
options(embedded_titles="yes" start_at="5,5"
sheet_name= "Sheet1") style=styles.background;
 
proc report data=sashelp.prdsale spanrows;
title "Expense Report Year 2016";
column country product actual predict; 
define country / group;
define product / group;
rbreak after / summarize;
run;
 
ods excel close;

In This Example

  • PROC TEMPLATE uses the BACKGROUNDIMAGE= attribute within the BODY style element of the CLASS statement to include the image.
  • The BACKGROUND=_UNDEF attribute removes the background color.

As you can see in the following output, Excel repeats (or, tiles) images that are used as a background.  Excel repeats the image across the width of the worksheet.

Output

But this method of tiling might not be what you want. For example, you might want your image to cover the entire worksheet. To prevent the background image from being tiled, you can insert the image into an image editor (for example, Microsoft Paint) and enlarge the background image so that it covers the full page. You can also create a canvas (that is, a page) in the image editor and then add your background image to the canvas and save it. The Excel destination does not support transparency, a property in where the background image is visible through an image. However, you can use PROC TEMPLATE to simulate transparency by removing the background colors of the various cells. When you use any of the methods described above, your output includes an image that covers the full page.

The following example uses the PROC TEMPLATE method to create the background image and remove the background colors of the cells:

proc template;
define style styles.background;
parent=styles.excel;
class body / background=_undef_ backgroundimage="C:background_large.jpg";
class header, rowheader, data / color=white 
borderwidth=5pt
bordercolor=white
borderstyle=solid
background=_undef_;
end;
run;
 
ods excel file="c:temp.xlsx" options(embedded_titles="yes"
start_at="5,5"
sheet_name="Sheet1") 
style=styles.background;
 
proc report data=sashelp.prdsale spanrows;
title "Expense Report Year 2016";
column country product actual predict;
define country / group;
define product / group;
rbreak after / summarize;
run;
 
ods excel close;

In This Example

  • First, the image was included in Microsoft Paint to enlarge it.
  • Then, PROC TEMPLATE uses the BACKGROUNDIMAGE= attribute within the BODY style element of the CLASS statement to include the enlarged image.

Output

Adding External Images to the Worksheet

Currently, the Excel destination does not support adding external images on a per-cell basis.

However, you can add external images (for example, a company logo) using either of the following methods:

You can accomplish this task in the following ways:

  • manually add an image using an image editor
  • use the GSLIDE procedure with the GOPTIONS statement
  • use the %ENHANCE macro.

Adding an Image with an Image Editor

Using an image editor such as Microsoft Paint, you can place an image (for example, a logo) wherever you want it on the worksheet. In the following display, the image is brought into the Paint application and moved to the top left of a canvas.

After you save this image, you can include it in an Excel worksheet as a background image using the BACKGROUNDIMAGE= attribute, which displays the logo without repeating it.

proc template; 
define style styles.background_kogo;
parent=styles.excel;
class body / background=_undef_
backgroundimage="c:tempbackground_logo";
end;
run;
 
ods excel file="c:temp.xlsx" style=styles.background_logo;
proc print data=sashelp.class;
run;
ods excel close;

Output

Adding an Image Using the GOPTIONS Statement with the GSLIDE Procedure

You can also use the GOPTIONS statement and PROC GSLIDE procedure with the Excel destination to add a logo on the worksheet. This method requires SAS/GRAPH software to be licensed and installed.

To add a background image to the graph display area of PROC GSLIDE output, specify the IBACK= option in the GOPTIONS statement, as shown in the following example:

ods excel file="c:temp.xlsx" options(sheet_interval="none");
 
goptions iback="c:sas.png" imagestyle=fit vsize=1in hsize=2in;
 
proc gslide;
run;
 
proc report data=sashelp.class;
run;
 
ods excel close;

 

In This Example

  • The GOPTIONS statement with the IBACK= option adds a background image to the graph display area.
  • The IMAGESTYLE=FIT option keeps the image from repeating (tiling).
  • The VSIZE= and HSIZE= options modify the size of the image.
  • The Excel destination suboption SHEET_INTERVAL="NONE" specifies that the image and report output are to be added to the same worksheet.

Output

Adding an Image Using the %EXCEL_ENHANCE Macro

The %EXCEL_ENHANCE is a downloadable macro that enables you to place images on a worksheet in an exact location by using a macro parameter. The macro creates VBScript code that inserts your image in a specified location on the worksheet during post-processing.

The following example uses the %EXCEL_ENHANCE macro to add an image to a workbook.

Note: This method is limited to Microsoft Windows operating environments.

%include "C:excel_macro.sas";
%excel_enhance(open_workbook=c:temp.xlsx, 
insert_image=%str(c:SAS.png#sheet1!a1,
c:canada.jpg#sheet1!b5,
c:germany.jpg#sheet1!b10,
c:usa.jpg#sheet1!b15),
create_workbook=c:temp_update.xlsx,
file_format=xlsx);

In This Example

  • The %INCLUDE statement includes the %EXCEL_ENHANCE macro into your program.
  • The %EXCEL_ENHANCE macro uses the INSERT_IMAGE= parameter to insert an image into the worksheet at a specified location. You can also specify multiple images, but they must be separated by commas.

The INSERT_IMAGE= option uses the following syntax in the %STR macro function to pass the image.

image-location #sheet-name ! sheet-position
  • The OPEN_WORKBOOK= parameter specifies the location of the workbook in which you want to add an image.
  • The CREATE_WORKBOOK= parameter creates a new workbook that includes your changes.
  • The FILE_FORMAT= parameter enables you to specify the extension or format for the files that are created.

Output

Navigating a Microsoft Excel Workbook

When you generate an Excel workbook with the Excel destination, the best way to navigate the workbook is by creating a table of contents. You can create a table of contents by using the Excel destination's CONTENTS= option. You can also use the PROC ODSLABEL statement to modify the table-of-contents text that is generated by the procedure. In the following example, that text (The Print Procedure) is generated by the PRINT procedure.

ods excel file="c:temp.xls" options(embedded_titles="yes"
 contents="yes");
 
ods proclabel= "Detail Report of Males";
 
proc print data=sashelp.class(where=(sex="M"));
title link="#'The Table of Contents'!a1"  "Return to TOC";
run;
 
ods proclabel= "Detail Report of Females";
 
proc print data=sashelp.class(where=(sex="F"));
title link="#'The Table of Contents!a1'"  "Return to TOC";
run;
 
ods excel close;

 

In This Example

  • The CONTENTS= option is included in the ODS EXCEL statement to create a table of contents. You can also use the INDEX= suboption (not shown in the example code) to generate an abbreviated output. These options create the first worksheet within a workbook and include an entry to each element that is generated in the workbook.
  • The ODS PROCLABEL statement is used to modify the table-of-contents text that is generated by the procedure name. In this example, the text The Print Procedure (generated by the two PROC PRINT steps) is modified to Detail Report of Males and Detail Report of Females, respectively.
  • You can also modify the secondary link by using the CONTENTS= option in the procedure statements for the PRINT, REPORT, and TABULATE procedures.
  • The LINK= option in the TITLE statement adds a link that returns you to Table of Contents navigation page. You can also use this option in a FOOTNOTE statement. The argument that you specify for the LINK= option is the sheet name for the Table of Contents page. You can also add a link by using the Microsoft Excel hyperlink function in the ODS TEXT= statement using the Excel Hyperlink function.

Output

The output below shows the Table of Contents navigation page.

The next output shows a page in the report that contains the Return to TOC link.

Using the ODS Excel Destination instead of Older Destinations

Currently, you might be using older destinations (for example, the MSOffice2K or the ExcelXP tagsets).  If you decide to move to the ODS Excel destination, you'll notice differences related to style, options, and wrapping between it and other destinations.

  • One difference is that the Excel destination uses the Excel style, which looks similar to the HTMLBlue style. Other destinations use different styles (for example, the ExcelXP tagset uses styles.default.
  • Certain options that are valid with the ExcelXP tagset are not valid in the Excel destination.
  • Another difference that you'll notice right away is how the text is wrapped by the Excel destination. By default, the Excel destination uses an algorithm to try to wrap columns in order to prevent overly wide columns. When text wraps, a hard return is added automatically to the cell (similar to when you either press Alt+ Enter from the keyboard under Windows or you submit a carriage-return line feed [CRLF] character). You can prevent the Excel destination from adding this hard return in wrapping by specifying a width value that is large enough so that text does not wrap. You can also use the Excel destination's new FLOW= suboption, which is available in the fourth maintenance release for SAS 9.4 (TS1M4). This option contains the parameters TABLES, ROWHEADERS, DATA, TEXT, and a range (for example, A1:E20). When you specify the parameter TABLES, that automatically includes the HEADERS, ROWHEADERS, and DATA parameters.

The following example demonstrates how to prevent the Excel destination from automatically generating a hard return for wrapped text in SAS 9.4 TS1M4.

data one;
var1="Product A in Sports";
var2="Product B in Casual";
label var1="Product Group for All Brands in Region 1";
label var2="Product Group for All Brands in Region 2";
run;
 
ods excel file="c:temp.xlsx" options(flow="tables");
 
proc print data=one label;
run;
 
ods excel close;

In This Example

  • The first table shown in the output below is created by default. As a result, the header wraps in the formula bar where the CRLF character is added.
  • The second table in the output is generated with the FLOW="TABLES" suboption (in the ODS EXCEL statement) to prevent the destination from adding the CRLF character to table headers, row headers, and data cells. When you add this option, Microsoft Excel text wrapping is turned on, by default.

Output

Table that is created by default:

Table that is created by including the FLOW="TABLES" suboption in the ODS EXCEL statement:

Hot Fixes for the Excel Destination

If you run SAS 9.4 TS1M3 and you use the ODS Excel destination, see the following SAS Notes for pertinent hot fixes that you should apply:

  • SAS 56878, "The ODS destination for Excel generates grouped worksheets when multiple sheets are produced"
  • SAS Note 57088, "The error 'applied buffer too small for requested data' might be generated when you use the ODS destination for Excel"

Resources

Bessler, Roy. 2015. "The New SAS ODS Excel Destination: A User Review and Demonstration."
Proceedings of the Wisconsin, Illinois SAS Users Group. Milwaukee, WI.

Huff, Gina. 2016. "An 'Excel'lent Journey: Exploring the New ODS EXCEL Statement."
Proceedings of the SAS Global Forum 2016 Conference. Cary, NC: SAS Institute Inc.

Parker, Chevell. 2016. "A Ringside Seat: The ODS Excel Destination versus the ODS ExcelXP Tagset."
Proceedings of the SAS Global Forum 2016 Conference. Cary, NC: SAS Institute Inc.

Tips for using the ODS Excel Destination was published on SAS Users.

2月 142017
 

In this post I wanted to shed some light on a visualization you may not be using enough: the Word Cloud. Word association exercises can often be a fun way to pass the time with friends, or it can trigger immediate action – just think of your email inbox and seeing an email from a particular person: your boss, wife, husband or child. The same can be true for information for your organization. A single word can quickly, efficiently and effectively communicate the performance of a company’s metric, hence the value of using a word cloud visualization in your report.

Let’s look at some examples. Here I am using the Insight Toy data and looking at the performance of Products based on customer orders.

As the word cloud in SAS Visual Analytics 7.3 Designer has a maximum row return of 100, I have used the Rank feature to look at the top 25 Products and the bottom 25 Products. I also created a filtered interaction between the word clouds and their respective list tables below to show a bit more detail around the next level in the hierarchy after Product Make.

Notice how impactful these Product names are compared to when using their corresponding SKUs. Be sure to pick a meaningful category to represent your data in the word cloud.

This type of visualization could lead to a great comparison report, comparing what the top and bottom Products were for the same month in the previous year.

What if your data doesn’t have the appropriate column to display on a word cloud? No problem. In this next example, I took the value of Sales Rep Rating and created a new Calculated Data Item to represent values less than or equal to 25% to be Poor, inclusively between 26% and 50% to be Average and everything else to be Above Average.

Using a word cloud for this new category data item allows you to quickly move through the different states and compare the Sales Rep Performance frequency. You could also use this new category to compare each performance group’s Order Totals.

Here is California’s Sales Rep Performance:

And here is Maryland’s Sales Rep Performance:


These are two ideas for you to think about how you might include the word cloud visualization into your reports to help quickly and effectively represent the status of a company’s metric beyond the standard text analytics usage.

tags: SAS Professional Services, SAS Visual Analytics

Visualization Spotlight: Visual Analytics Designer 7.3 Word Cloud was published on SAS Users.

2月 142017
 

Editor's note: This following post is from Shara Evans, CEO of Market Clarity Pty Ltd. Shara is a featured speaker at SAS Global Forum 2017 and a globally acknowledged Keynote Speaker and widely regarded as one of the world’s Top Female Futurists.

Learn more about Shara.


In the movie Minority Report lead character John Anderton, played by Tom Cruise, has an eye transplant in order to avoid being recognized by ubiquitous iris scanning identification systems.

Such surgical procedures still face some fairly significant challenges, in particular connecting the optic nerve of the transplanted eye to that of the recipient. However the concept of pervasive individual identification systems is now very close to reality and although the surgical solution is already available, it’s seriously drastic!

We’re talking face recognition here.

Many facial recognition systems are built on the concept of “cooperative systems,” where you look directly at the camera from a pre-determined distance and you are well lit, and your photo is compared against a verified image stored in a database. This type of system is used extensively for border control and physical security systems.

Facial recognition

Face in the Crowd Recognition (Crowd walking towards camera in corridor) Source: Imagus

Where it gets really interesting is with “non-cooperative systems,” which aim to recognize faces in a crowd: in non-optimal lighting situations and from a variety of angles. These systems aim to recognize people who could be wearing spectacles, scarves or hats, and who might be on the move. An Australian company, Imagus Technology has designed a system that is capable of doing just that — recognizing faces in a crowd.

To do this, the facial recognition system compiles a statistical model of a face by looking at low-frequency textures such as bone structure. While some systems may use very high-frequency features such as moles on the skin, eyelashes, wrinkles, or crow’s feet at the edges of the eyes — this requires a very high-quality image. Whereas, with people walking past, there’s motion blur, non-optimal camera angles, etcetera, so in this case using low-frequency information gets very good matches.

Biometrics are also gaining rapid acceptance for both convenience and fraud prevention in payment systems. The two most popular biometric markers are fingerprints and facial recognition, and are generally deployed as part of a two-factor authentication system. For example, MasterCard’s “Selfie Pay” app was launched in Europe in late 2016, and is now being rolled out to other global locations. This application was designed to speed-up and secure online purchases.

Facial recognition is particularly interesting, because while not every mobile phone in the world will be equipped with a fingerprint reader, virtually every device has a camera on it. We’re all suffering from password overload, and biometrics - if properly secured, and rolled out as part of a multi-factor authentication process - can provide a solution to coming up with, and remembering, complex passwords for the many apps and websites that we frequent.

Its not just about recognizing individuals

Facial recognition systems are also being used for marketing and demographics. In a store, for example, you might want to count the number of people looking at your billboard or your display. You'd like to see a breakdown of how many males and females there are, age demographics, time spent in front of the ad, and other relevant parameters.

Can you imagine a digital advertising sign equipped with facial recognition? In Australia, Digital Out-of-Home (DOOH) devices are already being used to choose the right time to display a client’s advertising. To minimize wastage in ad spend, ads are displayed only to a relevant audience demographic; for instance, playing an ad for a family pie only when it sees a mum approaching.

What if you could go beyond recognizing demographics to analyzing people’s emotions? Advances in artificial intelligence are turning this science fiction concept into reality. Robots such as “Pepper” are equipped with specialized emotion recognition software that allows it to adapt to human emotions. Again, in an advertising context, this could prove to be marketing gold.

Privacy Considerations

Of course new technologies is always a double-edged sword, and biometrics and advanced emotion detection certainly fall into this category.

For example, customers typically register for a biometric payment system in order to realize a benefit such as faster or more secure e-commerce checkouts or being fast-tracked through security checks at airports. However, the enterprise collecting and using this data must in turn satisfy the customer that their biometric reference data will be kept and managed securely, and used only for the stated purpose.

The advent of advanced facial recognition technologies provides new mechanisms for retailers and enterprises to identify customers, for example from CCTV cameras as they enter shops or as they view public advertising displays. It is when these activities are performed without the individual’s knowledge or consent that concerns arise.

Perhaps most worrisome is that emotion recognition technology would be impossible to control. For example, anyone would be able to take footage of world leaders fronting the press in apparent agreement after the outcome of major negotiations and perhaps reveal their real emotions!

From a truth perspective, maybe this would be a good thing.

But, imagine that you’re involved in intense business negotiations. In the not too distant future advanced augmented reality glasses or contacts could be used to record and analyze the emotions of everyone in the room in real time. Or, maybe you’re having a heart-to-heart talk with a family member or friend. Is there such a thing as too much information?

Most of the technology for widespread exploitation of face recognition is already in place: pervasive security cameras connected over broadband networks to vast resources of cloud computing power. The only piece missing is the software. Once that becomes reliable and readily available, hiding in plain sight will no longer be an option.

Find out more at the SAS Global User Forum

This is a preview of some of the concepts that Shara will explore in her speech on “Emerging Technologies: New Data Sets to Interpret and Monetize” at the SAS Global User Forum:

  • Emerging technologies such as advanced wearables, augmented and virtual reality, and biometrics — all of which will generate massive amounts of data.
  • Smart Cities — Bringing infrastructure to life with sensors, IoT connections and robots
  • Self Driving Cars + Cars of the Future — Exploring the latest in automotive technologies, robot vision, vehicle sensors, V2V comms + more
  • The Drone Revolution — looking at both the incredible benefits and challenges we face as drones take to the skies with high definition cameras and sensors.
  • The Next Wave of Big Data — How AI will transform information silos, perform advanced voice recognition, facial recognition and emotion detection
  • A Look Into the Future — How the convergence of biotech, ICT, nanotechnologies and augmentation of our bodies may change what it means to be human.

Join Shara for a ride into the future where humans are increasingly integrated with the ‘net!

About Shara Evans

Technology Futurist Shara Evans is a globally acknowledged Keynote Speaker and widely regarded as one of the world’s Top Female Futurists. Highly sought after and in demand by conference producers and media, Shara provides the latest insights and thought provoking ideas on a broad spectrum of issues. Shara can be reached via her website: www.sharaevans.com

(Note: My new website will be launching in a few weeks. In the meantime, the URL automatically redirects to my company website – www.marketclarity.com.au )

tags: analytics, SAS Global Forum

Facial recognition: Monetizing faces in the crowd was published on SAS Users.

2月 102017
 

Small matters matter. Imagine saving (or spending wisely) just 1 second of your time every hour. One measly second! During your lifespan you would save or spend wisely (1 sec-an-hour * 24 hours-a-day * 365 days-a-year x 100 years) / (3600 seconds-an-hour * 24 hours-a-day) = 10 days, a whole two week vacation!

While truncation vs rounding may seem to be insignificant in a given instance, the cumulative effect of either could be truly enormous, whether it’s truncation vs rounding of decimal numbers or of the SAS time values presented below.

From my prior post Truncating decimal numbers in SAS without rounding, we know that SAS formats such as w.d, DOLLARw.d, and COMMAw.d do not truncate decimal numbers, but rather round them.

However, SAS time value formats are somewhat different. Let’s take a look.

Suppose we have a SAS time value of '09:35:57't. As a reminder, a SAS time value is a value representing the number of seconds since midnight of the current day. SAS time values are between 0 and 86400.

TIMEw.d Format

Let’s apply the TIMEw.d format to our time value and see what it does.

If you run the following SAS code:

data _null_;
	t = '09:35:57't;
	put t= time5.;
	put t= time2.;
run;

you will get in the SAS log:

t=9:35
t=9

which means that this format does truncate both seconds and minutes. Conversely, if rounding were taking place we would have gotten:

t=9:36
t=10

HHMMw.d Format

Let’s run the same SAS code with HHMMw.d format:

data _null_;
	t = '09:35:57't;
	put t= hhmm5.;
	put t= hhmm2.;
run;

SAS log will show:

t=9:36
t=9

What does that mean? It means that HHMMw.d format rounds seconds (in case of truncating I would expect to get t=9:35), but truncates minutes (in case of rounding I would expect to get t=10, as 35 minutes are closer to 10 than to 9). A bit inconsistent, at least for our purposes.

Truncating SAS time values

This little research above shows that out of the two formats, TIMEw.d and HHMMw.d, it is perfectly fine to use the TIMEw.d format for the purpose of SAS time value truncation, for both minutes and seconds.

Regardless of the format used, you can also truncate your time value computationally, before applying a format, by subtracting from that value a remainder of division of that value by 60 (for seconds truncation) or by 3600 (for minutes truncation). For example, the following code:

data _null_;
	t = '09:35:57't;
	t_m = t - mod(t,60);
	t_h = t - mod(t,3600);
	put t= hhmm5.;
	put t_m= hhmm5.;
	put t_h= hhmm5.;
run;

produces the following SAS log:

t=9:36
t_m=9:35
t_h=9:00

Rounding SAS time values

Now that we’ve learned both the computational method and the TIMEw.d format method of truncation, how do we go about rounding? As long as the format behavior is consistent we can use its truncating functionality to convert it into the rounding functionality. In order to do that we just need to increase the original time value by 60 (seconds) for seconds rounding, and by 3600 (seconds) for minutes rounding. Truncation of that new value is equivalent to rounding of the original value.

Let’s run the following SAS code:

data _null_;
	t = '09:35:57't;
	t_m = t + 60;
	t_h = t + 3600;
	put t_m= time5.;
	put t_h= time2.;
run;

SAS log will show:

t_m=9:36
t_h=10

which means that our original time value '09:35:57't was rounded in both cases – seconds rounding and minutes rounding.

Now you know how to truncate and how to round SAS time values. And don’t forget about your lifetime 2-week vacation opportunity by saving a second every hour; or make it 2 seconds per hour and enjoy the full month off.

tags: SAS Professional Services, SAS Programmers, tips & techniques

Truncating vs rounding SAS time values was published on SAS Users.

2月 102017
 

Since the SAS 9.4 M2 release in December 2014, there have been several refinements and updates to the middle tier that are of interest to installers and administrators. In this blog, I’m going to summarize them for you. What I’m describing here is available in the newest SAS release (9.4 M4). I’ll describe them at a high level, and refer you to the documentation for details and how to implement some of these changes.

Security enhancements

Preserve your TLS Customizations:
For security purposes, many of you will manually add TLS configurations, either to the SAS Web Server, the SAS Web Application Server, or both. In addition, you may prefer to use your own reverse proxy server (such as IIS), either instead of, or in addition to, the SAS Web Server. Before the 9.4 M4 release, when upgrading or applying maintenance, you had to undo these custom configurations, perform the upgrade, and then apply the custom configurations again. Now, the upgrade will preserve them, making the process much easier. See Middle-Tier Security in the Middle Tier Administration Guide, Fourth Edition for full details.

Newer versions of OpenSSL are now provided (see doc for specific version numbers):
A Java upgrade enables enforcement of TLSv2. TLS is now considered the security standard for https connections, (SSL is obsolete) and this can be enforced with configurations to the SAS Web Server and the SAS Web Application Server. The new version of Java SAS is using (Ver 1.7+) now allows for this. One important thing to be aware of is that certificates are completely independent of which protocol you are using, and therefore any certificates you may have been using with SSL should work equally with newer TLS protocols.

Management of the trusted CA (Certificate Authority) bundle:
SAS now has a trusted CA bundle, that can be managed by the SAS Deployment Manager, in a new location:  SASHome/SASSecurityCertificateFramework/1.1/cacerts/. The CA certificates can be root certificates, intermediate certificates, or both. Here’s what the menu item looks like:

Middle Tier Changes and Upgrades in SAS 9.4 M4

Previously it was necessary to manually add your root/intermediate certificates to the Java truststore “cacerts,” located inside the JRE; now it’s done through the new interface. If you are on Windows, you must also add trusted CAs to the Windows store (as before), which will make them available to any browsers running there. This is documented at http://www.sqlservermart.com/HowTo/Windows_Import_Certificate.aspx and elsewhere online.

Security Support for SAS Web Applications – white list external sites, and HTTP request methods:
For added security, web sites hosting SAS web applications can now maintain a white list of external URLs that are allowed to connect in. This provides protection against Cross Site Request Forgeries, and other vulnerabilities. This is what the prompt looks like in the SDW:

Middle Tier Changes and Upgrades in SAS 9.4 M4

HTTP request methods can also be specified as allowed/not allowed. The list of URLs can be specified during installation in the SDW (shown above), or using the SAS Management Console. You can disable whitelist checking entirely, and you can add a “blacklist” or specific sites to always block. You can also block based on request method–ie, GET, POST, PUT, etc. See the Middle Tier Administration Guide for details.

Forward Proxy Configuration:
You can now set up SAS web applications to forward external URL requests through a proxy–here it’s called a forward proxy server. Many organizations do this behind their firewalls. See details for how to set this up in the administration guide.

Other miscellaneous changes:
As an administrator you can now force users to Log Off using SAS Web Administration Console.    You can also send emails to one or more users from the same window.  This is what the menu looks like:

Middle Tier Changes and Upgrades in SAS 9.4 M4

Faster start-up time for the SAS Web Application Server

JMS Broker (ActiveMQ) now uses Version 5.12.2 (fixed bugs).

SAS Web Server now uses version 5.5.2 and includes an updated mod_proxy_connect module for TLS tunneling.

References

SAS 9.4 Intelligence Platform: Middle Tier Administration Guide, Fourth Edition

Encryption in SAS 9.4, Sixth Edition

 

tags: SAS 9.4, SAS Administrators, SAS Professional Services, security

Middle Tier Changes and Upgrades in SAS 9.4 M4 was published on SAS Users.

2月 032017
 

I will begin with a short story.

SAS Global Forum, Content is KingLike many employers, McDougall Scientific, my employer, requires its employees to review, with their co-workers and managers, what they learned at a conference or course. They are also asked to suggest applications of their learnings so that McDougall might realize value from the expense, both in time and money, of sending them to continuing education events.

Fei Wang, my co-worker, and I attended SAS Global Forum last year in Vegas. During her presentation to co-workers upon our return, Fei not only provided a comprehensive overview of the conference format, sessions, and learning opportunities, but she also chose one presentation to highlight that will fundamentally improve one of our business processes.

Although Fei attended many sessions and learned much, session 8480-2016, with thanks to Steven Black, will save McDougall enough time and money to dwarf the expenditure of sending Fei to SAS Global Forum.

“But John,” you might ask, “why not simply search the proceedings after the conference?” Well, because we would never think to search for CRF annotation automation. Innovation of this sort is more easily found by attending the conference. Discovering valuable nuggets like Steven’s idea is a common occurrence at SAS Global Forum.

The value that employers realize from SAS Global Forum is the reason “content is king,” a cliché first introduced by the magazine publishing industry in the mid-1970s.

Our speakers represent every region of the world!

Though there are a number of really great benefits from attending the conference, great content continues to reign supreme at SAS Global Forum.  This year’s conference is no different. The 2017 Content Advisory Team has assembled a stellar lineup of well over 600 sessions; invited speakers, contributed papers, hands-on workshops, tutorials and posters. And, I am very proud to report that 25 countries are contributing speakers this year, with every region of the world represented: North, Central, and South Africa, Europe, Australia, the Middle East, Asia and the Americas. This sort of global diversity brings new ideas and new ways of looking at and solving problems that really grows your knowledge and helps move your organization forward.

In addition to all of this great technical content, we have made special effort to organize sessions that help SAS Users better present their work. As Melissa Marshall famously claims, “Science not communicated is science not done.” Therefore, in keeping with the SAS Global Users Group’s mission to champion the needs of SAS users around the globe, here is a sampling of sessions that will help you better communicate.

The list starts with Melissa herself!

Present Your Science: Transforming Technical Talks
Session T108, Melissa Marshall, Principal, Melissa Marshall Consulting LLC

This versatile half-day workshop covers the full gamut: content strategy, slide design, and presentation delivery. With a dynamic combination of lecture, discussion, video analysis, and exercises, this workshop will truly transform how technical professionals present their work and will help foster a culture of improved communications throughout the SAS community.
Read More

How the British Broadcasting Corporation Uses Data to Tell Stories in a Visually Compelling Way
Session 0824, Amanda J Farnsworth, Head of Visual Journalism, BBC News

… data is often seen as a dry, detached, unemotional thing that's hard to understand and for many, easy to ignore. At the BBC, employees have been thinking hard about how to use data to tell stories in a visually compelling way that connects with audiences and makes them more curious about the world that we live in. And, there is an ever-increasing amount of data with which to tell those stories. Governments are publishing more big data sets about health, education, crime, and social makeup. Academics are generating huge amounts of data as a consequence of research. Businesses and other organizations conduct their own research and polling. The BBC’s aim is to take that data and make it relevant at a personal level, answering the audiences' number one question: what does this mean for me?
Read More

Convince Me: Constructing Persuasive Presentations
Session 0862, Frank Carillo, CEO and Anne Coffey, Senior Director, E.C.G. Inc.

Data outputs do not a persuasive argument make. Effective persuasion requires a combination of logic and emotion supported by facts. Statisticians dedicate their lives to analyzing data such that it is appropriate supporting evidence. While the appropriate evidence is essential to convince your listeners, you first have to be able to gain and maintain their attention and trust. Persuasive presentations fight for hearts and minds, and are not a dry, unbiased recitation of facts or analyses. This session is designed to provide suggestions for how to utilize successful structures and create emotional connections.
Read More

Data Visualization Best Practices: Practical Storytelling Using SAS®
Session T117, Greg S Nelson, CEO, Thotwave Technologies LLC.

Data means little without our ability to visually convey it. Whether building a business case to open a new office, acquiring customers, presenting research findings, forecasting or comparing the relative effectiveness of a program, we are crafting a story that is defined by the graphics that we use to tell it. Using practical, real-world examples, students will learn how to critically think about visualizations.
Read More

Presentations as Listeners Like Them: How to Tailor for an Audience
Session 0408, Frank Carillo, CEO and Anne Coffey, Senior Director, E.C.G. Inc.

Data doesn't speak for itself. We speak for it, and how we do that influences how people view and interpret that data. One of the most overlooked aspects of presenting data is analyzing the audience. At no point in history have speakers had to face such heterogeneous audiences as they do today: there might be many as five different generations in the room, cross-functional teams have broad areas of expertise, and international companies integrate different cultures and customs. This session is designed to teach attendees how to analyze not the data, but the listeners. Who is your audience? What is important to them? What is your message …?
Read More

tags: papers & presentations, SAS Global Forum

At SAS Global Forum, Content is King was published on SAS Users.