Git

1月 182019
 

It seems that everyone knows about GitHub -- the service that hosts many popular open source code projects. The underpinnings of GitHub are based on Git, which is itself an open-source implementation of a source management system. Git was originally built to help developers collaborate on Linux (yet another famous open source project) -- but now we all use it for all types of projects.

There are other free and for-pay services that use Git, like Bitbucket and GitLab. And there are countless products that embed Git for its versioning and collaboration features. In 2014, SAS developers added built-in Git support for SAS Enterprise Guide.

Since then, Git (and GitHub) have grown to play an even larger role in data science operations and DevOps in general. Automation is a key component for production work -- including check-in, check-out, commit, and rollback. In response, SAS has added Git integration to more SAS products, including:

  • the Base SAS programming language, via a collection of SAS functions.
  • SAS Data Integration Studio, via a new source control plugin
  • SAS Studio (experimental in v3.8)

You can use this Git integration with any service that supports Git (GitHub, GitLab, etc.), or with your own private Git servers and even just local Git repositories.

SAS functions for Git

Git infrastructure and functions were added to SAS 9.4 Maintenance 6. The new SAS functions all have the helpful prefix of "GITFN_" (signifying "Git fun!", I assume). Here's a partial list:

GITFN_CLONE  Clones a Git repository (for example, from GitHub) into a directory on the SAS server.
GITFN_COMMIT  Commits staged files to the local repository
GITFN_DIFF Returns the number of diffs between two commits in the local repository and creates a diff record object for the local repository.
GITFN_PUSH  Pushes the committed files in the local repository to the remote repository.
GITFN_NEW_BRANCH  Creates a Git branch

 

The function names make sense if you're familiar with Git lingo. If you're new to Git, you'll need to learn the terms that go with the commands: clone, repo, commit, stage, blame, and more. This handbook provided by GitHub is friendly and easy to read. (Or you can start with this xkcd comic.)

You can

data _null_;
 version = gitfn_version();
 put version=;             
 
 rc = gitfn_clone("https://github.com/sascommunities/sas-dummy-blog/",
   "c:\Projects\sas-dummy-blog");
 put rc=;
run;

In one line, this function fetches an entire collection of code files from your source control system. Here's a more concrete example that fetches the code to a work space, then runs a program from that repository. (This is safe for you to try -- here's the code that will be pulled/run. It even works from SAS University Edition.)

options dlcreatedir;
%let repoPath = %sysfunc(getoption(WORK))/sas-dummy-blog;
libname repo "&repoPath.";
libname repo clear;
 
/* Fetch latest code from GitHub */
data _null_;
 rc = gitfn_clone("https://github.com/sascommunities/sas-dummy-blog/",
   "&repoPath.");
 put rc=;
run;
 
/* run the code in this session */
%include "&repoPath./rng_example_thanos.sas";

You could use the other GITFN functions to stage and commit the output from your SAS jobs, including log files, data sets, ODS results -- whatever you need to keep and version.

Using Git in SAS Data Integration Studio

SAS Data Integration Studio has supported source control integration for many years, but only for CVS and Subversion (still in wide use, but they aren't media darlings like GitHub). By popular request, the latest version of SAS Data Integration Studio adds support for a Git plug-in.

Example of Git in SAS DI Studio

See the documentation for details:

Read more about setup and use in the available here as part of our "Custom Tasks Tuesday" series.

Using Git in SAS Enterprise Guide

This isn't new, but I'll include it for completeness. SAS Enterprise Guide supports built-in Git repository support for SAS programs that are stored in your project file. You can use this feature without having to set up any external Git servers or repositories. Also, SAS Enterprise Guide can recognize when you reference programs that are managed in an external Git repository. This integration enables features like program history, compare differences, commit, and more. Read more and see a demo of this in action here.

program history

If you use SAS Enterprise Guide to edit and run SAS programs that are managed in an external Git repository, here's an important tip. Change your project file properties to "Use paths relative to the project for programs and importable files." You'll find this checkbox in File->Project Properties.

With this enabled, you can store the project file (EGP) and any SAS programs together in Git, organized into subfolders if you want. As long as these are cloned into a similar structure on any system you use, the file paths will resolve automatically.

The post Using built-in Git operations in SAS appeared first on The SAS Dummy.

8月 192017
 

SAS programmers have high expectations for their coding environment, and why shouldn't they? Companies have a huge investment in their SAS code base, and it's important to have tools that help you understand that code and track changes over time. Few things are more satisfying as a SAS program that works as designed and delivers perfect results. (Oh, hyperbole you say? I don't think so.) But when your program isn't working the way it should, there are two features that can help you get back on track: a code debugger, and program revision history. Both of these capabilities are built into SAS Enterprise Guide. Program history was added in v7.1, and the debugger was added in v7.13.

I've written about the DATA step debugger before -- both as a teaching tool and as a productivity tool. In this article, I'm sharing a demo of the debugger's features, led by SAS developer Joe Flynn. Before joining the SAS Enterprise Guide development team, Joe worked in SAS Technical Support. He's very familiar with "bugs," and reported his share of them to SAS R&D. Now -- like every programmer -- Joe makes the bugs. But of course, he fixes most of them before they ever see the light of day. How does he do that? Debugging.

This video is only about 8 minutes long, but it's packed with good information. In the debugger demo, you'll learn how you can use standard debugging methods, such as breakpoints, step over and step through, watch variables, jump to, evaluate expression, and more. There is no better way to understand exactly what is causing your DATA step to misbehave.

Joe's debugger

In the program history demo (the second part of the video), you'll learn how team members can collaborate using standard source management tools (such as Git). If you establish a good practice of storing code in a central place with solid source management techniques, SAS Enterprise Guide can help you see who changed what, and when. SAS Enterprise Guide also offers a built-in code version comparison tool, which enhances your ability to find the breaking changes. You can also use the code comparison technique on its own, outside of the program history feature.

program history

Take a few minutes to watch the video, and then try out the features yourself. You don't need a Git installation to play with program history at the project level, though it helps when you want to extend that feature to support team collaboration.

See also

The post Code debugging and program history in SAS Enterprise Guide appeared first on The SAS Dummy.

1月 182016
 
I love GitHub for version control and collaboration, though I'm no master of it. And the tools for integrating git and GitHub with RStudio are just amazing boons to productivity.

Unfortunately, my University-supplied computer does not play well with GitHub. Various directories are locked down, and I can't push or pull to GitHub directly from RStudio. I can't even use install_github() from the devtools package, which is needed for loading Shiny applications up to Shinyapps.io. I lived with this for a bit, using git from the desktop and rsconnect from a home computer. But what a PIA.

Then I remembered I know how to put RStudio in the cloud-- why not install R there, and make that be my GitHub solution?

It works great. The steps are below. In setting it up, I discovered that Digital Ocean has changed their set-up a little bit, so I update the earlier post as well.

1. Go to Digital Ocean and sign up for an account. By using this link, you will get a $10 credit. (Full disclosure: I will also get a $25 credit once you spend $25 real dollars there.) The reason to use this provider is that they have a system ready to run with Docker already built in, which makes it easy. In addition, their prices are quite reasonable. You will need to use a credit card or PayPal to activate your account, but you can play for a long time with your $10 credit-- the cheapest machine is $.007 per hour, up to a $5 per month maximum.

2. On your Digital Ocean page, click "Create droplet". Click on "One-click Apps" and select "Docker (1.9.1 on 14.04)". (The numbers in the parentheses are the Docker and Ubuntu version, and might change over time.) Then a size (meaning cost/power) of machine and the region closest to you. You can ignore the settings. Give your new computer an arbitrary name. Then click "Create Droplet" at the bottom of the page.

3. It takes a few seconds for the droplet to spin up. Then you should see your droplet dashboard. If not, click "Droplets" from the top bar. Under "More", click "Access Console". This brings up a virtual terminal to your cloud computer. Log in (your username is root) using the password that digital ocean sent you when the droplet spun up.

4. Start your RStudio container by typing: docker run -d -p 8787:8787 -e ROOT=TRUE rocker/hadleyverse

You can replace hadleyverse with rstudio if you like, for a quicker first-time installation, but many R users will want enough of Hadley Wickham's packages that it makes sense to install this version. The -e ROOT=TRUE is crucial for our approach to installing git into the container, but see the comment below from Petr Simicek below for another way to do the same thing.

5. Log in to your Cloud-based RStudio. Find the IP address of your cloud computer on the droplet dashboard, and append :8787 to it, and just put it into your browser. For example: http://135.104.92.185:8787. Log in as user rstudio with password rstudio.

6. Install git, inside the Docker container. Inside RStudio, click Tools -> Shell.... Note: you have to use this shell, it's not the same as using the droplet terminal. Type: sudo apt-get update and then sudo apt-get install git-core to install git.

git likes to know who you are. To set git up, from the same shell prompt, type git config --global user.name "Your Handle" and git config --global user.email "an.email@somewhere.edu"

7. Close the shell, and in RStudio, set things up to work with GitHub: Go to Tools -> Global Options -> Git/SVN. Click on create RSA key. You don't need a name for it. Create it, close the window, then view it and copy it.

8. Open GitHub, go to your Profile, click "Edit Profile", "SSH keys". Click "Add key", and just paste in the stuff you copied from RStudio in the previous step.

You're done! To clone an existing repos from Github to your cloud machine, open a new project in RStudio, and select Version Control, then Git, and paste in the URL name that GitHub provides. Then work away!

An unrelated note about aggregators:We love aggregators! Aggregators collect blogs that have similar coverage for the convenience of readers, and for blog authors they offer a way to reach new audiences. SAS and R is aggregated by R-bloggers, PROC-X, and statsblogs with our permission, and by at least 2 other aggregating services which have never contacted us. If you read this on an aggregator that does not credit the blogs it incorporates, please come visit us at SAS and R. We answer comments there and offer direct subscriptions if you like our content. In addition, no one is allowed to profit by this work under our license; if you see advertisements on this page, other than as mentioned above, the aggregator is violating the terms by which we publish our work.
10月 132014
 

SAS Enterprise Guide 7.1 began shipping last week. Of the many new features, some are "biggies" while others are more subtle. My favorite new features are those for SAS programmers, including several items that I've heard customers ask for specifically. I'll describe them briefly here; the SAS Enterprise Guide online help contains more details.

Track program history

This is one of the biggies. If you have SAS programs in your SAS Enterprise Guide project, you can now track your changes in those programs using standard source control management methods.

eg71_commit
You can "commit", view history, revert changes, compare versions, and even see an annotated "blame" view that shows exactly when you introduced a change that broke your program.

eg71_blame
The program history feature relies on a "hyperlocal" Git repository within your EGP file, so you can't use this to track changes to SAS programs that you store outside of the project. But SAS Enterprise Guide 7.1 also supports integration with a file-system-based Git repository if you set one up using other tools. The SAS integrated menus/tools will still help you to see your program's heritage.

eg71_newscm
Why Git? Functionally, it fits the purpose. And the SAS team was able to embed the necessary pieces within the application, so you don't need to install additional tools before getting started. And besides, all of the cool kids use Git these days. If you need to work with Subversion or another tool, you can still use this file-system technique.

Smart highlighting in the program editor

Double-click on a word in the editor to highlight it, and instantly see all other occurrences of that word highlighted in your program view. That's what puts the "smart" in "smart highlighting".

It's amazingly useful for finding all occurrences of a variable name, data set name, or an embarrassing misspelling.

eg71_smarthighlight

Project log summary window

The Project Log shows you a complete aggregated view of your SAS logs within your project; it's been part of SAS Enterprise Guide for several years. The log content is comprehensive, but often difficult to navigate because it holds so much. Now the popular Log Summary view (introduced in release 6.1) has made it to the Project Log, simplifying your journey through the log content.

eg71_projlogsummary

Project-level Search

The search for a "search" feature led me to create the EGP Search tool, which has been very popular among SAS Enterprise Guide "power users". Now there is a built-in search feature that allows you to search the current project for text in any project element, including tasks, programs, and results.

eg71find

(The built-in search feature doesn't search multiple project files, so my EGP Search tool isn't obsolete just yet.)

SAS Macro Variable viewer and SAS System Options viewer

Many SAS programmers have downloaded these two custom tasks from this blog. The macro variable viewer shows all of the current SAS macro variables and their values, plus allows a quick method to evaluate macro expressions. The system options viewer shows all of the SAS options, with their values and meanings. Thanks to the popularity of the custom tasks, the R&D team agreed to include them in the main application. These tasks are now "first-class citizens" on the Tools menu.

eg71_newtools

But wait, there's more

I look forward to discussing more new features, including: integration with SAS Studio tasks, with SAS Visual Analytics and the LASR server, some new UI niceties for finding tasks and organizing Favorites, and much more. It's a big release with plenty of treasures to find -- and even I'm still discovering them!

tags: Git, SAS Enterprise Guide, SAS programming, source control
2月 212013
 

PhUSE-FDA Working Group 5 (Development of Standard Scripts for Analysis and Programming) just adopted Google Code as collaborative programming platform. Google Code is one of the most popular and respected open source software hosting sites in the world and it is definitely a good choice for PhUSE-FDA WG5.

But after viewing one of WG5’s working reports, Sharing Standard Statistical Scripts and getting to know why they finally chose Google Code (rather than Github which was also tested by WG5 members), I think it’s necessary to clarify some misunderstanding against Github where I’m also an occasional user.

As stated in Slide 11 in the report mentioned before, Github,

Too complicated an interface
Too much overhead for simple development
Too much training and education needed

designed for classic programming languages like C and Java (not for things like R and SAS)

For the first point regarding interface, it seems only Git command line tested, and it may be too complicated to “classic statistical programming users”. Actually, Github offers a great GUI tool, for example, GitHub for Windows to help users visually clone repositories, commit changes and other management tasks without typing Git commands:

Github_GUI

It’s also worthy to mention that with GitHub for Windows, users don’t need to install any separated version control software like Git, CVS or SVN. GitHub for Windows already includes a fully functional version of msysGit. It just makes users’ life much simpler. To use Google Code, you must install and configure something like TortoiseSVN.

The second, is Github suitable for “things like R and SAS”? It’s true that all hosts including Github are dominated by “classic programming languages like C and Java”. For SAS, SAS programmers as a whole are just not active in  any social coding activities, but for R, actually it is one of the mostly used languages in Github.

Google Code is good and a “Google Code vs Github” question is just mostly subjective. It seems to me the pickup of Google Code by WG5 rather than Github was based on incomplete information. I personally prefer Github and there are also some good reasons:

  • Use the GUI tool, GitHub for Windows to maintain a minimum Git/SVN/CVS setup.
  • Github supplies much richer statistics reports, including charts.
  • Github is more social oriented which makes it cool in this Web2.0 world.