source control

11月 102020
 

The code and data that drive analytics projects are important assets to the organizations that sponsor them. As such, there is a growing trend to manage these items in the source management systems of record. For most companies these days, that means Git. The specific system might be GitHub Enterprise, GitLab, or Bitbucket -- all platforms that are based on Git.

Many SAS products support direct integration with Git. This includes SAS Studio, SAS Enterprise Guide, and the SAS programming language. (That last one checks a lot of boxes for ways to use Git and SAS together.) While we have good documentation and videos to help you learn about Git and SAS, we often get questions around "best practices" -- what is the best/correct way to organize your SAS projects in Git?

In this article I'll dodge that question, but I'll still try to provide some helpful advice in the process.

Ask the Expert resource: Using SAS® With Git: Bring a DevOps Mindset to Your SAS® Code

Guidelines for managing SAS projects in Git

It’s difficult for us to prescribe exactly how to organize project repositories in source control. Your best approach will depend so much on the type of work, the company organization, and the culture of collaboration. But I can provide some guidance -- mainly things to do and things to avoid -- based on experience.

Do not create one huge repository

DO NOT build one huge repository that contains everything you currently maintain. Your work only grows over time and you'll come to regret/revisit the internal organization of a huge project. Once established, it can be tricky to change the folder structure and organization. If you later try to break a large project into smaller pieces, it can be difficult or impossible to maintain the integrity of source management benefits like file histories and differences.

Design with collaboration in mind

DO NOT organize projects based only on the teams that maintain them. And of course, don't organize projects based on individual team members.

  • Good repo names: risk-adjustment-model, engagement-campaigns
  • Bad repo names: joes-code, claims-dept

All teams reorganize over time, and you don't want to have to reorganize all of your code each time that happens. And code projects change hands, so keep the structure personnel-agnostic if you can. Major refactoring of code can introduce errors, and you don't want to risk that just because you got a new VP or someone changed departments.

Instead, DO organize projects based on function/work that the code accomplishes. Think modular...but don't make projects too granular (or you'll have a million projects). I personally maintain several SAS code projects. The one thing they have in common is that I'm the main contributor -- but I organize them into functional repos that theoretically (oh please oh please) someone else could step in to take over.

The Git view of my YouTube API project in SAS Enterprise Guide

Up with reuse, down with ownership

This might seem a bit communist, but collaboration works best when we don't regard code that we write as "our turf." DO NOT cling to notions of code "ownership." It makes sense for teams/subject-matter experts to have primary responsibility for a project, but systems like Git are designed to help with transparency and collaboration. Be open to another team member suggesting and merging (with review and approval) a change that improves things. GitHub, GitLab, and Bitbucket all support mechanisms for issue tracking and merge requests. These allow changes to be suggested, submitted, revised, and approved in an efficient, transparent way.

DO use source control to enable code reuse. Many teams have foundational "shared code" for standard operations, coded in SAS macros or shared statements. Consider placing these into their own project that other projects and teams can import. You can even use Git functions within SAS to fetch and include this code directly from your Git repository:

/* create a temp folder to hold the shared code */
options dlcreatedir;
%let repoPath = %sysfunc(getoption(WORK))/shared-code;
libname repo "&repoPath.";
libname repo clear;
 
/* Fetch latest code from Git */
data _null_;
 rc = git_clone( 
   "https://gitlab.mycompany.com/sas-projects/shared-code/",
   "&repoPath.");
run;
 
options source2;
/* run the code in this session */
%include "&repoPath./bootstrap-macros.sas";

If you rely on a repository for shared code and components, make sure that tests are in place so changes can be validated and will not break downstream systems. You can even automate tests with continuous integration tools like Jenkins.

DO document how projects relate to each other, dependencies, and prepare guidance for new team members to get started quickly. For most of us, we feel more accountable when we know that our code will be placed in central repositories visible to our peers. It may inspire cleaner code, more complete documentation, and a robust on-boarding process for new team members. Use the Markdown files (README.md and others) in a repository to keep your documentation close to the code.

My SAS code to check Pagespeed Insights, with documentation

Work with Git features (and not against them)

Once your project files are in a Git repository, you might need to change your way of working so that you aren't going against the grain of Git benefits.

DO NOT work on code changes in a shared directory with multiple team members –- you'll step on each other. The advantage of Git is that it's a distributed workflow and each developer can work with their own copy of the repository, and merge/accept changes from others at their own pace.

DO use Git branching to organize and isolate changes until you are ready to merge them with the main branch. It takes a little bit of learning and practice, but when you adopt a branching approach you'll find it much easier to manage -- it beats keeping multiple copies of your code with slightly different file and folder names to mark "works in progress."

DO consider learning and using Git tools such as Git Bash (command line), Git GUI, and a code IDE like VS Code. These don't replace the SAS-provided coding tools with their Git integration, but they can supplement your workflow and make it easier to manage content among several projects.

Learning more

When you're ready to learn more about working with Git and SAS, we have many webinars, videos, and documentation resources:

The post How to organize your SAS projects in Git appeared first on The SAS Dummy.

10月 132014
 

SAS Enterprise Guide 7.1 began shipping last week. Of the many new features, some are "biggies" while others are more subtle. My favorite new features are those for SAS programmers, including several items that I've heard customers ask for specifically. I'll describe them briefly here; the SAS Enterprise Guide online help contains more details.

Track program history

This is one of the biggies. If you have SAS programs in your SAS Enterprise Guide project, you can now track your changes in those programs using standard source control management methods.

eg71_commit
You can "commit", view history, revert changes, compare versions, and even see an annotated "blame" view that shows exactly when you introduced a change that broke your program.

eg71_blame
The program history feature relies on a "hyperlocal" Git repository within your EGP file, so you can't use this to track changes to SAS programs that you store outside of the project. But SAS Enterprise Guide 7.1 also supports integration with a file-system-based Git repository if you set one up using other tools. The SAS integrated menus/tools will still help you to see your program's heritage.

eg71_newscm
Why Git? Functionally, it fits the purpose. And the SAS team was able to embed the necessary pieces within the application, so you don't need to install additional tools before getting started. And besides, all of the cool kids use Git these days. If you need to work with Subversion or another tool, you can still use this file-system technique.

Smart highlighting in the program editor

Double-click on a word in the editor to highlight it, and instantly see all other occurrences of that word highlighted in your program view. That's what puts the "smart" in "smart highlighting".

It's amazingly useful for finding all occurrences of a variable name, data set name, or an embarrassing misspelling.

eg71_smarthighlight

Project log summary window

The Project Log shows you a complete aggregated view of your SAS logs within your project; it's been part of SAS Enterprise Guide for several years. The log content is comprehensive, but often difficult to navigate because it holds so much. Now the popular Log Summary view (introduced in release 6.1) has made it to the Project Log, simplifying your journey through the log content.

eg71_projlogsummary

Project-level Search

The search for a "search" feature led me to create the EGP Search tool, which has been very popular among SAS Enterprise Guide "power users". Now there is a built-in search feature that allows you to search the current project for text in any project element, including tasks, programs, and results.

eg71find

(The built-in search feature doesn't search multiple project files, so my EGP Search tool isn't obsolete just yet.)

SAS Macro Variable viewer and SAS System Options viewer

Many SAS programmers have downloaded these two custom tasks from this blog. The macro variable viewer shows all of the current SAS macro variables and their values, plus allows a quick method to evaluate macro expressions. The system options viewer shows all of the SAS options, with their values and meanings. Thanks to the popularity of the custom tasks, the R&D team agreed to include them in the main application. These tasks are now "first-class citizens" on the Tools menu.

eg71_newtools

But wait, there's more

I look forward to discussing more new features, including: integration with SAS Studio tasks, with SAS Visual Analytics and the LASR server, some new UI niceties for finding tasks and organizing Favorites, and much more. It's a big release with plenty of treasures to find -- and even I'm still discovering them!

tags: Git, SAS Enterprise Guide, SAS programming, source control
10月 302012
 

I work on a variety of projects at SAS, most of which require some level of team collaboration in source management systems. Due to the many technologies that we work with, SAS developers use different source management tools for different purposes. I've got projects in CVS, Subversion, and Git.

When it comes to writing and maintaining SAS programs (are you ready for this shocker?), I use SAS Enterprise Guide almost exclusively. I know how to use all of the other tools, but this is where I'm most productive.

So, here's the question that I hear all of the time: Does SAS Enterprise Guide integrate with a source control system? The answer is No. But, it does integrate with the Windows file system. And the Windows file system integrates with source control.

Frankly, this approach is how I prefer to work anyway. Probably due to bad past experiences with source control integration in other IDEs, I prefer to deal with source control on the file system. When my content is ready to commit, I use the file system integration (usually in the form of Windows shell extensions) to review the commit, check differences and history, manage file merges (if needed), and actually commit the files.

For example, here is a picture of one of my project directories where we're using Subversion. I'm using TortoiseSVN to access the repository. You can see the icon overlays that indicate the files that are up-to-date (green checks) and those that are modified locally (red exclamation).

How to enable relative file references

SAS Enterprise Guide allows you to link in SAS programs and external data files (such as Excel or CSV files), so you don't have to lock up all of your content in the project (EGP) file. When working with source control, you need to enable one additional trick: tell SAS Enterprise Guide to treat these file references as relative paths. (There's nothing like an absolute file path -- specific to your machine -- for messing up your collaboration effort.)

This setting is maintained per project. To set it:

  1. Select File->Project Properties. The Properties window appears.
  2. Select the File References tab.
  3. Check the box: "Use paths relative to the project for programs and importable files"

An example on github

To prove that this actually works, I "refactored" one of my own projects to share with you. It's the project that analyzes my Netflix rental/streaming history. I placed the Excel files in a "./data" subfolder, and all of the SAS programs in a "./programs" subfolder. These assets, combined with the process flow "recipes" within the project, should allow anybody to re-run my project and to replicate my results. (Note: these projects do assume a Local SAS session. If you don't have a Local SAS, you must use Tools->Project Maintenance to replace the server references for your environment.)

The projects (one for v4.3, one for v5.1) are in this github repository. Git is fast becoming the most popular source control repository for all types of developers, so it doesn't get much more modern than this. If you're brave and you have some time to kill, give it a try. And please, don't be too critical of my movie rental history.

tags: CVS, github, SAS Enterprise Guide, scm, source control, Subversion