7月 302020
 

With COVID-19 spreading worldwide, accurate data have become more important than ever. In this blog, I share some of my favorite sources:

The Economist Reported deaths often underestimate actual deaths. One way to get at the real numbers is to compare total deaths from all causes versus the typical death rate. This “mortality tracker” plots excess deaths which is a more reliable measure than reported deaths.

Johns Hopkins University This interactive dashboard by the Coronavirus Resource Center at the Bloomberg School of Public Health shows detailed data about the pandemic worldwide.

National Public Radio These interactive graphics by NPR focus on the pandemic in the US.

SAS Institute This interactive dashboard gives a different look at global COVID-19 data.

Avi Schiffmann This webpage may be the most impressive effort by an individual person, and shows that tabular data can be profoundly thought-provoking too.

These articles are also highly recommended:

The Risks–Know Them–Avoid Them This article explains in plain language how COVID-19 spreads and how to keep yourself safe. Share this with your family.

COVID-19 Superspreader Events in 28 Countries: Critical Patterns and Lessons This fascinating article compiles data about superspreader events (SSEs) and reveals a lot about how this virus is spread.

Temporary reduction in daily global CO2 emissions during the COVID-19 forced confinement Finally, something positive: an article about the reduction in CO2 emissions due to the pandemic.

Knowledge is power. Working together we can all stay healthy.

4月 052020
 

最近新冠肆虐,在高度全球化的地球村,没有哪一个国家和地区的人能躲避这个病毒, 它是全人类的敌人。在油管上看到一个有趣的动图展示(Trajectory of COVID-19 confirmed cases,请自行搜索)。其中有一个图形截图如下:

图1, Trajectory of covid 19 cases

这个图横纵坐标都用的是对数坐标系,有意思的是,横坐标是总确诊例数,纵坐标是新增例数。对于这种全球性的大型数据SAS公司常用的是鼓泡泡图,漂亮美观大气,特别在大屏幕上拿着长竹竿指指点点,非常的霸气。虽然能够看到各国的变化趋势和比较,但是没法判断关键节点。而这种弹道图却能很好地恰如其分的看到拐点。毕竟天天隔离在家,不能出门是非常的无聊无助,看到拐点就看到了解封的希望。

这种图的就是曲线图,随时间增加而变化的动态图,比较能反映其变化。画这种几个要点:

1,数据源,全球数据长时间的数据,这次的数据很符合。在网上找到一个WHO的数据源

2,坐标系采用对数坐标系。病毒这种微生物,其生长趋势分为潜伏期,对数增长期,平稳期和消亡期。课本上讲的清清楚楚,相关理论及现象也是多如牛毛,这里就不罗嗦了。 人类对线性的理解比较深刻,对什么指数式增长的趋势有种失控感。采用线性变化趋势能够很好的保护人类的脆弱心理,有利于做出理性的判断。

3,横纵坐标的变量的选取,这是个难点。常规下横坐标是时间,纵坐标是数量的对数。这个通常用来预测限制条件下的生长变化趋势。但其实很多时候,实际影响因素太多而无法做出判断,通常都是马后炮。不过如果能从历史曲线中学习到新知识,也能受益匪浅。除了时间、总数以外,每日新增数,时间拐点这两个参数也是非常受关注。 怎么把这两个参数在图表上表示出来呢? 问题很好,答案就在本文图1中。总数,每日新增,拐点,就缺个时间戳。从图1可知,中国和韩国已经上岸,其他国家还挂在线上。

4,需要吸引人。 SAS画图的毛病我在群里吐槽了很多,这里就不多表。不过SAS画图工工整整,严谨,直接挂Nature上都行。如果能增加一点活泼感就更好了,这里,我在横纵坐标轴上做了一点改进,让坐标轴从1000起步,随时间增加,新增数和总数隔一段时间会增加一个量级,效果不错。如下图。

5,还有一个处理,由于上报的时间存在滞后,并且检测技术,诊断标准各个国家都在变,所以上报的数据在某些阶段过于集中,导致有假拐点出现。因此,这里做了一个平滑,使用移动平价,也即是今天的新增数据用将过去7天(包括今天)平均数来代替,避免假拐点迷惑人类的眼睛。

上面这个是全球的,齐齐整整的,全都上线了。中韩提前下线,但是要看到后面新增例数存在一个反复的过程,防疫任务艰巨,大家还是不能掉以轻心。下面重点看下几个“模范生”,不管是自称的,还是公认的。曲线上写的清清楚楚。看下面图,日本,新加坡,这分明还在线上挣扎嘛。最多是把曲线拉平了一些(”flatten the curve”,请自行搜索。),拐点还不明朗,还需少吹牛,多努力,别被忽悠瘸了。

原创文章: ”COVDI-19全球各国病例数弹道追踪图示“,转载请注明: 转自SAS资源资讯列表

本文链接地址: http://saslist.net/archives/460

3月 272020
 

Six editions is a lot! If you had told us, back when we wrote the first edition of The Little SAS Book, that someday we would write a sixth; we would have wondered how we could possibly find that much to say. After all, it is supposed to be The Little SAS Book, isn’t it? But the developers at SAS Institute are constantly hard at work inventing new and better ways of analyzing and visualizing data. And some of those ways turn out to be so fundamental that they belong even in a little book about SAS.

Interface independence

One of the biggest changes to SAS software in recent years is the proliferation of interfaces. SAS programmers have more choices than ever before. Previous editions contained some sections specific to the SAS windowing environment (also called Display Manager). We wrote this edition for all SAS programmers whether you use SAS Studio, SAS Enterprise Guide, the SAS windowing environment, or run in batch. That sounds easy, but it wasn’t. There are differences in how SAS behaves with different interfaces, and these differences can be very fundamental. In particular, the system option that sets the rules for names of variables varies depending on how you run SAS. So old sections had to be rewritten, and we added a whole new section showing how to use variable names containing blanks and special characters.

New ways to read and write Microsoft Excel files

Previous editions already covered how to read and write Microsoft Excel files, but SAS developers have created new ways that are even better. This edition contains new sections about the XLSX LIBNAME engine and the ODS EXCEL destination.

More PROC SQL

From the very first edition, The Little SAS Book always covered PROC SQL. But it was in an appendix and over time we noticed that most people ignore appendices. So for this edition, we removed the appendix and added new sections on using PROC SQL to

  • Subset your data
  • Join data sets
  • Add summary statistics to a data set
  • Create macro variables with the INTO clause

For people who are new to SQL, these sections provide a good introduction; for people who already know SQL, they provide a model of how to leverage SQL in your SAS programs.

Updates and additions throughout the book

Almost every section in this edition has been changed in some way. We added new options, made sure everything is up-to-date, and ran every example in every SAS interface noting any differences. For example, PROC SGPLOT has some new options, the default ODS style for PDF has changed, and the LISTING destination behaves differently in different interfaces. Here’s a short list, in no particular order, of new or expanded topics in the sixth edition:

  • More examples with permanent SAS data sets, CSV files, or tab-delimited files
  • More log notes throughout the book showing what to look for
  • LIKE or sounds-like (=*) operators in WHERE statements
  • CROSSLIST, NOCUM, and NOPRINT options in PROC FREQ
  • Grouping data with a user-defined format and the PUT function
  • Iterative DO groups
  • DO WHILE and DO UNTIL statements
  • %DO statements

Even though we have added a lot to this edition, it is still a little book.  In fact, this edition is shorter than the last—by twelve pages! We think this is the best edition yet.

3月 132020
 

If you have been using SAS for long, you have probably noticed that there is generally more than one way to do anything. The Little SAS Book has long covered reading and writing Microsoft Excel files with the IMPORT and EXPORT procedures, but for the Sixth Edition we decided it was time to add two more ways: The ODS EXCEL destination makes it easy to convert procedure results into Excel files, while the XLSX LIBNAME engine allows you to access Excel files as if they were SAS data sets.

With the XLSX LIBNAME engine, you can convert an Excel file to a SAS data set (or vice versa) if you want to, but you can also access an Excel file directly without the need for a SAS data set. This engine works for files created using any version of Microsoft Excel 2007 or later in the Windows or UNIX operating environments. You must have SAS 9.4M2 or higher and SAS/ACCESS Interface to PC Files software. A nice thing about this engine is that it works with any combination of 32 bit and 64 bit systems.

The XLSX LIBNAME engine uses the first line in your file for the variable names, scans each full column to determine the variable type (character or numeric), assigns lengths to character variables, and recognizes dates, and numeric values containing commas or dollar signs. While the XLSX LIBNAME engine does not offer many options, because you are using an Excel file like a SAS data set, you can use some standard data set options. For example, you can use the RENAME= data set option to change the names of variables, and FIRSTOBS= and OBS= to select a subset of rows.

Reading an Excel file as is 

Suppose you have the following Excel file containing data about magnolia trees:


With the XLSX LIBNAME engine, SAS can read the file, without first converting it to a SAS data set. Here is a PROC PRINT that prints the data directly from the Excel file.

* Read Excel spreadsheet XLSX LIBNAME;
LIBNAME exfiles XLSX ‘c:\MyExcel\Trees.xlsx’;
PROC PRINT DATA = exfiles.sheet1;
   TITLE ‘PROC PRINT of Excel File’;
RUN;

Here are the results of the PROC PRINT. Notice that the variable names were taken from the first row in the file.

Converting an Excel file to a SAS data set 

If you want to convert an Excel file to a SAS data set, you can do that too. Here is a DATA step that reads the Excel file. The RENAME= data set option changes the variable name MaxHeight to MaxHeightFeet. Then a new variable is computed which is equal to the height in meters.

* Import Excel into a SAS data set;
DATA magnolia;
SET exfiles.sheet1 (RENAME = (MaxHeight = MaxHeightFeet));
MaxHeightMeters = ROUND(MaxHeightFeet * 0.3048);
RUN;

Here is the SAS data set with the renamed and new variables:

Writing to an Excel file 

It is just as easy to write to an Excel file as it is to read from it.

* Write a new sheet to the Excel file;
DATA exfiles.trees;
   SET magnolia;
RUN;
LIBNAME exfiles CLEAR;

Here is what the Excel file looks like with the new sheet. Notice that the new tab is labeled with the name of the SAS data set TREES.

Another nice thing about the XLSX LIBNAME is that it only locks a spreadsheet while SAS is accessing it. So generally speaking, it’s not necessary to issue a second LIBNAME statement to clear the libref. However, I did find, when I ran this in SAS Enterprise Guide, that I could not open the Excel spreadsheet unless I cleared the libref. So you can probably skip the LIBNAME CLEAR statement if you are using Display Manager or SAS Studio.

The XLSX LIBNAME engine is so flexible and easy to use that we think it’s a great addition to any SAS programmer’s skill set.

For more about the XLSX LIBNAME engine, I recommend this blog by Chris Hemedinger.

12月 102019
 
The DATA step has been around for many years and regardless of how many new SAS® products and solutions are released, the DATA step remains a popular way to create and manipulate SAS data sets. The SAS language contains a wide variety of approaches that provide an endless number of ways to accomplish a goal. Whether you are reshaping a data set entirely or simply assigning values to a new variable, there are numerous tips and tricks that you can use to save time and keystrokes. Here are a few that Technical Support offers to customers on a regular basis.

Writing file contents to the SAS® log

Perhaps you are reading a file and seeing “invalid data” messages in the log or you are not sure which delimiter is used between variables. You can use the null INPUT statement to read a small sample of the data. Specify the amount of data that you want to read by adjusting options in the INFILE statement. This example reads one record with 500 bytes:

   data _null_;
      infile 'path' recfm=f lrecl=500 obs=1;
      input;
      list;
   run;

The LIST statement writes the current record to the log. In addition, it includes a ruled line above the record so that it is easier to see the data for each column of the file. If the data contains at least one unprintable character, you will see three lines of output for each record written.

For example:

RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+
 
1   CHAR  this is a test.123.dog.. 24
    ZONE  766726726276770333066600
    NUMR  48930930104534912394F7DA

The line beginning with CHAR describes the character that is represented by the two hexadecimal characters underneath. When you see a period in that line, it might actually be a period, but it often means that the hexadecimal characters do not map to a specific key on the keyboard. Also, column 15 for this data is a tab ('09'x). By looking at this record in the log, you can see that the first variable contains spaces and that the file is tab delimited.

Writing hexadecimal output for all input

Since SAS® 9.4M6 (TS1M6), the HEXLISTALL option in the LIST statement enables all lines of input data to be written in hexadecimal format to the SAS log, regardless of whether there are unprintable characters in the data:

   data mylib.new / hexlistall;

Removing “invalid data” messages from the log

When SAS reads data into a data set, it is common to encounter values that are invalid based on the variable type or the informat that is specified for a variable. You should examine the SAS log to assess whether the informat is incorrect or if there are some records that are actually invalid. After you ensure that the data is being read correctly, you can dismiss any “invalid data” messages by using double question mark (??) modifiers after the variable name in question:

   input date ?? mmddyy10.;

You can also use the question mark modifiers when you convert a character variable to numeric with the INPUT function:

   newdate=input(olddate,?? mmddyy10.);

The above syntax reads all values with the MMDDYY10. informat and then dismisses the notes to the log when some values for the OLDDATE variable are invalid.

Sharing files between UNIX and Microsoft Windows operating systems

The end-of-record markers in text files are different for UNIX and Windows operating systems. If you needed to share a file between these systems, the file used to need preprocessing in order to change the markers to the desired type or perhaps specify a delimiter. Now, the TERMSTR= option in the INFILE statement enables you to specify the end-of-record marker in the incoming file.

If you are working in a UNIX environment and you need to read a file that was created in a Windows environment, use the TERMSTR=CRLF option:

   infile 'file-specification'  termstr=crlf ;

If you are in a Windows environment and you need to read a file that was created in a UNIX environment, use this syntax:

   infile 'file-specification'  termstr=lf ;

Adapting array values from an ARRAY statement

The VNAME function makes it very convenient to use the variable names from an ARRAY statement. You most often use an ARRAY statement to obtain the values from numerous variables. In this example, the SQL procedure makes it easy to store the unique values of the variable Product into the macro variable &VARLIST and that number of values into the macro variable &CT (another easy tip). Within the DO loop, you obtain the names of the variables from the array, and those values from the array then become variable names.

   proc sql noprint;
      select distinct(product) into :varlist separated by ' '
      from one;
      select count(distinct product) into :ct
      from one;
   quit;
 
   …more DATA step statements…
   array myvar(&ct) $ &varlist;
      do i=1 to &ct;
         if product=vname(myvar(i)) then do;
         myvar(i)=left(put(contract,8.));
   end;
   …more DATA step statements…

Using a mathematical equation in a DO loop

A typical DO loop has a beginning and an end, both represented by integers. Did you know you can use an equation to process data more easily? Suppose that you want to process every four observations as one unit. You can run code similar to the following:

   …more DATA step statements…
   J+1;
   do i=((j*4)-3) to (j*4);
      set data-set-name point=I;
      …more DATA step statements…
   end;

Using an equation to point to an array element

With small data sets, you might want to put all values for all observations into a single observation. Suppose that you have a data set with four variables and six observations. You can create an array to hold the existing variables and also create an array for the new variables. Here is partial code to illustrate how the equation dynamically points to the correct new variable name in the array:

   array old(4) a b c d;
   array test (24) var1-var24;
   retain var1-var24;
   do i=1 to nobs;
      set w nobs=nobs point=i;
      do j=1 to 4;
      test(((i-1)*4)+j)=old(j);
      end;
   end;

Creating a hexadecimal reference chart

How often have you wanted to know the hexadecimal equivalent for a character or vice versa? Sure, you can look up a reference chart online, but you can also create one with a short program. The BYTE function returns values in your computer’s character set. The PUT statement writes the decimal value, hexadecimal value, and equivalent character to the SAS log:

   data a;
      do i=0 to 255;
      x=byte(i);
      put i=z3. i=hex2. x=;
   end;
   run;

When the resulting character is unprintable, such as a carriage return and line feed (CRLF) character, the X value is blank.

Conclusion

Hopefully, these new tips will be useful in your future programming. If you know some additional tips, please comment them below so that readers of this blog can add even more DATA step tips to their arsenal! If you would like to learn more about DATA step, check out these other blogs:

Old reliable: DATA step tips and tricks was published on SAS Users.

12月 102019
 
The DATA step has been around for many years and regardless of how many new SAS® products and solutions are released, the DATA step remains a popular way to create and manipulate SAS data sets. The SAS language contains a wide variety of approaches that provide an endless number of ways to accomplish a goal. Whether you are reshaping a data set entirely or simply assigning values to a new variable, there are numerous tips and tricks that you can use to save time and keystrokes. Here are a few that Technical Support offers to customers on a regular basis.

Writing file contents to the SAS® log

Perhaps you are reading a file and seeing “invalid data” messages in the log or you are not sure which delimiter is used between variables. You can use the null INPUT statement to read a small sample of the data. Specify the amount of data that you want to read by adjusting options in the INFILE statement. This example reads one record with 500 bytes:

   data _null_;
      infile 'path' recfm=f lrecl=500 obs=1;
      input;
      list;
   run;

The LIST statement writes the current record to the log. In addition, it includes a ruled line above the record so that it is easier to see the data for each column of the file. If the data contains at least one unprintable character, you will see three lines of output for each record written.

For example:

RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+
 
1   CHAR  this is a test.123.dog.. 24
    ZONE  766726726276770333066600
    NUMR  48930930104534912394F7DA

The line beginning with CHAR describes the character that is represented by the two hexadecimal characters underneath. When you see a period in that line, it might actually be a period, but it often means that the hexadecimal characters do not map to a specific key on the keyboard. Also, column 15 for this data is a tab ('09'x). By looking at this record in the log, you can see that the first variable contains spaces and that the file is tab delimited.

Writing hexadecimal output for all input

Since SAS® 9.4M6 (TS1M6), the HEXLISTALL option in the LIST statement enables all lines of input data to be written in hexadecimal format to the SAS log, regardless of whether there are unprintable characters in the data:

   data mylib.new / hexlistall;

Removing “invalid data” messages from the log

When SAS reads data into a data set, it is common to encounter values that are invalid based on the variable type or the informat that is specified for a variable. You should examine the SAS log to assess whether the informat is incorrect or if there are some records that are actually invalid. After you ensure that the data is being read correctly, you can dismiss any “invalid data” messages by using double question mark (??) modifiers after the variable name in question:

   input date ?? mmddyy10.;

You can also use the question mark modifiers when you convert a character variable to numeric with the INPUT function:

   newdate=input(olddate,?? mmddyy10.);

The above syntax reads all values with the MMDDYY10. informat and then dismisses the notes to the log when some values for the OLDDATE variable are invalid.

Sharing files between UNIX and Microsoft Windows operating systems

The end-of-record markers in text files are different for UNIX and Windows operating systems. If you needed to share a file between these systems, the file used to need preprocessing in order to change the markers to the desired type or perhaps specify a delimiter. Now, the TERMSTR= option in the INFILE statement enables you to specify the end-of-record marker in the incoming file.

If you are working in a UNIX environment and you need to read a file that was created in a Windows environment, use the TERMSTR=CRLF option:

   infile 'file-specification'  termstr=crlf ;

If you are in a Windows environment and you need to read a file that was created in a UNIX environment, use this syntax:

   infile 'file-specification'  termstr=lf ;

Adapting array values from an ARRAY statement

The VNAME function makes it very convenient to use the variable names from an ARRAY statement. You most often use an ARRAY statement to obtain the values from numerous variables. In this example, the SQL procedure makes it easy to store the unique values of the variable Product into the macro variable &VARLIST and that number of values into the macro variable &CT (another easy tip). Within the DO loop, you obtain the names of the variables from the array, and those values from the array then become variable names.

   proc sql noprint;
      select distinct(product) into :varlist separated by ' '
      from one;
      select count(distinct product) into :ct
      from one;
   quit;
 
   …more DATA step statements…
   array myvar(&ct) $ &varlist;
      do i=1 to &ct;
         if product=vname(myvar(i)) then do;
         myvar(i)=left(put(contract,8.));
   end;
   …more DATA step statements…

Using a mathematical equation in a DO loop

A typical DO loop has a beginning and an end, both represented by integers. Did you know you can use an equation to process data more easily? Suppose that you want to process every four observations as one unit. You can run code similar to the following:

   …more DATA step statements…
   J+1;
   do i=((j*4)-3) to (j*4);
      set data-set-name point=I;
      …more DATA step statements…
   end;

Using an equation to point to an array element

With small data sets, you might want to put all values for all observations into a single observation. Suppose that you have a data set with four variables and six observations. You can create an array to hold the existing variables and also create an array for the new variables. Here is partial code to illustrate how the equation dynamically points to the correct new variable name in the array:

   array old(4) a b c d;
   array test (24) var1-var24;
   retain var1-var24;
   do i=1 to nobs;
      set w nobs=nobs point=i;
      do j=1 to 4;
      test(((i-1)*4)+j)=old(j);
      end;
   end;

Creating a hexadecimal reference chart

How often have you wanted to know the hexadecimal equivalent for a character or vice versa? Sure, you can look up a reference chart online, but you can also create one with a short program. The BYTE function returns values in your computer’s character set. The PUT statement writes the decimal value, hexadecimal value, and equivalent character to the SAS log:

   data a;
      do i=0 to 255;
      x=byte(i);
      put i=z3. i=hex2. x=;
   end;
   run;

When the resulting character is unprintable, such as a carriage return and line feed (CRLF) character, the X value is blank.

Conclusion

Hopefully, these new tips will be useful in your future programming. If you know some additional tips, please comment them below so that readers of this blog can add even more DATA step tips to their arsenal! If you would like to learn more about DATA step, check out these other blogs:

Old reliable: DATA step tips and tricks was published on SAS Users.

11月 222019
 

sxlion

大家有一个普通的印象:SAS的更新很慢,很老很落后。可能跟它的版本命名有关,SAS9.0是2004年出来的,到现在都快20年了,版本号 还停留在9字头,并且还没有继续更新的迹象。当然这个与SAS公司的明面

原创文章: ”难道是SAS10?云分析服务时代的到来“,转载请注明: 转自SAS资源资讯列表

本文链接地址: http://saslist.net/archives/454

11月 222019
 

sxlion

大家有一个普通的印象:SAS的更新很慢,很老很落后。可能跟它的版本命名有关,SAS9.0是2004年出来的,到现在都快20年了,版本号 还停留在9字头,并且还没有继续更新的迹象。当然这个与SAS公司的明面

原创文章: ”难道是SAS10?云分析服务时代的到来“,转载请注明: 转自SAS资源资讯列表

本文链接地址: http://saslist.net/archives/454