data preparation

10月 132015

For anyone doing data analysis, it’s common to devote a large proportion of time to data cleaning. Real data is dirty data. Even if you are using automated collection tools, errors, anomalies and inconsistencies will still make their way into your data. Over the next few blog posts, I’d like […]

The post It's a dirty job, but somebody has to do it: Data cleaning with JMP appeared first on JMP Blog.

7月 062015

I recently used a JMP add-in I wrote to import my complete set of BodyMedia FIT food log data files, including data from Dec. 21, 2010, through the last day I logged my meals in that software on March 29, 2015. My final data table contained 39,942 rows of food items names. […]

The post Creating a JMP 12 Recode script from a Lookup Table containing original and new values appeared first on JMP Blog.

7月 012015

The data you want to import into JMP often requires some manipulation before it’s ready to be analyzed in JMP. Sometimes data is arranged so that a row contains information for multiple observations. To prepare your data for analysis, you must restructure it so that each row of the JMP […]

The post How to stack data for a Oneway analysis appeared first on JMP Blog.

5月 042015

In April, the free trial of SAS Data Loader for Hadoop became available globally. Now, you can take a test drive of our new technology designed to increase the speed and ease of managing data within Hadoop. The downloads might take a while (after all, this is big data), but I think you’ll […]

The post Self-service big data preparation in the age of Hadoop appeared first on The Data Roundtable.

4月 152011
To rename the variables of a dataset in SAS is a daily routine. SAS or the programmer s would give an arbitrary name for any variable at the initial stage of data integration. Those names have to be modified afterward. Wensui [Ref.1] developed a macro to add prefixes to the variables . Vincent et al. [Ref. 2] extended his idea and added some parameters into the macros. However, giving a name to a variable in SAS has many restrictions regarding the length and the format. For better understanding and recognition, labeling variables instead of renaming them would be useful. In the example below, first comes with an integration of complicated text data. Proc Transpose generates a number of variables with the same prefix.  Then by invoking the label() macro, the dataset would be correctly labeled as desired.

1. Wensui Liu. ‘How to rename many variables in SAS’.
2. Vincent Weng. Ying Feng. ‘Renaming in Batches’. SAS Global 2009.

****************(1) MODULE-BUILDING STEP******************;
%macro label(dsin = , dsout = , dslabel = );
   *  MACRO:      label()
   *  GOAL:       use a label dataset to label the variables 
   *              of the target dataset
   *  PARAMETERS: dsin = input dataset
   *              dsout = output dataset
   *              dslabel = label dataset
      data _tmp;
            set &dslabel ;
            num = _n_;

      ods listing close;
      ods output variables = _varlist;
      proc contents data = &dsin;

      proc sql;
            select cats(a.variable, '="', b.labelname, '"') 
                   into: labellist separated by ' '
            from _varlist as a, _tmp as b 
            where a.num = b.num

      data &dsout;
            set &dsin;
            label &labellist;

      proc datasets;
            delete _:;
      ods listing;

****************(2) TESTING STEP******************;
******(2.1) INTEGRATE COMPLICATED DATA*************;
data have;
    infile datalines dlm = ',';
        retain _row;
        input _tmpvar $ @@ ; 
        if prxmatch("/10\d/", _tmpvar) ne 0 then _row + 1; 
        if missing(_tmpvar) then delete;
    100, Tom, 3,1,5,2,6
    101, Marlene, 1,2,4
    102, Jerry, 9,10,4,
    5, 6                    
    103, Jim,2 ,1, 2, 2,4

proc transpose data=have out=want(drop = _:)
    prefix = var;
    by _row;
    var _tmpvar;

******(2.2) INPUT LABELS FOR USE*************;
data label;
      input labelname $30.;
      Patient ID
      Patient last name
      The 1st treatment
      The 2nd treatment
      The 3rd treatment
      The 4th treatment
      The 5th treatment

******(2.3) INVOKE MACRO TO LABEL*************;
%label(dsin = want, dsout = want_labeled, dslabel = label);

****************END OF ALL CODING***************************************;