# Data步双output、双set的妙用

ML.ped的列数为ML.bim的2倍.

modi.txt是我修改的部分数据,通过这样的程序来实现:
data ML1;
set ML;
array geno{86} _all_;
if geno{1}=”A” then var7=1 ; else  var7=2;
if geno{2}=”A” then var8=1 ; else  var8=2;
if geno{3}=”T” then var9=1 ; else  var9=2;
if geno{4}=”T” then var10=1 ; else  var10=2;
.
.
.
run;

```data mlped; input var1 :\$5. var2 :\$5. (var3-var92) (~\$1.) ; cards;```

H1000 H1000 0 0 1 2 A G T C T C G A T C C T T A C T T C T C G A A G T C C T C G A C T C C A A … G
H1003 H1003 0 0 1 2 A G T T T C G G 0 0 0 0 0 0 0 0 C C T T 0 0 G G 0 0 C C G G A A T T C C T…  C
…/*将ML.ped数据复制过来*/

H1004 H1004 0 0 1 2 A G T C T C G A T C C T T A C T T C T C G A A G T C C T C G A C T C C A  C … C G

;
run;

data mlbim;
input a \$4.@@; output; output;  /*双output构建新的查询数据*/
if _N_ = 43 then stop;
cards;
A T T G T C T C T T G A T C C A T C A A A G G G G T C C T A G C C C G C C G A T G T C
G C C A C T A T C C A G C T G C C A T G G A A A A C T T C G A T G G A T T A G C A G G
;
run;

data modi;
set mlped;
array geno[86] var7-var92;
array newgeno[86] nvar7-nvar92;
do i= 1 to 86 ;
set mlbim point=i;    /*双set ，使用point定位查询数据集ml.bim */
if geno[i]=a then newgeno[i]=1 ;else newgeno[i]=2;
end;
output;
run;

proc print data=modi;
var nvar7-nvar92;
run;

NOTE: 从数据集 WORK.MLPED. 读取了 2020 个观测
NOTE: 数据集 WORK.MODI 有 2020 个观测和 179 个变量。
NOTE: “DATA 语句”所用时间（总处理时间）:

CPU 时间 0.03 秒

# 线性规划之合金原料配比问题

Metalco公司希望得到一种新的合金，其中锡40%、锌35%、铅25%。原料合金的成分如下表：

 合金 原料代号 成    分 价格 锡含量 / % 锌含量 / % 铅含量 / % 美元/磅 1 60 10 30 77 2 25 15 60 70 3 45 45 10 88 4 20 50 30 84 5 50 40 10 94

# Some discriminat methodes and SAS codes

NOTICE: The following text is one part of my published paper. Do not distribute it! All right reserved.

1, linear discriminant analysis (LDA)
Because of its simplicity and robustness, LDA has been one of the most frequently used classification techniques since 1936.
LDA:
```proc discrim data=ex1 testdata=ex2; class g; var x1-x10 run; ```

2, Combation of PCA and LDA(PCA+LDA), or PLS and PCA (PLS+LDA)
Principal component analysis (PCA) is the fundamental method used in chemometric and is based on vector algebra. The main purpose of this method is to reduce the dimensions of a data set with a large number of intercorrelated variables, whilst retaining as much of the information present in the original data as possible. A new set of orthogonal variables, principal components (PCs), describe the variance in data. Only first few of them can retain most of variation in describing the systematic information of all the original variables. Usually, a subset of limited PCs is used to explore the trends of samples with different treatments. Furthermore, when using these PCs as input variables, linear discriminant analysis (LDA) can greatly reduce multiple co-linearity among the variables of the original data. Therefore, the combination of PCA and LDA (PCA+LDA) was used here for the goal of classification. Principal component regression (PCR) is a multiple linear regression method for relating two sets of variables (PCs and response variables) with predictive purposes. PLS is an extension of PCR, which is applied to relate two sets of variables by a regression model. But in PLS, the principal components are more correlated with the response variables. This results in a more effective prediction of the response variable. In the same way, the PCs of PLS can be used in conjunction with LDA (PLS+LDA) to tackle classification problems. 继续阅读Some discriminat methodes and SAS codes

# 都是小数点惹的祸

data dup;
input  id  date  field  value ;
cards;
1  2  0.0001  10
1  2  0.0001  10
1  2  0.00001  10
1  2  0.00001000001  10
1  3  0.00001  10
1  3  0.00001  10
1  3  0.00003  10
1  3  0.00003  10
1  3  0.00003  10
;
run;
proc sql;

create table NoDup1 as
select unique id, date, field, avg(value) as value from Dup group by id, date, field;
quit;   ;
*method 2;
proc means data = Dup nway ;
class  id date field;
var value;
output out = NoDup2(drop = _type_ _freq_) mean = value;
run;