Tuesday, July 27, 2010

Effective use of Retain in SAS

1. Holding values of the variables across iterations: The retain statement is used to hold the values of variables across iterations of the data step. Normally, all variables in the data step are set to missing at the start of each of the iteration of the data step. The use of “retain” x y retains the values of the variables x and y across data step iterations. The use of “retain” will retain the values of all variables used in the data step across iterations of the data step.


For example, if we would like to compute values of y(n)=2*y(n-1) with y(1)=1.

data one;

if _n_=1 then y=1;

else y=2*y1;

y1=y;

if _n_ =100 then stop;

run;

Since values are set to missing at the start of data step iteration, the data set “one” will contain one value 1 and the other 99 values of y will be missing.

The following program produces the desired data set.

data two;

retain y1;

if _n_=1 then y=1;

else y=2*y1;

y1=y;

if _n_ =100 then stop;

run;

Reference: http://javeeh.net/sasintro/intro84.html

2. Create time interval using Retain Statement: The RETAIN statement causes a variable to retain its value from one iteration of the DATA step to the next. RETAIN is useful when calculating these time intervals between visits because each unique visit is in different records within the same data set. For more details please check the following link:

http://www2.sas.com/proceedings/sugi25/25/cc/25p100.pdf

3. To Create count/order variables: A counter variable can be created to identify the sequential number of the visits. Here is the syntax:

data aaa; set bbb;

Lpurch = lag(purchase_id);

Ldt = lag(purch_dt);

if first. cust_id then Lpurch = .;

if first. cust_id then Ldt = .;

run;

proc sort data=aaa out= xxx;by cust_id purch_dt purchase_id;

run;

data xxx1; set xxx; by cust_id purch_dt purchase_id;

retain cnt 0;

if first. cust_id then cnt=1;

else if (purch_dt=Ldt and purchase_id=Lpurch) then cnt=cnt;else cnt=cnt+1;

run;

For more details, please visit the following link:

http://www.wuss.org/proceedings07/Posters/POS_Worden_DatumToRemember.pdf

4. To have all the variables in the dataset in a particular order(Re-ordering variables): Any Statement that lists the variables in the desired order before any other Statement will reorder the variables in the newly created Dataset. The most common are the Retain, Length, Attrib, Label, and Format Statements. Retain statement is considered the safest to use. The reason for this is all variables coming from an input Dataset are automatically Retained. As such using a Retain Statement to reorder variables in a Dataset has no unintended side effect. All other Statements require the programmer to specify some attribute of each variable. Here is the syntax:

data high_perf_model_score_1;

retain TITLE_CODE DATA_TYPE_NAME Customer_ID MODEL_NAME MODEL_RUN_DATE

MODEL_LEVEL_CODE SCORE RANK;

set high_perf_model_score;

run;

If we need to use the file as a input for some automated process, that time the order of the variables are very important. Retain Statement is very helpful in this situation.

For more details, please check the link:

http://www.sascommunity.org/wiki/Re-ordering_variables

No comments:

Post a Comment