Thursday, April 3, 2008

PROC Functions

SAS functions
perform arithmetic operations
compute sample statistics (for example: sum, mean, and standard deviation)
manipulate SAS dates and process character values
perform many other tasks.

Sample statistics functions ignore missing values.

data onboard;
set work.pilotdata; (copy from dataset)
Total=sum(FirstClass,Economy);
run;

avgX = mean(of x1-x10);
stdX = std(of x1-x10);
minX = min(of x1-x10);
maxX = max(of x1-x10);
rangeX = range(of x1-x10);


Other useful sample statistic functions are:

MAX(argument,...) returns the largest value

MIN(argument,...) returns the smallest value

MEAN(argument,...) returns the arithmetic mean (average)

N(argument,....) returns the number of nonmissing arguments

NMISS(argument,...) returns the number of missing values

STD(argument,...) returns the standard deviation

STDERR(argument,...) returns the standard error of the mean

VAR(argument,...) returns the variance

Special WHERE

Use special operators in the WHERE statement to subset data.

LIKE selects observations by comparing character values to specified patterns.
A percent sign (%) replaces any number of characters.
An underscore (_) replaces one character.

where Code like 'E_U%';

Selects observations where the value of Code begins with an E, followed by a single character, followed by a U, followed by any number of characters.

The sounds like (=*) operator selects observations that contain spelling variations of the word or words specified.

where Name=*'SMITH';

Selects names like SMYTHE and SMITT.

IS NULL or IS MISSING selects observations in which the value of the variable is missing.

where Flight is missing;
where Flight is null;

Identifying Observations - ID (suppress OBS) v.s. NOOBS + BY-ID Group

Use the ID statement to identify observations.
Combine the BY and ID statements to produce special formatting.

proc print data=ia.empdata;
id JobCode;
var EmpID Salary;
run;

When the ID and BY statements specify the same variable,
the Obs column is suppressed
the BY line is suppressed
the ID/BY variable prints in the leftmost column
each ID/BY value only prints at the start of each BY group (and on the subtotal line, if a SUM statement is used).

Specify JobCode in the BY and ID statements to change the report format.

proc sort data=ia.empdata out=work.empdata;
by JobCode;
run;
proc print data=work.empdata;
by JobCode;
id JobCode;
sum Salary;
run;

Sorting SAS DataSet - PROC PRINT - SUM function + PageBreak

proc print data=ia.empdata noobs;
var JobCode EmpID Salary;
sum Salary;
run;

To request subgroup totals in PROC PRINT, the observations in the data set must be grouped.

The SORT procedure
rearranges the observations in a SAS data set
can create a new SAS data set containing the rearranged observations
can sort on multiple variables
can sort in ascending (default) or descending order
does not generate printed output
treats missing values as the smallest possible value.

Printing Subtotals and Grand Totals

Print the data set grouped by JobCode with a subtotal for the Salary column for each JobCode.

proc sort data=ia.empdata out=work.empdata;
by JobCode;
run;
proc print data=work.empdata;
by JobCode;
sum Salary;
run;

Page Breaks
Use the PAGEBY statement to put each subgroup on a separate page.

proc print data=work.empdata;
by JobCode;
pageby JobCode;
sum Salary;
run;

FILENAME vs LIBNAME

Using LIBNAME statement to reference a SAS data library

Libname libref 'C:\data'
libref.filename;


Usinga FILENAME statement to reference an external file

FIlename fileref 'C:\data\hw1.dat';
infile fileref;

PROC FORMAT

DATA HW1;
INFILE "D:\temp\files\hw1.txt"
DELIMITER=' ' /* or DLM=' '*/
MISSOVER
DSD
/* DSD means Delimiter-Sensitive Data, which does three things for you
First, it ignores delimiters in data values enclosed in quotation marks
Second, it does not read quotation marks as part of the data value
Third, it treats two delimiters in a row as a missing value.
DSD option assumes that delimiter is a comma. If your delimiter is not a comma then
you can use DELIMITER= option with the DSD option to specify the delimiter
*/
;
INPUT
SampleID
CaseControl $
Gender $
Race $
Age
NoCig
Multivitamins
Calcium
TumorICD9 $
FamilyHistory $
HRT $
Height
Weight;
BMI = (Weight*0.45)/((Height*0.0254)**2);
Proc format;
value Mul 1='never' 2='former' 3='current';
value Cal 1='never' 2='former' 3='current';
RUN;

proc print data = hw1 LABEL;
TITLE1 "CS133/BIOT133 - Introduction to SAS Programming";
TITLE2 "Homework #1";
LABEL
SampleID = "Sample ID"
CaseControl = "Case/Control"
Gender = "Gender"
Race = "Race"
Age = "Age"
NoCig = "Number of cigarettes smoked/day"
Multivitamins = "Multivitamins taken"
Calcium = "Calcium supplements taken"
TumorICD9 = "Tumor site by ICD9 code"
FamilyHistory = "Family History"
HRT = "HRT use"
Height = "Height in inches"
Weight = "Weight in pounds"
;
FORMAT Multivitamins Mul. Calcium Cal. BMI 2. ;
FOOTNOTE1 "Student - Yuan-Kai Huang";
FOOTNOTE2 "April 2, 2008";
run;