What SAS stands for –
“Statistical Analysis System” is a software suite developed by SAS Institute for advanced analytics, multivariate analyses, business intelligence, data management, and predictive analytics – ( Source – Wikipedia)
Mostly used for various purposes such data mining, report writing, statistical analysis, business modeling, applications development and data warehousing. SAS is considered as the diamond in the present scenario.
SAS a Leader in 2016 Gartner Magic Quadrant for Data Integration Tools
Before we start SAS please install SAS from SAS site ( university or cloud edition )
Let’s start playing with SAS
The 1st step for SAS is to create a library and we want to keep all our data to that library.
Format for library is
Libname <> <Path> ;
Libname Aug_2017 'F:\Base SAS';
Run;
Shortcut keys in SAS
CTRL+? to comment*//*
CTRL+SHIFT+? to uncomment*//*
Press F3 or F8 to run the program*/
1st program in SAS
Format -
Data dataset name
input;
cards;
run;
Data Stu_Enq;
Input Stu_ID $ Stu_Name $ Age Academics $ Occupation $ DOE Course $;
Informat DOE MMDDYY10.;
Format DOE MMDDYY10.;
Cards;
S1 Rachit 25 Btech IT 01-01-2018 SAS_AnalyticsS2
Manish 30 Btech IT 02-01-2018 SAS_AnalyticsS3
Sunny 24 Btech IT 03-01-2018 SAS_AnalyticsS4
Varun 21 Btech IT 04-01-2018 SAS_AnalyticsS5
Chandru 29 MCA IT 05-01-2018 SAS_AnalyticsS6
Ashu 25 MCA IT 06-01-2018 SAS_AnalyticsS7
Shasank 26 MCA IT 07-01-2018 SAS_AnalyticsS8
Namrata 25 MBA IT 08-01-2018 SAS_AnalyticsS9
Surya 27 MBA IT 09-01-2018 SAS_AnalyticsS10
Lavina 27 MBA IT 10-01-2018 SAS_Analytics;
Run;
When we run the program the dataset is created into WORK library ,
/*Note: if we don’t define library and given reference to the library, the dataset will default into work library*/
let’s discuss about the programming
In the above program we use many statements
1.Data- To create a dataset and every statement ends with the semicolon
2.Input- To define column input and every statement ends with the semicolon
3.Informat- To read data into SAS format and every statement ends with the semicolon
4.Format- To write data into SAS format and every statement ends with semicolon
5.cards- is a statement is used for putting temporary data and every statement ends with semicolon
6.Run- is a statement is used for executing the program and every statement ends with semicolon
USES Informat and Format -
Data Stu_Enq;
Input Stu_ID $ Stu_Name $ Age Academics $ Occupation $ DOE Course $;
Informat DOE MMDDYY10.;
Format DOE MMDDYY10.;
Cards;
S1 Rachit 25 Btech IT 01-01-1960 SAS_Analytics
S2 Manish 30 Btech IT 01-10-1960 SAS_Analytics
S3 Sunny 24 Btech IT 01-01-1959 SAS_Analytics
S4 Varun 21 Btech IT 01-01-1961 SAS_Analytics
S5 Chandru 29 MCA IT 05-01-2018 SAS_Analytics
S6 Ashu 25 MCA IT 06-01-2018 SAS_Analytics
S7 Shasank 26 MCA IT 07-01-2018 SAS_Analytics
S8 Namrata 25 MBA IT 08-01-2018 SAS_Analytics
S9 Surya 27 MBA IT 09-01-2018 SAS_Analytics
S10 Lavina 27 MBA IT 10-01-2018 SAS_Analytics;
Run;
Proc print data=Stu_Enq;
Run;
In the above dataset we have DOE is starting from 01-01-1960 hence the informat reads the date value as 0and in order to read the date value to date format we use sas format statement
In SAS PC, we have three windows
- Editor window- Where we write syntaxes/Programs
- Log window- Where we check the error, warning and successful program running msg
- Output window- Where to print the result
SAS Programming is based on 3 things —
- statements
- options
- functions
/*Going forward we will have more examples of statements, options, and functions*/
/*Lets talk about SAS Programming steps*//*
There are two steps to learn and grow in SAS programming
1. Data step- To create a dataset and starts with data statement and ends with a run statement
2. Proc step- To print the output and starts with proc statement and ends with run statement
/*Data step*/
Data Stu_Enq;
Input Stu_ID $ Stu_Name $ Age Academics $ Occupation $ DOE Course $;
Informat DOE MMDDYY10.;Format DOE Date9.;
Cards;
S1 Rachit 25 Btech IT 01-01-1960 SAS_Analytics
S2 Manish 30 Btech IT 01-10-1960 SAS_Analytics
S3 Sunny 24 Btech IT 01-01-1959 SAS_Analytics
S4 Varun 21 Btech IT 01-01-1961 SAS_Analytics
S5 Chandru 29 MCA IT 05-01-2018 SAS_Analytics
S6 Ashu 25 MCA IT 06-01-2018 SAS_Analytics
S7 Shasank 26 MCA IT 07-01-2018 SAS_Analytics
S8 Namrata 25 MBA IT 08-01-2018 SAS_Analytics
S9 Surya 27 MBA IT 09-01-2018 SAS_Analytics
S10 Lavina 27 MBA IT 10-01-2018 SAS_Analytics;
Run;
/*Proc step*//
Proc print data=Stu_Enq;Run;
Lets do more table using Cards and Data lines
Data School;
Input Roll_No Name $ Age Class $ Year;
Cards;
1 Rachit 25 Std1 20102 Manish 30 Std1 20103 Sunny 26 Std2 2010;
Run;
Proc print data=School;
run;
Data School;
Input Roll_No Name $ Age Class $ Year;Datalines;
1 Rachit 25 Std1 20102 Manish 30 Std1 20103 Sunny 26 Std2 2010;
Run;
Proc print data=School;
run;
Till now we learn if we dont define any library dataset goes to work library
How to Create our own library ?
Libname <Libref> 'Path';
Run;
Libname Aug_2017 ‘F:\Base SAS’;
Run;
How to define dataset into your library ?
Data Aug_2017.Stu_Enq;Input Stu_ID $ Stu_Name $ Age Academics $ Occupation $ DOE Course $;
Informat DOE MMDDYY10.;
Format DOE Date9.;
Cards;
S1 Rachit 25 Btech IT 01-01-1960 SAS_AnalyticsS2
Manish 30 Btech IT 01-10-1960 SAS_AnalyticsS3
Sunny 24 Btech IT 01-01-1959 SAS_AnalyticsS4
Varun 21 Btech IT 01-01-1961 SAS_AnalyticsS5
Chandru 29 MCA IT 05-01-2018 SAS_AnalyticsS6
Ashu 25 MCA IT 06-01-2018 SAS_AnalyticsS7
Shasank 26 MCA IT 07-01-2018 SAS_AnalyticsS8
Namrata 25 MBA IT 08-01-2018 SAS_AnalyticsS9
Surya 27 MBA IT 09-01-2018 SAS_AnalyticsS10
Lavina 27 MBA IT 10-01-2018 SAS_Analytics;
Run;
proc print data=Aug_2017.Stu_Enq ;
run;
While creating library, we need to follow some criteria –
1. Library name should not be more than 8 charcter long
2. It should not be started with a number or any special character apart from underscore
3. It can be started with an alphabet or an underscore only. and followed by number or an underscore only
Ex: – ABCD_123456778 —-No
ABCD_123– YES
_ABC123–YES
ABC@1234—NO
/Aug_2017–YES
Criteria to be followed for creating dataset name and the variable name –
1. Dataset and Variable name should not be more than 32 charcter long
2. It should not be started with a number or any special character apart from underscore
3. It can be started with an alphabet or an underscore only. and followed by number or an underscore only*/
Ex –
Jan_2019_Cuttack_Sales —–YES
First Name———--No
First_Name———–YES
Last_Name————Yes
Academics———–-Yes
Jan sales_2010——–No
How many types of variables are there ?
Variables are two types
Numeric and Character
Character Varuable –
1. default characger varuable length is 8 characters*//*2. maximum character variable length could be 32,767*/
Numeric Variable –
1. Default is 8 bytes*//*2. 1 byte equals to 2 digits. Now default numeric variable length would be 16 digits*/
/*Example -
Data Stu_Enq;
Input Stu_ID $ Stu_Name $ Age Academics $ Occupation $ DOE Course $;
Informat DOE MMDDYY10. ;
Format DOE Date9. ;
Cards;
S1 Rachit 25 Btech IT 01-01-1960 SAS_AnalyticsS2
Manish 30 Btech IT 01-10-1960 SAS_AnalyticsS3
Sunny 24 Btech IT 01-01-1959 SAS_AnalyticsS4
Varun 21 Btech IT 01-01-1961 SAS_AnalyticsS5
Chandru 29 MCA IT 05-01-2018 SAS_AnalyticsS6
Ashu 25 MCA IT 06-01-2018 SAS_AnalyticsS7
Shasank 26 MCA IT 07-01-2018 SAS_AnalyticsS8
Namrata 25 MBA IT 08-01-2018 SAS_AnalyticsS9
Surya 27 MBA IT 09-01-2018 SAS_AnalyticsS10
Lavina 27 MBA IT 10-01-2018 SAS_Analytics;
Run;
proc print data=Stu_Enq;
Run;
In the above program Course reads character upto 8 *//*so output is “SAS_Analy”
/*If we want to read full, we need to define length of the character variable*/
Data Stu_Enq;
Input Stu_ID $ Stu_Name $ Age Academics $ Occupation : $10. DOE Course : $20.;
Informat DOE MMDDYY10.;
Format DOE Date9.;
Cards;
S1 Rachit 25 Btech IT 01-01-1960 SAS_AnalyticsS2
Manish 30 Btech IT 01-10-1960 SAS_AnalyticsS3
Sunny 24 Btech IT 01-01-1959 SAS_AnalyticsS4
Varun 21 Btech IT 01-01-1961 SAS_AnalyticsS5
Chandru 29 MCA IT 05-01-2018 SAS_AnalyticsS6
Ashu 25 MCA IT 06-01-2018 SAS_AnalyticsS7
Shasank 26 MCA IT 07-01-2018 SAS_AnalyticsS8
Namrata 25 MBA IT 08-01-2018 SAS_AnalyticsS9
Surya 27 MBA IT 09-01-2018 SAS_AnalyticsS10
Lavina 27 MBA IT 10-01-2018 SAS_Analytics;
Run;
proc print data=Stu_Enq;
Run;
Every sas dataset has two portions –
1. descriptor portion- to describe dataset
2.data portion to show the dataset and the observations*/
/*Descriptor Portion of the dataset*/
Proc contents data=Stu_Enq;
run;
Proc contents data=Stu_Enq varnum;
run;
Proc contents data=Stu_Enq varnum short;
run;
Varnum– is an option use in proc contents to get the variable list in the datset sequence
Short – is an option use in proc contents to put variable header list
/*Descriptor Portion of the library*/Proc contents data=Aug_2017._all_ ;run;
Data Aug_2017.School;
Input Roll_No Name $ Age Class $ Year;
Cards;
1 Rachit 25 Std1 20102
Manish 30 Std1 20103
Sunny 26 Std2 2010;
Run;
Proc contents data=Aug_2017._all_ ;
run;
Proc contents data=Aug_2017._all_ NODS;
run;
/*NODS= NO Description, we use supress the descriptor portion of dataset*/
*//*Data Portion*/
Proc print data=Stu_Enq;
Run;
Proc print data=Stu_Enq (obs=5);
Run;
Proc print data=Stu_Enq (firstobs=5 obs=10);
Run;
Proc print data=Stu_Enq (firstobs=10 obs=10);
Run;
Proc print data=Stu_Enq ;
var Stu_ID Age Academics DOE Course;
Run;
Proc print data=Stu_Enq ;
var Stu_ID Age Academics DOE Course;
Where Academics='Btech';
Run;
Data manipulation in SAS
/*Importing MED_New_2016.csv file*/
Proc Import Out=Aug_2017.MED
datafile='F:\Aug_Batch_2017\a5. SAS Base and Advanced\Base SAS\Data\MED_New_2016.csv'
dbms=csv replace;
Run;
/*in the log window we got,*/
data AUG_2017.MED_Infile ;
infile 'F:\Aug_Batch_2017\a5. SAS Base and Advanced\Base SAS\Data\MED_New_2016.csv' delimiter=',' MISSOVER DSD lrecl=32767 firstobs=2 ;
informat CUSTOMER_ID best32. ;
informat Company $9. ;
informat CARD_REG_DATE anydtdtm40. ;
informat CARD_ACTIVE $1. ;
informat FIRST_USE_DTE anydtdtm40. ;
informat firstSTOR best32. ;
informat TITLE $4. ;
informat GENDER $6. ;
informat max_spent best32. ;
informat DOB ddmmyy10. ;
informat FTD anydtdtm40. ;
informat Age best32. ;
informat STATE_CODE $3. ;
informat POST_CODE best32. ;
informat CUST_STAT $6. ;
informat Avgsize_spent best32. ;
informat CARD_STAT $10. ;
informat RGSTN_TYPE_IND $6. ;
informat NO_OF_TRIPS best32. ;
informat TOWN $14. ;
informat EMAIL_IND $1. ;
informat CONTACT_PREF $5. ;
informat Average_Qty_PER_ACC best32. ;
informat Spent_amount best32. ;
format CUSTOMER_ID best12. ;
format Company $9. ;
format CARD_REG_DATE datetime. ;
format CARD_ACTIVE $1. ;
format FIRST_USE_DTE datetime. ;
format firstSTOR best12. ;
format TITLE $4. ;
format GENDER $6. ;
format max_spent best12. ;
format DOB ddmmyy10. ;
format FTD datetime. ;
format Age best12. ;
format STATE_CODE $3. ;
format POST_CODE best12. ;
format CUST_STAT $6. ;
format Avgsize_spent best12. ;
format CARD_STAT $10. ;
format RGSTN_TYPE_IND $6. ;
format NO_OF_TRIPS best12. ;
format TOWN $14. ;
format EMAIL_IND $1. ;format CONTACT_PREF $5. ;format Average_Qty_PER_ACC best12. ;format Spent_amount best12. ;inputCUSTOMER_IDCompany $CARD_REG_DATECARD_ACTIVE $FIRST_USE_DTEfirstSTORTITLE $GENDER $max_spentDOBFTDAgeSTATE_CODE $POST_CODECUST_STAT $Avgsize_spentCARD_STAT $RGSTN_TYPE_IND $NO_OF_TRIPSTOWN $EMAIL_IND $CONTACT_PREF $Average_Qty_PER_ACCSpent_amount;
run;