6 questions by using SAS
Part I-Reading SAS data sets
1. Reading a Space-Delimited Raw Data File
a. Write a DATA step to create a new data set named work.qtrdonation. Read the space-delimited raw data file, which can be named as follows:
Windows
“&pathdonation.dat”
UNIX
“&path/donation.dat”
z/OS (OS/390)
“&path..rawdata(donation)”
Partial Raw Data File
120265 . . . 25
120267 15 15 15 15
120269 20 20 20 20
120270 20 10 5 .
120271 20 20 20 20
b. Read the following fields from the raw data file:
Name
Type
Length
IDNum
Character
6
Qtr1
Numeric
8
Qtr2
Numeric
8
Qtr3
Numeric
8
Qtr4
Numeric
8
c. Write a PROC PRINT step to create the report below. The results contain 124 observations.
Partial PROC PRINT Output
ObsIDNum Qtr1 Qtr2 Qtr3 Qtr4
1 120265 . . . 25
2 120267 15 15 15 15
3 120269 20 20 20 20
4 120270 20 10 5 .
5 120271 20 20 20 20
2. Reading a Delimited Raw Data File with Nonstandard Data Values
a. Write a DATA step to create a temporary data set, prices. Read the delimited raw data file named as follows:
Windows
“&pathpricing.dat”
UNIX
“&path/ pricing.dat”
z/OS (OS/390)
“&path..rawdata(pricing)”
All data fields are numeric.
Partial Raw Data File
210200100009*09JUN2011*31DEC9999*$15.50*$34.70
210200100017*24JAN2011*31DEC9999*$17.80*22.80
210200200023*04JUL2011*31DEC9999*$8.25*$19.80
210200600067*27OCT2011*31DEC9999*$28.90*47.00
210200600085*28AUG2011*31DEC9999*$17.85*$39.40
b. Generate the report below. The results should contain 16 observations.
Partial PROC PRINT Output
2011 Pricing
Sales
Obs ProductID StartDate EndDate Cost Price
1 210200100009 06/09/2011 12/31/9999 15.50 34.70
2 210200100017 01/24/2011 12/31/9999 17.80 22.80
3 210200200023 07/04/2011 12/31/9999 8.25 19.80
4 210200600067 10/27/2011 12/31/9999 28.90 47.00
5 210200600085 08/28/2011 12/31/9999 17.85 39.40
3. Reading a Delimited File with Missing Values
a. Write a DATA step to create a temporary data set, prices. Use the asterisk-delimited raw data file, which can be named as follows:
Windows
“&pathprices.dat”
UNIX
“&path/prices.dat”
z/OS (OS/390)
“&path..rawdata(prices)”
Partial Raw Data File
210200100009*09JUN2007*31DEC9999*$15.50*$34.70
210200100017*24JAN2007*31DEC9999*$17.80
210200200023*04JUL2007*31DEC9999*$8.25*$19.80
210200600067*27OCT2007*31DEC9999*$28.90
210200600085*28AUG2007*31DEC9999*$17.85*$39.40
There might be missing data at the end of some records. Read the following fields from the raw data file:
Name
Type
Length
ProductID
Numeric
8
StartDate
Numeric
8
EndDate
Numeric
8
UnitCostPrice
Numeric
8
UnitSalesPrice
Numeric
8
b. Define labels and formats in the DATA step to create a data set that generates the following output when they are used in the PROC PRINT step. The results should contain 259 observations.
Partial PROC PRINT Output
2007 Prices
Sales
Start of End of Cost Price Price per
Obs Product ID Date Range Date Range per Unit Unit
1 210200100009 06/09/2007 12/31/9999 15.50 34.70
2 210200100017 01/24/2007 12/31/9999 17.80 .
3 210200200023 07/04/2007 12/31/9999 8.25 19.80
4 210200600067 10/27/2007 12/31/9999 28.90 .
5 210200600085 08/28/2007 12/31/9999 17.85 39.40
Part II-Manipulating data
4. Creating New Variables
a. Write a DATA step that reads orion.customer to create work.birthday.
b. In the DATA step, create three new variables: Bday2012, BdayDOW2012, and Age2012.
• Bday2012 is the combination of the month of Birth_Date, the day of Birth_Date, and the constant of 2012 in the MDY function.
• BdayDOW2012 is the day of the week of Bday2012.
• Age2012 is the age of the customer in 2012. Subtract Birth_Date from Bday2012 and divide the result by 365.25.
c. Include only the following variables in the new data set: Customer_Name, Birth_Date, Bday2012, BdayDOW2012, and Age2012.
d. Format Bday2012 to appear in the form 01Jan2012. Age2012 should be formatted to appear
with no decimal places.
e. Write a PROC PRINT step to create the report below. The results should contain 77 observations.
Partial PROC PRINT Output
Birth_ Bday
ObsCustomer_Name Date Bday2012 DOW2012 Age2012
1 James Kvarniq 27JUN1978 27JUN2012 4 34
2 Sandrina Stephano 09JUL1983 09JUL2012 2 29
3 Cornelia Krahl 27FEB1978 27FEB2012 2 34
4 Karen Ballinger 18OCT1988 18OCT2012 5 24
5 Elke Wallstab16AUG1978 16AUG2012 5 34
5. Creating Multiple Variables in Conditional Processing
a. Write a DATA step that reads orion.customer_dim to create work.season.
b. Create two new variables: Promo and Promo2.
The value of Promo is based on the quarter in which the customer was born.
• If the customer was born in the first quarter, then Promo is equal to Winter.
• If the customer was born in the second quarter, then Promo is equal to Spring.
• If the customer was born in the third quarter, then Promo is equal to Summer.
• If the customer was born in the fourth quarter, then Promo is equal to Fall.
The value of Promo2 is based on the customer’s age.
• For young adults, whose age is between 18 and 25, set Promo2 equal to YA.
• For seniors, aged 65 or older, set Promo2 equal to Senior.
Promo2 should have a missing value for all other customers.
c. The new data set should include only Customer_FirstName, Customer_LastName,Customer_BirthDate, Customer_Age, Promo, and Promo2.
d. Create the report below. The results should include 77 observations.
Partial PROC PRINT Output
Customer_ Customer_ Customer_ Customer_
Obs FirstName LastNameBirthDate Promo Age Promo2
1 James Kvarniq 27JUN1978 Spring 33
2 Sandrina Stephano 09JUL1983 Summer 28
3 Cornelia Krahl 27FEB1978 Winter 33
4 Karen Ballinger 18OCT1988 Fall 23 YA
5 Elke Wallstab 16AUG1978 Summer 33
6 David Black 12APR1973 Spring 38
7 Markus Sepke 21JUL1992 Summer 19 YA
8 Ulrich Heyde 16JAN1943 Winter 68 Senior
6. Creating Variables Unconditionally and Conditionally
a. Write a DATA step that reads orion.orders to create work.ordertype.
b. Create a new variable, DayOfWeek, that is equal to the weekday of Order_Date.
c. Create the new variable Type, which is equal to the following:
• Retail Sale if Order_Type is equal to 1
• Catalog Sale if Order_Type is equal to 2
• Internet Sale if Order_Type is equal to 3.
d. Create the new variable SaleAds, which is equal to the following:
• Mail if Order_Type is equal to 2
• Email if Order_Type is equal to 3.
e. Do not include Order_Type, Employee_ID, and Customer_ID in the new data set.
f. Create the report below. The results should contain 490 observations.
Partial PROC PRINT Output
Day
Order_ Delivery_ Sale Of
ObsOrder_ID Date Date Type Ads Week
1 1230058123 11JAN2007 11JAN2007 Retail Sale 5
2 1230080101 15JAN2007 19JAN2007 Catalog Sale Mail 2
3 1230106883 20JAN2007 22JAN2007 Catalog Sale Mail 7
4 1230147441 28JAN2007 28JAN2007 Retail Sale 1
5 1230315085 27FEB2007 27FEB2007 Retail Sale 3