Coalesce in SAS
Coalescing refers to the process of checking the value of each input parameter in the order in which they are listed and returns the first non-missing value. In SAS, the behavior of each of the coalesce functions depends on the processing context, either in a data step or in a PROC SQL
statement. These differences are important to acknowledge, especially when the code is migrated from PROC SQL
to a data step.
Suppose, for example, that the following piece of code needs to be modified into a data step:
The intuitive approach would be to simply rewrite the code as follows:
Although this approach seems to be correct at first glance, it may cause a potential issue: the coalesce
function in the context of a data step treats parameters as numeric values. More specifically, if column1
and column2
are characters, the automatic SAS conversion kicks in by trying to convert these values to numerics. If such a conversion fails, a null value will be passed to the function. Simply switching to the character-compatible function, coalescec
, may still have some side-effects as we will demonstrate in the following sections.
This article is therefore intended to review the behavior of the various coealesce
functions in SAS, and to outline the hurdles when using these functions. All the examples will be driven from the following base table:
The base table will allow us to study the behaviors of the various coalesce
functions with variables of different lengths and types as summarized as follows:
Variable | Type | Length |
---|---|---|
c_var_long | Char | 300 |
c_var_short | Char | 14 |
n_var_long | Num | 8 |
n_var_short | Num | 3 |
Coalesce in a data step
In order to coalesce data in a data step, SAS offers two options:
coalesce(argument-1<..., argument-n>)
. All the arguments need to be numeric.coalescec( argument-1<..., argument-n>)
. It performs a similar functionality for character arguments.
coalescec
in a data-step
It is important to note that if the coalescec
function returns a value to a variable that has not previously been assigned a length, then that variable is given a length of 200 bytes. As a consequence, if one of the arguments has a length above 200 characters, the result may be truncated. To avoid any data truncation, the length of the output variable needs to be set ahead of the function call.
The resulting dataset has the following structure:
Variable | Type | Length |
---|---|---|
c_var_long | Char | 300 |
c_var_short | Char | 14 |
coalescec_empty_with_c_long | Char | 200 |
coalescec_empty_with_c_long2 | Char | 300 |
coalescec_empty_with_c_short | Char | 200 |
Using the coalescec
function with a parameter of size greater than 200 (here, c_var_long
) will return by default a result with a length of 200. Setting explicitly the size of the output variable through the length
statement is the only way to prevent a possible truncation.
coalesce
in a data-step
The coalesce
function accepts only numerics and returns by default a numeric of length 8, even if the underlying parameters have a shorter length. Unlike its sibling character-based function, there is no risk of truncation as the maximum length for a numeric in SAS is **8</em>. Specifying the length of the return value in advance allows however the user to have full control of the length of the result.
</p>
Note that the third variable of the output table (coalesce_empty_with_n_long2
) is not equal anymore to n_var_long
as the numeric has been truncated. Truncating non-integers is not recommended due to the lost of precision.
Name | Value | Type | Length |
---|---|---|---|
coalesce_empty_with_n_short | 10 | Num | 8 |
coalesce_empty_with_n_long | 10.236455 | Num | 8 |
coalesce_empty_with_n_long2 | 10.236328 | Num | 3 |
Coalescing in a SQL-Step
coalescec
in SQL
The coalescec
function is still available in the context of a PROC SQL
statement. Similarly to the data step, setting explicitly the length through the column attribute may avoid an undesired truncation.
The output table is structured as follows:
Variable | Type | Length |
---|---|---|
coalescec_empty_with_c_long | Char | 200 |
coalescec_empty_with_c_long2 | Char | 300 |
coalescec_empty_with_c_short | Char | 200 |
coalesce
in SQL
It is recommended to use the coalesce
SQL function for both numeric and character variables as SAS determines then the type of the output column based on the types of the arguments. While you need to use two functions, one for each data type, to coalesce variables in a data step, only one function is required in SQL.
Choosing the coalesce
function over the coalescec
function shifts the responsibility of defining the result type to the SQL processor. This behavior is in line with the SQL standard.
Note that the length of the ouput column is equal to the maximum length of the input parameters.
Variable | Type | Length |
---|---|---|
coalesce_empty_with_c_long | Char | 300 |
coalesce_empty_with_c_short | Char | 14 |
coalesce_empty_with_c_short2 | Char | 5 |
coalesce_empty_with_n_long | Num | 8 |
coalesce_empty_with_n_long2 | Num | 3 |
coalesce_empty_with_n_short | Num | 8 |
coalesce_empty_with_n_short2 | Num | 3 |
Type Conversion
We have covered so far the use of the coalesce functions when all the parameters are of the same type. Let’s review the behaviours when there is a type mismatch.
Mixing Types in a data-step
In a data-step, the arguments are converted if required. More specifically, if an numeric argument is used with the coalescec
function, SAS will convert it to a character variable. Such a conversion is straightforward and is unlikely to fail. SAS will issue however a note in the log. With the coalesce
function, SAS will try to convert a character variable into a numeric variable. If such a conversion fails, the argument will be set to NULL.
Running the previous piece of code will produce the following entry in the log:
Mixing types in SQL
While SAS will attempt to convert the parameters to the appropriate types, the coalesce
function generates an error if there is a mismatch.
The previous piece of code will issue three consecutive errors of the same type:
Summary
The following table summarizes the differences between coalesce
and coalescec
depending on the context of the processing step. These differences are critical to keep in mind especially when code is being re-factored to use a different processing context.
Context | DATA Step | SQL Step | ||
---|---|---|---|---|
Function | coalesce |
coalescec |
coalesce |
coalescec |
Type of the Parameters | Numeric | Character | Character or Numeric. All the parameters need to be of the same type. | Character |
Result Length | 8 by default unless specified otherwise | 200 by default unless specified otherwise | The length is adjusted based on the length of the parameters | 200 by default unless specified otherwise |
Risk if length is not specified | No loss of precision | Possible truncation | No loss of precision | Possible truncation |
Type Conversion | Conversion from character to numeric may fail. Generates a note in the log. | Automatic conversion. Generates a note in the log. | Generates an error if there is an issue with conversion. | Automatic conversion. Generates a note in the log. |