Blog

SQL 2012 Developer: Aggregates

Excerpt by Don Kiely | September 11, 2013

Aggregates

One of the more exciting types of SQLCLR code modules is custom aggregates. Earlier versions of SQL Server didn't offer an easy way to extend the rather short list of T-SQL aggregates, such as SUM, AVG, MAX, MIN, and COUNT. You still can't use T-SQL to create custom aggregates, but you can write your own in SQLCLR. It's not something you'll need to do often, but it can greatly simplify application coding in situations where you need it.  Unlike other SQLCLR code modules, you can define only a single custom aggregate within a class. The class must be public and requires four non-static methods, which SQL Server calls as it processes the data to aggregate:

  • Init initializes storage values. SQL Server calls this once at the beginning of the query that uses the aggregate.
  • Accumulate does the work of combining values. SQL Server calls it once for each row that is aggregated, and it must support multiple instances of the aggregation code that is executing so that SQL Server can utilize multiple threads to make operations more efficient.
  • Merge is called at the end of the operation to merge the aggregations of multiple instances into one final value.
  • Terminate is the last method called and returns the final aggregated value.

You also have to deal with serialization issues because instances of the aggregation class have to be moved between threads. This complicates the code only a little, since the .NET Framework has rich support for serialization. Most commonly, you'll implement the IBinarySerialize interface to let .NET do all the serialization work, but there are other options if you have a special need.

Consider a sample aggregation. A common requirement in applications is to create a concatenation of strings within a table, delimited by a single character such as a semicolon. Some applications require information in this format for processing. You'll create a Concatenate aggregation that you can use to concatenate all string values in a field. For example, once you create the aggregation, you could run the following T-SQL code to produce a single string with all product names from the Northwind Products table that have a category ID of 4, which is Dairy Products.

SELECT dbo.Concatenate([ProductName]) FROM Products WHERE CategoryID = 4;

This code produces the following string (with line breaks added to fit on the printed page):

Queso Cabrales;Queso Manchego La Pastora;Gorgonzola Telino;Mascarpone Fabioli;Geitost;Raclette Courdavault; Camembert Pierrot;Gudbrandsdalsost;Flotemysost;Mozzarella di Giovann

The first part of the SQLCLR code defines the aggregation class and decorates it with Serializable and SqlUserDefinedAggregate attributes. The SqlUserDefinedAggregate attribute needs the following parameters, most of which the query processor uses to perform the aggregation properly.

  • Format specifies the serialization format, either Format.Native or Format.UserDefined. You'll nearly always need to use UserDefined, although SQL Server has enhanced support for the Native format.
  • IsInvariantToDuplicates indicates whether the aggregation result is the same if any of the items are duplicates. The Concatenate aggregation returns a different result if duplicates exist (they'll appear multiple times in the resulting string), so set this value to false.
  • IsInvariantToNulls indicates whether the aggregation result is the same if any of the items are nulls. Concatenate ignores null values, so set this value to true because the same string will result whether there are nulls or not.
  • IsNullIfEmpty indicates whether the aggregation result is null if no items were aggregated, such as when the WHERE clause returns norecords to aggregate. If no strings exist, Concatenate returns a null, so set this to true.
  • MaxByteSize is required by the UserDefined serialization format and indicates the largest aggregation value that can be returned. This is set to 8000 bytes in Concatenate.

Concatenate is implemented in the following code as a structure and inherits from IBinarySerialize so that SQL Server can properly use multiple instances of the aggregate, using the listed parameters.

[Serializable] [Microsoft.SqlServer.Server.SqlUserDefinedAggregate( Format.UserDefined, IsInvariantToDuplicates = false, IsInvariantToNulls = true, IsNullIfEmpty = true, MaxByteSize = 8000)] public struct Concatenate : IBinarySerialize {

SQL Server calls the Init() method before aggregation begins. This is the point where you should initialize any variables used in the aggregation. Concatenate uses a StringBuilder object in an sb variable to hold the aggregated values, so that variable is instantiated here.

private StringBuilder sb; public void Init() { sb = new StringBuilder(); }

SQL Server calls the Accumulate method when it has a new value to add to the aggregation. Concatenate works only on string type data, so the method receives a value of type SqlString. If the value is null, it returns immediately without doing any aggregation task. Otherwise, it calls the Append method of the StringBuilder variable to concatenate the string.

public void Accumulate(SqlString Value) { if (Value.IsNull) return; sb.Append(Value.ToString() + ";"); }

The Merge method is used to aggregate multiple instances of the Concatenate object. In this case it simply concatenates the value passed to it by the SQL Server query processor to the current instance's aggregation value. The method is passed an object of type Concatenate and reads the object's StringBuilder's ToString property to get the current aggregation value. Often you'll need to do more work here; for example, when the resulting aggregation needs to be in alphabetical order. In that case, you may want to use a sortable .NET list object to store the strings internally while processing the aggregate. But here a simple append is sufficient, since the order of the items in the final string is not important.

public void Merge(Concatenate Group) { sb.Append(Group.sb.ToString() + ";"); }

The last of the four required aggregation methods is Terminate. SQL Server calls this after it processes the last value that it aggregates. This method needs to return the final aggregated value, so it first checks whether the current string is of zero length. If it is, it returns a null. Otherwise it strips off the final semicolon and returns the value. It also clears the sb object in case this instance of Concatenate is used for another aggregation.

public SqlString Terminate() { if (sb.ToString().Length == 0) return SqlString.Null; else { String result = sb.ToString().Substring(0, sb.Length - 2); sb.Length = 0; return result; } }

The aggregation object also has two other methods, required because the object implements the IBinarySerialize interface. The Read method is called when the object is deserialized and restores the value of the sb variable. Write puts the current string value of sb into the serialization object w.

void IBinarySerialize.Read(System.IO.BinaryReader r) { sb = new StringBuilder(r.ReadString()); } void IBinarySerialize.Write(System.IO.BinaryWriter w) { w.Write(sb.ToString()); }

This simple but useful example shows what is necessary to create a SQLCLR aggregate. Other custom aggregates may require more complex code to implement all the required features, and you may have to use multiple class-level variables to keep track of multiple values during the course of calculation, but it really isn't all that hard to do once you understand the basic principles. You'll need to have a good handle on how .NET does serialization. But these techniques really become valuable when you implement your own user-defined types in SQLCLR, which open up whole worlds of possibilities in SQL Server.

ldn-expertdkielyThis post is an excerpt from the online courseware for our SQL Server 2012: Advanced SQL CLR course written by expert Don Kiely. 



Don Kiely

Don Kiely is a featured instructor on many of our SQL Server and Visual Studio courses. He is a nationally recognized author, instructor, and consultant specializing in Microsoft technologies. Don has many years of teaching experience, is the author or co-author of several programming books, and has spoken at many industry conferences and user groups. In addition, Don is a consultant for a variety of companies that develop distributed applications for public and private organizations.


This course excerpt was originally posted September 11, 2013 from the online courseware SQL 2012 Developer, Part 06 of 13: Advanced SQL CLR by Don Kiely