Raju is the data quality service offering lead at CapTech. His area of expertise include, many different areas within the IT management consulting arena. He thinks everything else can be purchased, but quality must be built-in.
Using the SQL RANK function to solve an unusual sorting problem in obtaining data for performance tests
Feb 15 2010
Problem:
Our team needed to create a data set listing retail stores belonging to different territories for performance tests that involved store selection. The data set was to be used by virtual users signing-in as different territory owners. The challenge was that the territory and store numbers in the data set should be as random as possible. For example, if there are 20 territories and each territory has 10 stores, the data set should contain the 20 territories with their first stores, followed by the same 20 territories with their second stores and so on.
Solution:
Billy Tilson of CapTech solved this sorting issue using the RANK function of Teradata. In fact, this Online Analytical Processing (OLAP) function is part of the ANSI SQL-99 standard and so this approach can be used in most databases.
The SQL we used was something like,
SELECT
USERID,
TERRITORY,
STORE_NO,
RANK() OVER (PARTITION BY TERRITORY ORDER BY STORE_NO) AS RANK_ID
FROM STORE_DIMENSION
<<More Joins or Filters as Needed>
ORDER BY 4, 2
The output will be,
|
USERID |
TERRITORY |
STORE_NO |
RANK_ID |
|
user1 |
123 |
342134 |
1 |
|
user2 |
234 |
234567 |
1 |
|
user3 |
345 |
456789 |
1 |
|
… |
… |
… |
… |
|
user20 |
789 |
764590 |
1 |
|
user1 |
123 |
890021 |
2 |
|
user2 |
234 |
992435 |
2 |
|
user3 |
345 |
896543 |
2 |
|
… |
… |
… |
… |
|
user20 |
789 |
667843 |
3 |
|
user1 |
123 |
436577 |
3 |
|
user2 |
234 |
298766 |
3 |
|
user3 |
345 |
903345 |
3 |
|
… |
… |
… |
… |
|
user20 |
789 |
217789 |
3 |
The PARTITION BY clause determines the group (in this case the TERRITORY) on which the RANK function executes. If this is not used, the entire result set delivered by the FROM clause will be treated as a single group.
If what's needed is a simple sort by territory and store numbers, order by can be used. However, the problem here is the need to shuffle territories and stores to the maximum extent possible. The way the performance testing tool uses the data file is that multiple virtual users pick chunks of data out of it to generate the transactions. The objective of this sort is to make sure that when multiple virtual users throw out SQLs concurrently, the database cache does not end up fetching the same territory's or store's information from its cache. In this approach for sorting, the stores with rank 1 are the first stores from all the territories and those of rank 2 are the second stores from all the territories, and so on.
Typically, RANK function is used to get the top few rows based ascending or the descending orders, but in this situation, the OLAP function is used to solve a unique sorting need.
Conclusion:
OLAP functions such as RANK can be very powerful and be used for many challenging situations. This article detailed one creative way to solve an unusual sorting problem with the RANK function.
© 2010 CapTech Ventures, Inc. All Rights Reserved. Legal Notices.