Metaprogramming and a Generative Approach

Dec 07 2010

When it comes to metaprogramming, even the most technical folks can find their eyes rolling back into their heads.  I’ll admit, with all of the fun and new technologies out there, a discussion about metaprogramming is, in the name of brutal honesty, just outright boring…  but, so is being addicted to brute force.

During a recent ETL implementation using T-SQL, changes in architecture and scope began to threaten the project’s delivery dates. I desperately wanted to begin development but feared throw away work resulting from requirement changes coming out of ongoing design and analysis discussions. How do you develop code without a finalized set of requirements and prevent re-writing each piece of code when requirements change?

By writing a code generator, I was able to rapidly respond to model, requirement and architectural changes by simply adjusting the metadata or code template and regenerating the ETL scripts. Additionally, it saved me countless hours of tedious error prone activities such as defining table variables and creating select statements from the source tables that respect the ordinal positioning of their counterparts on the target.

Advantages

1. Quickly update entire codebase by editing code template or metadata and regenerating the scripts

2. Enables development prior to having a finalized set of requirements, model or architecture

3. Standardized names and format ensure consistency throughout the scripts

4. Metadata driven test cases enable rapid testing.

Disadvantages

1. Writing dynamic SQL that writes dynamic SQL can be tricky (think about passing literal values with single ticks (‘) or concatenating a literal that is actually a parameter in the generated script.

2. Can be difficult to debug.

Below are some examples that illustrate the concepts I’ve discussed. The actual code being implemented was more complex than these examples.

--declares a variable to hold the string to create the temporary table used in the data movement.  The process will iterate over a table of metadata that contains information about the source, target, datatypes and the data lengths etc

Declare @buildTempTableSQL nvarchar(max) = 
‘DECLARE @temp_’ + @sourceTable + ‘ AS TABLE (‘
...
Fetch next into @source_column, @target_column, @datatype, @length, @null
Set @buildTempTableSQL += ‘,[’ + @target_column + ‘ ] [‘ + @datatype + ‘] (‘ + @length + ‘)’ + @null  +@newline
...

The code snippet above creates table variables for each table like the example output below:

DECLARE @temp_SomeTargetTable AS TABLE 
( 
[ClientAccountId] [int] NULL,
[ClientAccountName] [varchar](30) NULL,
[ContactFirstName] [varchar](75) NULL,
[ContactLastName] [varchar](75) NULL ,
[ContactEmailAddrs] [varchar](120) NULL,
[ContactPhoneNumber] [varchar](50) NULL,
[ContactFaxNumber] [varchar](20) NULL,
[LastUpdateUserId] [varchar](30) NULL,
[LastUpdateDate] [datetime] NULL
)

Similarly, the select statement to populate the table variable will implement the mapping and respect the ordinal positioning of the target.

Declare @buildSelectStatementSQL nvarchar(max) = ‘INSERT INTO @temp_’ + @sourceTable + ‘SELECT ‘ + @newline
…
Fetch next into @source_column, @target_column, @datatype, @length, @null
…
Set @buildSelectStatementSQL += ‘,’ + @source_column + ‘ AS [‘ + @target_column + ‘], + @newline

The previous snippet renders as this next statement in the stored procedure we generate:

INSERT INTO @temp_SomeTargetTable

SELECT  
ClientAccountID as [ClientAccountId],
ClientAccountName as [ClientAccountName],
ContactFirstName as [ContactFirstName],
ContactLastName as [ContactLastName],
ContactEmail as [ContactEmailAddrs],
ContactPhone as [ContactPhoneNumber],
ContactFax as [ContactFaxNumber],
LastChangedBy as [LastUpdateUserId],
LastChangedDate as [LastUpdateDate],
From [SRC_SERVER\DEV].SourceDatabase.dbo.SomeSouceTable

Once the requirements, model and architectural decisions were finalized, I quickly completed the development by updating the code template, deleting a few source to target mappings in the metadata table to reflect a model change, and regenerating the scripts.

With the additional time that freed up I built on this approach by creating a related testing table that contained test cases for each source to target pairing. Each test case record contained a “preRunSQL” and a “postRunSQL”. The “preRunSQL” contained SQL designed to alter data to support the evaluation of that particular test case while the “postRunSQL” contained the evaluating statement and updated the “pass” field to reflect the outcome of the test. Executing and evaluating a few hundred test cases for unit testing ran in a short amount of time. New test cases can be added as they are needed. Additionally, it creates a ready to go regression suite should changes occur in the model or code template.

Using metadata to generate code can provide a way apply updates across an entire code base, rapid response to model or requirement changes, standardize names and formatting, and can offer an overall reduction in development and testing time.

About the Author

 

Disclaimer

The words and opinions expressed here are those of each article's respective author, and do not necessarily represent the views of CapTech Ventures.