Using Groovy to Parse Application Logs for Performance Data

Jul 08 2009

Groovy came in handy for me once again last week when I was tasked with aggregating performance data from application logs.  The logs were concatenated together into one file that totalled over 2 million lines.  Interspersed throughout the logs were performance timer entries.  Each entry included a timestamp, a string for tagging the line as a performance timer entry, the name of the object and method called, and the number of milliseconds elapsed for the method's execution.

Performance tests were run over a two-day period under various loads.  For analysis purposes we wanted to see each method's average run time for each set of conditions, so we needed a way to look at aggregated data for specified time periods.  The results would be output to a CSV format for easy integration with an Excel worksheet.  Finally, we didn't want to spend too much time on creating a script since it wouldn't really be reuseable outside the context of this project, due to the specific formatting of the performance timing entries.

I chose Groovy for the following reasons:

  • I'm comfortable with Java, and Groovy is a superset of Java.
  • It's a scripting language, so I can have code that executes outside of an object model.
  • Supports closures as well as some handy convenience methods.
  • Allows me to use objects and classes as necessary.

I created a Groovy script that took four arguments:

  • Start time
  • End time
  • Input file (the log)
  • Output file

The script reads through every line in the log file.  Working with file reader input streams is always a fairly straightforward exercise, but it's even quicker and easier with Groovy:


File file = new File(inFile)  // inFile is a string file name
file.eachLine(parse) // parse is the name of a closure


Inside the closure, the it keyword is used to refer to the line (String) that was passed in:

nTimer = it.indexOf("PERF-TIMER]") + 12  // length of "PERF-TIMER"

The parse closure is where all the action is.  The algorithm is specific to the unique performance timer entries for this project, checking each line for a "PERF-TIMER" string, then for the timestamp.  It then determines if the timestamp falls between the times specified by the input arguments, and proceeds from there.

Outside the closure, I defined a class to store method data:

class MethodData {
    long nCount = 0
    long msTotal = 0 // total milliseconds
    String methodName
  
    void add(String ms) {
        nCount++
        int i = Integer.parseInt(ms.trim())
        msTotal += i
    }
  
    double avg() {
        return msTotal / nCount
    }
  
    String toString() {
        return methodName + "," + nCount + "," + msTotal + "," + avg() + "\n"
    }
}


Back inside the closure, the method name is extracted from the performance timer entry and used as a key for a Map object.  If data has already been captured for a given method, the milliseconds data is just added to the total.  Otherwise, a new object is created and put into the Map:

if (map.containsKey(strMethodName)) {
    (MethodData)(map.get(strMethodName)).add(strMilliseconds)
  
} else {
    data = new MethodData()
    data.methodName = strMethodName
    data.add(strMilliseconds)
    map.put(strMethodName, data)
  
}


All this working with performance data got me thinking:  how would the Groovy script itself perform?

Running the script from an Eclipse console on a MacBook Pro (4 GB memory), it went through a 260 MB log file with 2.003 million lines in just over 64 seconds.  That's about 4 MB per second, or .03 milliseconds per record.  Not bad, certainly acceptable for my purposes.

About the Author

Disclaimer

The words and opinions expressed here are those of each article's respective author, and do not necessarily represent the views of CapTech Ventures.