Google Search Appliance (GSA) Sorting in Portal

Oct 20 2009

At several of our clients, we have integrated the Google Search appliance into a Portal.  In order to accomplish this integration we could take 1 of 2 approaches:

1.     Utilize GSA’s built-in ability to format the presentation logic via a XLST.

2.     Utilize GSA’s ability to return straight XML.

Both approaches work well and can suit the needs of a portal.  Option 1 though will not work if you need to sort the entire result set prior to displaying it to the users.  The reasons for this is as follows:

1.     GSA does not provide the ability to retrieve more than 100 results at a time

2.     GSA’s built in sorting only sorts the first 100 results.

3.     Sorting on things other than Date or Relevance [e.g. Meta Data] requires some XSLT work and it is still bound by the limitations of only sorting the 100 records at a time.

Option 2 still has the limitation of fetching 100 records at a time, but you can sort it client side as requirements dictate.  Our approach to accomplishing this typically involves the following:

1.     Creating client side code that dynamically fetches the entire result from GSA by fetching blocks of 100 results at a time up to the maximum available.

2.     Store the resulting composite XML in a cached region for a predetermined amount of time.  The caching algorithm for the key and time should be configurable so that it can be adjusted as needed.

3.     After fetching and storing the results, sort them based upon the client input.

Conclusion

Overall Option 2  worked very well for us when the sorting requirements exceed those available to you by the built-in mechanisms provided by GSA.  The one challenge to keep in mind is the memory requirements needed for caching and the time required to fetch the results in chunks.  In both cases, we found that the memory requirements rarely had an adverse impact on our portal and the fetch time was only incurred by the first requestor and was rarely noticeable.

About the Author

Mr. DiFrango has over 15 years experience specializing in architecture, design, and construction of distributed, integrated systems in the enterprise and web environments. This experience includes expertise in Enterprise Systems Integration, Application Development, Service Oriented Architecture, Content Management Integration and Portal Solutions. Mr. DiFrango specializes in the JBoss Portal, Alfresco, Weblogic and Tibco product suites. Product expertise pertains primarily to Application Servers, Service Enablement Products, Content Management Integration, and Portal frameworks. Mr. DiFrango is experienced in architecting, designing and developing J2EE standardized Applications. This experience includes full lifecycle development from creating OO designs through to product testing. Mr. DiFrango also has experience leading teams of analysts, developers, and testers through troubleshooting complex development, mentoring staff, managing development schedules and work assignments, and providing architectural guidance. He is experienced in software development methodologies including procedural or waterfall, and Agile projects. He also has extensive experience with modeling systems during analysis, design, and construction using UML.

 

Disclaimer

The words and opinions expressed here are those of each article's respective author, and do not necessarily represent the views of CapTech Ventures.