Tuesday, May 17, 2011

Kowaa -- Explain your database to Apache Solr.

CSC Lounge (http://csclounge.com) is currently being reimplemened.

Major feedback we have gotten revolves around the need for the UI and UX to be more modern, colourful and intuitive and we are working to meet those needs. But it's not only the UI that we are working on, we are re-implemening the whole system with the Scala-based Lift web framework. The major reason we chose Scala is that it is more cost effective and flexible, besides it's what the cool chaps use for their systems now-a-days (ask Twitter and Foursquare). One of the module which we worked on (and is currently ready) is the search module.

The search was built on Apache Solr (http://lucene.apache.org/solr), an open-source search server created with Java on top of Apache Lucene (http://lucene.apache.org/java/docs/index.html). Solr is an amazing product and recommended to anyone building a web system from scratch (or already has a running web system) and want to add search functionalities. It works by allowing developers import indexes, which would be persisted to disk as documents containing fields that are accessible to search. Beside just searching, the results can be returned highlighted or transformed in whatever way the developer wants. Results could also be returned in several formats including XML (the default), JSON, PHP, Ruby, Python. Most of these features are actually base features of Lucene (which currently powers twitter's backend and a number of NoSQL DBs, like CouchDB). Solr allows users to import indexes from different sources -- XML, RSS, Wikipedia or an already existing database. Ours was the case of an already existing database. The problem with the process is that you have to manually create an XML file which tells Solr how your database is defined. This is a stressful process as you have to switch between the XML editor and your database and, in between, decide what you want to index. God help you if you have a complex database system which requires nested Solr fields.

This is where Kowaa comes in.

I created Kowaa to automate the process. It's a simple Java-based GUI tool that connects to your database and provides interfaces for you to set the properties of the Solr document and fields. It presents you with interfaces to select which database fields you want to index. It also allows you create nested fields. It then spits out the XML you need in a directory you specify. You can now customise by setting transformers (like HTML stippers) and other processors. I searched the internet for a tool to do this and I didn't find any, so I created one.

I used it with Microsoft SQL Server but I included options and JDBC driver libraries for indexing MySQL, PostgreSQL and JavaDB (network and embedded). To download, just follow the URL http://csclounge.com/kowaa.rar. The file contains both the source codes and the build so you can customise the code as you want to. It was built for internal use but we figure it would be needed so we are giving it all free! So if you are looking to set-up Solr on your site on an already exsting database you might want to take advantage of this.

After downloading, extract to any directory of your choice and run by invoking the following command on the command line:

$ java -jar path_to_extract/dist/Kowaa.jar

What does Kowaa mean?

Kowaa is Igbo word for "explain". Ifetayo also reminded me that when you break it into two, it means "teach us" in Yoruba (Ko waa). So it's more like explaining (or teaching) Apache Solr what your database is.

No comments: