12.15.2011

Flush Jaspersoft OLAP Cache from Talend/JasperETL

JaspersoftETL is a Java code generation tool designed to load your operational data into a data warehouse and get it ready for reporting and analysis.  Jaspersoft OLAP takes advantage of aggregate tables and an in-memory cache to speed up queries. If you're running a nightly job to load the DW, it's a good idea to clear the OLAP cache at that time, otherwise your users will not take advantage of new data.

There are several ways to clear the OLAP cache:
  • Login with superuser privileges and display the OLAP Settings page.
  • In the OLAP Settings panel, click Flush OLAP Cache
  •  Using utilities such as cURL and wget, send an HTTP request to this URL: http://server:8080/olap/flush.html?j_username=superuser&j_password=
  • Restart the application server instance. 
  • Call the Java API.
I've created a small component that allows the clearing of cache via the URL method from a JasperETL/Talend job. Essentially it will prompt you for a username and password (and note that only superuser can clear the cache by default) and a server URL. The URL should point to something like http://server:8080/jasperserver-pro/olap/flush.html

You may find the tJasperOLAPCache component on talendforge.org

Another useful thing to do would be to use the tJasperServerExec plugin to re-populate the cache.

12.01.2011

Using tGoogleAnalyticsInput in JasperETL/Talend

This is a short article on getting Google Analytics into JaspersoftETL/Talend ETL tools. Google Analytics has a nice interface and many great ways of analyzing web traffic, it'd be hard to beat the work Google did there. The purpose of this connector would be to bring the data into your own data warehouse and join it with other intelligence you might have. A great way to display the aggregated data would be with a BI tool like Jaspersoft.

I'm writing because the help for the tGoogleAnalyticsInput is in French and incomplete.

So let's get started. You'll need JasperETL 4.1.2 or older (commercial editions here, open source here).

Installing the plugin:
Create a new Job Design and drag the tGoogleAnalyticsInput component from the Palette (under Internet):


















Configure the component. First start with finding some fields you'd like to use from the Google API: http://code.google.com/apis/analytics/docs/gdata/dimsmets/dimsmets.html. Note that you need to have some metrics (can't be all dimensions, vice versa).

Here's an example schema (you'll have to put in the fields yourself):
Now configure the component. In order to get your Profile ID (note it doesn't start with UA) read the cryptic Google Documentation about it. I got mine by examining the URL while logged into GA and looking at the &id= parameter.

The "value" column comes directly from the Google API and will also have to be filled in. I've also found that I HAVE to filter on period. You can do 1980-2020 if you want to include everything. 

Now you can run the the job and see your visitors:

Funny enough. If you're reading this right now then you will eventually flow through the GA database and the very job I'm displaying above ;)