Manage your business metrics with Haystack, Elasticsearch and Kibana

  • Jan
    02

A good practice in projects you're undertaking is to bring in business metrics to ensure that the actions being carried out are giving the expected results, the idea of this article is to tell you about my experience using open source tools that make this job easier.

 

Haystack is a search tool for Django, the way it works is that a SearchIndex is created to determine which fields of the model can be indexed and which fields can be filtered, it supports multiple backends such as Solr, Whoosh, Xapian or Elasticsearch. We chose Elasticsearch because it integrates with Kibana, a tool we use to provide business intelligence.

A SearchIndex may look like:


import datetime
from haystack import indexes
from myapp.models import Note

class NoteIndex(indexes.SearchIndex, indexes.Indexable):
text = indexes.CharField(document=True, use_template=True)
author = indexes.CharField(model_attr='user')
pub_date = indexes.DateTimeField(model_attr='pub_date')

def get_model(self):
return Note

def index_queryset(self, using=None):
"""Used when the entire index for model is updated."""
return self.get_model().objects.filter(pub_date__lte=datetime.datetime.now())

 

In this example we can filter by author, pub_date or search for text using the text field, the text field is mandatory and is defined in a template file, placed for example in a directory search/indexes/myapp/note_text.txt with the following content:

 


{{ object.title }}
{{ object.user.get_full_name }}
{{ object.body }}

 

For business metrics  this field is not particularly helpful because we are interested in using Haystack as a tool to index the django models in Elasticsearch so we can have a copy of the database in an environment prepared for the stress of constants queries and that doesn't affect the performance of our production database.

 

At this point we can already get some useful information, for example, we can find the number of daily records using a histogram graph on Kibana using the query django_ct=auth.users query grouping the information per day.

 

Measuring data that are not in our Django Models

In case we need to measure information not reflected in our database Elasticsearch can be used directly, for example if we want to measure the number of times that a user presses a button, then we  have to create an index with the action information and copy down a wrapper I use to send this information to Elasticsearch using the Haystack settings.

 


from django.conf import settings
from django.core.exceptions import ImproperlyConfigured

from elasticsearch import Elasticsearch

from datetime import datetime

class BusinessMetric(object):
def __init__(self, doc_type):
self.doc_type = doc_type
try:
hc = settings.HAYSTACK_CONNECTIONS
except:
raise ImproperlyConfigured("You must specify a 'HAYSTACK_CONNECTIONS' in your settings'.")

# TODO: manage multiples haystack connections
for connection in hc:
connection_alias = connection
connection_options = hc[connection]
break

if not 'URL' in connection_options:
raise ImproperlyConfigured("You must specify a 'URL' in your settings for connection '%s'." % connection_alias)

self.conn = Elasticsearch(connection_options['URL'], **connection_options.get('KWARGS', {}))

def push(self, body):
body['created'] = datetime.now()
self.conn.index(index="business_metric", doc_type=self.doc_type, body=body)

def search(self, body):
return self.conn.search(index="business_metric", body=body)

 

So to create this metric, at the view executed when the user presses the button we would place

 


metric = BusinessMetric(‘appname_metric’)

metric.push({"user": str(self.request.user), "action": "press_a_button"})

 

Bonus: Daily metrics

For quick access to useful information during business operation it can be helpful to implement a dashboard. Here I leave a couple of examples of how you can integrate metrics with django-dashing.

Using HayStack

In this case we use SearchQuerySet from haystack, the queries are very similar to those used in Django's ORM


class DailyErrandsInfoWidget(NumberWidget):
title = 'Errands'

def get_value(self):
today_start, today_end = day_boundaries()
return SearchQuerySet().filter(
django_ct='errands.errand',
created__lte=today_end,
created__gte=today_start).count()

def get_change_rate(self):
today_start, today_end = day_boundaries()

errands = SearchQuerySet().filter(
django_ct='errands.errand',
created__lte=today_end,
created__gte=today_start)

total = 0

for errand in errands:
total += len(errand.destinations)

return '{} destinations'.format(total)

def get_more_info(self):
today_start, today_end = day_boundaries()

taxis = SearchQuerySet().filter(
django_ct='errands.errand',
created__lte=today_end,
created__gte=today_start,
type='taxi').count()

return '{} taxis'.format(taxis)

def get_updated_at(self):
modified = SearchQuerySet().filter(
django_ct='errands.errand').order_by('-modified')[0].modified

return u'Last update {}'.format(modified.strftime(u'%I:%m %p'))

 

Making queries in Elasticsearch

In this case we need to do the queries using Query DSL, which can be a bit trickier due to unfamiliarity. The Django Dashing widget configuration may look like this

 


class ButtonPressessWidget(NumberWidget):
title = 'Button Presses'

def get_value(self):
q = BusinessMetric(‘appname_metric’).search({
"query": {
"match": {
"action": "press_a_button"
}
},
"filter": {
"range": {
"created": {"gte": "now-1d"}
}
}
})

return len(q['hits']['total'])

Conclusion

 

Technologies like Elasticsearch have great potential and is this article only briefly some of its possibilities. To learn more you can read the documentation and articles on the blog, Kibana in particular is complicated when you're getting started and it may seem hard to take advantage of it but once you get used to Lucene queries and how each type of panel works, it becomes really useful.


Posted on Jan. 2, 2015, 3:37 p.m.