ELK intro and Elasticsearch lessons from production

ELK stands for ElasticSearch, Logstash and Kibana. I had become acquainted with this during MongoDB Day – Bangalore on 19 May 2014 by Susheel Zaveri’s excellent talk. So, I was overjoyed, when the Elasticsearch Meetup Bangalore’s First Meetup coincided with my trip on 27 Sep 2014. Elasticsearch has got an open, RESTful API that makes it easy to build applications on top of it. It can process both structured and unstructured data, so you can derive insights from log files to Tweets to plain old CSV files, all in near real-time. Best of all, you can ingest data from all these disparate sources easily into Logstash, then search and analyze across all of these types of data with Elasticsearch, visualizing the results using Kibana. This stack makes these insights available to anyone in an organization through Kibana’s dashboards, which are share-able and don’t require programming know-how to use effectively.

These features – plus many more – make the ELK stack so flexible that it meets the big data challenges of a wide variety of verticals. A major financial company uses the ELK stack to do anomaly detection and root out credit card fraud. Another one performs analytics and sentiment analysis across social media data. Yet another one detects hacking on their networks, and yet another for full-text search across e-commerce sites with billions of entries.

Suyog Rao starts the talk while Drew sitsThe meetup was held at SpringPeople Software Pvt Ltd, Sector 7, HSR Layout, Bengaluru, Karnataka.  It had 2 speakers: Suyog Rao, Vedang Manerikar. It was free of cost, but required registration in a Google Form. Suyog Rao (@suyograo) started with an introduction to ELK. He started describing ElasticSearch as a schema-free, REST and JSON document store. The salient points of his talk were:






  • The popularity of ElasticSearch can be gauged from the total number of downloads, which stands at 10M in last 2 years.
  • An Elastic Search cluster can contain multiple Indices(databases), which in turn contain multiple Types(tables). These types hold multiple Documents (rows), and each document has Properties(columns). [Terms in bracket are relational counterpart]
  • It uses replication for high availability and performance. For horizontal scalability, it uses sharding.
  • It supports:
    • Unstructured as well as Faceted, structured search
    • Enrichment and sorting
    • Pagination and Aggregation


He covered Logstash and Kibana next.

  • Logstash is a ruby app, which runs on JVM.
  • It allows one to collect, parse, enrich and store logs and events.
  • Kibana allows one to have beautiful visualization on top of Elasticsearch index with zero code.
  • The new version makes use D3 library.

He showed a quick demo. Actually covered a lot of stuff in short time.




Vedang Manerikar (@vedang) works with Helpshift, a mobile CRM company based out of Pune and San Francisco. [It’s a company, which has unique hiring practices. Refer my earlier blogpost on Building Silicon Valley culture in India]

IMG_2734The customer-facing side of Helpshift product is a simple chat feature within the app using the Helpshift mobile SDK. The business-facing side is a complex agent dashboard that helps the agent in processing as many issues as quickly as possible. This business-facing side is built on top of Elasticsearch. He shared the following nuggets of wisdom with us:

  • Elasticsearch does not have a book on it, although it will soon be solved. There are good references and videos, but nothing structured like a book yet.
  • Don’t use Elasticsearch as a primary database. The data should first go into mysql, MongoDB or other transactional datastore.
  • Though ES allows one to have a mixed mode node with both meta data and data, it is best to separate master and data nodes.
  • For multi-tenant index like Helpshift’s usecase, an index per customer is not a good idea, but something based on the index size.
  • He said helpful steps about bulk loading like controlling replica count etc, but I did not catch it fully.
  • Rolling upgrade of ES is fraught with risks, so it is better to spin up new cluster and decommission old one. [This was contested by Suyog and Drew]
  • Benchmarking is hugely important and should be done at staging and development phase to prevent aches during production. He mentioned about a tool called Tsung, which helped them benchmark percolators. Percolators allowed live notifications of new issues.
  • During runtime, a lot of debugging can be done using cat API’s, so make use of them.
  • Tune JVM parameters, like allocate more memory for young generation.
  • ES uses Lucene under the hood, so some troubleshooting might require understanding its working as well
  • RTFM – Basically read manual carefully. Pay special attention to the unit, whether a particular number refers to ms or seconds.
  • Advanced ES users make use of filters to make complex views.
  • There were many others, but I guess we have to wait for the presentation to arrive.

Writeup on MongoDB Meetup at Jabong

IMG_1557Mongo Dilli (meetup url) held its meetup at Jabong on Aug 22, 2014. We started around 6:15 after initial introduction of participants. At least a quarter of them were using MongoDB in production, while few had just started looking at it. About half of them had not tried MongoDB yet, but were extremely interested in it.

The first talk was by the hosts at Jabong, Supreet Sethi and Apoorva Moghey. They had audaciously run MongoDB on Raspberry Pi running ARM processor. Since MongoDB runs on small Endian machines till MongoDB Inc fixes SERVER-1625, they had to use download a fork (github url) of MongoDB and compile it.

While they were presenting, I was frantically trying to finish up presentation. I tried rigging the raffle bucket, but it did not work, as I did not win it at the end just kidding :)!IMG_1558

After this, I started with my talk on Product Catalog: Retail Reference Architecture with MongoDB. After all, I was at Jabong, India’s leading e-tailer! On a serious note, the schema design in MongoDB due to its document structure is different compared with relational ER modeling, so I chose a sample domain to illustrate general points. I did spend quite few minutes on answering general and introductory questions on MongoDB, nosql; because 50% of audience was new to MongoDB and a few entirely fresh to nosql.

After this Anuvrat Prashar from product review portal, Zopper presented his journey of Python and MongoDB. It was really a pleasure to listen to the nerdy talk. Interestingly, he had ssh’ed to his box from his colleague’s over the internet, as his machine did not have a connector to the projector. His presentation was HTML5 and transitions were taking time after action, but things worked fine. We learnt a big deal about Python MongoDB driver and a few wrappers on it. The crawling produces semi-structured data, which is easily digestible by MongoDB. It would be a nightmare to do the same on a relational database.

IMG_1563The most important part was the drawing of raffles to announce 3 winners. The prizes were sponsored by Jabong. We had nice snacks and a great time networking with enthusiasts and users of MongoDB afterwards.

Bangalore top Indian city for professionals to move into among Indian cities

Linkedin had announced in June 2014, that it had a base of 26 million professionals in India, which is 2nd largest after USA – Hindu news article. Linkedin analyzed (original post on Linkedin blog) movement of technology professionals between Nov 2012 and Nov 2013. If we compare just the Indian cities, which appear in top 10 cities globally, Bangalore (or Bengaluru) comes to top in both absolute and percentage terms.

Linkedin Moving professionals

You may view the chart directly on Tableau public site.

MongoDB World Recap

Check out MongoDB World videos and presentations. Feel free to check out the keynote sessions, as well as most popular customer and internal sessions at MongoDB World.

Watch Most Popular Sessions

Hear from Customers at MongoDB World

Watch the Keynote Videos:

April SAIF ignition meeting on mobile marketing

SAIFMobileMarketingSAIF held a meeting for entrepreneurs at its office terrace on the theme of mobile marketing. There were a lot of folks from SAIF investee companies like PayTM, PropTiger etc. After initial snacks and networking, Deepak Abbot (@deepakabbot), Product Marketing Head from PayTM provided valuable insights regarding mobile marketing.

PayTM has been marketing for 1.5 years. It grew 4x. It is a desktop web company to start with. Now, 60% orders come on mobile. They have seen 6m app installs till March 2014. Windows is 2nd biggest interestingly. Have to spent money, not upgraded. 4% revenue.
They started with a target 10m in 2 yrs.
Acquire, retain. Loyalty, monetize, analytics are key to mobile marketing.
First 3 months is your best chance. Use the following 5 methods for it.

  1. App Store optimization – keywords in title, description. New google play policy on April 1. App icon. Category – secondary. Non-competitive categories like Education. More no of installs on iOS. Active users, uninstall, inbound links. Less than 1000 a day can rank. Ranking doesn’t change too fast on Play. If uninstall rate is 40%, it is taken -vely.
  2. Reviews and ratings – Ask review. Android allows review reply – use that. Windows app Store has these feature now as well.
  3. App Store submission – In addition to submitting to standard stores, like Apple AppStore, Google Play Store, list your application at 50 other stores like amazon, getjar as well.
  4. PR, Social Media – 75k IAS centric app
  5. Referral – Uber gave free money to use for rides

Engagement and retention

Active install. Gaming 10 times a month. Paytm 4 times a month. DTH recharge is done monthly. Don’t send irrelevant notification. Use it selectively.

  1. Audience segmentation – tool like Urban Airship
  2. Targeted offer (Commerce App)
  3. Virtual gratification – Quizup. Titles. Crown around photograph.
  4. T+X, T+Y strategy: 7th day user is not coming, send offer on 10th day
  5. Social plugins – Google plus. Over the air install. Tinder
  6. Cross promotions: don’t monetize from day 1


  1. Games and freemium instead of paid apps. Be clear about Biz Model
  2. Don’t create pricing barrier
  3. Advertising – Go native if possible
  4. In-App purchases – make it fun, 1% is minimum benchmark

Appslar, Flurry, google analytics, localytics,for mobile. How many organic, reference. Cohorts like how many came today segment. Transacting vs how often, how much money. ARPU. LTV. How many are opening on day 1. PayTM benchmark is 40%, but keep it at 10%.

  1. App usage
  2. ARPU
  3. Retention
  4. LTV
  5. User feedback

Configure certain standard stuff on day 1.  Schema.org (Semantic web for lesser mortals) should be enabled on day 1.
Demo’ed apps at developer event. Get app to blogger a month before is a good trick to gain some traction. The audience was engaged and resonated with

Mumbai MongoDB Meetup’s first session

Mumbai MGauravAtMumbaiMUG1ongoDB Meetup group started late last year and had it’s first meetup on Feb 8th, Saturday. The speaker, Anand George, a MongoDB and Node.JS professional gave an excellent introduction. A MEAN (MongoDB, Express.JS, AngularJS, Node.JS) user for past 2 years, he showed a presentation and then went on to show CRUD in front of audience in Mongo Shell. The audience had prior experience in relational databases, like postgres, nosql like neo4j as well big data technologies like Hadoop. There were folks from Ugam Solutions (analytics), IBM (SI and product), Wipro (SI), Open Solutions (now a part of Fiserv), Exa India etc etc. Later we were joined by Gaurav, VP, Engineering, ScaleArc (the sponsor of the event), who asked generic questions on nosql. We even touched upon git.






At the end, we had informal meeting and snacks. We discussed that we could meet every 2 months. We collected what are useful topics for different people.MUG1Snacks


MongoDB Afternoons in Delhi and Bangalore

MongoDB held its 1st set of events, 10 months after opening its offices in heart of Cybercity, Gurgaon.

An Afternoon in New Delhi

The event started with a welcome note by Rajnish Verma, Director Sales, MongoDB India.

I went next with a talk on Schema Design in Document NoSQL World discussing about Blog System.

Before tea break, Anil N from Techgene covered Pelica Migrator and Ashish Mittal, Daffodil Software showed ERP system, namely Applane. Latter went on to win MongoDB innovation award that evening!

Matias covered new features of MongoDB 2.6, released on April 8.

Nikhil Nayab, Cignex showed scaling using sharding using effective shard key selection emphasizing on benchmarking to collect empirical evidence rather than any other method.

Matias came back to demonstrate MongoDB Management Service (MMS).

Abhishek Tajpaul, from Intelligroup described his experiences during building of social media analytics.

Next up was Jabong’s usecase of MongoDB, before the innovation awards were announced.

An Afternoon in Bangalore

Next stop was Bangalore, where other Matias, Abhishek, Anil N., Uday Kumar (different speaker from Cignex) and I repeated our talks (Well, audience was different ;-) ). Susheel Zaveri, [24/7] talked with a lot of love for MongoDB about storing user behavior logs in MongoDB and its integration with ElasticSearch for beautiful charts for insights!

Rediff News Publishing’s use of MongoDB was described by Subbu.

After this, Livingtree won the innovation award! We had a gala night afterwards with audience.

MongoDB introduction talk at Dr Dobbs Conference Bangalore, Pune in April 2014

Dr Dobbs Conference held its maiden conference in India in Bangalore on 11 and 12 April. Organized by UBM, it saw good presentation on Hadoop, MongoDB, source code analysis etc. Yours truly presented an introduction to document database MongoDB. After giving a brief introduction to different nosql database, I went on to describe various aspects of MongoDB. I took a simple blog application example. I designed an ERD (entity-relationship design) circa RDBMS. I described the schema design in MongoDB’s document data model covering articles, tags, categories, users, comments and web metrics. Then, I illustrated it with Python code (inspired liberally from my awesome MongoDB colleagues). Please find the presentation below.

Do let me know your comments, feedback on it by leaving a comment below or emailing me.

Cloudera, an Apache Hadoop implementor raises an insane 900m

Cloudera just completed a $900M funding round. This incorporates the $160M they announced a week or so ago.

This is obviously great for Cloudera. For just 18% of the company they got $740M from Intel (plus sold away a few other equity points to other investors). It also signifies something more: The complete shape of data infrastructure is changing.

MongoDB has had a Hadoop connector for a while now. It has been used by several customers, like FourSquare. To cement this relationship more, Mike Olson, Cloudera co-founder and Chief Strategy Officer will keynote at MongoDB World this June at New York City.

[Disclaimer: I work as consulting engineer with MongoDB India. Shameless plug: If you want 10% discount for MongoDB World, please contact me prasoon DOT kumar AT mongodb DOT com]

Techfest 2014 at IIT Bombay

I thoroughly enjoyed various lectures, exhibits at Techfest at IIT Bombay, Powai. Among them, I found the following 3 noteworthy.

Asa Dotzler on Mozilla’s mobile OS

Asa started by describing how Mozilla has commonness with Techfest, being both volunteer-driven and been running for 17 years and growing, which is hard for volunteer-run efforts! Mozilla is a not-for-profit organization promoting original values of internet and acts as balancer between commercial interests on one hand and other facets like civil, political activism, free information on the other.

Firefox1 Team

Team members of Firefox 1 version

Internet had its heart inter-operability from start by its fore-fathers like Tim Berners Lee in 1980’s. It led to development of online community, political activism, educational curriculum for all. In 2000’s, when millions of user began using proprietary browser like Internet Explorer over Netscape Navigator, it was leading to monopolization of consumer’s browser experience. While IE 6 was reasonably solid, it had a problem, that it was proprietary, so Mozilla’s small, but driven team (pictured above) started building Firefox. It became better than IE soon in 2003, but the challenge of marketing it came to them. They did not have the advantage of bundling it with OS bought by majority of users. They turned back to users and asked suggestions and volunteers for spreading Firefox. Spread Firefox was like open-source marketing of the product, basically first few power users would convince other slightly non-technical users and so on. Volunteers pooled in money, design and their heart to do a paper ad in none other than New York Times. Volunteers kept on surprising – they did a huge Firefox symbol in farm field in Oregon. Firefox kept getting bigger and bigger market share and acted as catalyst for extension, even for better browsers. Chrome, Safari came up and it was no longer a monopolistic control of consumer’s browser experience.
Now, same kind of control over consumer’s mobile experience is being by few big companies, so they decided to do a mobile OS now. Their idea is for a smartphone to be available for $80, because that is the price point, where next 2B users will come out now. They are not targeting high-end smartphones, because that market is well served by likes of Apple and Samsung. Vishwanathan Krishnamoorthy joined Asa on the stage for Q&A. They told how to become volunteer by going to Mozilla site, maybe joining Bugzilla or whatever suits your skill and interest. They said that if anyone of us wanted to make contribution, which would affect 1/2 Billion users, we should volunteer for them.

4G launch by Reliance Jio

RJIL’s President for Strategy and Products, Mathew Oommen introduced 4G services of Reliance Jio. I saw a demo of Jio TV with recording, video on demand and other features, which was at par with modern satellite TV services. There was also a healthcare demo, which I could not catch up. They did tease us with many more services:

  1. Education – remote education, virtual classroom, online assessment
  2. Entertainment – Shows and TV content, Pause and play content
  3. Healthcare – remote services

Social Robot BINA 48

Here’s BINA sitting at IIT Bombay, who is the world’s most advanced social robot to talk to.


BINA 48 Social Robot








I am embedding Youtube video of the same robot from a US channel, as I could not get a recording of the same.

All in all, it was a great fest on technology, science and innovation, a must-see for college students and even inquisitive adults. Last year, I had attended a lecture by Facebook Product Manager on Newsfeed and learnt immensely from it.


Get every new post delivered to your Inbox.

Join 1,645 other followers

%d bloggers like this: