Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Most often our use cases define which specific set of data we need and then we will write different kind of queries for the same. Some examples of Use Cases and queries needed to get data for these Use cases are as follow

 1.Operational Queries (Random Access): In an E-Commerce application we want to display all the orders placed by a particular user (see Figure 1).

...

In this case there may be million of rows but we are mainly focused on data for our current logged in user, so we are interested in a specific set of rows that meet the criteria (user_id == ) . These kinds of queries are Operational Queries which involved certain rows based on a particular filter. In big data world HBASE is one of the technologies that is being used for these kind of queries.

 2. Batch Queries (Sequential Access): In the same E-Commerce Application we want to generate a report which contains the number of orders per user, the amount of money spent by that user, number of returns, number of coupons used and monthly orders (see Figure 2). For this use case we have to access all the rows and then aggregate or reduce the data to generate a report.

Most of the Business Intelligence (BI) tools execute these queries. Earlier we used to use the Data Warehouse technology which was specifically developed for these needs in relational world. So in Batch Queries we are focused on the data from all or majority of rows, the number of columns involved might only be few. Apache Hive is quite popular for these kinds of queries.

 3. Interactive Analysis Queries (OLAP Queries) : Taking an example of same application we have a need to provide analysis regarding Users, Items, Sales, Warehouse Inventory for the Items, Sizes Available, Vendor Information and so on (see Figure 3). In this use case we need to query multiple tables, filter multiple columns as well as have aggregate functions or nested queries. Such queries are often generated by Interactive Analysis tools like Tableau, PentaHo. Even for graph databases which provide excellent fast retrieval of OLTP queries we have to use a Big Data tool for OLAP. There are a lot of tools available in the market for these kinds of queries like Apache Impala and Apache Kylin.

...