Project: Apache Gora

Issue: GORA-527

Student: Xavier Sumba

GitHub: cuent

Email: c.uent@hotmail.com

Documentation: Redis backend documentation

Reports: GORA-527: Implement a data store for REDIS reports

Code: GORA-527

Pull request


About

I am Xavier Sumba a graduate student at Concordia University and McGill University. At the time, I am working in the intersection of Bayesian Methods and Machine Learning. My research focuses on improving inference for probabilistic graphical models. Specifically, expectation propagation and variational inference. In that sense, I have learned the importance of the proper use of databases for specific problems. For this, I am attracted to Apache Gora because of the facility it has for the implementation of datastores.


In addition, I feel passionate about open source communities and have contributed since I was in my undergrad. Every time, I learn something valuable and this is the way I find the best to give something in return and say thanks for that knowledge I have acquired. I have used some databases relational, and NoSQL. I already have some experience with NoSQL databases such as Redis.


Finally, I find GSoC the best way to establish connections and make lifelong friends that feel as passionate as I feel. I have participated before in GSoC. It was one of the best experiences that I could have had. I have full joy during the program. In fact, I never left the project I participated and I still stick around and try to solve some issues. After a while, I became an Apache committer and have mentored in other editions of GSoC. So, I will really appreciate if I can stay and learn more about Apache Gora and get to know that great community.


Project Summary

Redis [1] is an open source (BSD licensed), in-memory data structure store, used as a database, cache and message broker. So, It is better if Apache Gora can use Redis DB as well. Then this proposal is mainly dedicated to providing Redis compatibility for Gora. For such, there is a need to implement a new datastore, and Apache Gora gives a great facility for this.  


Project

In this section, I discuss the goal of the project and how I plan to carry it out. Additionally, I will describe the commitment I plan to follow accompanied by a potential schedule for deliverables and tasks.

First of all, Apache Gora is an open source framework provides an in-memory data model and persistence for big data. Gora supports persisting to column stores, key-value stores, document stores, distributed in-memory key/value stores, in-memory data grids, in-memory caches, distributed multi-model stores, and hybrid in-memory architectures. [2] Additionally, Gora provides a facility to analyze data since it gives support to Apache Hadoop and Apache Spark.

On the other hand, Redis is an open source (BSD licensed), in-memory data structure store, used as a database, cache and message broker. It supports data structures such as strings, hashes, lists, sets, sorted sets with range queries, bitmaps, hyperloglogs, geospatial indexes with radius queries and streams. Redis has built-in replication, Lua scripting, LRU eviction, transactions and different levels of on-disk persistence, and provides high availability via Redis Sentinel and automatic partitioning with Redis Cluster.

In this project, I am going to implement a new backend, also known as a datastore, within Apache Gora. This new backend will provide Apache Gora support for Redis DB. Redis backend will use one of the clients that Redis provides [3] to connect with Apache Gora.

Approach


There are three main steps that a database should have in order to be considered as a datastore in Apache Gora. We need to analyze and select the best connector to make an insert, delete, and update operations. Redis DB supports a variety of connectors listed as follows:

  • aredis
  • JDBC-Redis
  • Jedipus
  • JRedis
  • lettuce
  • redis-protocol
  • RedisClient
  • Redisson
  • RJC
  • vertx-redis-client


We have selected Jedis [5] for its flexibility and ease of use for communication with the Redis server. However, this can be changed during the development of the project. Jedis seems to be the most performant connector that aligns the structure of the Apache Gora project. In that way, there is the need to make insert/delete/update operations and query information from Redis DB through Apache Gora using the Jedis client.

However, there are multiple alternatives to Jedis, as shown in the list above, but only two more are currently worthy of their recommendation: lettuce, and Redisson as Redis documentation states. These two clients do have some unique features like thread safety, transparent reconnection handling and an asynchronous API, all features of which Jedis lacks. However, it is small and considerably faster than the other two. Besides, it is the client library of choice of the Spring Framework developers, and it has the biggest community of all three. So, Jedis seems to be the best option for Apache Gora datastore.

The new datastore, Redis DB, will be implemented on top of DataStoreBase class. This class supports the basic operations and structure for a datastore and all the necessary operations for the creation, update, and deletion of a table will be done through Jedis library. Finally, Jedis library also supports querying information from Redis DB. Since Jedis gives a base abstraction to access data and using those results we can build objects or more complex beans. And this communication will be done using the base class QueryBase, which provides all the operations for querying in Apache Gora.

Since Jedis gives a base abstraction to access data types, building objects or more complex beans can be done with the same library.

Design of communication between Apache Gora and Redis DB is provided in the following image. Apache Gora communicates with Redis DB through the client library, Jedis, and makes common CRUD operations. Once information is retrieved, the datastore maps this information to object type. The data type return by Redis DB can be found in [6]. For the mappings, there will be a new OTD (object-to-store) file, following the current design of Apache Gora. Just to reiterate, the mappings will be done after analyzing the Redis datatypes [6].


Deliverables


Redis compatibility for GORA, where the deliverable will comprise

  1. Redis Module (new datastore)
    1. Store
    2. Query
    3. Filter
  2. Examples
  3. Unit and integration tests for the Redis datastore
  4. Documentation and description of the datastore
  5. Weekly reports of progress and sharing results through blog posts.


Note: The reports can be done through email in the mailing list or video conference with an assigned mentor. Additionally, the aim of blog posts is to engage more people with Apache Gora, so they can follow tutorials of the advantages of Apache Gora and learn about it.



Period

Activities

Progress

Community Bonding - May 6 - May 27

- Get a broad understanding of the implementation of datastores in Apache Gora.

- Interact with mentor and Apache Gora community

- Work on small issues while reading the code base.

- Create a template module for new datastore.

Done!

Coding period - May 27 - June 23

- Install Redis DB for development and testing (e.g. embed a Redis DB instance for unit testing.)

- Learn Jedis library for the connection with Apache Gora

- Implement schema for operations and analyze datatypes supported in Redis DB

- Develop unit tests

Done!

Evaluation - June 24 - June 28, 2019

- Improve code or deliverables based on suggestions from the mentor.

Done!

Coding period - June 29 - July 21

- Implement the serialization and deserialization of objects.

- Implement base operations such as insert, delete, update.

- Generate unit tests.

Done!

Evaluation - July 22 - July 26

- Improve code or deliverables based on suggestions from the mentor.

Done!

Coding period - July 27 - August 18

- Implement support for queries.

- Create documentation and tutorials of the backend.

  1. generate documentation for final users and an explanation of how to use the new datastore
  2. documentation of the development and problems encountered during the project

Done!

Final evaluation - August 19  - August 26

- Improve code or deliverables based on suggestions from the mentor.

Done!

Commitment

I plan to dedicate most of my time to this project. I have contributed to several open source projects and honestly, I think this is the best way not only to learn but also to interact with other people akin and passionate to technology. I plan to invest my time wisely during the summer to complete this project successfully. I really like the project and I want to stick around after GSoC, so hopefully, one of these days I will become an Apache Committer for Apache Gora. So, I plan to work around 30-40 hours per week in this project in order to complete it successfully.

Finally, I am really excited to work in this project and in order to know the source code of Apache Gora, I have worked in an issue, which at the time is being discussed in GitHub before it is integrated. I have also read about the project and learn how to implement a new datastore as in [4] and other tutorials provided for Apache Gora website. Additionally, I have analyzed potential connectors from Redis DB to Apache Gora.

Open Source Contributions

  • Apache Marmotta: Apache Committer, fixing issues or support in the mailing list
  • Free Code Camp: Translations and fix small bugs
  • NgSolr: Upgrade to a recent version of AngularJS
  • Microsoft Academic: Add documentation for the graph API
  • TensorFlow Datasets: Add dataset

Links

[1] https://redis.io/

[2] http://gora.apache.org/

[3] https://redis.io/clients#java

[4] https://cwiki.apache.org/confluence/display/GORA/Writing+a+new+DataStore+for+Gora+HOW_TO

[5] https://code.google.com/archive/p/jredis/

[6] https://redis.io/topics/data-types

[7] https://github.com/apache/gora/blob/master/gora-solr/src/test/conf/gora-solr-mapping.xml

  • No labels