Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • The owner of an object (either a row, a column, a sub field of a column, or the return value of a UDF) is the code that creates it, and the life time of an object expires when the corresponding object for the next row is created. That means several things:
    • We should not directly cache any object. In both group-by and join, we copy the object and then put it into a hashmap.
    • SerDe, UDF, etc can reuse the same object for the same column in different rows. That means we can get rid of most of the object creations in the data pipeline, which is a huge performance boost.

...