Work in progress

This site is in the process of being reviewed and updated.

Introduction

Attribute's values may be big. JpegPhoto is the perfect exemple of such a big value, which could be megabytes big. The current handling of such attributes is not satisfactory : it is simply loaded fully in memory, before being streamed down to the disk. In case you do a search for some other attributes into entries, and if such an attribute exists in the found entries, they will be loaded in memory for nothing.

We should find another solution to correctly handle such attributes

Proposal

The idea would be to stream attribute's values bigger than a defined size (1kb ? 4kb?), and read them only if needed. The data will be stored into a specific file, like blogs into a database.

This will imply the modification of the Attribute internbal structure. An Attribute stores String or byte[], so adding a third kind of data is not really complicated : we simply replace this Object type by a hierarchy :

  • two interfaces to describe value handling: AttributeValue and StreamedValue extending AttributeValue
  • two interfaces to describe value type : StringAttributeValue and BytesAttributeValue
  • four sub-classes : StringValue implementing StringAttributeValue, BytesValue implementing BytesAttributeValue, StringStreamedValue implementing StringAttributeValue and BytesStreamedValue implementing BytesAttributeValue

    Schema to be drawn (I don't have a Mac at home (sad)

The StreamedValue will contain all the information to get the real value. It is important to know that StreamedValue may perfectly contain bytes or a String, thus

Impact

in order to minimize the memory consumption we will have to handle those streamed values differently. They are many places where we will have to modify the code.

Decoding a streamed value

When a user add a big piece of data, we receive it as byte array. We don't know if it's a String or a binary data, until we analyze the associated attributeType. The current process is to stor it in memory, until all the PDU is decoded. If we gat a 1mb jpegPhoto, we will store it in memory? This must be changed. Suppose we set a limit to 1kb before switching to a streamed object, then as soon as we reach this limit, we will have to push the data to the disk. The main problem is that if its attributeType describe a textual type, which has a syntax, we must check the syntax. This is done through syntax checkers, which are not written to handle streamed values.

Checking a String streamed value

This is the second place where we will have to modify the code to correctly handle streamed values. In this case, we should analyze the String by chunks. The main problem is that if the data is stored as a byte[], then the byte[] blocks of data won't be the same size than the char[] blocks, because a char can be represented by more than one byte. The second problem is that we will have to check the String piece by piece, not as a whole.

For instance, the ACIItemSyntaxChecker handle the values using this code (sligthly modified, to avoid the Object to String transformation) :

public boolean isValidSyntax( String value ) {

        if ( value.length() == 0 )
        {
            return false;
        }

        try
        {
            synchronized( checker )
            {
                checker.parse( value );
            }

            return true;
        }
        catch ( ParseException pe )
        {
            return false;
        }
    }

Here, the checker is an instance of the ACIItemChecker class, which embed a Antlr parser for ACIs. The good point is that the lexer is using a Reader, so we can transform the code to absorb a char[] instead of a String, and build a char[] out of the bytes, char by char.

Anyway, we have more than 60 syntax checkers, some of them dealing with binary data (like the JpegSyntaxChecker), and we will need to modify the class hierarchy to create two new classes :

  • BinarySyntaxChecker for syntax checkers handling byte[]
  • StringSyntaxChecker for syntax checkers handling String

Both classes will provide a reader, which will deliver bytes or chars, depending on the inherience scheme.

Each SyntaxChecker sub-class will extend the isValid( Reader data ) method, and should be able to deal with the Reader passed as an argument

Interceptor internal usage

Now that the data have been checked, we may have to process them through the interceptors chain. Some of those data will be used internally (like the ACIItems), so the mechanism by which they are used should be able to cope with their individual storage nature (streamed or in memory).

Storage

At the end, we will have to store those streamed data. The backend will simply stores a pointer (a long) to these values, this identifier will be a hash key to the real data. A specific file will store all those data within these indentifiers. A jdbm htree can be used for this purpose, but we should define an API to manipulate those data, to allow us changing the physical storage implementation (coupling is not an option).

Recovering the data

This is the opposite operation. We won't read the real data before needed, we just restore the identifier. Nothing special but using the defined API defined in the previous point.

Comparing values

In a search operation, we may have to compare streamed values to a filter. This will obviously drive us to ammend the comparators to make them handling streamed data. The modification will be done on the same bases than with the SyntaxCheckers.

Sending data back to the user

This is another complex part. Data should be written chunk by chunk, but we must first compute the encoded PDU. Even if the encoding operation by itself is not really a complex operation, it does it using in-memory elements. We should transform this process to handlestreamed data. The second modifciation is to be sure that those data are sent on the fly, not after the full encoded PDU has been stored in memory. We must modify the way data are transfered to MINA. The current encoder implement a encode method which returns a ByteBuffer as a result. This ByteBuffer contains the whole bytes to be sent. We must return a ByteReader which will returns blocs of ByteBuffers (4kb blocks for instance). The ProtocolEndoder implementation we use is able to deal with such blocs of data, if we send them as an array of blocks, but all the blocks must have been loaded in memory first. We must slightly modify the ProtocolEncoder to be able to use a callback to ask for the next block of data.

Implementation 

Storage 

We have to store all the data into the filesystem, in a way which is performant, simple and reliable. First, let's consider that we store data as a chained list of blocks : if we have a 25kb value to store, we will store it as a chain of 25 blocks if each block is 1kb big. Second, we want to store them somwhere where we can easily recover them, entirely. Third, the ordering *must* be kept.

FileSystem 

We can simply use the underlying fileSystem, storing each value as a single file. This is ok as soon as the number of files is not enormous. If we are to store millions of files, this could become a problem (well, this is an assumption which is not really sustained by anything else than a feeling, but reallity can tells that the Linux FS might be able to cope with such a load much better than any other solution. For W$ FS, I have no idea)

Let's try to design a system based on the FileSystem to store files.

pros :

  • Easy to write : a simply write files to the disk
  • Fast : the FileSystem ultimatly handles all the write operations, and is really close to the disk
  • Reliable : at least if the file system itself is reliable
  • Ubiquitous : the file system can be replicated through existing mechanisms like rsync
  • Not limited : using a SAN for huge database is an option

cons :

  • Low level : we have first to access the directory, and then to create a new file before writing the data
  • Messy : we will have potentially thousands or more files on the disk

The main problem is that we will have to assure that duplicated file can't be created. What we can do is to replicate the directory. For instance, if we want to save a jpegPhoto for the entry :

cn=jsmith,ou=example,ou=com

we will store it into the file :

<ldap repository>/data/b3U9Y29t/b3U9ZXhhbXBsZQ==/Y249anNtaXRo/anBlZ3Bob3Rv.1

where the RDN are base64 encoded to avoid problem with special characters :

Real value

base 64 encoded

ou=com

b3U9Y29t

ou=example

b3U9ZXhhbXBsZQ==

cn=jsmith

Y249anNtaXRo

jpegphoto

anBlZ3Bob3Rv

(We have a suffix at the end of the file containing the value, because we may have more than one value. Here, we only have one, so we add a '.1' at the end)

The slight little problem with this approach is that when we are decoding the value from the ldap PDU, we have no way to know which is the associated DN... To get it, as we have a guarantee that this information has already been decoded before the value itself, we must use an information into the associated TLV : the TLV id. This id is guarantee to be unique, and will be used to temporarilly store the data in a file. When the value will be decoded, we will create the real file by moving this temporary file to its final destination, and rename it.

Conclusion

This is not really a matter of hours, neither days, but it can be done in a few weeks. This modification is absolutly mandatory if we want to deal with real life usage of a LdapServer. If not, we can make the server die quickly with a OutOfMemoryException...

  • No labels