1.Should Be Exception Safe
We are moving to MODERN C++ now, and Exception Safe Code will reduce the usage of LOG(FATAL) and BE will be more stable.
1.1 Use Smart Pointer
For example, in this code, there is a raw pointer block_ptr(at line 442), and this block_ptr will pass to send_block method at line 450. It is ok in normal cases. But there is a RETURN_IF_ERROR at line 446, so it is memory leak.
If we do not use smart pointer, we could not sure that there are some unexpected return code or throw unexpected exceptions in the future. And could not prevent memory leak.
Another case, we write many code like this.
Create a new block and put it to a vector or reset it to a unique ptr or shared ptr, and the code will call delete during free_blocks is cleared. This is unsafe, lets look at Block's constructor:
At line 63, it create many columns, if it create 10 column and 11th failed. Then exception is thrown. It will not be add to vector and not released. So that it will be memory leak.
Another case, at line 475, we push some raw pointers to a vector, but at line 474 the code may return when meet errors, so that the vector at line 471 will not release the already created ColumnPtrWrappers.
So that not save a raw pointer in vector or other containers.
1.2 Using factory creators instead of new
In order to force the smart ptr usage, the developer could add macro ENABLE_FACTORY_CREATOR to class
If you add this macro to the class, then we could make sure the smart ptr is used for our class.
Block* block = new Block(xxxx); // compile error during build
Block block(xxx); // it is ok
std::shared_ptr<Block> block = Block::create_shared(xxxx); // it is ok
std::unique_ptr<Block> block = Block::create_unique(xxxx); // it is ok
1.3 Lock and Unlock
There are many lock and unlock code. Could not make sure that the lock is correctly released. Should use unique lock or lock guard. For example, in following example, the lock is conditionally locked by checking acquire_lock
sign. But if somebody add some code at line 76 or call return before line 86 the lock will not be correctly unlocked.
Could write code like this. Using defer lock to add lock to mutex and will be released during recvr_lock deconstructor.
2. Not Use Dynamic Arrays for Buffer
At line 62, the code create a dynamic array uint8_t meta_buff[metadata_size] as a read buffer. Dynamic array is allocated on stack and we could not make sure the size of the buffer. Sometimes it is more than 10MB. There will stack overflow.
You could use a vector or new uint8_t[metadata_size] to allocate the buffer on heap.
3. Clear EOF or EOS definitions
Currently, there are many forms that return blocks, like next_block, get_block, next_batch. And there are many ways to check if it is the last block or end of file or end of stream:
- Depend on Status. For example, there is a END_OF_FILE status, and our storage layer code depend on this to check if it is end of file.
- Depend on Block row num == 0;
- Depend on nullptr, like get_block(block** res) and check if *res == nullptr;
- Depend on an obvious eos sign, like next_block(block, bool eos)
We should unify these ways and it will make code more clear and simple. we should use this way: next_block(block, bool eos)
But currently, if eos == true, there maybe data in block. You should check if block.rows == 0.
4. Do Not Do Heavy Work in BRPC thread
bRPC is working like Coroutine, if do some heavy work or use pthread lock, it will block the network IO. We also discourage the usage of bhtread::mutex in our code because it is maybe a pthread that using bthread::mutex. It will
also blocking the underline IO thread in bRPC. So that we should use the following way. PUT the request into a pthread thread pool and do the work in pthread like line 225.
5. Do Not Do Heavy Work in Deconstructor and Not Do Deconstructor Work in Close method
For example, if we do some close work in deconstructor, the dependend parent raw ptr may be released. it will core.
6. Never use std::thread to create thread too frequently
If create or destroy too frequently, jemalloc may not behavior very well. Sometimes crash. like this:
Should use thread pool or std future or promise to do this.
7. Never add enum definition in the middle of the existing thrift definition.
During upgrade, doris's policy is to upgrade BE and then upgrade FE to do rolling upgrade. If we add a enum definition in the middle of the Enum Type, then the id order
will be different between FE and BE. I will cause exception or core.
8. Print status not error code
If error status is returned when call some method failed. Do not only print error code or error message, you could just print status object. because status will print callstack, we could know
which line failed. For example, in compaction, we print the result status, like this code:
then we could get error stack, like this:
9. Do not use admin set config in regression test
Do not use admin set frontend config any more because it will modify FE's config. In github's workflow, the test is running in a single thread.
So that it is not a problem. But in daily test, the regression test is running in multiple threads. If we modify fe's config, it will affect other regression
test. The test case will fail.
10 Add log(fatal) before builtin unreachable()
builtin unreachable will trigger sigtrap signal, it is not very clear when in debug mode, should add log fatal before it and its core stack will be more clear.