CloudStack have not had a strong tradition of enforcing a exception and logging behaviour. However, do as we say and not as we do. Just because we weren't good at it doesn't mean you shouldn't. And we are working very hard to be good at it.
CloudStack uses log4j. Yes, we could have use a number of logging facades out there. Yes, log4j is somewhat of an oldie but it is a goodie. Besides, what's really important is not the tool but the content (a recurring theme you'll find in CloudStack). CloudStack should be deployed with logging at INFO level or above and all logs should be stamped at GMT. However, CloudStack DOES NOT require restart to change logging levels. The following is a list of our logging levels and their suggested usage.
Level | Use When |
---|---|
FATAL | This ship's sunk. Or the JVM has to die due to this. |
ERROR | The system has hit a problem that it can not recover from. This error does not affect the general health of CloudStack but does error out for a particular request to CloudStack. |
WARNING | The system has hit an problem that it thinks it can recover from but the admin should be aware so they can take a look at it. |
INFO | The admin is interested in knowing this information (like the pilot announcing "Grand Canyon is to your right" on the flight) |
DEBUG | Information that may be helpful to the admin in debugging a problem. The deciding factor here often is if an admin can reliably reproduce FATAL, ERROR, and WARNING condition, turning on DEBUG in logging should provide sufficient information about how they got to the error. |
TRACE | Repetitive and annoying logs that really shouldn't be needed in normal debugging but may be useful as a last resort. Generally, the deciding factor on whether TRACE level is used is how fast this log can fill up the disk space if it is turned on. |
There are plenty of wisdom out on the internet regarding exceptions and handling. Here is some general anti-patterns and, on the bottom of that page, there are resources to other guidelines. There are a few that I like to single out as important.
Code Block |
---|
try { code...; } catch (Exception specific to your code) { Specific exception handling and logging... } catch (Exception e) { s_logger.warn("Caught unexpected exception", e); exception handling code... } |
Code Block |
---|
try { code...; } catch (XenAPIException e) { // Do either this: s_logger.warn("Caught a xen api exception", e); // or throw new CloudRuntimeException("Caught a xen api exception", e); // Don't ever do JUST this. throw new CloudRuntimeException("Got a xen api exception"); } |
Code Block |
---|
public void irresponsibleMethod() throws Exception; public void responsibleMethod() throws XenAPIException; public void runtimeExceptMethod(); // throws CloudRuntimeException that's not suppose to be logged until entry point. public void innocentCaller() { try { irresponsibleMethod(); responsibleMethod(); runtimeExceptionMethod(); } catch(Exception e) { s_logger.warn("Unable to execute", e); throw new CloudRuntimeException("Unable to execute", e); // What's wrong here? // 1. If the error was thrown from responsibleMethod, the caller now forgot to do special handling for XenAPIException. // 2. If the error was thrown from runtimeExceptionMethod, the caller now log it once here, and will log again at entry point. } } |
Code Block |
---|
try { some code; } catch(XenAPIException e) { // catch generic error here. s_logger.debug("There's an exception. Rolling back code: " + e.getMessage()); ...rollback some code; throw e; // note there's no "new" here. } |
Code Block |
---|
for (Task task : taskList) { try { process task; } catch (Exception e) { ...handle exception and continue } } |
CloudStack do have a list of well known exceptions and there are some exceptions are important to describe here.
Exception | Thrown By | Purpose | Usage |
---|---|---|---|
CloudRuntimeException | everyone | An error has been hit that cannot be handled. | When using this exception, it is best to pack as much debugging information into the message as possible |
ResourceUnvailableException | components that deal with resource allocation. | To serve as a parent class for when a physical resource is unusable when CloudStack wants to use it. | This exception must be thrown with the scope set in the exception. The scope tells the caller above whether this exception affects a host, storage pool, cluster, pod, or zone. The caller can then decide if it can retry. |
InsufficientCapacityException | components that deal with resource allocation. | To serve as a parent class for when a physical resource is out of capacity when CloudStack wants to use it. | This exception must be thrown with the scope set in the exception. The scope tells the caller above whether this exception affects a host, storage pool, cluster, pod, or zone. The caller can then decide if it can retry. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
There is also a good reference to CloudStack exceptions and error codes here.