Skip to end of metadata
Go to start of metadata

Netflix Curator CHANGES.txt

1.3.3 - March 6, 2013
=====================
* Issue 250: Restore support for passing null to usingWatcher().

* Issue 251: Allow a custom Executor Service to be used for PathChildrenCache.

* DistributedDoubleBarrier wasn't handling wait expiration correctly and was sending negative
numbers to wait().

* Issue 254: Check that executorService isn't null before closing.

* Pull 258: Fix bad performing use of Guava's transform.

1.3.2 - February 6, 2013
========================
* MAJOR BUG FIX - Issue 232: ZooKeeper guarantees that "A watch object, or function/context pair, will only
be triggered once for a given notification." Curator was breaking this guarantee by internally creating a
new Watcher object each time one was needed. This is now fixed and ZooKeeper's guarantee is restored. Big
thanks to user barkbay for his persistence and help on this.

* Issue 247: POST_INITIALIZED_EVENT wasn't correctly handling an initially empty node.

* Issue 245: Auth info specified in the CuratorFrameworkFactory.Builder was not being re-set in cases
where the internal ZooKeeper handle was recreated. i.e. if the cluster has issues auth info would be lost.

* The default watcher in the ZooKeeper handle is now cleared before the ZooKeeper handle is closed. This avoids
an edge case where events meant for the old ZooKeeper handle get processed.

1.3.1 - January 28, 2013
========================
* Tightened up a possible race deep inside the connection management.

* PathChildrenCache.rebuild() and PathChildrenCache.rebuildNode() were not handling deleted nodes.

* Issue 237: New feature. PathChildrenCache now optionally posts an event when the initial cache is
populated. To accommodate this behavior there is a new version of start() that takes an enum. See the
Javadoc for each value. For this new behavior, use StartMode.POST_INITIALIZED_EVENT. Once the cache
is initialized a PathChildrenCacheEvent.Type.INITIALIZED will be posted. Huge thanks to user philflesh
for the idea and co-implementation.

1.3.0 - January 10, 2013
========================
* MAJOR CHANGE (thus a version bump): I'd always thought that if the client is disconnected from the server
long enough then an Expired event would be generated. Testing, however, shows this not to be the case. I believe
it's related to ZOOKEEPER-1159. The behavior associated with this is that if the clients lost connection to the
cluster for longer than the session expiration they would _never_ be able to reconnect. The connection would
be permanently lost. Many users were seeing this as endless log messages indicating "Connection timed out
for connection...". As a workaround, in 1.3.0+ when the Curator state changes to LOST, a flag will be set
so that the next time Curator needs to get the ZooKeeper instance, the current instance will be closed and a new
ZooKeeper instance will be allocated (as if the session had expired).

* Added checks for illegal namespaces.

* Issue 232: NodeCache wasn't handling server connection issues well. It would repeatedly execute checkExists()
with a watcher causing the heap to fill with watcher objects.

* Issue 233: An internal idiom being used to create an EnsurePath instance with the parent of a passed in path
wasn't correct. Due to an unfortunate implementation of ZKPaths.PathAndNode (mea culpa) the root path is specified
differently than non-root paths. To work around this, I added a method to EnsurePath - excludingLast() - that
can be used instead of the idiom.

* Issue 230: Added a filter to control which IP address are returned by ServiceInstanceBuilder.getAllLocalIPs().
Set the filter via ServiceInstanceBuilder.setLocalIpFilter().

1.2.6 - January 1, 2013
=======================
* Issue 214: Added rebuildNode method to PathChildrenCache.

* Added a NodeCache to compliment the PathChildrenCache. The doc is here:
https://github.com/Netflix/curator/wiki/Node-Cache

* Creating nodes in background wasn't handling createParentsIfNeeded.

* Issue 216: Rewrote LeaderLatch to better handle connection/server instability. At the same time, made
most of the calls async which will help concurrency and performance.

* Issue 217: DistributedAtomicLong (et al) should use ensurePath internally to be consistent with
other recipes.

* Issue 220: When creating a ServiceCacheImpl, a PathChildrenCache is created. The cache loads all existing services,
but because preloading does not create events, ServiceCacheImpl never notices this. ServiceCacheImpl.getInstances()
will return an empty list.

* Issue 221: client.getACL().forPath("/") throws a NullPointerException, because the Zookeeper
API expects a Stat, but GetACLBuilderImpl initializes responseStat to null.

* Issue 222: Counter and log messages reversed in RetryLoop.takeException().

* New feature: CuratorTempFramework. Temporary CuratorFramework instances are meant for single requests to
ZooKeeper ensembles over a failure prone network such as a WAN. The APIs available from CuratorTempFramework
are limited. Further, the connection will be closed after a period of inactivity. Based on an idea mentioned in a
post by Camille Fournier: http://whilefalse.blogspot.com/2012/12/building-global-highly-available.html - details
here: https://github.com/Netflix/curator/wiki/Temporary-Framework

* Issue 224: ExponentialBackoffRetry was not protected against edge-cases where a too big maxRetries argument
was used. It now also incorporates a maxSleep value.

1.2.5 - November 27, 2012
=========================
* Depend on ZooKeeper 3.4.5

* Issue 177: PathChildrenCache wasn't shutting down the executor when closed. Also, reworked the event
queue to avoid potential herding of messages in unstable conditions. The herding could result in runaway
memory allocation as reported in the issue. NOTE: due to this change, the PathChildrenCache node
refresh code and the PathChildrenCacheListener notification threads have been merged. Do not block
for very long inside of your PathChildrenCacheListener or you will prevent the cache from getting
updated.

* Issue 200: Post-creation services registered in ServiceDiscovery via registerService() were
not being treated the same as the service passed in the constructor. Consequently they wouldn't get
re-registered if there were connection problems.

* Creating nodes withProtection() is now supported in the background. e.g.
client.create().withProtection().inBackground()...

* Added methods to InterProcessSemaphoreV2: setNodeData() and getParticipantNodes() and, to the Lease
interface, getData().

* Issue 205 - already started error message was misleading.

* Pull 209 - Fixed inconsistent API for get() in DiscoveryResource.java - thanks to user dougnukem

* Issue 211 - Added getState() method to CuratorFramework.

* Issue 212 - There wasn't a good way to update the data for a Service. I've added a new method
ServiceDiscovery: updateService(). NOTE: this method requires all ServiceDiscovery instances to be using
version 1.2.5 of Curator. Internally, ServiceCache now uses PathChildrenCache.

* Pull 210 - For convenience, a version of {@link DiscoveryContext} that uses any generic type as the
payload. Thanks to user dougnukem.

1.2.4 - November 2, 2012
========================
* Depend on ZooKeeper 3.4.4

* Added a new Examples sub project - better late than never.

* Guaranteed deletes were not working correctly if CuratorFramework.usingNamespace() was used.

* I can't believe this has been like this for so long. The executor passed to listeners was never used.
Doh!!! Major bug.

* Issue 188: Display a meaningful message if the value node is corrupted

* Issue 194: Initial sync() operation should occur immediately - like the change in 1.2.3 for all "background"
operations.

* Added support for ZK 3.4's read only operation as described here:
http://wiki.apache.org/hadoop/ZooKeeper/GSoCReadOnlyMode - CuratorFrameworkFactory.Builder has a new
method to set canBeReadOnly(). There is a new ConnectionState: READ_ONLY. Note: Your servers need to
see a system property set "readonlymode.enabled" as true. This isn't documented anywhere that I can see.

* Pull Request: 196 - Fix some issues with NamespaceFacade stemming from inconsistent state. Thanks to
Answashe.

* Issue 197 - Possible NullPointerException from ConnectionStateManager line 133 that is caused by a race
condition. In CuratorFrameworkImpl, connectionStateManager.start() is called after client.start().

1.2.3 - October 6, 2012
=======================
* Previously, all background operations (i.e. when the inBackground() method is used)
were put into a queue and processed in an internal thread. Curator does this to handle retries in
background operations. This can be improved, however. The first time the background operation is
executed the ZooKeeper method can be called directly - without queuing. This will get the operation
into ZooKeeper immediately and will help prevent Curator's internal queue from backing up.

* Issue 173: The DistributedQueue (and, thus, all the other queue recipes) was unnecessarily
calling getChildren (with a watch) after each group of children was processed. It can just as easily
wait for the internal cache to get its watch notified. This change creates an edge case, though,
for ErrorMode.REQUEUE. Consequently, when in mode ErrorMode.REQUEUE the DistributedQueue now
deletes the bad message and re-creates it. This required the use of ZooKeeper 3.4.x's transactions.
So, if you use ErrorMode.REQUEUE you MUST be running ZooKeeper 3.4+.

1.2.2 - September 30, 2012
==========================
* NOTE: The 1.0.x branch is not being released and has been deprecated. It was advised many versions
ago that this was coming. So, here it is.

* For ZKClient Bridge: 1. Previous method of sending initial connect event to ZKClient was
unreliable; 2. Added an option to not close the curator instance

* The default connection timeout has increased to 15 seconds. The default session timeout has
increased to 60 seconds. These both can now be overridden via system properties:
"curator-default-connection-timeout" and "curator-default-session-timeout".

* Thanks to Ben Bangert: the InterProcessSemaphore waiting semantics weren't ideal. The nth
waiting node has to wait for all nodes in front of it. I've improved this a bit. However, the algorithm
used still suffers from potential out of order acquisition as well as potential starvation if a given client
does not release a lease. Therefore, I'm deprecating InterProcessSemaphore in favor of the new
InterProcessSemaphoreV2 which is based on Ben's algorithm.

* Issue 164: The PathChildrenCache no longer clears its internal state when there is a connection
issue. Consequently, the PathChildrenCacheEvent.Type values have changed. Instead of a RESET event
there are events that match the ConnectionState events.

1.1.18/1.0.20 - September 4, 2012
=================================
* New extension project: "ZKClient Bridge". A bridge between Curator and ZKClient. Useful for
projects that would like to use Curator but don't want to risk porting working code that uses
ZKClient.

1.1.17/1.0.18 - August 30, 2012
===============================
* Issue 132: If namespace is set in builder, getNamespace() does not return it

* Issue 131: If connection is lost to the server, the ServiceInstance needs to re-register once
there is a re-connection.

* PathChildrenCache was not sending UPDATE messages when a node's data changed in the case
that false was passed in the constructor for cacheData.

* Merge 136 from wt: Add eclipse support to gradle.

* Merge 137 from pbh101: ConnectionState declares IOException, never throws it

1.1.16/1.0.17 - August 2, 2012
==============================
* Merge 114 from amuraru: Make sure internal executor services are not started until startup.

* Merge 116 from samuelgmartinez: Fix for Issue 115: Wrong behaviour in LeaderLatch when a candidate
loses connection

* Issue 118: Ignore nonode exceptions when deleting lock path

* Added a non-reentrant mutex: InterProcessSemaphoreMutex. This mutex doesn't have the threading
restrictions that InterProcessMutex has. This should help with issues 75 and 117.

* Merge 122 from ithaka that addresses Issue #98 - JsonInstanceSerializer does not deserialize
custom payload types. IMPORTANT! This change introduces a breaking incompatibility with List payloads
that will show up in environments that mix the old code and the new code. The new code will throw a
JsonMappingException when it attempts to read a List payload written by the old code. The old code,
when it reads a List payload generated by the new code, will generate a list with two elements,
the first of which is the class name of the List type, and the second of which is the original list.

* Issue 121: Apply bytecode rewriting to turn off JMX registrations to TestingServer as well as
TestingCluster.

* Issue 125: Use ScheduledThreadPoolExecutor instead of blocking threads that have period work.

* Issue 126: Added getNamespace() method.

* Issue 120: Additional check for connection loss in DistributedDoubleBarrier.

1.1.15/1.0.16 - July 14, 2012
=============================
* Added ChildReaper. This builds on the newly added Reaper. This utility monitors a single node
and reaps empty children of that node.

* Issue 107: The namespace wrapper was corrupting the path if the EnsurePath handler had an error.
The best thing to do is let the code continue.

* Issue 109: Make duplicate close() calls in CuratorFrameworkImpl a NOP instead of an error.

* A more complete solution for background build-ups. The previous implementation did the retry sleep
in the background process which ends up blocking ZooKeeper. During connection problems, this would
cause ZooKeeper packets/watchers to back up. The new implementation uses a DelayQueue to simulate a
sleep in the background. NOTE: this caused a change to the RetryPolicy APIs.

1.1.14/1.0.15 - July 6, 2012
============================
* Merge #100 from bbeck: Added BoundedExponentialBackoffRetry.

* Merge #102 from artemip: Added REAP_UNTIL_GONE mode to Reaper; Remove items from activePaths once
they are deletes; Tests

* Issue 99: The Double Barrier should allow more than the max to enter the barrier. I don't see any
harm in this.

* Issue 103: Important change/fix for ExhibitorEnsembleProvider: the previous implementation wasn't
handling outages very well. The connectionString could get stuck to an old value if the list of
Exhibitors all went down and couldn't be contacted. Now, a backup provider is required and the backup
is used to update the list of Exhibitors should there be connection problems.

* IMPORTANT NOTE: The 1.0.x branch of Curator is now end of life. There will be a few more releases
but please migrate to the 1.1.x branch.

1.1.13/1.0.14 - June 25, 2012
=============================
* New queue features: a) bounded queues: use setMaxItems() in the builder to set an (approx) upper
bound on the queue size; b) the builder now has an option to turn off background puts; c) queues now
default to flushing remaining puts when closed - this can be controlled in the builder via
finalFlushTime().

* Issue 82: Generalized (and deprecated) nonNamespaceView() by adding the usingNamespace() method
to allow getting a facade of the client that uses a specified namespace.

* createParentsIfNeeded() should now perform better. Instead of "pre" checking, it now only does the
check if KeeperException.NoNodeException is thrown. LockInternals now uses this method and, so, should
perform a bit better.

* Added a new utility: Reaper. This can be used to clean up parent lock nodes so that they don't
stay around as garbage. See the Utilities wiki for details: https://github.com/Netflix/curator/wiki/Utilities

* Unit tests should be a lot less noisy. A system property now turns off most internal error logging.

* Issue 88: Children processor should wait for all nodes to be processed before fetching more items

1.1.12/1.0.13 - June 5, 2012
============================
* Pull Request 81: Avoid invalid ipv6 localhost addresses

* Another big bug: guaranteed deletions were not working with namespaces.

1.1.11/1.0.12 - June 1, 2012
============================
* MAJOR BUG FIX!!!! Many of the Curator recipes rely on the internal class LockInternals. It has
a bug that exhibits when the ZooKeeper cluster is unstable. There is an edge case that causes
LockInternals to leak a node in the lock path that it is managing. This results in a deadlock. The
leak occurs when retries are exhausted. NOTE: TestLockCleanlinessWithFaults now tests for this
condition.

* Added some missing combinations in the backgrounding API

* Added QueueSharder utility. Due to limitations in ZooKeeper's transport layer, a single queue
will break if it has more than 10K-ish items in it. This class provides a facade over multiple
distributed queues. It monitors the queues and if any one of them goes over a threshold, a new
queue is added. Puts are distributed amongst the queues.

* Issue 80: Check for null data before decompressing data in getData().

* Merge from user bbeck - enhanced the testing in-memory ZK server to handle some edge cases. A nice
benefit is that it starts up faster. Thanks Brandon!

1.1.10/1.0.11 - May 17, 2012
============================
* Generalized the ProtectedEphemeralSequential so that it works with any create mode.
withProtectedEphemeralSequential() is deprecated in favor of the new method withProtection().

* Update all uses of Preconditions to make sure they print a reasonable diagnostic message.

* Added a new wrapped Watcher type that can throw exceptions as a convenience. The various
usingWatcher() methods now can take CuratorWatcher instances.

* InterProcessSemaphore and LeaderSelector weren't respecting the default bytes feature.

* Make the default data for nodes be the local IP address. This helps in debugging and enables
the deadlock analysis in Exhibitor.

* New recipe added: DistributedDelayQueue

1.1.9/1.0.10 - May 10, 2012
===========================
* Based on suggestion in Issue 67: Added new concept of UriSpec to the ServiceInstance in the
Service Discovery Curator extension.

* User "Pierre-Luc Bertrand" pointed out a potential race condition that would cause a SysConnected
to get sent before an Expired. So, now I push the event to the parent watcher before resetting
the connection in ConnectionState.process(WatchedEvent)

* New Feature: SessionFailRetryLoop. Huge thanks to Pierre-Luc Bertrand for his work on this.
SessionFailRetryLoop is a special type of retry loop that causes all Curator methods in a thread to
fail when a session failure is detected. This enables sets of Curator operations that must be tied
to a single ZooKeeper session. See Tech Note 3 for details: https://github.com/Netflix/curator/wiki/Tech-Note-3

* Several users have expressed dissatisfaction with the LeaderSelector implementation - requiring a
thread, etc. So, LeaderLatch has been added which behaves a lot like a CountDownLatch but for leader
selection.

1.1.8/1.0.9 - April 17, 2012
============================
* Added methods to compress data via create() and setData() and to decompress data via getData(). The
compression is GZIP by default. You can change this via the CuratorFrameworkFactory by specifying
a CompressionProvider.

* Added ZookeeperFactory to the client as a testing aid.

* Added ACLProvider to make it easier to use ACLs and recipes. It can be set via the
CuratorFrameworkFactory builder.

* Several of the recipes were creating new watcher objects each time they were needed when the watcher(s)
can be created once in the constructor.

* Issue 62: DistributedQueue wasn't handling getting interrupted very well. It was logging an error.

* Issue 64: wasn't handling SASL events. Any non-SysConnected event was being treated as a disconnection.

* Issue 65: Accepted a pull request that fixes a bug in RetryUntilElapsed.

* Issue 66: Bad log string - needed String.format()

1.1.7/1.0.8 - April 6, 2012
===========================
* Accepted a change so that testng is testCompile in Gradle

* Rewrote TestingServer and TestingCluster based on work by Jeremie BORDIER (ahfeel)

* Rewrote the log4j property files

* Moved to ZK 3.4.3

* More work on the Exhibitor integration

1.1.5/1.0.6 - March 23, 2012
============================
* Moved to Gradle as the build system.

* Added SimpleDistributedQueue, a drop-in replacement for the DistributedQueue that comes with the
ZK distribution.

* IMPORTANT CHANGE TO LeaderSelector. Previous versions of Curator overloaded the start() method
to allow re-queueing. THIS IS NO LONGER SUPPORTED. Instead, there is a new method, requeue(), that
does this. Calling start() more than once will now throw an exception.

* LeaderSelector now supports auto re-queueing. In previous versions, it wasn't trivial to requeue
the instance. Now, make a call to autoRequeue() to put the instance in a mode where it will requeue
itself when the leader selector listener returns.

* The mechanism that calls any kind of Curator listener wasn't protected against exceptions. Thus,
an exception in a listener could break the listener event thread.

* deleteDirectoryContents() no longer checks for sym links. This was a major issue in the Guava
version and possibly one of the reasons they removed the method altogether.

1.1.4/1.0.5 - March 12, 2012
============================
* Introduced a parent interface for Queues so that they can have some common methods

* Added new Recipe: DistributedIdQueue - a version of DistributedQueue that allows IDs to be
associated with queue items. Items can then be removed from the queue if needed.

* Curator can now be configured to poll a cluster of Exhibitor (https://github.com/Netflix/exhibitor)
instances to get the connection string to use with the ZooKeeper client. Should the connection
string change, any new connections will use the new connection string.

1.1.3/1.0.4 - March 7, 2012
===========================
* Issue 27: This bug exposed major problems with the PathChildrenCache. I ended up completely
rewriting it. The original version was very inefficient and prone to herding. This new version
is as efficient as possible and adds some nice new features. The major new feature is that when
calling start(), you can have the cache load an initial working set of data.

* Issue 31: It turns out an instance of InterProcessMutex could not be shared in multiple threads. My
assumption was that users would create a new InterProcessMutex per thread. But, this restriction is
arbitrary. For comparison, the JDK Lock doesn't have this requirement. I've fixed this however it
was a significant change internally. I'm counting on my tests to prove correctness.

* EnsurePath wasn't doing its work in a RetryLoop.

* Added a new class to the Test module, Timing, that is used to better coordinate timings in tests

* LockInternals had a retry loop for all failures when it was only needed if the session expired
and the lock node was lost. So, I refined the code to handle this specific case.

* Issue 34: PathChildrenCache should ensure the path

* Moved to Guava 11.x

* Lots of work on the Gradle build. NOTE: Gradle will soon become the build system for Curator

1.1.2/1.0.3 - Feb. 8, 2012
==========================
* Added listener to Queues to listen for put completion

* Issue 24: If InterProcesMutex.release() failed to delete the node (due to connection errors, etc.)
the instance was left in an inconsistent state that would cause a future call to acquire() to
succeed without actually creating the lock. A new feature (see next bullet) was added to solve this
problem: guaranteed deletes. The various lock-based recipes now use this feature.

* New feature: guaranteed deletes. The delete builder now has a method that will record failed node
deletions and attempt to delete them in the background until successful. NOTE: you will still get
an exception when the deletion fails. But, you can be assured that as long as the CuratorFramework
instance is open attempts will be made to delete the node:
    client.delete().guaranteed() ...

1.1.1/1.0.2 - Jan. 21, 2012
===========================
* Issue 22: Make ServiceCache close itself down properly.

* Issue 21: Move TestNG to the top-level pom and define its scope as test

* Issue 17: ConnectionStateManager should use the builder's thread factory if present

1.1.0 - Jan. 5, 2012
=====================
* 1.1.x marks a separate branch of Curator:
    - 1.0.x will stay compatible with ZooKeeper 3.3.x
    - 1.1.x+ will require ZooKeeper 3.4.x+

* Added support for ZooKeeper 3.4's Transactions:
    - CuratorFramework has a new method: inTransaction() that starts a
      transaction builder
    - See TestTransactions for examples

1.0.1 - Jan. 4, 2012
=====================
* Updated and tested against ZooKeeper 3.4.2

1.0.0 - Dec. 31, 2011
=====================
* Added a REST server for Service Discovery
* Switched to slf4j for logging
* Moved to 1.0 version
* Curator is now feature complete

0.6.4 - Dec. 7, 2011
=====================
* Added Barrier

* Added Double Barrier

* Added Read/Write lock

* Added revocation to InterProcessMutex

* Fixed (hopefully) intermittent failures with testRetry()

* Updates/enhancements to Discovery based on suggestions from Eishay Smith

0.6.3 - Nov. 30, 2011
=====================
* Added Service Discovery

0.6.1 - Nov. 18, 2011
=====================
* Added new methods to LeaderSelector to identify/get all Participants

* Moved to ZooKeeper 3.3.3

* Made the TestingCluster not throw an assertion error due to internal JMX registrations
in ZK. This is done with Javaassist ugliness.

* Refactored listeners in Curator to a common methodology

* Major changes to error handling. Adding a ConnectionStateManager that allows users to
listen for connection changes. Connection loss is first treated as a recoverable Suspension.
If the connection is not re-established, the state changes to connection loss. Any recipes
that are affected by this have been updated.

* PathChildrenCache now handles connection state changes much better.

* All Curator created threads now have a meaningful name.

0.5.2 - Nov. 14, 2011
=====================
* Jérémie Bordier posted on the ZK mailing list about a split brain issue with the Leader Selector.
If the Leader is connected to a server that suffers a network partition, it needs to get notified
that it has lost leadership. Curator handled this somewhat but only if the client application
executed periodic ZooKeeper operations. I've enhanced the CuratorFramework implementation to check
for disconnection and executed a background sync (with retries). This will cause any listener's
unhandledError() method to get called when there is a network partition.

* New utility: TestingCluster. Allows for testing with an in-memory ZK ensemble.

* Reworked distributed atomic implementations. I was unhappy with the complexity of the previous
one. Now, there's a simpler master implementation DistributedAtomicValue that is the basis for the
others. Adding new versions should be simpler as well.

0.5.1 - Nov. 12, 2011
=====================
* Another pass at fixing the semaphore. Went back to the model of the count being merely a
convention. Added a new recipe for a SharedCount that can be used in place of the count convention
with the semaphore. This is the best of both worlds. The semaphore code is a lot simpler and will
perform better. Thanks to Monal Daxini for the idea.
  • No labels