Page History

Design-related questions

Question: In an enterprise context it could be an requirement to send a link to someone else pointing to a specific potentially old message in a certain Pool.

Yes. That's perfectly reasonable. That message is like a static file on disk. Once it's written, it remains unchanged until it's deleted. This is an ideal application of a REST-style approach. That's why I've advocated
for a "message based" approach first, but a REST/static approach when the message based approach doesn't make sense. What I am opposed to is a "try to make everything fit the REST model" approach to API design.

...

Question: Would it be costly in your model to get the message nr. X (+ n older messages) in a users timeline?.
A message will exist outside of a timeline. There exists a cache of recently accessed messages. Sometimes there will be a historic message that is referenced and that will be materialized from backing store and rendered. It will likely fall out of cache if it's historical and not accessed again.

...

Question: I don't get why it has to be in the session's state because you could as well use the information that a user is online as a guidance, even if the state would be stored somewhere out of the session. Wouldn't make a difference I guess and storing it in the session looks natural.

The state itself is not in the session. The session is the guide that the user is online. The session contains a listener that is attached to the User. The only real state that resides in the session is the state
necessary to batch up any messages that the User has forwarded to the listener in between the HTTP polling requests. If there is an HTML front end, state about that front end will reside in the session as well, but
that's a different issue.

...

Question: I don't understand why we would need to store all entries in a cache, instead of only keeping the last n entries for each user based on some heuristics such as the last 3 days or something. I would somehow expect that the probability that a user wants to see a message is exponentially decreasing with the messages age. For example that someone wants to see a message that is the 1000 newest message in his timeline is probably almost zero.
Some people mine their timelines for information. I agree that some aging policy is necessary as 36B entries will consume a lot of storage in RAM or on disk, but the last 1,000 is likely too few based on what I have seen of actual user behavior.
In terms of an aging policy in an RDBMS, the cost of aging out old entries is likely to be an index scan or something on that order (DELETE FROM mailbox WHERE date < xxx or a user-by-user DELETE WHERE id IN (SELECT messages > 1000 in mailbox))

Important Links

Statefulness and algorithms for social networks

Child pages

Versions Compared

Old Version 3

New Version 4

Key

Design-related questions

Important Links