Category Archives: Uncategorized

Improved Referrer Logs

I have reworked the individual page referrer logs to show the title of the referring page where possible, rather than the URL. Here’s a good example.

This page lookup is done asynchronously behind the scenes; so in most cases, the referring page title won’t show up on the initial view, but it will typically be there within a few seconds.

I’d love to hear your feedback…and a big thanks to Les at 0xDECAFBAD for giving me the idea.

More on Caching

Justin Rudd did some research on a caching product called Coherence. Interesting…I wasn’t aware of any such product.

The part of his post that got me thinking was when he was talking about two different types of caches – “Replicated” and “Distributed” (I actually find the term “distributed” for the second type to be somewhat misleading). I got to thinking about what kinds of data you could safely store in each type of cache.

Justin defines the replicated cache:

A replicated cache is where every web server (or app server) has a cache. Then when an entry is added or removed, a multicast packet is sent out and all the other caches update themselves.

And notes that it suffers from the race conditions and other problems previously mentioned. But what kind of data could we store in this kind of cache? Read-only data, of course…but what about read/write data?

For this type of cache, any data for which outdated data is acceptable would be a candidate. My referrer log data for posts, for example. It’s not going to kill me if this data is slightly out of date. But in many cases, with complex inter-related data, I don’t think it’s always trivial to determine if stale data really is acceptable. In some cases you might have to cache related data as well, so you always have a consistent view of the world (albeit outdated).

The second type of cache, the “distributed” cache, is where there is a central cache server (or cluster of them) between the code and the database. Justin talks here about caching data from multiple tables, or multiple systems, even with data that requires transactional integrity. This cache type seems workable for some scenarios, but the transactional issue is more interesting.

You could only use such a cache for transactional data in the case where the underlying cached data is never updated from another source. If your cache has a stale view of data, then transactional integrity goes right out the window. This rules out many legacy systems, as these systems are typically being updated from legacy sources as well as the new code. And even for the single data source, multiple table case, this implies that all N tables involved are ONLY updated via the cache. This is a pretty tough constraint to put on an architecture.

For the sake of argument, let’s say we’re ok with this, and the cache will be the single source of updates. Then we get into the locking issues. I don’t think you can simply lock a single cache entry – it depends on the underlying data layout. For a single-table cache object, maybe…but the more general case gets complicated. Cases come to mind where locking one cache object might require locks on other cache objects, and figuring out which ones would not be trivial. To avoid having to determine which other (seemingly unrelated to the current operation) objects would need to be locked, one would have to design the objects to the very granular. For example, all updates to a customer would have to go through a single object instance.

So to ensure integrity of our data representation, all updates to customer A must go through cache object A. This is a pretty serious concurrency constraint. What I could end up with is multiple requests from multiple servers, all trying to lock a single object on a single server. This is pretty much the number one killer to scalability…

Most transactional systems I have come across or worked on tend to map objects to verbs, rather than nouns. And verb-objects don’t lend themselves to caching – caching is all about maintaining state, and verb-objects generally don’t maintain state. Especially across transaction boundaries. Thoughts?

Collaboration and TrackBack/Pingback

After thinking some more about yesterday’s post, I have some more thoughts. Sam Gentile mentioned that weblogs seem to work better for people than more direct interactions; so why not embrace this, and see if we can get more out of weblogs? It might not get Sam all the way to where he wants to be, but it would certainly be better than where we are now.

Let’s take this discussion for example. Sam made his initial post yesterday sometime, and got a couple of replies (mine and Sam Ruby’s) that I knew about quickly. But what about others? Rahul also replied, but I only found it by searching Technorati. Similarly, unless Sam decides to link to these posts, many of his readers will never know about them. Sam does, because I know he religiously scans his referrer logs :-), but no one else will know.

So how to fix this? It seems to me TrackBack or Pingback is the way to go. Automated notifications between weblogs, encouraging more widespread collaboration. As far as I know, Movable Type users already get this for free, if they enable it. For Radio folks, Google reveals a couple of people doing it, albeit with a bit more work.

So, everyone, how about implementing TrackBack and/or Pingback on your weblogs – and let’s see where it takes us?

Instant Messaging during meetings

Cory Doctorow posted Chat as a side-channel for face-to-face meetings. This got me thinking – I’ve used MSN Messenger many times during conference calls, to discuss things real-time with certain attendees but off-line from the call. It’s proved especially useful when having a conference call with one of my clients, and one of their suppliers/customers. Kind of takes the place of the “lean over and whisper” thing, when you’re not all in the same room.

Collaboration

Sam Gentile says:

…it seems very difficult to get people to collaborate freely and consistently in a medium where they feel like they have an obligation. […] But I am still left with this need for collaboration that is more meaningful that web logs and people interact more directly. Thus far, in the absense of a project, it has eluded me.

I think the key there is “in the absense of a project.” I really have three types of online collaboration. First, there’s personal interactions with my friends. Second, there is the collaboration on specific projects, where the collaboration is part of my job. This would include email with coworkers, and Groove-style collaboration. The third type is the informal collaboration on those topics I have an interest in. This includes the weblog community I read, newsgroups, mailing lists, etc. I participate as I have time to do so.

Sam is talking about collaboration in the absense of a project, and I think it’s the third type he’s referring to. I think the possible cause of his frustration is that we all have our favorite communities that we participate in, even without a specific business need, because we enjoy them. However, I think many folks are just out of time, and have to make choices. When I run out of time, I first drop off of the newsgroups; next to go is the .NET mailing lists, and finally the weblog posts dwindle. Adding more collaboration mediums is great when there’s extra time – but when the crunch comes, something has to go.

Perhaps what we need is a collaboration medium which more effectively aggregates multiple sources into a single “community”. Rather than having to divide time between newsgroups, mailing lists, workspaces, etc., maybe it could be organized by topic. One could participate in the “.NET community” without having to consciously make choices as to the medium. Hmm…