ActiveSpace Browsers
Well, it has been a little while since my last post, and I think it is very much time to post some new information on ActiveSpace by introducing a new concept that is specific to ActiveSpaces and we believe extermely useful: the Space Browsers.
Space Browsers are a very simple to use yet very powerful feature of AS that allows programmers to implement the following architectural paradigms in order to create truly elastic distributed applications and services:
Iterating over a subset (view) of the entries stored in a Space
Continuous Queries
Using a Space as a distributed Queue
Grid computing paradigms such as Map/Reduce
Space Browsers (and Listeners) complement the Space’s basic put/get/take/lock methods (which require the user to provide a specific value for the key fields, and therefore operate on a single Space entry at a time) with the ability to ‘iterate’ through sets of entries for which the exact key field values are not known in advance.
A Space Browser is an object that can be seen as a sort of continuously updated iterator over entries stored in a Space. I say ‘sort of’ because, like an iterator, it has a ‘next()’ method which returns the next entry to ‘work’ on, but that’s where the similarities stop.
Browsing over a Space view
Let’s look at the first use case: iterating over a ‘view’ of the entries stored in a Space: when a Browser is created over a Space an optional filter string can be passed that will refine the set of entries being considered by the Browser.
In ActiveSpaces filters (which can be applied to both Browsers and Listerners) follow the SQL-92 syntax and implement many (but not yet all) of its clauses, put simply a filter is the part of a query that follows the ‘where’ statement in a ‘select * from Space where …’ query.
For example if a Space ‘customers’ contains Tuples that have an ‘Age’ field, browsing through all of the entries in the Space for people aged less than 30 years old is as simple as passing the string “Age < 30” as a filter when creating the Browser.
Continuous Queries
However unlike a query in a database that works only on a snapshot of the data contained in the database at a specific point in time, Space Browsers are continuously updated according to the changes in the Space being browsed and can be open-ended, effectively providing ‘Continous Query’ capability of the data contained in the Space. To that extend and unlike a more ‘traditional’ iterator object, the Browser objects do not have a ‘hasNext()’ method, but rather have a timeout value. That value is the amount of time the programmer is willing for the final ‘next()’ call to block for while waiting for something new of ‘next’ on.
Because of the low latency orientation of ActiveSpaces, a blocking 'next()' will unblock and return the newly inserted or updated Tuple in real-time as soon as the change happens, and that no matter how many Listeners or Browsers may be interested in that new Tuple.
Using a Space as a distributed Queue
This continuous real-time updating of the Browser also means that changes happening to the data in the space are automatically reflected on the list of entries about to be browsed as they happen: a Space Browser never gives the user outdated information!
For example, if an entry existed at the time the Browser was created, but this entry gets taken from the space before the Browser’s ‘next()’ method gets to it, then this entry will not be returned by the ‘next()’ method.
Also, unlike a traditional iterator that only allows its users to look at a series of data items, Space Browsers allows users to not only look but also operate on entries in an iterative manner. To that extend Browser have a ‘type’ that influence the type of operation applied to the entry when the Browser’s ‘next()’ method is invoked:
The GET Browser’s ‘next()’ method does a ‘get()’ on the next entry to browse (very much like a regular iterator).
The TAKE Browser ‘next()’ method however does a ‘take()’ on the next entry (that is, it atomically retrieves AND removes the next entry currently availbe to take from the space).
The LOCK Browser ‘next()’ method does a ‘lock()’ on the next entry to browse (that is, it atomically retrieves AND locks of the next entry currently available to lock in the Space).
Because of the fact that Browsers are continuously updated in real-time according to the changes in the unerdlying Space, TAKE and LOCK Browsers effectively (and very simply) allow 1-of-n ‘consumption’ of the entries stored in a Space. No matter how many TAKE Browsers may be created over a Space (even if those Browsers are created by many separate processes deployed on many separate physical hosts), what is taken from the Space by one Browser’s ‘next()’ method will NEVER be taken by another Browser’s ‘next()’ method, thereby allowing the programmer to use a Space like one would use a store-and-forward messaging Queue (for temporal and localization de-coupling, analoguous to the ‘master/worker’ pattern in Space-based architecture for example), except that in our case the Tuple ‘Queue’ being a Space, it can be distributed and scalled up as needed.
Writing elastic applications and services made easy
Creating a distributed service is therefore extremely easy to program using ActiveSpaces, clients simply put requests in a ‘request’ space, and servers simply create a TAKE (or LOCK) Browser on the Space, when ever a new request is put in that space by a client, it is automatically taken by one (and only one) of the servers for processing (and the reply Tuple can be put in a ‘reply’ Space), all the implementer of that service has to do is invoke ‘next()’.
The code below is a copy-and-paste of the part of the ActiveSpaces example server-side program that shows this feature, in this example the program waits for a request to be put in the 'request' space (by the client-side example program) and 'processes' it by adding a new field to the request tuple and putting this updated tuple into the 'reply' space. If there is no new requests to process after 500 milliseconds the program just prints out a status of the number of requests it has processed so far and goes back to waiting for more requests.
// Create a Take Browser on the request space with a timeout of 500 milliseconds
BrowserDef browserDef = BrowserDef.create().setTimeout(500);
Browser takeBrowser = null;
try {
takeBrowser = metaspace.browse(space_req.getName(), BrowserType.TAKE, browserDef);
} catch (ASException e) {
System.out.print("Problem creating the take browser: " + e);
return;
}
// the TakeBrowser allows the use of the 'request' space as a queue
Tuple tuple = null;
SpaceEntry entry = null;
do {
// Try to take a tuple from the space
try {
entry = takeBrowser.next();
// if next() returned an Entry, we successfully took/consumed the entry from the Space
if (entry != null) {
tuple = entry.getTuple();
// add the replyer field to the tuple and put it into the reply space
tuple.put("replyer", selfmember);
space_resp.put(tuple);
served++;
if (served % 100 == 0) {
System.out.println("I have serviced " + served + " requests");
}
} else // there is nothing to take: the next() timed out and returned null
{
System.out.println("I have serviced " + served + " requests and I am waiting for more");
}
} catch (Exception e) {
System.out.println("Exception in the take browser: " + e);
}
} while (true);
New server instances can be added (or removed) on the fly, without service interruption, thereby allowing ActiveSpaces users to create ‘write-once, deploy as many as you need’ elastic applications.
Subscription with 'Initial Values'
But that’s not all of what you can do with a Space Browser: Browsers (and Listeners) have two associated scopes that can further refine (than a filter) the set of entries being browsed.
The ‘time scope’ can be used to narrow down the period of time of interest:
'SNAPSHOT' means that the browser starts with all the entries in the space at the time the browser is created (or initial values), but is not updated with new entries that are put into the space after that moment.
'NEW' means that the browser starts empty, and is updated only with entries (or associated events) put into the space after the moment of the browser’s creation.
'ALL' means that the browser starts with all the entries in the space, and is then seamlessly continuously updated with new entries as they are added into the Space. Readers familiar with pub/sub messaging systems will note that creating a Listener with a time scope of ‘ALL’ allows then to receive ‘initial values’ before they start receiving future publications.
Grid computing programming made easy
The other scope of Browsers and Listeners is the ‘distribution scope’ and it can be used to narrow down the set of entries according to their distribution over the Space's seeders:
‘ALL’ means that all of the entries stored in the Space will be considered, regardless of where they are being seeded
‘SEEDED’ means that only the entries assigned to the local metaspace member on which the Browser/Listener is created will be considered.
What may not be obvious right away is that the distribution scope of ‘SEEDED’ is the gateway to easy creation of grid-computing applications using (for example) a Map/Reduce pattern.
The Map/Reduce pattern can be described as follows: a very large data-set is divided in a set of Key/Value pairs (or in the case of ActiveSpaces in a set of Tuples) that are evenly distributed over a set of nodes (i.e. a Space’s seeders): this is the ‘mapping’ phase. Each node then processes the subset of the data that was assigned to it and puts the result of it’s processing (a smaller or equal set of Tuples) in another Space for the client to retrieve (or for another set of nodes to process): this is the ‘reduce’ phase. Most grid computing applications can be architected as a series of ‘Map’ and ‘Reduce’ phases.
This type of grid computing applications is very easy to create using ActiveSpaces: ‘mappers’ are simply nodes that join a ‘mapping’ Space as seeders each one creating on that space a Browser with a distribution scope of ‘SEEDED’ and just need to invoke ‘next()’ on that Browser: every time a new subset of the data is mapped to a specific node, that node’s ‘next()’ method will return with the new tuple to process. Alternatively a ‘control’ Space could be used to coordinate which mapping or reduce phase of the overall process is currently being executed if a more ‘batch’ rather than ‘event-driven’ type of processing is desired.
Conclusion
In conclusion, while I didn’t even describe completely all of the features of the Space Browsers and Listeners (for example the EventBrowsers), I hope I will have been able to show the power of the Browser concept and the various ways they can be used, and how they greatly simplify the work of the application programmer allowing them to very easily create elastic distributed/grid-enabled applications and services.
