TupleSpace

Last edit November 3, 2014

A language for expressing parallel processing based on an AssociativeMemory data structure that provides a repository (mathematically a bag; that is, a set that can contain multiple copies of the same item) for tuples (see TupleDefinition). The structure has six primitive operations: put, copy, take, try_copy, try_take, and eval. put places a tuple into the bag. copy finds and reads a tuple. take is like copy but also removes a tuple after reading it. Copy and find block if they cannot find a tuple matching the query; try_copy and try_find are the non-blocking versions. Eval forks a new process.

TupleSpace is an AssociativeMemory, which means that tuples are accessed not by their address but rather by their content and type.

The repository is "generative" in that (see: GenerativeCommunication) because a tuple generated by a process has an independent existence in TupleSpace. Any process may remove the tuple, and the tuple is bound to no process in particular. any process can reference any Tuple regardless of where that Tuple is stored. A TupleSpace is a logical concept and does not require an underlying physical shared memory

One strength of the model is its ability to describe parallel algorithms without reference to any specific computer architecture.

The concept of tuple space was pioneered by DavidGelernter at the YaleLindaGroup with the LindaTupleSpaces system and was initially offered in FortranLanguage and CeeLanguage languages. However, various other TupleSpace systems exist today such as LiPS or ActorSpaces for many languages including the SmalltalkLanguage, JavaLanguage, LispLanguage and RubyLanguage (DistributedRuby). For some reason, the JavaLanguage is experiencing an explosion of generative systems including JavaSpaces, TSpace, PageSpaces, OpenSpaces, and so on.

Systems based on the BlackboardMetaphor can be simply created in TupleSpace. Tuple spaces are more general than the BlackboardMetaphor (which has specialists grabbing information scoped to their specialty), for example, they can also be used for ComputeServer.

For criticisms of the TupleSpace model relative to ConcurrentLogicProgramming, MessagePassingConcurrency and CommunicatingSequentialProcesses, see https://www.cypherpunks.to/erights/history/clp/linda-letters.pdf (starting on page 4 of the PDF).

Putting mutable reference objects into a tuple may incur some problems that subvert some of the benefit for using tuple spaces.

'Introductory links'

http://software-carpentry.org/blog/2011/03/tuple-spaces-or-good-ideas-dont-always-win.html
Linda in Context, http://www.cs.uwaterloo.ca/~fmavadda/p444-carriero.pdf (haven't finished it yet, and I'd prefer a little less context ;-))
JavaSpacesPrinciplesPatternsAndPractice (Freeman, Hupfer, Arnold) is a very good introduction to tuple spaces in general, even though it is centered around JavaSpaces which is a bit different.
http://web.archive.org/web/20090925185219/http://phi.sinica.edu.tw/instruct/workshop/html/linda/linda.html
http://www-unix.mcs.anl.gov/dbpp/text/node44.html

What is a TupleSpace?; by comparing it to other similar things

vs message queue Messages go from one person to another (or to many others). TupleSpace is a bag with tuples in it. Each process can then find a tuple that matches their search criteria and remove it from the bag. If you choose to have as a field in your tuple a destination process, and if selection is done on that field, then tuples can sort of simulate message passing. By copying rather than removing tuples you can also simulate messages to groups, and broadcast messages.

However, one of the key differences between a TupleSpace and a message queue involves the temporal aspect. Generally with a message queue once the message has reached the front of the queue and been processed it is gone. With a TupleSpace that does not have to happen. A Tuple can be placed into the TupleSpace and it may never be taken. Or it may be taken in a month. Or it might already have been taken by the time you read this sentence.

In addition, MessageQueue has ordering of the messages, which is not maintained by a TupleSpace.

vs SQL DB

A TupleSpace is arguably a more primitive construct than a SQL DBMS. A SQL DBMS could conceivably be built using a TupleSpace implementation as its storage layer. It would, however, be an awkward AbstractionInversion to create a TupleSpace using an SQL DBMS, especially in cases where TupleSpace functionality is needed without requiring other aspects of SQL or DBMSes.

So it's almost like a database assembler language?

[I don't know. What's a database assembler language? There are various technologies that can underpin a relational database management system. TupleSpace is one possibility, key/value stores (like the Berkeley DB) is another.]

Implementations of TupleSpaces

LindaTupleSpaces
In JavaLanguage
- Sun's JavaSpaces
  - GigaSpaces, a commercial implementation - http://www.gigaspaces.com
  - OpenSpaces http://openspaces.exolab.org/ (BrokenLink)
- IBM's TSpaces - http://www.almaden.ibm.com/cs/TSpaces/
- ActiveSpace - http://activespace.codehaus.org/
- Jada - http://www.cs.unibo.it/~rossi/jada/
- LighTS http://lights.sourceforge.net/
- jxtaspaces for P2P http://jxtaspaces.jxta.org/
- GLOBE http://www.diku.dk/hjemmesider/studerende/eglarsen/GLOBE/
- Blitz http://www.dancres.org/blitz/
In C
- Scientific Computing Associates' TCP-Linda http://lindaspaces.com/about/index.html
- LinuxTuples http://linuxtuples.sourceforge.net/
- W.F. Wong's Simple C-Linda with Posix Threads http://www.comp.nus.edu.sg/~wongwf/linda.html
HTML Page Spaces -- http://flp.cs.tu-berlin.de/pagespc/
Masatoshi Seki's Rinda, part of DistributedRuby dRuby (a.k.a. DRb?) http://www2a.biglobe.ne.jp/~seki/ruby/druby.en.html (this is part of the standard ruby distribution now)
Ruple where the tuples are replaced by XML documents (well-formed, but not necessarily validated) - http://www.roguewave.com/developer/tac/ruple (Ceased development as of 20 May 2003)
p4-Linda - part of the p4 parallel programming system (no longer supported)
- http://www-fp.mcs.anl.gov/~lusk/p4/index.html
- ftp://info.mcs.anl.gov/pub/p4
PyBrenda in Python http://www.snurgle.org/~pybrenda/
PyLinda. Linda imp. in Python http://www-users.cs.york.ac.uk/~aw/pylinda/
in Python, R, Matlab: NetWorkSpaces, http://www.ddj.com/web-development/200001971?pgno=1
in ToolCommandLanguage Tcl: http://wiki.tcl.tk/3947
TripleSpace is where the tuples are RDF triples: http://www.deri.ie/fileadmin/documents/DERI-TR-2004-05-31.pdf

Scalability discussion

A question. On TupleSpaceScalability, I just made the assertion that a distributed, persistent, transactional, fault tolerant AssociativeMemory cannot scale linearly; that is, TupleSpace won't work on the Internet, precisely because of its underlying "distributed shared memory" idea. Some assumptions have to be relaxed slightly; I've indicated one possible direction there, but there must be others. What do you think? -- VladimirSlepnev

Discussion

I am a HUGE fan of tuple spaces. They are the most powerful execution environment there is, since the guillotine has gone out of favor. No contest. I believe it can be optimally configured to model any execution environment. Static binding, no problem, dynamic binding, no problem, distribution, no problem, transactions, no problem, persistence, no problem, and so on. These Yale guys are sitting on a gold mine. All we need is a fast O/S mapping and this would take over. No doubt. Word :). -- RichardHenderson

A gold mine? Does YaleUniversity (or anyone else for that matter) hold patents on TupleSpace?

I'm thinking it wouldn't be such a hideous AbstractionInversion to use an RDBMS to implement a TupleSpace.

Instead of having different kinds of tuples, you simply have records in different tables. Since most modern RDBMSs provide in-memory tables, you have the possibility creating/consuming large numbers of them without hitting the disk space (so long as you can keep your tables small enough to fit into memory). The transactional nature of such things would seem to be a good fit for Linda's atomic operations. Additionally, if something happened which caused a transaction to fail, the ability to roll-back to a known state and try again would be very helpful. In this respect, I see quite a bit of the development surrounding Linda as re-inventing the RDBMS wheel. Naturally, you'd want to do it in a more lightweight fashion, so much as possible. Eliminating the SQL engine and providing an API to access data/perform operations would be a good start.

Many of the Linda tasks you create are based on "consume a tuple of this type, do X calculations with it, emit a tuple of this other type, repeat as necessary." This would require something repeatedly polling the database, looking for an input tuple to consume. If you could use triggers, which spawn or execute that process every time a record is written to the appropriate table, you get an interrupt-driven process instead of a polling-driven process; from there, it becomes a question of the DB server's ability to manage large numbers of spanwed tasks. Add in the fact that, for many, modern RDBMSs you can have a cluster of machines, sharding the data from a specific set of tables across multiple physical machines, and you have the ability to scale across multiple physical machines for increased performance. Indeed, while you may want some of the records persisted for actual consumption outside of the process, if the vast majority of "intermediate" results, working toward those records, can be done entirely in memory, it may be worth your while to have several physical machines which only shard the in-memory tables and handle the bulk of the triggered processes.

While such a system would be a bit of overkill, I suspect there are more people who are well-versed in the care and feeding of RDBMSs than there are developing with Linda. This could be a good "jumping off point" for introducing more people, inside and outside of various enterprises, to the concepts in Linda.

Are there papers out there about people already trying this? Or have tried this? I have to think I'm not the only person thinking in this direction, but I'm coming up short on hits on Google. It's possible I'm just not searching on the right terms, yet. -- Meower68

It's not clear to me what RAM has to do with it. That's more or less of an implementation detail, not a database "user" interface issue. Often it's best to focus on the UI/language/interface aspects of a "technology" first, and then consider implementation issues. Of course it's not always seamless because machine/performance-issue trade-offs often dictate what's available or feasible.

To clarify, one of the performance advantages which a TupleSpace frequently has over an RDBMS is that the TupleSpace lives only in RAM. I'm merely suggesting that a modern RDBMS, with memory-backed tables, would bring the performance closer to parity.

[Any DBMS or TupleSpace developed in the last few decades makes extensive use of caching, to the point that disk speed generally isn't an issue except (possibly) at startup whilst the cache is being populated, or if the pattern of retrievals results in a significant quantity of cache misses. The most notable conceptual distinction between a TupleSpace and an RDBMS is that in an RDBMS, tuple storage must be organised into a finite quantity of predefined tables. TupleSpaces do not have this structural constraint. Depending on the application, this may be a help or a hindrance. If there are a wide variety of tuple types, the overhead of creating one table per tuple type (or of using some schema that supports arbitrary tuple types, such as EAV) may prohibit using an RDBMS.]

(See related discussion at MultiParadigmDatabaseDiscussion.)

Wow. I'm almost sorry I brought this up.

When designing an application which will run in a TupleSpace, you typically have some idea, up front, as to what kind of tuples you will need to create. In that case, it's pretty easy to just create tables and read/write records to them. To avoid getting hung up from the beginning on the whole MultiParadigmDatabase argument, and what is and is not implemented, I'm asking if anyone has done something like this, using SQL and whatever stored procedure language the server implements. I recognize this is not what RDBMSs were created for, but it would appear that one could play around with this, possibly resulting in something useful. Does anyone know if this has been tried?

If the answer is no, no one has tried this, fine; that is a valid answer. It's possible that what I'm asking is simply stretching/abusing such systems too much. If someone knows of someone who has tried this, please provide some references where I can learn more about it. -- Meower68

I'm not aware of anyone who has tried this. I'm not sure what you expect the outcome to be, other than (perhaps) a somewhat slower than usual TupleSpace implementation.

Getting most Fortune 500 companies to provision systems, install a TupleSpace and experiment with it is, in my experience, a bridge too far. All of them, however, have RDBMSs in place. Those are reasonably well-understood technologies. Many of them have clustered, sharded databases in place, both for production work and on test/development levels. Which is more likely to get a positive response from the typical F500 Pointy Haired Boss?

We need a higher-performance way to do X. I have some ideas for how to do it, utilizing existing infrastructure, albeit in an unconventional fashion. Can I get some time to experiment with it?
We need a higher-performance way to do X. Can I get some servers provisioned, on the development layer, and install a bunch of new software, as well as some time to experiment with it?

A dedicated TupleSpace would perform better; on that we agree. Additional infrastructure, hardware and software, especially software with which no one currently has experience, is a harder sell to management. If, however, you can get the RDBMS solution working and show significant performance improvement, you may be able to lead them toward a dedicated TupleSpace. -- Meower68

If your organisation does not have a culture of experimentation, it's no more likely to let you experiment on a DBMS than on provisioned servers or anything else. From the company's point of view, the difficult (and usually impossible) sell is "can I get some time to experiment with it" -- unless you were explicitly hired to conduct such experiments. And that's rare in organisations without an experiment-oriented culture. Usually, if "innovation" can't be bought from Oracle or Microsoft along with a full support contract, or implemented entirely within an Excel spreadsheet, it's not going to happen. On the other hand, if your organisation does support experimentation, it's as likely to support provisioning a server or two as it is to allow you your own experimental database. Indeed, with modern data centre infrastructure, provisioning servers may require no more expense and effort than launching a couple more virtual machine instances on the corporate private cloud.

Time for a Tuple-Ware party? :-)

Contributors include LukeGorrie, AndersBengtsson, AravindAjad, RobertDiFalco, BayleShanks.

See MultiCaster, AssociativeMemory, GenerativeCommunication, MultiParadigmDatabase

AugustThirteen