A language for expressing parallel processing based on an
AssociativeMemory data structure that provides a repository (mathematically a
bag; that is, a set that can contain multiple copies of the same item) for tuples (see
TupleDefinition). The structure has six primitive operations:
put,
copy,
take,
try_copy,
try_take, and
eval.
put places a tuple into the bag.
copy finds and reads a tuple.
take is like
copy but also removes a tuple after reading it.
Copy and find block if they cannot find a tuple matching the query;
try_copy and
try_find are the non-blocking versions.
Eval forks a new process.
TupleSpace is an
AssociativeMemory, which means that tuples are accessed not by their
address but rather by their
content and type.
The repository is "generative" in that (see:
GenerativeCommunication) because a tuple generated by a process has an
independent existence in
TupleSpace. Any process may remove the tuple, and the tuple is bound to
no process in particular. any process can reference any Tuple regardless of where that Tuple is stored. A
TupleSpace is a logical concept and does not require an underlying physical shared memory
One strength of the model is its ability to describe parallel algorithms without reference to any specific computer architecture.
The concept of tuple space was pioneered by
DavidGelernter at the
YaleLindaGroup with the
LindaTupleSpaces system and was initially offered in
FortranLanguage and
CeeLanguage languages. However, various other
TupleSpace systems exist today such as LiPS or Actor
Spaces for many languages including the
SmalltalkLanguage,
JavaLanguage,
LispLanguage and
RubyLanguage (
DistributedRuby). For some reason, the
JavaLanguage is experiencing an explosion of generative systems including
JavaSpaces, TSpace, Page
Spaces,
OpenSpaces, and so on.
Systems based on the
BlackboardMetaphor can be simply created in
TupleSpace. Tuple spaces are more general than the
BlackboardMetaphor (which has specialists grabbing information scoped to their specialty), for example, they can also be used for
ComputeServer.
For criticisms of the
TupleSpace model relative to
ConcurrentLogicProgramming,
MessagePassingConcurrency and
CommunicatingSequentialProcesses, see
https://www.cypherpunks.to/erights/history/clp/linda-letters.pdf (starting on page 4 of the PDF).
Putting mutable reference objects into a tuple may incur some problems that subvert some of the benefit for using tuple spaces.
'Introductory links'
What is a TupleSpace?; by comparing it to other similar things
vs message queue
Messages go from one person to another (or to many others).
TupleSpace is a bag with tuples in it. Each process can then find a tuple that matches their search criteria and remove it from the bag. If you choose to have as a field in your tuple a destination process, and if selection is done on that field, then tuples can sort of simulate message passing. By copying rather than removing tuples you can also simulate messages to groups, and broadcast messages.
However, one of the key differences between a
TupleSpace and a message queue involves the temporal aspect. Generally with a message queue once the message has reached the front of the queue and been processed it is gone. With a
TupleSpace that does not have to happen. A Tuple can be placed into the
TupleSpace and it may never be taken. Or it may be taken in a month. Or it might already have been taken by the time you read this sentence.
In addition,
MessageQueue has ordering of the messages, which is not maintained by a
TupleSpace.
vs SQL DB
A
TupleSpace is arguably a more primitive construct than a SQL DBMS. A SQL DBMS could conceivably be built using a
TupleSpace implementation as its storage layer. It would, however, be an awkward
AbstractionInversion to create a
TupleSpace using an SQL DBMS, especially in cases where
TupleSpace functionality is needed without requiring other aspects of SQL or DBMSes.
So it's almost like a database assembler language?
[I don't know. What's a database assembler language? There are various technologies that can underpin a relational database management system.
TupleSpace is one possibility, key/value stores (like the Berkeley DB) is another.]
Implementations of TupleSpaces
Scalability discussion
A question. On
TupleSpaceScalability, I just made the assertion that a distributed, persistent, transactional, fault tolerant
AssociativeMemory cannot scale linearly; that is,
TupleSpace won't work on the Internet, precisely because of its underlying "distributed shared memory" idea. Some assumptions have to be relaxed slightly; I've indicated one possible direction there, but there must be others. What do you think? --
VladimirSlepnev
Discussion
I am a HUGE fan of tuple spaces. They are the most powerful execution environment there is, since the guillotine has gone out of favor. No contest. I believe it can be optimally configured to model any execution environment. Static binding, no problem, dynamic binding, no problem, distribution, no problem, transactions, no problem, persistence, no problem, and so on. These Yale guys are sitting on a gold mine. All we need is a fast O/S mapping and this would take over. No doubt. Word :). --
RichardHenderson
A gold mine? Does YaleUniversity (or anyone else for that matter) hold patents on TupleSpace?
I'm thinking it wouldn't be such a hideous
AbstractionInversion to use an RDBMS to implement a
TupleSpace.
Instead of having different kinds of tuples, you simply have records in different tables. Since most modern RDBMSs provide in-memory tables, you have the possibility creating/consuming large numbers of them without hitting the disk space (so long as you can keep your tables small enough to fit into memory). The transactional nature of such things would seem to be a good fit for Linda's atomic operations. Additionally, if something happened which caused a transaction to fail, the ability to roll-back to a known state and try again would be very helpful. In this respect, I see quite a bit of the development surrounding Linda as re-inventing the RDBMS wheel. Naturally, you'd want to do it in a more lightweight fashion, so much as possible. Eliminating the SQL engine and providing an API to access data/perform operations would be a good start.
Many of the Linda tasks you create are based on "consume a tuple of this type, do X calculations with it, emit a tuple of this other type, repeat as necessary." This would require something repeatedly polling the database, looking for an input tuple to consume. If you could use triggers, which spawn or execute that process every time a record is written to the appropriate table, you get an interrupt-driven process instead of a polling-driven process; from there, it becomes a question of the DB server's ability to manage large numbers of spanwed tasks. Add in the fact that, for many, modern RDBMSs you can have a cluster of machines, sharding the data from a specific set of tables across multiple physical machines, and you have the ability to scale across multiple physical machines for increased performance. Indeed, while you may want some of the records persisted for actual consumption outside of the process, if the vast majority of "intermediate" results, working toward those records, can be done entirely in memory, it may be worth your while to have several physical machines which only shard the in-memory tables and handle the bulk of the triggered processes.
While such a system would be a bit of overkill, I suspect there are more people who are well-versed in the care and feeding of RDBMSs than there are developing with Linda. This could be a good "jumping off point" for introducing more people, inside and outside of various enterprises, to the concepts in Linda.
Are there papers out there about people already trying this? Or have tried this? I have to think I'm not the only person thinking in this direction, but I'm coming up short on hits on Google. It's possible I'm just not searching on the right terms, yet. -- Meower68
It's not clear to me what RAM has to do with it. That's more or less of an implementation detail, not a database "user" interface issue. Often it's best to focus on the UI/language/interface aspects of a "technology" first, and then
consider implementation issues. Of course it's not always seamless because machine/performance-issue trade-offs often dictate what's available or feasible.
To clarify, one of the performance advantages which a
TupleSpace frequently has over an RDBMS is that the
TupleSpace lives only in RAM. I'm merely suggesting that a modern RDBMS, with memory-backed tables, would bring the performance closer to parity.
[Any DBMS or
TupleSpace developed in the last few decades makes extensive use of caching, to the point that disk speed generally isn't an issue except (possibly) at startup whilst the cache is being populated, or if the pattern of retrievals results in a significant quantity of cache misses. The most notable conceptual distinction between a
TupleSpace and an RDBMS is that in an RDBMS, tuple storage must be organised into a finite quantity of predefined tables.
TupleSpaces do not have this structural constraint. Depending on the application, this may be a help or a hindrance. If there are a wide variety of tuple types, the overhead of creating one table per tuple type (or of using some schema that supports arbitrary tuple types, such as EAV) may prohibit using an RDBMS.]
(See related discussion at
MultiParadigmDatabaseDiscussion.)
Wow. I'm almost sorry I brought this up.
When designing an application which will run in a
TupleSpace, you typically have some idea, up front, as to what kind of tuples you will need to create. In that case, it's pretty easy to just create tables and read/write records to them. To avoid getting hung up from the beginning on the whole
MultiParadigmDatabase argument, and what is and is not implemented, I'm asking if anyone has done something like this, using SQL and whatever stored procedure language the server implements. I recognize this is not what RDBMSs were created for, but it would appear that one could play around with this, possibly resulting in something useful. Does anyone know if this has been tried?
If the answer is
no, no one has tried this, fine; that is a valid answer. It's possible that what I'm asking is simply stretching/abusing such systems too much. If someone knows of someone who
has tried this, please provide some references where I can learn more about it. -- Meower68
I'm not aware of anyone who has tried this. I'm not sure what you expect the outcome to be, other than (perhaps) a somewhat slower than usual TupleSpace implementation.
Getting most Fortune 500 companies to provision systems, install a
TupleSpace and experiment with it is, in my experience, a bridge too far. All of them, however, have RDBMSs in place. Those are reasonably well-understood technologies. Many of them have clustered, sharded databases in place, both for production work and on test/development levels. Which is more likely to get a positive response from the typical F500 Pointy Haired Boss?
- We need a higher-performance way to do X. I have some ideas for how to do it, utilizing existing infrastructure, albeit in an unconventional fashion. Can I get some time to experiment with it?
- We need a higher-performance way to do X. Can I get some servers provisioned, on the development layer, and install a bunch of new software, as well as some time to experiment with it?
A dedicated
TupleSpace would perform better; on that we agree. Additional infrastructure, hardware and software, especially software with which no one currently has experience, is a harder sell to management. If, however, you can get the RDBMS solution working and show significant performance improvement, you may be able to lead them toward a dedicated
TupleSpace. -- Meower68
If your organisation does not have a culture of experimentation, it's no more likely to let you experiment on a DBMS than on provisioned servers or anything else. From the company's point of view, the difficult (and usually impossible) sell is "can I get some time to experiment with it" -- unless you were explicitly hired to conduct such experiments. And that's rare in organisations without an experiment-oriented culture. Usually, if "innovation" can't be bought from Oracle or Microsoft along with a full support contract, or implemented entirely within an Excel spreadsheet, it's not going to happen. On the other hand, if your organisation does support experimentation, it's as likely to support provisioning a server or two as it is to allow you your own experimental database. Indeed, with modern data centre infrastructure, provisioning servers may require no more expense and effort than launching a couple more virtual machine instances on the corporate private cloud.
Time for a Tuple-Ware party? :-)
Contributors include
LukeGorrie,
AndersBengtsson,
AravindAjad,
RobertDiFalco,
BayleShanks.
See
MultiCaster,
AssociativeMemory,
GenerativeCommunication,
MultiParadigmDatabase
AugustThirteen