Talk:Thunk
This article is rated C-class on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | ||||||||||||||
|
Untitled
[edit]On the topic of LISP..
Not so, a thunk stored in a SASL list allows delayed evaluation of the thunk until the value is absolutely needed, a common example is a list of all prime numbers where the list of elements are represented with a thunk that is used to derive the prime numbers, filling in the contents with the primes as they are provided by the thunk, thus allowing repeated accesses of the same list elements takes constant time because they are already evaluated. The thunk requires access to the list indices (thunk implemented as a lambda function). .
Actually, a lambda function can be used to create a list of thunks, but the thunks are determined by the lambda function, so.. It's been a while since I've coded LISP, ever since I learned the language, great for learning but I wouldn't code in it.. Ask Suzanne Sluizer she knows..
--Rofthorax 09:56, 24 August 2005 (UTC)
No No No! Thunking is more general than that! Probably needs a reference to Algol implementations which used thunking... -- 81.79.64.46 11:14, 3 May 2004 (UTC)
For a good discussion, see:
http://compilers.iecc.com/comparch/article/98-03-043
The legend that I heard was that, generically, a "thunk" is a function (or procedure) which takes no arguments, and returns no values, and that it was coined by Donald Knuth in The Art of Computer Programming, who came up with thunk as an anagram of his surname, and as a way to describe a minimal function. I don't have my copy handy to verify. But this seems like a general definition which would apply to invoking the continuation of closures, which usually take much longer than a little thunk should.
According to the Internet's famous jargon file:
File: jargon.info :thunk: /thuhnk/ n. 1. "A piece of coding which provides an address", according to P. Z. Ingerman, who invented thunks in 1961 as a way of binding actual parameters to their formal definitions in Algol-60 procedure calls. If a procedure is called with an expression in the place of a formal parameter, the compiler generates a {thunk} to compute the expression and leave the address of the result in some standard location. 2. Later generalized into: an expression, frozen together with its environment, for later evaluation if and when needed (similar to what in techspeak is called a `closure'). The process of unfreezing these thunks is called `forcing'. 3. A {stubroutine}, in an overlay programming environment, that loads and jumps to the correct overlay. Compare {trampoline}. 4. People and activities scheduled in a thunklike manner. "It occurred to me the other day that I am rather accurately modeled by a thunk --- I frequently need to be forced to completion." --- paraphrased from a {plan file}. Historical note: There are a couple of onomatopoeic myths circulating about the origin of this term. The most common is that it is the sound made by data hitting the stack; another holds that the sound is that of the data hitting an accumulator. Yet another holds that it is the sound of the expression being unfrozen at argument-evaluation time. In fact, according to the inventors, it was coined after they realized (in the wee hours after hours of discussion) that the type of an argument in Algol-60 could be figured out in advance with a little compile-time thought, simplifying the evaluation machinery. In other words, it had `already been thought of'; thus it was christened a `thunk', which is "the past tense of `think' at two in the morning".
- I'd read this before too, which is why the "no known root" at the top of the article surprised me. Could someone paraphrase the "past tense of think" explanation in the article? I'm too tired to do it well right now. PeteVerdon 21:34, 2 January 2006 (UTC)
- Is the Jargon File really reliable enough to be included in encyclopedic content? If there's an actual non-speculative source for the "past tense of think" explanation, that ought to be cited -- otherwise I don't think that the possible etymology of the word is really important enough to be included, since there are at least three distinct, unverified etymologies floating around. --bmills 20:25, 5 January 2006 (UTC)
- who would have thunk it? --ZhuLien 2:14, 26 March 2007 (UTC)
- The link to the Ingerman article is 404, but a friend wrote to me: “As an ACM member, I have easy access to the "thunk" article. It is attached. Ingerman does not give etymology, but he mentions the charming term "FUSBUDGET mechanism".” I've corrected the volume number (Vol. 4, not Vol. 1); see January 1961 Table of Contents. -- Thnidu (talk) 00:45, 20 May 2010 (UTC)
Splittify me!
[edit]As far as my mind can tell (which is quite far, I believe), two meanings of the word "thunk" are used on this page. I feel that each should have a separate page... I dunno why, though :-) --Ihope127 03:42, 22 August 2005 (UTC)
- I reckon all 3 meanings should be split. The non-computer term is the most commonly used one I'd guess as the computer terms would only be used by programmers. --ZhuLien 13:08, 23 July 2006 (GMT+10)
- Definitely. The senses i can identify are:
- Delayed evaluation, explicitly or as part of call by name/need. (Thunk (functional programming)?)
- OS/2 / Windows address space conversion.
- Dynamic linking. (Sounds very much like trampolining.)
- I'm not sure i get the "to describe a specific type of adapter" section; it sounds very non-canonical. --Piet Delport 14:51, 13 April 2007 (UTC)
- I agree, as well. These seem like different things, each significant enough to merit an individual article. Thunk 04:03, 29 May 2007 (UTC)
- Don't split. These are not unrelated concepts; they are born of similar needs in loader design. For someone who is trying to design/change/improve a loader, its nice to have one article that discusses all the various manifestations of the general idea of thunking. Its not like this article is too long, or anything. linas (talk) 20:42, 27 December 2007 (UTC)
- To be clear, the thunk in object-oriented programming has flavors of both a run-time reference resolution (which is, in fact, the one place where OO programmers do get explicitly bitten by thunking mistakes), and it has a flavour of dynamic linking resolution. So, at some superficial level, the different implementations of thunks discussed in this article seem distinct, they are in fact different manifestations of the same core design requirement. linas (talk) 20:54, 27 December 2007 (UTC)
I say split. Thunks are a mechanism for delayed computation of values. The OOP-related thing and the Microsoft-specific "hack" stuff definitely have nothing to do with that---those sound more like "stubs", which is the term I would use for bits of code that stand in for calling other code. Of course I'm not saying we should get rid of that info, but it could be split into separate articles or perhaps merged into something more relevant. --cos —Preceding unsigned comment added by 129.67.53.183 (talk) 23:40, 22 October 2009 (UTC)
- I say split thunk(computing) from thunk(dialectics). The multiple algorithmic senses of Thunk should be grouped on one page. The sense of Thunk, that is more like a dialectical koan, should be on its own page. U664003803 (talk) 23:44, 25 February 2010 (UTC)
I agree, there is a need to split. The third point is definitely specific to Windows and should be in another page than the two first generic notions. Should we first agree on that ? (and on dialectics also, of course !) —Preceding unsigned comment added by 134.157.168.24 (talk) 16:47, 26 February 2010 (UTC)
- Where is thunk(dialectics). Why the first section has been completely removed ? —Preceding unsigned comment added by 83.202.168.142 (talk) 13:32, 7 March 2010 (UTC)
Flat thunk
[edit]I removed the following and replaced it with a shorter summary based on the first half:
A flat thunk consists of a pair of dlls (one 32bit and one 16bit) that are used to translate calls from 32bit code to 16bit code. To allow the two dlls to communicate, some intermediate code must be used to translate memory addresses (pointers) between platforms. If you have any past experience with 16bit process memory calls, you may recall that they consist of a pointer to a memory segment and the offset into that memory segment. This is different than a 32bit process memory pointer which consists of an absolute address to the memory being accessed. So, the problem, in a nutshell, is translating segment + offset pointers into absolute addresses. VB programmers don't usually need to worry about things like memory pointers, but the problem is that ALL software is ultimately based on memory pointers. It is the IDE and the programming language that hide these ugly details from us but when you get right down to it, every variable, function, sub and etc... that you write (in any language) consists of an address in memory. Now, imagine a 16bit dll being loaded into a 32bit process where none of the memory addresses match up on either side of the function calls. It just plain can't be done without proper translation. By Gaurav Bhaskar Microsft Support [email protected]
Cammy 19:25, 29 December 2006 (UTC)
Jensen's device
[edit]- The first example I had given of thunking was in terms of the use of Jensen's device (call by name) to implement read/write or get/put macros (where a separate IO routine was called by reference) in IBM OS360.
I deleted this fragment. I'm putting in here in case anybody wants to expand and properly cite it into an encyclopedic paragraph. --shadytrees 17:50, 16 July 2007 (UTC)
Common Lisp's constantly
[edit]It does not belong under a heading of "delayed computation",
as constantly
is a function and hence its
argument is evaluated when the "thunk" is created, not when
used.
70.111.106.99 (talk) 21:00, 3 January 2008 (UTC)
Untitled
[edit]The code examples in this article are too obscure to be informative. They don't so much demonstrate the use of thunks as they show bits of Algol or some such. It would perhaps be more useful to say that thunks are in fact a data structure that contains a reference to code as well as an environment to execute that code in (the env might be a stack frame or dictionary pointer -- it's a place that somehow holds mappings between variables and values).
For instance:
variable a = 7; def foo(a); // NB: two vars called a print a + 2; enddef; foo(a + 10);
When foo is called in an eager language, the expression (a + 10) is evaluated before the function is called, and the value passed in is 17. foo will print the value of (17 + 2) = 19 (not 7 + 2), as it uses the latest definition of a, which is the one local to the function.
In a lazy language, instead of evaluating (a + 10) before passing it into foo, the code of the expression just gets wrapped up in a thunk along with current variable mappings as [(a + 10), {a -> 7}] and that gets passed in. When foo tries to print the value of (a + 2), the thunk that is now in the local variable a is executed using the environment it carries in it, not the one of foo, so we get (17 + 2), which is printed as 19. The point to note is that in principle you shouldn't have the same bit of code executing in environments it doesn't belong to (this relates to lambda calculus and how you can't subst RHS expressions that use the same variables as the LHS expressions).
Thunks can help with some infinite loops. Consider the following Haskell code (a lazy language):
-- takes two args and returns the first first x y = x -- infinite recursion infinity = let loop k = loop (k+1) in loop 0 first 5 infinity => PRINTS 5
The infinite recursion is never evaluated, as it's never used by the program. It only ever uses the first arg of the function, and the second just stays a thunk and is not expanded. An eager language would try to get a value for infinity before calling first. At this point it may be worth looking into innermost vs outermost reduction.
While thunks can be useful in this way, they have their disadvantages: creating and destroying thunks all the time can make programs slow. Thunks that contain side-effects (e.g. print a value to the screen) may not execute in the order the programmer intended as their contents are only evaluated whenever the function called tries to use their value (if at all).
There are some interesting exceptions in common langs like C and Java where this sort of lazy thing is used (but not using thunks). For instance, if statements only execute one of their branches, which is a bit of forced lazy evaluation.
Just some points that could be used to spruce this article up, or that people can read up on if they're interested in this sort of thing.
--cos
—Preceding unsigned comment added by 129.67.53.183 (talk) 19:05, 22 October 2009 (UTC)
The Algol 60 thunk is not the same thing as lazy evaluation because the semantics are that it is re-evaluated for each use, not only on first use (the article "lazy evaluation" makes this distinction very early). That is why Jensen's device "works".
Nor is it a "delayed computation" because its value is the value at the point of use not at some earlier time. It is not in general possible to evaluate it any earlier, so it is not delayed. Of course, in a given use, one or both of these things might be true but that is a property of the program, not of the thunk itself.
BTW, if I remember correctly, and to nit pick, addresses (aka "names") are not part of the value domain in Algol 60, in which case, these thunks do not return values.
The C thing that cos mentions I would not describe as lazy evaluation but consecutional evaluation. The difference being that in the consecutional case, the expression is never referenced (i.e. bound), as against the value never being needed but the expression having been referenced. The difference may appear subtle but it is conceivable that in a (imagined) language with precise exception semantics the lazy case might throw an exception at the point of being bound while the consecutional case might not.
There should, perhaps, be an article for consecutional semantics.
Noticing the similarities between these concepts is useful, but we should also be careful to notice their differences. -- dlm —Preceding unsigned comment added by 192.55.54.38 (talk) 13:50, 17 May 2011 (UTC)
32-bit software in 64-bit Windows
[edit]Please can someone write a section explaining why running 32-bit programs on 64-bit Windows is not considered thunking. —Preceding unsigned comment added by 198.54.202.114 (talk) 15:43, 18 September 2010 (UTC)
Addition of example assembly compilation for example code (thunk_for_C_access_in_B)
[edit]Do people think (no semi-pun intended) that it would be useful to put an example of the assembly code a compiler would generate for the wrapper function in the example? It talks about how the function saves time on average (one expensive operation and one cheap operation in one case, and an addition of only a heap branch in another case), but it's kind of hard to see that without knowing what exact instructions are generated. I don't quite understand it well enough to write such :) - but I've seen this kind of code (Newton ROM - ArM Norcraft-compiled C++ [0.43/C4.68 (rel195/Newton Release 9); 1996]). I can sat that it looks like a mess of jumps, and I think seeing a "simple" assembly compilation of this pattern would help understanding (to the extent that Wikipedia CS articles are a psudo-textbook..)
Also, why doesn't the Talk page for Thunk_(object-oriented_programming
automatically redirect here to Talk:Thunk
? — Preceding unsigned comment added by 98.223.232.121 (talk) 16:42, 21 December 2011 (UTC)
SQL and Ajax examples
[edit]The Thunk (compatibility mapping) page (which I'm merging back here) has these two examples.
- For example, when reading a foreign key from a database table an obvious requirement is to make the join to the related table. Thunk can be used as the term for making this explicit join within the constraints of a hand-created SQL stored procedure generator. Another example is in generic message interceptors (such as generic mouse-click interceptors within auto-enabling Ajax JavaScript libraries). The overall process of intercepting the generic mouse-click message, determining registered handlers, dispatching to those handlers, retrieving results, and applying results to the current page can be described as "thunking" the mouse-click event. In this sense, "to thunk" describes the overall process of detecting a condition that requires re-translation and/or re-packaging of the initial data along with the dispatching and/or handling computer code to support the required action.
I don't know what this paragraph is talking about, nor are there sources that might help me understand. The second one is maybe trying to say that "thunking" = "event-driven programming", but then it's wrong and by all appearances original research. Nonetheless I'm leaving the examples here in case I'm mistaken. 50.136.204.132 (talk) 10:10, 7 March 2014 (UTC)
Article re-merged
[edit]This article was split in Feb. 2011. This was its state at the time: disjointed, poorly formatted, jargon-heavy, filled with examples but very little explanation. So little explanation that editors were convinced that the article was talking about several unrelated concepts. It was not. All the meanings of "thunk" here (except the ones I pasted above) are practical evolutions of Ingerman's "computations as parameters" concept. They differ with regard to the reason that a computation is needed, and how it is generated (by the compiler, another compile-time tool, or a run-time service). There should be one article, and now there is. 50.136.204.132 (talk) 07:36, 9 March 2014 (UTC)
What about simplifying "Object-oriented programming" section ?
[edit]IMHO, example in this section is uselessly complicated, since it uses a class A which merely appears in the example while being irrelevant to the issue discussed.
To achieve the improvement goal requested in the header of "Thunk" page, class A could be suppressed and class B and class C could be renamed into class A and class B respectively, thus making clearer the intrisic issue of the case shown.
This would yield something like:
Object-oriented programming
[edit]Thunks are useful in object-oriented programming platforms that allow a class to inherit multiple interfaces, leading to situations where the same method might be called via any of several interfaces. The following code illustrates such a situation in C++.
class A {
int value;
virtual int access() { return this->value; }
};
class B : public A {
int better_value;
virtual int access() { return this->better_value; }
};
int use(A *a) {
return a->access();
}
// ...
A someA;
use(&someA);
B someB;
use(&someB);
In this example, the code generated for each of the classes A and B will include a dispatch table that can be used to call access
on an object of that type, via a reference that has the same type. Class B will have an additional dispatch table, used to call access
on an object of type B via a reference of type A. The expression a->access()
will use A's own dispatch table or the additional B table, depending on the type of object a refers to. If it refers to an object of type B, the compiler must ensure that B's access
implementation receives an instance address for the entire B object, rather than just the inherited A part of that object.[1]
As a direct approach to this pointer adjustment problem, the compiler can include an integer offset in each dispatch table entry. This offset is the difference between the reference's address and the address required by the method implementation. The code generated for each call through these dispatch tables must then retrieve the offset and use it to adjust the instance address before calling the method.
The solution just described has problems similar to the naïve implementation of call-by-name described earlier: the compiler generates several copies of code to calculate an argument (the instance address), while also increasing the dispatch table sizes to hold the offsets. As an alternative, the compiler can generate an adjustor thunk along with B's implementation of access
that adjusts the instance address by the required amount and then calls the method. The thunk can appear in B's dispatch table for A, thereby eliminating the need for callers to adjust the address themselves.[2]
References
- ^ Stroustrup, Bjarne (Fall 1989). "Multiple Inheritance for C++" (PDF). Computing Systems. 1 (4). USENIX. Retrieved 4 August 2014.
- ^ Driesen, Karel; Hölzle, Urs (1996). "The Direct Cost of Virtual Function Calls in C++" (PDF). OOPSLA. Retrieved 24 February 2011.
{{cite journal}}
: Cite journal requires|journal=
(help)
Examples
[edit]One of the uses for thunks is a numeric computation which repeatedly invokes a subcomputation with different inputs, e.g., an integration routine. In ALGOL 60 such a routine would normally be written as a procedure with a call-by-name parameter for the subcomputation. In some other languages it would probably be coded with a procedure parameter, avoiding the need for a thunk. I believe that Thunk#Applications should have a subsection with examples of such uses, but am not sure what it should be called. Shmuel (Seymour J.) Metz Username:Chatul (talk) 17:45, 3 May 2016 (UTC)
Environment of thunk
[edit]To the text "The address and environment of this helper subroutine," Chatul added the annotation that the environment of the helper routine (the thunk) is not the environment in question, but that it is some other environment which is unclear from the wording of the note.
Possibly incorrect text should be corrected, not annotated with a "clarification" that contradicts it. In this case, the text previously said "the address of the helper subroutine." This ought to be enough. Whether the environment of a function travels with it is an implementation question. An environment should be included to deal with the funarg problem, but thunks have seen limited use in languages such as C++ that do not include one. When the funarg problem is solved, the environment passed for the thunk routine is — almost by definition — the thunk routine's environment. What other routine's environment could it be? 73.71.251.64 (talk) 19:02, 24 May 2020 (UTC)
- When you comment on what someone else wrote, please quote rather than paraphrasing. What you attributed to me is not what I wrote, which was "The environment passed is that of the call with the by-name parameter, not that of the called routine." That in no way contradicted the text that it was attached to.
- What I had in the footnote was not an implementation question, it affects the results of the call. How the environment is passes, and whether it is the responsibility of the called routine or the calling routines, are the implementation details. If the first parameter of foo is call by reference, the third parameter is call by name, foo has a local variable called baz and you call foo(bar,baz,bar + baz), you would get the wrong results if you didn't use the stack frame of the caller to evaluate bar+baz. Shmuel (Seymour J.) Metz Username:Chatul (talk) 19:46, 24 May 2020 (UTC)
- Thunks were invented for a language with call by name. Why would they see extensive use for a language without call by name? Shmuel (Seymour J.) Metz Username:Chatul (talk) 19:46, 24 May 2020 (UTC)
- I specifically said that they have seen limited use, but use nonetheless. C++ compilers use thunks. Programs written in C use thunks. Attaching an environment to a thunk parameter is a best practice because it can, as you say, affect the result. I'm well aware of that, as I brought up the funarg problem. Some languages automate this best practice. Others (including at least historical Lisp implementations) do not. 73.71.251.64 (talk) 20:22, 24 May 2020 (UTC)
- Thunks were invented half a decade after LISP, specifically as a means of implementing call by name in ALGOL 60. That required that the thunk be what in modern nomenclature is called a closure, and that it encapsulate both a body of code and a stack frame or equivalent. That association can be handled by code in the called routine, but it still has to be there. Shmuel (Seymour J.) Metz Username:Chatul (talk) 22:35, 24 May 2020 (UTC)
- If you accept my rewording of the note, then there is no purpose in debating the rest. I did not intentionally misrepresent you, and I can assume that you did not intentionally misread me. 73.71.251.64 (talk) 00:59, 26 May 2020 (UTC)