Nikodemus SiivolaCurios, oddities, vagaries and anomalies
-- all to amuse the old and delight the young. Lisp hacks sold during
intermission.
http://random-state.net/
Efficient Doesn't Equal Performanthacking
http://random-state.net/log/2013-03-16-efficient-doesnt-equal-performant.html
Sat, 16 Mar 2013 15:24:17 +0300
<p><i>Edit: Philipp Marek kindly pointed out that where I claimed to
have 1M entries, I only had 100k. Oops! This has been corrected below,
other numbers altered correspondingly, and everything rerun. My point
still stands.</i></p>
<p>This is a bit of a rant, I guess. Sorry about that.</p>
<p>A few years back I needed a binary heap, and I needed one that was
fast and thread safe. So I wrote <a
href="http://nikodemus.github.com/pileup/">Pileup</a>.</p>
<p>There are other heaps for Common Lisp, and some of them support
operations Pileup doesn't implement out of the box, and all of them
claim to be efficient.</p>
<p>...and I'm sure that algorithmically they are. However, <i>constant
factors matter</i> more often than you might think. I'll single out
<tt><a href="https://github.com/TheRiver/CL-HEAP">CL-HEAP</a></tt>
primarily because it has such an authoritative name. :)<p>
<p>A tiny benchmark:</p>
<pre>(defvar *100k-numbers* (loop repeat 100000
collect (random most-positive-fixnum)))
(defvar *400-numbers* (loop repeat 400
collect (random most-positive-fixnum)))
(defun insert-and-pop (isert pop heap things)
(declare (function insert pop))
(dolist (thing things)
(funcall insert heap thing))
(loop for thing = (funcall pop heap)
while thing))
(defun make-insert-and-pop (make insert pop things)
(declare (function make))
(insert-and-pop insert pop (funcall make) things))
(defun time-insert-and-pop (insert pop heap things)
;; Time 4 runs.
(time
(loop repeat 4 do (insert-and-pop insert pop heap things)))
t)
(defun time-make-insert-and-pop (make insert pop things)
;; Time 1000 runs.
(time
(loop repeat 1000 do (make-insert-and-pop make insert pop things)))
t)
(defun cl-heap-make ()
(make-instance 'cl-heap:priority-queue))
(defun cl-heap-insert (heap thing)
(cl-heap:enqueue heap thing thing))
(defun cl-heap-pop (heap)
(cl-heap:dequeue heap))
(defun pileup-make ()
(pileup:make-heap #'<))
(defun pileup-insert (heap thing)
(pileup:heap-insert thing heap))
(defun pileup-pop (heap)
(pileup:heap-pop heap))
;;; CL-HEAP: insert and pop 100k numbers x 4 into a single queue
(time-insert-and-pop #'cl-heap-insert
#'cl-heap-pop
(cl-heap-make)
*100k-numbers*)
;;; PILEUP: insert and pop 100k numbers x 4 into a single queue
(time-insert-and-pop #'pileup-insert
#'pileup-pop
(pileup-make)
*100k-numbers*)
;;; CL-HEAP: make 1k heaps, insert and pop 400 numbers into each
(time-make-insert-and-pop #'cl-heap-make
#'cl-heap-insert
#'cl-heap-pop
*400-numbers*)
;;; PILEUP: make 1k heaps, insert and pop 400 numbers into each
(time-make-insert-and-pop #'pileup-make
#'pileup-insert
#'pileup-pop
*400-numbers*)</pre>
<p><b>Results:</b> (warmup run, then median of three runs for each)</p>
<pre class="small">;;; CL-HEAP: insert and pop 100k numbers x 4 into a single queue
Evaluation took:
6.038 seconds of real time
5.999279 seconds of total run time (5.808912 user, 0.190367 system)
[ Run times consist of 0.355 seconds GC time, and 5.645 seconds non-GC time. ]
99.35% CPU
15,660,756,020 processor cycles
208,397,472 bytes consed
;;; PILEUP: insert and pop 100k numbers x 4 into a single queue
Evaluation took:
0.430 seconds of real time
0.431266 seconds of total run time (0.429962 user, 0.001304 system)
100.23% CPU
1,115,426,026 processor cycles
3,053,536 bytes consed
;;; CL-HEAP: make 1k heaps, insert and pop 4k numbers
Evaluation took:
2.830 seconds of real time
2.839067 seconds of total run time (2.829425 user, 0.009642 system)
[ Run times consist of 0.031 seconds GC time, and 2.809 seconds non-GC time. ]
100.32% CPU
7,338,649,539 processor cycles
182,811,168 bytes consed
;;; PILEUP: make 1k heaps, insert and pop 4k numbers
Evaluation took:
0.301 seconds of real time
0.301773 seconds of total run time (0.300767 user, 0.001006 system)
100.33% CPU
779,991,108 processor cycles
12,540,864 bytes consed</pre>
<p>(I was also going to compare parallel performance, but CL-HEAP
doesn't appear to be thread-safe, so...)</p>
<p>This is not to disparage <tt>CL-HEAP</tt>: it supports things which
Pileup doesn't, but it clearly isn't written with constant factors in
mind, and this shows.</p>
<p><b><i>Constant factors matter.</i></b></p>
<p>(Admittedly, I tested this only on SBCL, and it might turn out that
CL-HEAP does a lot better -- and Pileup a lot worse -- on some other
implementation. This does not alter my main contention that you ignore
constant factors at your own peril.)</p>
Userspace Threads for SBCL -- a short discussionhacking
http://random-state.net/log/2012-10-06-userspace-threads-for-sbcl----a-short-discussion.html
Sat, 6 Oct 2012 13:22:47 +0300
<p>(This is in response to a wish from an IndieGoGo funder.)</p>
<p>First off a disclaimer: I'm not really into green threads / fibers
distinction, so I'm just going to be rambling about userspace threads
in general. I'm also making the assumption that having the option of
userspace threads in addition to native threads would be a good thing,
and not spending time on ramifications of that.</p>
<p>I'm also <em>not</em> working, or planning to work on this area in
near future. Consider this a vague roadmap for those wanting to look
into doing this.</p>
<p><b>Are Some Threads More Equal Than Others?</b></p>
<p>How are userspace threads distinct from native threads?</p>
<p>Does</p>
<pre>(subtypep 'userspace-thread 'thread)</pre>
<p>hold? Is there going to even be a distinct userspace thread type?</p>
<p>Because semantically userspace threads should mostly be
indistinguishable from native threads (ie. dynamic binding works the
same way, locks work the same way, etc), I think they should indeed be
just like threads except for the "who is responsible for scheduling"
bit.</p>
<p>So I'm thinking all lisp threads are really going to be userspace
threads, and it's just that some of them have a dedicated OS thread
from the start.</p>
<p>Let's say <tt>MAKE-THREAD</tt> grows an argument <tt>:RUN</tt>, which
defaults to true. If it's <tt>NIL</tt> you get an inert suspended
thread object that won't run until someone yields to it.</p>
<p>So from lisp land all threads are going to look identical -- but with
the new distinction that some lisp threads may be in suspended state,
not being currently run by any OS thread.</p>
<p><b>Lies, Statistics, and Schedules</b></p>
<p>How does scheduling work?</p>
<p>Do users of userspace threads need explicit scheduling, or does the
system eg. automatically schedule them on blocking IO?</p>
<p>I think both have merits, but automatic scheduling on blocking IO is
really explicit scheduling under the hood, so let's consider that
only.</p>
<p>We already have <tt>#'THREAD-YIELD</tt>. Let's just add an optional
argument to it. If that argument is a thread that isn't currently
running, we yield to it -- otherwise we will just consider it a hint
that our processor resources could be better spent elsewhere now.
(Or maybe "yield execution context" and "yield processor time" should
not be mixed up? Hm.)</p>
<p>It's possible that we may also want <tt>SUSPEND-THREAD</tt>, which
dissociates a lisp thread from the underlying OS thread, and
<tt>RUN-THREAD</tt> which starts a new OS thread in order to run a
suspended lisp thread, but I'll ignore them for now.</p>
<p>One thing that this does mean, which the current system may have
conflicting assumptions about, is that the OS thread associated with a
single lisp thread may change over it's lifetime. This needs to be
checked. (Or indeed that a thread has an OS thread associated with
it!)</p>
<p>We're also going to need critical sections during which the scheduling
status of the thread (ie, if it's currently running == associated with
an OS thread) cannot be changed. Not sure if
<tt>WITHOUT-INTERRUPTS</tt> should subsume this, or if it has to be
distinct from it.</p>
<p><b>Enough About Design, Let's Do This!</b></p>
<p>So, what does a thread need? A stack and a context, pretty much.</p>
<p>I'll make the wild assumption that we're on a platform with fully
functional <tt><a
href="http://linux.die.net/man/3/swapcontext">swapcontext(3)</a></tt>. I
have sometimes heard whispered that those API's aren't all that great,
so it's possible that we may need to implement them in asm on our own
-- but I haven't ever used them personally, so I don't really claim to
know.</p>
<p>If that is how we're going to be switching from one userspace thread
to another, how do we make it play nice with the rest of SBCL?</p>
<p>Let's start by taking a look at <tt>MAKE-THREAD</tt>. At first
blush at least it looks to me like the only thing that really needs to
be different for userspace threads is the call to
<tt>%CREATE-THREAD</tt>, which currently ends up doing the
following:</p>
<ul>
<li><p>Creates the C-side thread struct, which contains the stack(s) and
thread-local bindings, and a bunch of other stuff. What it doesn't
currently have is space for everything <tt>swapcontext(3)</tt> needs, so we'll
need to add that.</p></li>
<li><p>Creates the OS thread, including all the signal handling setup,
etc.</p>
<p>Definitely prime reading ground for anyone looking to add
userspace threads to SBCL: the stuff that needs to happen when we
switch to a new thread is going to look a lot like
<tt>create_os_thread</tt> and <tt>new_thread_trampoline</tt>.</p>
<p>This gets factored into <tt>RUN-THREAD</tt> and
<tt>THREAD-YIELD</tt>, pretty much (or at least the C-code those
will end up calling). Not rocket science, but a lot of
details...</p></li>
</ul>
<p>(Unless you're just skimming this, go ahead and at least skim the
relevant parts of the code.)</p>
<p>The other end of thread lifetime is another obvious place to look in
-- but mostly it comes down to undoing whatever was done when the
thread was created. This raises a hairy design question, though: do OS
threads die when the current thread associated with them dies? I don't
know. I suspect this points to problem in my overall design, but
possibly it is a simple policy question.</p>
<p>The final place that needs attention is GC: it needs to be able to
find all C-side thread structs in order to scavenge their stacks, and
it needs to know how to scavenge the contexts of suspended threads
as well -- not rocket science, again, but details.</p>
<p>Is this all? <em>Probably not!</em></p>
<p>I'm pretty sure signal handling needs some very careful
consideration -- but if <tt>WITHOUT-INTERRUPTS</tt> also means
"without userspace thread state changes", then possibly current code
is a decent match.</p>
<p>I think the easiest way to find out what is missing, however, is to
start working towards an implementation.</p>
<p>The biggest issue with this sketch in my mind is the question of
thread death mentioned above. The easiest way to solve it (not
necessarily the best!) would be to say that each OS thread does indeed
die when the currently executing lisp thread dies. The second easiest
would be to have something like <tt>QUEUE-THREAD</tt>, which would
mean that when the next lisp thread dies, the queued one should
receive the OS thread instead of it going the way of the dodo.</p>
<p>...and now I'm out of time, and this still needs proofreading.
Hopefully this inspires someone to do somethign awesome. :)</p>
<p>Happy Hacking!</p>
<p>Addendum: locking really needs thinking about. Suspending a thread
that holds locks is not going to end well, or yielding while holding
locks. Not sure if the locking API should be clever about this, or if
this can all be punted to the users.</p>
Is That A Rest-List In Your Pocket?hacking
http://random-state.net/log/2012-09-23-is-that-a-rest-list-in-your-pocket.html
Sun, 23 Sep 2012 17:37:00 +0300
<b>Prelude</b>
<p>SBCL has for a while now been able to elide <tt>&REST</tt> list allocation when it
is only used as an argument to <tt>APPLY</tt>, so</p>
<pre>(defun foo (&rest args) (apply #'bar args))</pre>
<p>is non-consing if <tt>BAR</tt> is. Note: I'm not saying eliding
heap-allocation, I'm saying eliding list allocation completely:
instead of moving the arguments into a stack or heap allocated list
and then pulling them out later, the compiler generates code that
directly passes them as arguments to <tt>FOO</tt>.</p>
<p>This doesn't make a difference to stack-allocation if all you look at is
heap-consing, but it does save a noticeable amount of work at runtime, and
it doesn't break tail-calls like stack allocation does.</p>
<p>That's how far it went, however: if you did anything else with the
rest-list, the compiler gave up and allocated the full list -- on
stack if you asked for that using <tt>DYNAMIC-EXTENT</tt>.</p>
<b>First Act</b>
<p>Earlier this week Nathan Froyd (an SBCL hacker, and him of <a
href="http://method-combination.net/lisp/ironclad/">Ironclad</a>-fame)
committed a change that fixed a rather embarrassing oversight: we were
<em>heap allocating</em> the rest-lists in full calls to vararg entry
points to arithmetic functions like + and *.</p>
<p>This is less catastrophic for most code than you might imagine,
since SBCL works pretty hard to call more efficient entry points -- so
those calls are virtually never seen in performance sensitive
code.</p>
<p>Doesn't make it any less embarrassing, though. Years and years
it's been like that, until Nathan noticed.</p>
<p>Nathan fixed the heap consing by adding <tt>DYNAMIC-EXTENT</tt> declarations
to those functions involved, which not only reduced GC pressure a bit, but
provided a small performance boost.</p>
<b>Second Act</b>
<p>Adding those <tt>DYNAMIC-EXTENT</tt> declarations had another side effect
as well -- a couple of backtrace tests broke from to unexpected frames, due
to tail-calls being foiled by the stack allocation: several tests used division
by zero to trigger an error, so the arithmetic changes showed up there.</p>
<p>That would have been a fair tradeoff, and the backtrace tests could just
have been adjusted to allow the extra frames, but we could do a bit better.</p>
<p>SBCL has an extra (internal, unsupported) lambda-list keyword: <tt>SB-INT:&MORE</tt>,
which is a fair deal hairier to use than <tt>&REST</tt>, but allows dealing with variable
arguments without any consing -- heap or stack. So those arithmetic functions got
changed to use <tt>SB-INT:&MORE</tt> instead, which fixed the backtrace tests and
gave another small performance boost.</tt></p>
<b>Third Act</b>
<p>I was looking at the <tt>SB-INT:&MORE</tt> changes, and wondering if we should
expose it to users as well, since it obviously is useful occasionally -- and what
kind of interface cleanup that would entail.</p>
<p>Thinking about that I realized that I could just extend the
compiler smarts for dealing with <tt>&REST</tt> instead. Under the
hood, when SBCL optimizes an <tt>APPLY</tt> with a rest list as the
final argument, it actually changes into using <tt>&MORE</tt>.</p>
<p>So, I extended that part of the compiler to deal with (for starters) a
few list functions that would be sufficient for implementing the arithmetic
functions using rest lists and compiler magic.</p>
<p>The conversion path looks roughly like this:</p>
<pre>;;; Original source, using LENGTH as an example
(defun foo (&rest args)
(length args))
;;; Compiler adds hidden &MORE arguments when it sees the &REST.
(lambda (&rest args &more #:context #:count)
(length args))
;;; Source level transformation notices LENGTH is applied to a &REST
;;; argument and transform into %REST-LENGTH.
(lambda (&rest args &more #:context #:count)
(sb-c::%rest-length args #:context #:count))
;;; During optimization another transformation sees the %REST-LENGTH,
;;; and verifies that the rest list is never modified, or used in
;;; any place that would require actually allocating it -- this being
;;; the case, it proceeds.
(lambda (&rest args &more #:context #:count)
#:count)
;;; Since the rest list isn't used anymore, it is deleted.
(lambda (&more #:context #:count)
#:count)</pre>
<p>That's it, approximately. Currently this can be done for:
<tt>ELT</tt>, <tt>NTH</tt>, <tt>CAR</tt>, <tt>FIRST</tt>,
<tt>LENGTH</tt>, <tt>LIST-LENGTH</tt>, and <tt>VALUES-LIST</tt> -- and
additionally using a rest-list as the test-form in an <tt>IF</tt> is
equally efficient and doesn't force its allocation.</p>
<p><tt>LENGTH</tt>, <tt>ELT</tt>, and <tt>NTH</tt> on rest-lists
deserve a special mention: they're all O(1) when this optimization has
been applied.</p>
<p>Unfortunately we don't <em>yet</em> have any compiler notes about this, so
if you intend to take advantage of this optimization, you're best off verifying
the results from assembly.</p>
<b>Coda</b>
<p>With that in place, I rewrote the vararg arithmetic functions using <tt>&REST</tt>. Amusingly
they now look rather charmingly naive: the way someone who doesn't understand the cost of list
traversal would write things:</p>
<pre>(defun - (number &rest more-numbers)
(if more-numbers
(let ((result number))
(dotimes (i (length more-numbers) result)
(setf result (- result (nth i more-numbers)))))
(- number)))</pre>
<p>...but using bleeding edge SBCL, this compiles into rather nice code.</p>
<p>Finally, some pretty pictures. These are benchmark results for calling the vararg <tt>#'+</tt>
with 2, 4, or 8 arguments. F means fixnum, S a single float, and D a double float. The numbers are
benchmark iterations per second, so bigger is better. Topmost chart is for the current version
using SBCL's newly found rest-smarts, middle chart is for the version using <tt>DYNAMIC-EXTENT</tt>,
and bottom one is for the version before all this madness started.</p>
<p><a href="http://random-state.net/files/sbcl-vararg-plus-2012-09-lin.html">Benchmarks, linear scale.</a></p>
<p><a href="http://random-state.net/files/sbcl-vararg-plus-2012-09-log.html">Benchmarks, logarithmic scale.</a></p>
<p>If you look at the vararg+[ff], vararg+[ffff], and vararg+[ffffffff] benchmarks, you can see how the &REST
list allacation and access costs almost dominate them: even with stack allocation going from 8 to 2 arguments
barely doubles the speed; with the latest version each halving of the argument count doubles the speed for both
the fixnums-only and the singles-floats-only benchmarks.</p>
<p>This was run on x86-64, so both single-floats and fixnums are immediate objects. Doubles, however, need
heap allocation here -- so if you look at the double float numbers some of the allocation costs come from
the numbers and intermediate results.</p>
<p>...but before you get too excited about these numbers, remember the
reason why no-one noticed this for such a long time: in real-world
performance sensitive code these entry points don't really matter that
much.</p>
Neat TYPEP Trickhacking
http://random-state.net/log/2012-05-15-neat-typep-trick.html
Tue, 15 May 2012 20:19:49 +0300
<p>How do you test if an object is a cons that has the desired symbol
in the car?</p>
<pre>(typep x '(cons (eql :foo)))</pre>
<p>Sure,</p>
<pre>(and (consp x) (eq :foo (car x)))</pre>
<p>is essentially just as short...</p>
<p>I still find cons types neat, even if they're a nightmare when it
comes to type derivation, but that's a different matter. Some
nightmares aren't <em>all</em> bad.</p>
Trolledhacking
http://random-state.net/log/2012-05-15-trolled.html
Tue, 15 May 2012 08:36:46 +0300
<pre>(set-macro-character #\{ (lambda (s c) (read-delimited-list #\} s t)) nil)
(set-macro-character #\} (get-macro-character #\)) nil)</pre>
<p>EDIT: <em>I'm not saying <a
href="http://www.didierverna.com/sciblog/index.php?post/2012/05/14/Monday-Troll%3A-the-syntax-extension-myth">Didier</a>
is wrong. He isn't. While readmacros are extremely flexible, the
system has some rather obvious shortcomings.</em></p>
<p><em>I'm just pointing out that making <tt>{foo}</tt> read as list
is trivial in case someone walked away from his post with the
impression that it cannot be done.</em></p>
<p><em>If there's something I disagree with, it is his
characterization of readmacros as "shaky". Used to do things they can
do, in the manner they're designed to be used, they're perfectly
robust. Just not as nice or convenient as I would like them to
be.</em></p>
<p><em>The really interesting bit is his observation that
<tt>READ-DELIMITED-LIST</tt> doesn't support dotted lists, which I
hadn't realized before. It isn't specified to, but it would be nice if
it did...</em></p>
MADEIRA-PORThacking
http://random-state.net/log/2012-05-08-madeira-port.html
Tue, 8 May 2012 08:33:41 +0300
<p>This isn't Madeira proper yet, but something small and useful on
it's own, I hope: <a href="https://github.com/nikodemus/madeira-port">MADEIRA-PORT</a>.</p>
<p>Main feature is :MADEIRA-PORT ASDF component class:</p>
<pre>(defsystem :foo
:defsystem-depends-on (:madeira-port)
:serial t
:components
((:file "package")
(:module "ports"
:components
((:madeira-port "sbcl" :when :sbcl)
(:madeira-port "ccl" :when :ccl)
(:madeira-port "ansi" :unless (:or :sbcl :ccl))))
(:file "foo")))</pre>
<p>The :WHEN and :UNLESS options support an extended feature syntax,
which allows things such as:</p>
<pre>(:find-package :swank)
(:find-function #:exit :sb-ext)</pre>
<p>This extended feature syntax is also accessible by calling
EXTEND-FEATURE-SYNTAX at the toplevel, after which regular #+ and #-
readmacros will also understand it -- but unfortunately they will also
lose any implementation specific extensions in the process.</p>
<p>Happy Hacking!</p>
Please Don't Use SB-UNIXhacking
http://random-state.net/log/2012-05-01-please-dont-use-sb-unix.html
Tue, 1 May 2012 15:39:35 +0300
<p><tt>SB-UNIX</tt> is an internal implementation package. If you use
functionality provided by it, sooner or later your code will break,
because SBCL changed its internals: It is subject to change without
notice. When we stop using a function that used to live there, that
function gets deleted. Things may also change names and
interfaces.</p>
<pre>CL-USER> (documentation (find-package :sb-unix) t)
"private: a wrapper layer for SBCL itself to use when talking
with an underlying Unix-y operating system.
This was a public package in CMU CL, but that was different.
CMU CL's UNIX package tried to provide a comprehensive,
stable Unix interface suitable for the end user.
This package only tries to implement what happens to be
needed by the current implementation of SBCL, and makes
no guarantees of interface stability."</pre>
<p>Instead, use either <tt>SB-POSIX</tt> (which is the supported
external API), or call the foreign functions directly. Alternatively,
if you're using something from <tt>SB-UNIX</tt> that doesn't have a
counterpart in <tt>SB-POSIX</tt> or elsewhere, put a feature request / bug
report on <a
href="https://bugs.launchpad.net/sbcl/+filebug">Launchpad</a>
explaining what you need. Just saying "wanted: a supported equivalent
of <tt>SB-UNIX:UNIX-FOO</tt>" is enough.</p>
<p>(The same holds more or less for all internal packages, of course,
but <tt>SB-UNIX</tt> is the most common offender.)</p>
<p>I realize this is an imperfect world, and sometimes using an
unsupported API is the best thing you can do, but <i>please</i> try to
avoid this especially in libraries used by other people as well.</p>
Updated Common Lisp FAQhacking
http://random-state.net/log/updated-common-lisp-faq.html
Fri, 6 Apr 2012 14:03:32 +0300
<p>I updated my <a
href="http://random-state.net/files/nikodemus-cl-faq.txt">Common Lisp
FAQ</a>. If you spot any glaring errors or omissions, please let me
know.</p>
<p>EDIT: There's a <a
href="http://random-state.net/files/nikodemus-cl-faq.html">HTML
version</a> as well -- finally found my converter script.</p>
Holiday Hack: Bit Positionhacking
http://random-state.net/log/holiday-hack-bit-position.html
Fri, 30 Dec 2011 12:35:55 +0300
<p>Logically speaking, <tt>POSITION</tt> with trivial <tt>:KEY</tt>
and <tt>:TEST</tt> arguments should be much faster on bit-vectors than
on simple vectors: the system should be able to pull one words worth
of bits out of the vector at a single go, check if any are set (or
unset), and if so locate the one we're interested in -- else going on
to grab the next word.</p>
<p>Practically speaking, no-one who needed fast <tt>POSITION</tt> on
bit-vectors seems to have cared enough to implement it, and so until
yesterday (1.0.54.101) SBCL painstakingly pulled things one bit at a
time from the vector, creating a lot of unnecessary memory traffic and
branches.</p>
<p>How much of a difference does this make? I think the technical term
is "quite a bit of a difference." See <a
href="http://random-state.net/files/sbcl-bit-position-report.html">here</a>
for the benchmark results. First chart is from the new implementation,
second from the new one. Other calls to <tt>POSITION</tt> are included
for comparison: ones prefixed with <tt>generic-</tt> all go through
the full generic <tt>POSITION</tt>, while the others know the type of
the sequence at the call-site, and are able to sidestep a few
things.</p>
<p>So, if you at some point considered using bit-vectors, but decided
against them because <tt>POSITION</tt> wasn't up to snuff, now might
be a good time to revisit that decision.</p>
<p>Gory details at the end of <tt>src/code/bit-bash.lisp</tt>, full
story (including how the system dispatches to the specialized version)
best read from git.</p>
<p>Also, if you're looking for an SBCL project for next year, consider
the following:</p>
<ul>
<li>Using a similar strategy for <tt>POSITION</tt> on base-strings:
on a 64-bit system one memory read will net you 8 base-chars.</li>
<li>Using similar strategy for <tt>POSITION</tt> on all vectors
with element-type width of half-word or less.</li>
<li>Improving the performance of the generic <tt>POSITION</tt> for
other cases, using eg. specialized out-of-line versions.</li>
</ul>
<p>Happy Hacking and New Year!</p>
SBCL Threading Newshacking
http://random-state.net/log/sbcl-threading-news.html
Mon, 5 Dec 2011 18:47:29 +0300
<p><a href="http://www.sbcl.org/news.html#1.0.54">SBCL 1.0.54</a> is
barely out of the door, but I'm actually going to mention something
that went in the repository today, and will be in the next
release:</p>
<p>(TL;DR: Threads on Darwin are looking pretty solid right now. Go
give them a shake and let me know what falls out.)</p>
<pre>commit 8340bf74c31b29e9552ef8f705b6e1298547c6ab
Author: Nikodemus Siivola <[email protected]>
Date: Fri Nov 18 22:37:22 2011 +0200
semaphores in the runtime
Trivial refactorings:
* Rename STATE_SUSPENDED STATE_STOPPED for elegance. (Spells with
the same number of letters as STATE_RUNNING, things line up
nicer.)
* Re-express make_fixnum in terms of MAKE_FIXNUM so that we can
use the latter to define STATE_* names in a manner acceptable to
use in switch-statements.
* Move Mach exception handling initialization to darwin_init from
create_initial_thread so that current_mach_task gets initialized
before the first thread struct is initialized.
The Beef:
Replace condition variables in the runtime with semaphores.
On most platforms use sem_t, but on Darwin use semaphore_t. Hide
the difference behind, os_sem_t, os_sem_init, os_sem_destroy,
os_sem_post, and os_sem_wait.
POSIX realtime semaphores are supposedly safe to use in signal
handlers, unlike condition variables -- and experimentally at
least Mach semaphores on Darwin are a lot less prone to
problems.
(Our pthread mutex usage isn't quite kosher either, but it's the
pthread_cond_wait and pthread_cond_broadcast pair that seemed to
be causing most of the trouble.)</pre>
<p>(There are some other neat things lurking in HEAD in addition to this, but
I'll let you discover them for yourself.)</p>