Showing posts with label factor. Show all posts
Showing posts with label factor. Show all posts

Thursday, September 16, 2010

tvnz-grab in factor

The previous post in factor. I wrote it so my Windows using friends could have a single-binary solution. Download the zip archive and unzip somewhere in the windows path; C:\windows\system32 will do.

Usage: tvnz-grab <episode-page-url>. It will download all parts of the episode into the current directory.

Unfortunately, this only works for NZ content. Foreign content uses Adobe’s encrypted RTMP protocol. To get at episodes using that, check out rtmpsuck.

Imports first.

! Copyright (C) 2010 Jeremy Hughes.
! See http://factorcode.org/license.txt for BSD license.
USING: accessors assocs combinators.short-circuit fry
http.client io.streams.byte-array kernel namespaces make
regexp sequences xml xml.data locals splitting strings
io.encodings.binary io io.files command-line http system
math.parser destructors math math.functions io.pathnames
continuations xml.traversal ;
IN: tvnz-grab

Because I intend to extend this program into a small Qt demo, it is necessary that any words used to display UI information, dispatch on the type of UI.

SYMBOL: ui
SINGLETON: text

And the display methods themselves…

HOOK: show-progress ui ( chunk full -- )
HOOK: show-begin-fetch ui ( url -- )
HOOK: show-end-fetch ui ( -- )
HOOK: show-page-fetch ui ( -- )
HOOK: show-playlist ui ( seq -- )
HOOK: show-fatal-error ui ( error -- )

M\ text show-progress uses the symbols bytes and count to keep track of the number of bytes downloaded and the proportion of progress-bar fill respectively.

: print-bar ( full chunk -- )
    count [
        [ swap / 50 * round ] dip [
            - CHAR: =
             >string write
        ] [ drop ] 2bi
    ] change ;

M: text show-progress
    swap bytes [ + [ print-bar ] keep ] change flush ;

M: text show-begin-fetch
    "Fetching " write print "[" write flush ;

M: text show-end-fetch
    "]" print flush ;

M: text show-page-fetch
    "Fetching TVNZ page..." print flush ;

M: text show-playlist
    length "Found " write number>string write " parts." print
    flush ;

M: text show-fatal-error
    dup string? [ print ]
    [ drop "Oops! Something went wrong." print ] if 1 exit ;

Failed HTTP request errors need to be wrapped in a user friendly explanation.

: wrap-failed-request ( err -- * )
    [
        "HTTP request failed: " % [ message>> % ]
        [ " (" % code>> number>string % ")" % ] bi
    ] "" make throw ;

The playlist parameter in each episode’s web page is in a section of code looking like this.

var videoVars = {
    playlist: '/content/3685181/ta_ent_smil_skin.smil',
    config: '/fmsconfig.xml',
    ord: ord
};

Given this code could change unpredictably, we’ll use nothing more robust than a regular expression to get at the playlist path.

: get-playlist ( url -- data )
    http-get [ check-response drop ]
    [ R/ (?<=playlist: ').*(?=')/ first-match ] bi* [
        "http://tvnz.co.nz" prepend http-get [
            [ check-response drop ]
            [ wrap-failed-request ] recover
        ] dip
    ] [ "Could not find playlist at address." throw ] if* ;

The playlist is an XML file of which only the video elements concern us.

<video src="path-to-segment.flv" systemBitrate="700000"/>

700000 appears to be the highest bit rate so that is what we’ll go for.

parse-playlist takes the output of get-playlist and returns a list of URLs to episode segments.

: parse-playlist ( data -- urls )
    bytes>xml body>> "video" "700000" "systemBitrate"
    deep-tags-named-with-attr
    [ [ drop "src" ] [ attrs>> ] bi at ] map [ ] filter ;

Each segment is downloaded in chunks.

: call-progress ( data -- )
    length response get check-response
    "content-length" header string>number show-progress ;

: process-chunk ( data stream -- )
    [ stream-write ] [ drop call-progress ] 2bi ;

: get-video-segment ( url -- )
    [ show-begin-fetch ] [ ]
    [ part-name binary  ] tri
    [ '[ _ process-chunk ] with-http-get drop flush ]
    with-disposal show-end-fetch ;

: get-video-segments ( urls -- )
    [ get-video-segment ] each ;

grab-episode is where the action starts.

: (grab-episode) ( url -- )
    show-page-fetch get-playlist parse-playlist dup
    show-playlist [
        0 bytes count [ set ] bi-curry@ bi get-video-segments
    ] with-scope ;

: grab-episode ( url -- )
    [ (grab-episode) ] [ nip show-fatal-error ] recover ;

For the complete program see my github.

Friday, August 20, 2010

Why I like Factor

Factor is a relatively new language and implementation in the tradition of Forth, Lisp, and Smalltalk. Like Forth, Factor is concatenative and uses a postfix syntax. Also like Forth, Factor emphasises small procedures and constant refactoring. Like many Lisp and Smalltalk implementations, Factor compiles code into a loadable image. Like Lisp, Factor can perform arbitrary computation at compile time using macros while parse time evaluation allows the creation of new syntax forms.

Unlike Forth, Lisp, and Smalltalk, Factor is modern and unencumbered. Lisp suffers from an ossified specification, Smalltalk has lived inside its own image for so long it's out of touch with the rest of the world, and Forth provides few of Factor's higher level features like garbage collection and dynamic code updating.

Factor’s focus on correctness and efficiency is not commonly found in other modern ‘dynamic’ languages. The results speak for themselves.

1. Factor vocabularies are compiled to machine code. When using the interactive listener the code is compiled before execution (like SBCL).

2. Factor is fast. Not C fast, but SBCL fast.

3. Almost all structures are modifiable at runtime. This includes classes at any position in the hierarchy as well as FFI bindings. When reloading a modified vocabulary, definitions no longer present will be deleted from the running image. These changes are propagated to all dependent code and the changes to the running image are kept consistent.

4. A simple foreign function interface. No quirky header files or IDL necessary; everything is done from within Factor. No reloading of Factor necessary either: create bindings incrementally and test them as you go.

5. Deployment images. Factor’s deployment tool only loads code the resultant binary will run. Minimal image size for a hello world program is a few hundred kilobytes.

6. Common Lisp style condition system. Don’t let stack unwinding lose an exception’s context. Recover anywhere between where the exception is caught and where it was thrown.

Why having a competent designer pays off

In contrast to other popular languages like Ruby and Python, Factor does not suffer from limiting creeds like, “There should be one way to do it,” or from BDFLs who don’t see the value in useful language constructs discovered as early as the 70s. Here's the payoff...

7. Correct lexical scoping. Usually in Factor one uses its postfix syntax and data flow combinators, however it also sports a locals vocabulary defined entirely in factor which implements lexical scoping and lambdas. Despite the fact that this is an addition in a language where lexical variables are not the default, Factor's lexical scoping is correct and its lambdas are uncrippled. Python 2.x is unable to rebind a variable in a parent scope (3.x overcomes this by introducing a new keyword, ugh!), and Ruby only recently (1.9) gave blocks their own scope.

8. Usable higher order functions. Combinators are used in Factor all the time. For some unfathomable reason Ruby allows only one block argument per function while in Python the use of such esoterica as map and reduce is discouraged.

9. Tail call optimization. Yep, that’s right, that helpful recursion thingy. Another thing Python doesn't have because, oh I don’t know, ask Guido. Come to Factor and free your mind from loopiness!

10. Low, hi, and FFI. Factor scales well from highly abstracted garbage collected code, to micro optimization using inline ASM. Along the way, the FFI allows Factor quotations to be used as C callbacks, and provides factor-side memory allocation should you need to store values on the stack or in a memory pool.

11. Methods are orthogonal to classes. Generic words and their methods are defined outside classes. Adding a generic word to a preexisting class is as simple as defining it. Extensibility without nasty monkey patching or name clashes, and a nice fit with a hierarchy that can contain anything from tuples, to predicate classes and C structs. No hideously bloated base-class needed (I'm looking at you Smalltalk). If Programmer Pooh adds a display-in-opengl method to object he need not modify core code, nor will his method clash with any present or future methods on object.

Why having a smart community pays off

Smart, or at least willing to learn new things.

12. No whiners! Try educating a bunch of Java programmers in the utility of higher order functions or tail-recursion...

13. Macros, syntax, and combinators, oh my! If you don’t whine, you get cool stuff. Macros akin to those in Common Lisp, but hygienic like Scheme, because if you don’t have variables, you can’t capture them. Syntax words like Lisp’s reader macros, but better because no dispatch character craziness is necessary. Observe the usefulness of regular expressions that are syntax checked at parse time, or the EBNF vocabulary which compiles an inline EBNF description into parsing code. For higher order goodness check out Factor’s unique dataflow combinators. This is not your ancestor’s stack shuffling!

More stuff I like

14. Live documentation. Factor has live documentation ala Emacs, but prettier and better linked. The documentation is contained in separate files to the code it describes. For those of us who hate hunting for scratchings of code amid screeds of API comments, this is a good thing.

An acceptable Lisp?

Someone once wrote about Ruby being an acceptable Lisp. They were wrong. Ruby doesn’t have macros, reader macros, native compilation, or a REPL from which everything can be modified. (Oh, but it does have three different kinds of lambda!)

Factor isn’t an acceptable Lisp either. Factor is a mighty fine Lisp. Factor is better than Lisp. It has all the things that make Lisp great and more. Factor will make your code beautiful. Factor will cook you breakfast. Factor reads like English, (lisp (like (not))). All hail Factor!

Summation

I like Factor because it hits the sweet spot of pandering to nothing but being a great language. It lifts much of the good stuff from great languages of yore and gives them an improved spin. It hasn’t yet succumbed to the whining hordes of mediocrity. It is written by a team that really know what they are doing. If that doesn’t describe your language, then give Factor a try.

Friday, July 17, 2009

alien.marshall: marshalling between factor and C

alien.inline is nice, but it would be even nicer to have factor values automatically marshalled to and from their C equivalents. alien.marshall enables this.

A short example:

USING: alien.inline.syntax alien.marshall.syntax ;
IN: marshall-test

C-LIBRARY: short-example

CM-STRUCTURE: rectangle
   { "int" "width" }
   { "int" "height" } ;

CM-FUNCTION: int area ( rectangle c )
   return c.width * c.height;
;

CM-FUNCTION: void incr ( int* a, int delta )
   *a += delta;
;

;C-LIBRARY

and output:

( scratchpad ) <rectangle> 3 >>width 5 >>height

--- Data stack:
T{ rectangle f ALIEN: 36777744 f }
( scratchpad ) area

--- Data stack:
15
( scratchpad ) 3 incr

--- Data stack:
18

As you can see, struct fields are marshalled, as are struct arguments. Output parameters are unmarshalled and pushed on the stack after the return value (if not void).

Non–false c-ptrs are not marshalled, they are passed to the C function unchanged. It is assumed that if you pass a c-ptr you know what you are doing and can clean up after yourself.

Return values and output parameters which are pointers, are assumed to be pointers to a single value. Factor words which call C functions returning pointers to arrays (single or multi–dimensional) will need to include hand–coded unmarshalling.

Return pointers and output pointers are freed after unmarshalling. Struct fields are an exception to this: fields containing pointers will need to be explicitly freed once the struct is no longer needed (overriding the struct’s dispose* method is a good way to do this).

alien.marshallwords follow the same pattern as alien.inline, but with a CM- prefix instead of C-.

There are also M- prefixed words. These do not generate C code. They behave like their counterparts in alien.syntax with the addition of marshalling and unmarshalling of values.

Next up: alien.c++-templates

Thursday, July 9, 2009

alien.inline: inline C for factor

15/7/2009 updated example to match source changes

Factor's FFI is rather nice in that it doesn't use C headers, however there are still times when writing a little glue code in C is necessary (e.g. to make macro values or C++ methods available to the factor FFi). This task is made more tractable by alien.inline which enables writing C code in factor vocabularies. Said code is automatically compiled and linked to when the vocabulary is loaded.

Here is a short example:

USING: alien.inline.syntax ;
IN: inline-test

C-LIBRARY: short-example

COMPILE-AS-C++

C-INCLUDE: <string>

C-TYPEDEF: std::string stdstr

C-FUNCTION: stdstr* new_std_str ( const-char* s )
    stdstr* x = new stdstr(s);
    return x;
;

ALIAS: <stdstr> new_std_str

C-FUNCTION: const-char* std_to_c_str ( stdstr* s )
    return s->c_str();
;

ALIAS: std>c-str std_to_c_str

C-STRUCTURE: rectangle
    { "int" "width" }
    { "int" "height" } ;

C-FUNCTION: int area ( rectangle c )
    return c.width * c.height;
;

;C-LIBRARY
      

and output, which is no different to normal FFI usage:

( scratchpad ) "abc" <stdstr>

--- Data stack:
ALIEN: 16240640
( scratchpad ) std>c-str

--- Data stack:
"abc"
( scratchpad ) "rectangle" <c-object> 3 over set-rectangle-width 5 over set-rectangle-height

--- Data stack:
B{ 3 0 0 0 5 0 0 0 }
( scratchpad ) area

--- Data stack:
15
      

All the parsing words. Some of them perform the same function as their analogues in alien.syntax, but also generate the equivalent C code. Each parsing word has a runtime equivalent.

C-LIBRARY: name
sets up variables
COMPILE-AS-C++
treat generated code as C++
C-INCLUDE: name
generates #include name
C-LINK: name
adds -lname to linker command
C-FRAMEWORK: name
adds -framework name to linker command (OS X only)
C-LINK/FRAMEWORK: name
equivalent to C-FRAMEWORK: on OS X and C-LINK: everywhere else
C-TYPEDEF: old new
like TYPEDEF: but generates a C typedef statement too
C-STRUCTURE: name pairs... ;
like C-STRUCT: but also generates equivalent C code
C-FUNCTION: return name args body... ;
like FUNCTION: but with a function body written in C or C++
RAW-C: multiline-string ;
insert a string into the generated C/C++ file; useful for macros and other details not implemented in alien.inline; works the same as STRING:
;C-LIBRARY
writes, compiles, and links generated code, then calls add-library; does nothing if the shared library is younger than the factor source file
DELETE-C-LIBRARY: name
delete the shared library file corresponding to name (must be executed in the vocabulary where name is defined); mainly useful in unit tests

The library file produced by the above example would be named libinline-test_short-example; factor would see it as inline-test_short-example. Source and unlinked object files are written to resource:temp/ and linked libraries are written to resource:alien-inline-libs/.

As of 9/7/2009 alien.inline is brand new, so there likely will be corner cases it does not handle so well. If you run into trouble, pipe up on the factor mailing list or look for jedahu on #concatenative.

Next up: alien.marshall