Rustã§MD5ããã·ã¥ãè¨ç®ãã
å¥ã«é£ããã¯ãªããã©ããã使ãããã¨ããæã«æ¹æ³ãå¿ãã¦ãã¾ã£ã¦ãããä½åº¦ã調ã¹ç´ãã¦ããã®ã§ã¡ã¢ã«æ®ãã¦ããã
以ä¸ã®ããã«rust-cryptoã使ãã°å®ç¾å¯è½ã
ãµã³ãã«ããã¸ã§ã¯ãã®æºå:
$ rustc -V rustc 1.9.0 (e4e8b6668 2016-05-18) $ cargo new md5 --bin $ cd md5 $ echo 'rust-crypto = "*"' >> Cargo.toml $ vim src/main.rs # 以ä¸ãåç §
使ç¨ä¾:
extern crate crypto; use crypto::digest::Digest; use crypto::md5::Md5; fn main() { let mut md5 = Md5::new(); md5.input(b"hoge"); println!("hoge: {}", md5.result_str()); }
å®è¡çµæ:
$ cargo run
Running `target/debug/hoge`
hoge: ea703e7aa1efda0064eaa507d9e8ab7e
Rustã¨SBCLã§ã®DAWGæ§ç¯æ§è½ã®æ¯è¼ã¡ã¢
Rustã®åå¼·ãå
¼ãã¦ãcl-dawgã¨ããDAWGã®Common Lispå®è£
ã移æ¤ãã¦ãrust-dawgã¨ããã©ã¤ãã©ãªãä½ã£ã¦ã¿ãã
(DAWGã¯æ«å°¾é¨åãå
±æå¯è½ã«ãããã©ã¤ã®äºç¨®ãä¸è¨ã©ã¤ãã©ãªã§ã¯ããã®ãã©ã¤æ¨ãDoubleArrayå½¢å¼ã§è¡¨ç¾ãã¦ãããDAWGãDoubleArrayã®æ§ç¯æ¹æ³èªä½ã«é¢ãã¦ã¯ãéå»ã«ä½åº¦ãè¨äºãæ¸ãã¦ããã®ã§ããã§ã¯çç¥ãã)
両è ã®æ§è½æ¯è¼ã軽ãè¡ã£ãã®ã§ãããã«ãã®çµæã¡ã¢ãæ®ãã¦ããã
ç°å¢
- CPU: Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz (è«çåã³ã¢)
- ã¡ã¢ãª: 8GB
- OS: Ubuntu-15.04 (64bit)
- rust-dawg: v0.1.0
- cl-dawg: v0.3.1
- Rust: rustc-1.5.0
- Common Lisp: SBCL-1.2.14
測å®
å 容
- DAWGã¤ã³ããã¯ã¹ãã¡ã¤ã«ã®æ§ç¯æ§è½ãæ¯è¼
- å®è¡æéã¨ã¡ã¢ãªæ¶è²»é
- å
¥åã¯4500ä¸åç¨åº¦ã®æåNã°ã©ã
- http://s-yata.jp/corpus/nwc2010/ngrams/ ã«ããåºç¾é »åº¦ã1000以ä¸ã®Nã°ã©ã ã³ã¼ãã¹ã使ããã¦è²°ã£ã
- å
é¨çã«ã¯ã以ä¸ã®äºã¤ã®å¦çãè¡ããã¦ããã®ã§ããã®ããããã«è¦ããæéãè¨æ¸¬ãã:
- å ¥åãèªã¿è¾¼ã¿ãã¡ã¢ãªä¸ã«äºåæ¨ãã©ã¤(DAWG)ã®æ§ç¯
- ãã®äºåæ¨ãDoubleArrayå½¢å¼ã«å¤æãã¤ã¤ããã¡ã¤ã«ã«æ¸ãåºã
- ãªãä¸è¨ã®
1
ã«é¢ãã¦ã¯rust-dawgã¨cl-dawgã§ã»ã¼åæ§ã®ã³ã¼ãã¨ãªã£ã¦ãããã2
ã«é¢ãã¦ã¯rust-dawgã®å®è£ ä¸ã«ãå®è¡é度çã«ããå¹çç(ã¨æããã)ãªæ¹æ³ãæãã¤ãã¦å®è£ ãã¦ãã¾ã£ãã®ã§rustçã®æ¹ãæå©ãªè¨æ¸¬ã¨ãªã£ã¦ãã¾ã£ã¦ããå¯è½æ§ãé«ã- ã¡ã¢ãªæ¶è²»éã«ã¯å½±é¿ããªã
æºåã¨ã³ãã³ãå®è¡
å ¥åãã¡ã¤ã«ã®æºå
# Nã°ã©ã ã³ã¼ãã¹ã®åå¾ $ wget -xnH -i http://dist.s-yata.jp/corpus/nwc2010/ngrams/char/over99/filelist $ xz -d corpus/nwc2010/ngrams/char/over999/*/*.xz $ head -5 corpus/nwc2010/ngrams/char/over999/2gms/2gm-0000 " " 123067 " # 2867 " $ 1047 " % 2055 " & 3128 # æååé¨åã®ã¿ãæãåºãã¦ã½ã¼ããã $ cut -f 1 corpus/nwc2010/ngrams/char/over999/*gms/* | sed -e 's/ //g' > words.tmp $ LC_ALL=c sort words.tmp -o words $ wc -l words 44280717 words # ç´4500ä¸ã¯ã¼ã $ du -h words 652M words
rust-dawgã®ãã«ãã¨å®è¡
# ãã«ã $ git clone [email protected]:sile/rust-dawg.git $ cd rust-dawg $ patch -p1 < rust-dawg.patch # è¨æ¸¬ã³ã¼ããåãè¾¼ãããã®ããããå½ã¦ã $ cargo build --release # å®è¡ $ /usr/bin/time -v target/release/dawg_build rust-dawg.idx < words Building binary-tree trie ... done: elapsed=49999 ms Building double-array trie ... done: elapsed=11402 ms DONE Command being timed: "target/release/dawg_build rust-dawg.idx" User time (seconds): 60.90 System time (seconds): 0.53 Percent of CPU this job got: 100% Elapsed (wall clock) time (h:mm:ss or m:ss): 1:01.40 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 1921288 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 3195 Voluntary context switches: 1 Involuntary context switches: 84 Swaps: 0 File system inputs: 0 File system outputs: 409824 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 $ du -h rust-dawg.idx 201M rust-dawg.idx
cl-dawgã®ãã«ãã¨å®è¡
# ãã«ã $ git clone [email protected]:sile/cl-dawg.git $ cd cl-dawg $ patch -p1 < cl-dawg.patch # è¨æ¸¬ã³ã¼ããåãè¾¼ãããã®ããããå½ã¦ã $ sbcl --noinform --dynamic-space-size 7500 ;; dawgããã±ã¼ã¸ããã«ã * (require :asdf) * (setf asdf:*central-registry* (list (directory ".") (directory "lib/dict-0.2.0/"))) * (asdf:load-system :dict-0.2.0) * (asdf:load-system :dawg) ;; å®è¡å¯è½ãã¡ã¤ã«ãä½æ * (defun main () (let ((input-file (second sb-ext:*posix-argv*)) (output-file (third sb-ext:*posix-argv*))) (time (dawg:build :input input-file :output output-file)))) * (sb-ext:save-lisp-and-die "dawg-build" :toplevel #'main :executable 't :save-runtime-options t) # å®è¡ $ /usr/bin/time -v ./dawg-build words cl-dawg.idx :BUILD-BINARY-TREE-TRIE Evaluation took: 37.298 seconds of real time 37.304000 seconds of total run time (35.300000 user, 2.004000 system) [ Run times consist of 12.180 seconds GC time, and 25.124 seconds non-GC time. ] 100.02% CPU 96,735,665,025 processor cycles 12 page faults 11,938,296,176 bytes consed :BUILD-DOUBLE-ARRAY-TRIE Evaluation took: 33.487 seconds of real time 33.496000 seconds of total run time (32.436000 user, 1.060000 system) [ Run times consist of 6.484 seconds GC time, and 27.012 seconds non-GC time. ] 100.03% CPU 86,850,782,635 processor cycles 2 page faults 10,786,507,072 bytes consed Command being timed: "./dawg-build words cl-dawg.idx" User time (seconds): 67.74 System time (seconds): 3.09 Percent of CPU this job got: 100% Elapsed (wall clock) time (h:mm:ss or m:ss): 1:10.82 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 4622404 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 25 Minor (reclaiming a frame) page faults: 1190638 Voluntary context switches: 30 Involuntary context switches: 595 Swaps: 0 File system inputs: 4896 File system outputs: 820952 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 $ du -h cl-dawg.idx 201M cl-dawg.idx
çµæ(ã¾ã¨ã)
ä¸ã®çµæã表形å¼ã«ã¾ã¨ãããã®ã
æè¦æé
äºåæ¨æ§ç¯ | DoubleArrayæ§ç¯ | åè¨ | |
---|---|---|---|
rust-dawg | 49.99s | 11.40s | 61.40s |
cl-dawg | 37.30s (éGC: 25.12s) | 33.49s (éGC: 27.01s) | 70.80s (éGC: 52.13s) |
äºåæ¨æ§ç¯ã«è¦ããæéã¯cl-dawgã®æ¹ããDoubleArrayæ§ç¯ã«è¦ããæéã¯rust-dawgã®æ¹ããçããªã£ã¦ããã
å¾è
ã®å½±é¿ã大ããåè¨æéãrust-dawgã®æ¹ãçããªã£ã¦ããããä¸ã§æ¸ããããã«DoubleArrayæ§ç¯é¨åã¯Rustçã®æ¹ãæå©ãªå®è£
ã«ãªã£ã¦ããã®ã§ã
ãã®è¾ºãã®æ¡ä»¶ãåãããã°ãããããåè¨ã§ãclçã®æ¹ãéãçµæã¨ãªãã¨æãã
ã¾ããããGCã«è¦ããæéãé¤å¤ããã¨ããã°cl-dawg(SBCL)ã®æ¹ãã ãã¶è¯å¥½ãªçµæã¨ãªã£ã¦ããã
ã¡ã¢ãªæ¶è²»é
æ大ã¡ã¢ãªæ¶è²»é | |
---|---|
rust-dawg | 1.921GB |
cl-dawg | 4.622GB |
ã¡ã¢ãªæ¶è²»éã«é¢ãã¦ã¯ã(GCç¡ãè¨èªã¨GCæãè¨èªã®æ¯è¼ãªã®ã§å½ç¶ã¨è¨ãã°å½ç¶ã ã)Rustçã®æ¹ãåå以ä¸ã®æ¶è²»éã¨ãªã£ã¦ããã
ææ³
rust-dawgã®ã¡ã¢ãªæ¶è²»éã«é¢ãã¦ã¯æå¾ éãã
äºåæ¨æ§ç¯é¨åã®å¦çæéã¯ãcl-dawg(SBCL)ã®ãéGCé¨åã®æé+αããããã«åã¾ã£ã¦ããããã¨äºæ³ãã¦ããããæå¤ã¨æ¯ãããªãçµæã¨ãªã£ãã
ãSBCLã¯ããªãé«éãªå¦çç³» and CLçã¯ããªãæé©åããã¦ãã and Ruståå¿è
ãã¨cl-dawgçã«æå©ãªæ¡ä»¶ãå¤ã
ãããã©ã
Rustã¯ãéçåä»ã and GCä¸è¦ããªåã ãå®è¡æã®ã³ã¹ããå®ãåºæ¥ã¦ãè¯ãã¯ããªã®ã«ãã¨ã¯æãã
(ãã ããGCä¸è¦ã¨è¨ãã¤ã¤rust-dawgå
ã§ã¯(GCã®ä¸ç¨®ã¨è¦åããªãããªã)RCã¢ã¸ã¥ã¼ã«ã¯å¤ç¨ããã¦ãã)
ç¹ã«ç´è¿ã§ããäºå®ã¯ãªããã©ãæ°ãåãããä½ãåå ã§é度差ãçãã¦ããã®ãã®ç¹å®ãè¡ã£ã¦ã¿ãããããããªãã
(ex. RCãçãã¤ã³ã¿ã«ç½®ãæãããã©ããªãããã¨ããOptionãããã¦çªå
µå¤ã§undefinedã表ç¾ãã¦ã¿ããã¨ããå
é¨ã§å¤ç¨ãã¦ããããã·ã¥ãããã®å®è£
ãå¤ãããã©ããªãããã¨ããããããã³ã³ãã¤ã©ã®æé©åãä¸ååãã¨ã)
測å®ç¨ã«å½ã¦ãããã
rust-dawg.patch
diff --git a/Cargo.toml b/Cargo.toml index a1daa96..f829e37 100644 --- a/Cargo.toml +++ b/Cargo.toml @@ -7,2 +7,3 @@ [dependencies] bit-vec = "*" byteorder = "*" +time = "*" diff --git a/src/bin/dawg_build.rs b/src/bin/dawg_build.rs index 7677951..880fd5f 100644 --- a/src/bin/dawg_build.rs +++ b/src/bin/dawg_build.rs @@ -4,6 +4,7 @@ // see the LICENSE file at the top-level directory. extern crate dawg; +extern crate time; use std::env; use std::process; @@ -12,6 +13,11 @@ use std::io::BufRead; use dawg::binary_tree::Builder as BinaryTreeBuilder; use dawg::double_array::Builder as DoubleArrayBuilder; +fn now_ms() -> u64 { + let t = time::now().to_timespec(); + (t.sec as u64 * 1000 + t.nsec as u64 / 1000 / 1000) +} + fn main() { let args: Vec<_> = env::args().collect(); if args.len() != 2 { @@ -21,17 +27,25 @@ fn main() { let stdin = io::stdin(); let output_file = &args[1]; + + print!("Building binary-tree trie ... "); + let start = now_ms(); let trie = BinaryTreeBuilder::new() .build(stdin.lock().lines()) .unwrap_or_else(|e| { println!("[ERROR] Can't build DAWG: reason={}", e); process::exit(1); }); + println!("done: elapsed={} ms", now_ms() - start); + + print!("Building double-array trie ... "); + let start = now_ms(); let trie = DoubleArrayBuilder::new().build(trie); if let Err(e) = trie.save(output_file) { println!("[ERROR] Can't save dawg index: path={}, reason={}", output_file, e); process::exit(1); } + println!("done: elapsed={} ms", now_ms() - start); println!("DONE"); }
cl-dawg.patch
diff --git a/dawg.lisp b/dawg.lisp index aa5ea40..095dcbd 100644 --- a/dawg.lisp +++ b/dawg.lisp @@ -61,11 +61,14 @@ (declare ((or string pathname list) input) ((or string pathname) output) ((member :native :little :big) byte-order)) - (let ((trie (if (listp input) + (let ((trie (time (progn (print :build-binary-tree-trie) (if (listp input) (dawg.bintrie-builder:build-from-list input :show-progress show-progress) (dawg.bintrie-builder:build-from-file input :show-progress show-progress)))) + )) + (time (progn (print :build-double-array-trie) (dawg.double-array-builder:build-from-bintrie trie :output-file output :byte-order byte-order :show-progress show-progress)) + )) t) (defun load (index-path &key (byte-order :native))
Emacsã®ãã¡ã¤ã«ä¿åæã«èªåçã«rustfmtãé©ç¨ãããããã«ãã
ã³ã¼ããã©ã¼ããã¿ã«ããæ´å½¢ããã¡ã¤ã«ä¿åæã«èªåã§è¡ãããããã«ããããã®æé ã¡ã¢ã
ã¾ãã¯rustfmtãã¤ã³ã¹ãã¼ã«:
# ãã¼ã¸ã§ã³ $ rustc -V rustc 1.5.0 (3d7cd77e4 2015-12-04) # ã¤ã³ã¹ãã¼ã« $ cargo install rustfmt $ export PATH=$PATH:$HOME/.cargo/bin # åããã¦ã¿ã $ echo 'fn hoge () -> u32 { 1+1 }' | rustfmt fn hoge() -> u32 { 1 + 1 }
次ã¯emacs-rustfmtã®æé ã«å¾ã£ã¦rustfmt.el
ãEmacsã«ã¤ã³ã¹ãã¼ã«ãã:
(require 'package) (add-to-list 'package-archives '("melpa" . "https://melpa.org/packages/") t) (package-initialize) ;; => M-x package-list-packages ;; => `rustfmt`ãé¸æ => `i` => `x`
æå¾ã¯.emacs
ã«ä»¥ä¸ã®è¡ã追å :
(add-hook 'rust-mode-hook #'rustfmt-enable-on-save)
å®äºã
ç¾å¨æå»ãUNIXã¿ã¤ã ã¹ã¿ã³ãå½¢å¼ã§åå¾ãã
chronoã¨ããã©ã¤ãã©ãªã使ãã°åºæ¥ããã
$ cargo new sample --bin $ cd sample $ echo -e '\n[dependencies]\nchrono="*"\n' >> Cargo.toml
main.rs
ãä¿®æ£:
// src/main.rs extern crate chrono; use chrono::offset::local::Local; use chrono::duration::Duration; fn main() { // ç¾å¨æå» println!("Now: {}", Local::now().timestamp()); // ã¤ãã§ã«äºåå¾ã表示 println!("After 5 minutes: {}", (Local::now() + Duration::minutes(5)).timestamp()); }
å®è¡:
$ cargo run Now: 1447247053 After 5 minutes: 1447247353
loglevelã¨severityã®ä½¿ãåãã¡ã¢
èªåç¨ã®ãã°ã¬ãã«é¢é£ã®ç¨èªæ´çã¡ã¢ã loglevelãseverityãseverity levelããã©ã使ãåãããã ããã«æ¸ãã¦ãããã¨ãä¸è¬çã«æ£ãã使ãåããã©ããã¯ä¸æã
severity ã¯ããã®åã®éããã°ã¡ãã»ã¼ã¸ã®æ·±å»åº¦ã表ãã ãã®æ°å¤è¡¨ç¾ã severity level ã§ãæ·±å»åº¦ãé«ãã»ã©å¤ã¯å°ãããªãã
ä¾:
severity level | severity |
---|---|
0 | fatal |
1 | alert |
2 | warning |
3 | info |
4 | debug |
loglevel ã¯ãå®éã«åºåããããã°ã¡ãã»ã¼ã¸ãæå®ããããã®é¾å¤ã§ãããæå®ãããã¬ãã«ä»¥ä¸ã®ã¡ãã»ã¼ã¸ãåºåããããã¨ã«ãªãï¼ã¬ãã«ãä¸ããã»ã©åºåç¯å²ãå¢ãã)ã
| 4:debug | loglevel=3 --> | 3:info | â | 2:warning | ãããåºå対象 | 1:alert | â | 0:fatal |
ã¤ã¾ã severity ã¯åã ã®ã¡ãã»ã¼ã¸ã«ç´ä»ããã®ã§ãloglevelã¯ãããã®åºåãæ å½ãããã¬ã¼ã«æå®ãããã®ã
ç¹å®ãªãã·ã§ã³æå®æã«dialyzerã®ã¡ã¢ãªæ¶è²»éãç°æ§ã«å¤§ãããªã
--get_warnings
ã¨-Wrace_conditions
ã®äºã¤ã®ãªãã·ã§ã³ãåããã¦æå®ããã¨ãä½æ
ãã¡ã¢ãªä½¿ç¨éã大ãããªã£ã¦ãã¾ã模æ§ã
ä¾ãã°ã以ä¸ã®ã³ãã³ãå®è¡æã«ã¯ãã¡ã¢ãªä½¿ç¨éãéä¸ã§8GB(RAMãµã¤ãº)ã«éãã¦ãã¾ããPLTãæ§ç¯ãããã¨ãã§ããªãã£ãã
(ä¸ã®ãªãã·ã§ã³ã®å
ã®çæ¹ã ããæå®ããå ´åã¯æ°ç¾MBç¨åº¦ã®æ¶è²»éã«åã¾ã)
$ touch /tmp/dialyzer.plt $ dialyzer --build_plt --get_warnings -Wrace_conditions --apps kernel --plt /tmp/dialyzer.plt
OTP17.5ããã³OTP18.0ã§ç¾è±¡ãçºçãããã¨ã確èªã
ã¾ãstdlibã対象ã¢ããªã±ã¼ã·ã§ã³ã«æå®ããå ´åãåæ§ã®å¾åãè¦ãããã®ã§ãã¢ããªã±ã¼ã·ã§ã³åºæã¨ãã訳ã§ããªãããã
dialyzerã®ãã°ã®ãããªæ°ãããã®ã§åå ãç¹å®ãã¦ä¿®æ£ããæ¹ãè¯ãã®ãããããªãããèªåã®ä½¿ç¨ç¨éã§ã¯--get_warnings
ãæå®ããªãã¦ãç¹ã«å°ããã¨ã¯ãªãã®ã§ãã¨ããããã¯ãã®æ¹åã§åé¿ããã
ç´ æ°åçæ
ç´ æ°åã®å®è£ æ¹æ³ãèãã¦ã¿ãã®ã§ã¡ã¢ã
åºæ¬çã«ã¯ã¨ã©ãã¹ããã¹ã®ç¯©ã¨åããããªãã¨ãè¡ã£ã¦ããã¯ãã ããä¸åº¦ã«ä¿æãã篩ã®ç¯å²ã(ãããã)å¿
è¦æå°éã¨ãªã£ã¦ããã
(ä¸ã¤ã®æ¢ç¥ã®ç´ æ°ã«ã¤ããããã·ã¥ãã¼ãã«å
ã®ä¸ã¤ã®æ ããæ¶è²»ããªã)
// file: prime.rs // // $ rustc --version // rustc 1.0.0 (a59de37e9 2015-05-13) (built 2015-05-14) use std::collections::HashMap; pub struct Prime { curr: u64, // ç¾å¨ã®æ°å¤ (ç´ æ°ã¨ã¯éããªã) sieve: HashMap<u64, u64>, // åææ° => ç´ æ°: ãµã¤ãºã¯`curræªæºã®ç´ æ°ã®æ°`ã«å¸¸ã«çãã } impl Prime { pub fn is_prime(&mut self, n: u64) -> bool { match self.sieve.remove(&n) { // ä¸åº¦æ¢ç´¢ããæ°ã¯sieveããé¤å»ãã(容éç¯ç´) None => {self.mark_composite(2, n); true}, // ç´ æ° Some(prime) => {self.mark_composite(n/prime + 1, prime); false}, // åææ° } } fn mark_composite(&mut self, times: u64, prime: u64) { (times..).find(|i| !self.sieve.contains_key(&(i*prime)) ) // æªç»é²ã®åææ°ãæ¢ã .and_then(|i| self.sieve.insert(i*prime, prime) ); } } impl Iterator for Prime { type Item = u64; fn next(&mut self) -> Option<u64> { let prime = (self.curr..).find(|n| self.is_prime(*n) ).unwrap(); // curr以éã®æå°ã®ç´ æ°ãæ¤ç´¢ self.curr = prime + 1; Some(prime) } } pub fn primes() -> Prime { Prime{curr: 2, sieve: HashMap::new()} } fn main() { use std::env; let n: usize = env::args().nth(1).unwrap().parse().unwrap(); println!("{}-th prime: {}", n, primes().nth(n-1).unwrap()); }
å®è¡ä¾:
# CPU: Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz # ã³ã³ãã¤ã« $ rustc -O hoge.rs # 100ä¸çªã®ç´ æ°ã®æ¤ç´¢ $ time ./hoge 1000000 1000000-th prime: 15485863 real 0m10.086s # ç´10ç§æãã£ã user 0m10.060s sys 0m0.028s