Upgrade to Pro — share decks privately, control downloads, hide ads and more …

RWC 2024 DICOM & ISO/IEC 2022

RWC 2024 DICOM & ISO/IEC 2022

Ruby World Conference 2024

seki at druby.org

December 05, 2024
Tweet

More Decks by seki at druby.org

Other Decks in Programming

Transcript

  1. 属性の例 (0008,0005) : 文字集合 (0010,0010) : 患者名 (0020,0032) : 画像の場所(人体座標)

    (0028,0010) : 画素数(Rows) (0028,0011) : 画素数(Cols)  9
  2. Ruby風に書いた属性リスト タグごとに値のエンコード方法が違うよ  10 dicom = [ ... [[0x0008, 0x0005],

    "\\ISO 2022 IR 87\\ISO 2022 IR 13"], # charset ... [[0x0020, 0x0032], "-96.7773\\-36.77734\\-676.00"], # Image Position ... [[0x0028, 0x0010], 512], # Rows [[0x0028, 0x0011], 512], # Cols ... ]
  3. 属性のエンコード step1  11 0008 0005 'CS' 001e "\\ISO 2022

    IR 87\\ISO 2022 IR 13" dicom = [ ... [[0x0008, 0x0005], "\\ISO 2022 IR 87\\ISO 2022 IR 13"], # charset ... [[0x0020, 0x0032], "-96.7773\\-36.77734\\-676.00"], # Image Position ... [[0x0028, 0x0010], 512], # Rows [[0x0028, 0x0011], 512], # Cols ... ] タグ : 2つの16bit整数 VR : 値の表現方法を示す2文字 2byte データ長 : 16bit/32bitの整数 値 : VRに従って表現されたる。偶数バイト
  4. 属性のエンコード step2 転送構文 - transfer syntax VRを明示するLittle Endian (Explicit Little)

    他にImplicit Little, Explicit Bigがある Implicit Bigはないみたい メタ情報ブロックにどんな転送構文なのか書いてある メタ情報はExplicit Little... 各社/各世代でいろんなエンコード方法があった名残なのかなー  12 0008 0005 'CS' 001e "\\ISO 2022 IR 87\\ISO 2022 IR 13" 00000140 08 00 05 00 43 53 1e 00 5c 49 53 4f 20 32 30 32 |....CS..\ISO 202| 00000150 32 20 49 52 20 38 37 5c 49 53 4f 20 32 30 32 32 |2 IR 87\ISO 2022| 00000160 20 49 52 20 31 33 08 00 08 00 43 53 16 00 4f 52 | IR 13....CS..OR|
  5. DICM - DICOMファイルだよマーク タグごとに値のエンコード方法が違うよ  13 00000000 00 00 00

    00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00000080 44 49 43 4d 02 00 00 00 55 4c 04 00 b0 00 00 00 |DICM....UL......| 00000090 02 00 01 00 4f 42 00 00 02 00 00 00 00 01 02 00 |....OB..........| 000000a0 02 00 55 49 1a 00 31 2e 32 2e 38 34 30 2e 31 30 |..UI..1.2.840.10| 000000b0 30 30 38 2e 35 2e 31 2e 34 2e 31 2e 31 2e 32 00 |008.5.1.4.1.1.2.| 000000c0 02 00 03 00 55 49 3e 00 31 2e 32 2e 33 39 32 2e |....UI>.1.2.392.| 000000d0 32 30 30 30 33 36 2e 39 31 34 32 2e 31 30 30 30 |200036.9142.1000| 000000e0 33 33 30 32 2e 31 30 32 30 34 33 38 30 30 31 2e |3302.1020438001.| 000000f0 33 2e 32 30 32 34 30 37 30 34 31 33 30 30 35 34 |3.20240704130054| 00000100 2e 38 30 30 31 33 02 00 10 00 55 49 14 00 31 2e |.80013....UI..1.| 00000110 32 2e 38 34 30 2e 31 30 30 30 38 2e 31 2e 32 2e |2.840.10008.1.2.| 00000120 31 00 02 00 12 00 55 49 16 00 31 2e 32 2e 33 39 |1.....UI..1.2.39| 00000130 32 2e 32 30 30 30 33 36 2e 39 31 34 32 2e 31 00 |2.200036.9142.1.| 00000140 08 00 05 00 43 53 1e 00 5c 49 53 4f 20 32 30 32 |....CS..\ISO 202| 00000150 32 20 49 52 20 38 37 5c 49 53 4f 20 32 30 32 32 |2 IR 87\ISO 2022| 00000160 20 49 52 20 31 33 08 00 08 00 43 53 16 00 4f 52 | IR 13....CS..OR|
  6. (0002,0000)はメタ情報全体の長さ VRは'UL',データの長さは4byte,値は0xb0 タグごとに値のエンコード方法が違うよ  14 00000000 00 00 00 00

    00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00000080 44 49 43 4d 02 00 00 00 55 4c 04 00 b0 00 00 00 |DICM....UL......| 00000090 02 00 01 00 4f 42 00 00 02 00 00 00 00 01 02 00 |....OB..........| 000000a0 02 00 55 49 1a 00 31 2e 32 2e 38 34 30 2e 31 30 |..UI..1.2.840.10| 000000b0 30 30 38 2e 35 2e 31 2e 34 2e 31 2e 31 2e 32 00 |008.5.1.4.1.1.2.| 000000c0 02 00 03 00 55 49 3e 00 31 2e 32 2e 33 39 32 2e |....UI>.1.2.392.| 000000d0 32 30 30 30 33 36 2e 39 31 34 32 2e 31 30 30 30 |200036.9142.1000| 000000e0 33 33 30 32 2e 31 30 32 30 34 33 38 30 30 31 2e |3302.1020438001.| 000000f0 33 2e 32 30 32 34 30 37 30 34 31 33 30 30 35 34 |3.20240704130054| 00000100 2e 38 30 30 31 33 02 00 10 00 55 49 14 00 31 2e |.80013....UI..1.| 00000110 32 2e 38 34 30 2e 31 30 30 30 38 2e 31 2e 32 2e |2.840.10008.1.2.| 00000120 31 00 02 00 12 00 55 49 16 00 31 2e 32 2e 33 39 |1.....UI..1.2.39| 00000130 32 2e 32 30 30 30 33 36 2e 39 31 34 32 2e 31 00 |2.200036.9142.1.| 00000140 08 00 05 00 43 53 1e 00 5c 49 53 4f 20 32 30 32 |....CS..\ISO 202| 00000150 32 20 49 52 20 38 37 5c 49 53 4f 20 32 30 32 32 |2 IR 87\ISO 2022| 00000160 20 49 52 20 31 33 08 00 08 00 43 53 16 00 4f 52 | IR 13....CS..OR|
  7. この辺りがメタ情報 タグごとに値のエンコード方法が違うよ  15 00000000 00 00 00 00 00

    00 00 00 00 00 00 00 00 00 00 00 |................| * 00000080 44 49 43 4d 02 00 00 00 55 4c 04 00 b0 00 00 00 |DICM....UL......| 00000090 02 00 01 00 4f 42 00 00 02 00 00 00 00 01 02 00 |....OB..........| 000000a0 02 00 55 49 1a 00 31 2e 32 2e 38 34 30 2e 31 30 |..UI..1.2.840.10| 000000b0 30 30 38 2e 35 2e 31 2e 34 2e 31 2e 31 2e 32 00 |008.5.1.4.1.1.2.| 000000c0 02 00 03 00 55 49 3e 00 31 2e 32 2e 33 39 32 2e |....UI>.1.2.392.| 000000d0 32 30 30 30 33 36 2e 39 31 34 32 2e 31 30 30 30 |200036.9142.1000| 000000e0 33 33 30 32 2e 31 30 32 30 34 33 38 30 30 31 2e |3302.1020438001.| 000000f0 33 2e 32 30 32 34 30 37 30 34 31 33 30 30 35 34 |3.20240704130054| 00000100 2e 38 30 30 31 33 02 00 10 00 55 49 14 00 31 2e |.80013....UI..1.| 00000110 32 2e 38 34 30 2e 31 30 30 30 38 2e 31 2e 32 2e |2.840.10008.1.2.| 00000120 31 00 02 00 12 00 55 49 16 00 31 2e 32 2e 33 39 |1.....UI..1.2.39| 00000130 32 2e 32 30 30 30 33 36 2e 39 31 34 32 2e 31 00 |2.200036.9142.1.| 00000140 08 00 05 00 43 53 1e 00 5c 49 53 4f 20 32 30 32 |....CS..\ISO 202| 00000150 32 20 49 52 20 38 37 5c 49 53 4f 20 32 30 32 32 |2 IR 87\ISO 2022| 00000160 20 49 52 20 31 33 08 00 08 00 43 53 16 00 4f 52 | IR 13....CS..OR|
  8. (0002,0010)が転送構文 1.2.840.10008.1.2.1はExplicitLittle タグごとに値のエンコード方法が違うよ  16 00000000 00 00 00 00

    00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00000080 44 49 43 4d 02 00 00 00 55 4c 04 00 b0 00 00 00 |DICM....UL......| 00000090 02 00 01 00 4f 42 00 00 02 00 00 00 00 01 02 00 |....OB..........| 000000a0 02 00 55 49 1a 00 31 2e 32 2e 38 34 30 2e 31 30 |..UI..1.2.840.10| 000000b0 30 30 38 2e 35 2e 31 2e 34 2e 31 2e 31 2e 32 00 |008.5.1.4.1.1.2.| 000000c0 02 00 03 00 55 49 3e 00 31 2e 32 2e 33 39 32 2e |....UI>.1.2.392.| 000000d0 32 30 30 30 33 36 2e 39 31 34 32 2e 31 30 30 30 |200036.9142.1000| 000000e0 33 33 30 32 2e 31 30 32 30 34 33 38 30 30 31 2e |3302.1020438001.| 000000f0 33 2e 32 30 32 34 30 37 30 34 31 33 30 30 35 34 |3.20240704130054| 00000100 2e 38 30 30 31 33 02 00 10 00 55 49 14 00 31 2e |.80013....UI..1.| 00000110 32 2e 38 34 30 2e 31 30 30 30 38 2e 31 2e 32 2e |2.840.10008.1.2.| 00000120 31 00 02 00 12 00 55 49 16 00 31 2e 32 2e 33 39 |1.....UI..1.2.39| 00000130 32 2e 32 30 30 30 33 36 2e 39 31 34 32 2e 31 00 |2.200036.9142.1.| 00000140 08 00 05 00 43 53 1e 00 5c 49 53 4f 20 32 30 32 |....CS..\ISO 202| 00000150 32 20 49 52 20 38 37 5c 49 53 4f 20 32 30 32 32 |2 IR 87\ISO 2022| 00000160 20 49 52 20 31 33 08 00 08 00 43 53 16 00 4f 52 | IR 13....CS..OR|
  9. (0008,0005)は使用する文字集合 ISO 2022 IR 87, IR 13 と IR 6を使う宣言

    タグごとに値のエンコード方法が違うよ  17 00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00000080 44 49 43 4d 02 00 00 00 55 4c 04 00 b0 00 00 00 |DICM....UL......| 00000090 02 00 01 00 4f 42 00 00 02 00 00 00 00 01 02 00 |....OB..........| 000000a0 02 00 55 49 1a 00 31 2e 32 2e 38 34 30 2e 31 30 |..UI..1.2.840.10| 000000b0 30 30 38 2e 35 2e 31 2e 34 2e 31 2e 31 2e 32 00 |008.5.1.4.1.1.2.| 000000c0 02 00 03 00 55 49 3e 00 31 2e 32 2e 33 39 32 2e |....UI>.1.2.392.| 000000d0 32 30 30 30 33 36 2e 39 31 34 32 2e 31 30 30 30 |200036.9142.1000| 000000e0 33 33 30 32 2e 31 30 32 30 34 33 38 30 30 31 2e |3302.1020438001.| 000000f0 33 2e 32 30 32 34 30 37 30 34 31 33 30 30 35 34 |3.20240704130054| 00000100 2e 38 30 30 31 33 02 00 10 00 55 49 14 00 31 2e |.80013....UI..1.| 00000110 32 2e 38 34 30 2e 31 30 30 30 38 2e 31 2e 32 2e |2.840.10008.1.2.| 00000120 31 00 02 00 12 00 55 49 16 00 31 2e 32 2e 33 39 |1.....UI..1.2.39| 00000130 32 2e 32 30 30 30 33 36 2e 39 31 34 32 2e 31 00 |2.200036.9142.1.| 00000140 08 00 05 00 43 53 1e 00 5c 49 53 4f 20 32 30 32 |....CS..\ISO 202| 00000150 32 20 49 52 20 38 37 5c 49 53 4f 20 32 30 32 32 |2 IR 87\ISO 2022| 00000160 20 49 52 20 31 33 08 00 08 00 43 53 16 00 4f 52 | IR 13....CS..OR|
  10. DICOMの文字集合 DICOM文書でいろんな文字集合を扱う仕組みがある extensionなし - 文書の中に一種類の文字集合だけ ASCII, Latin-1, utf-8, GB18030... extensionあり

    - 文書の中に複数の文字集合がある ISO/IEC 2022の技術をもとにしてる extensionなしはRubyのencodingに似てるよ  18
  11. ASCII なぜこの並び?  23

    " # $ % & ' /6- %-& 41 ! 1 A Q 40) %$ " 2 B R 459 %$ # 3 C S &59 %$ $ 4 D T &05 %$ % 5 E U &/2 /", & 6 F V "$, 4:/ ' 7 G W #&- &5# ( 8 H X #4 $"/ ) 9 I Y )5 &. * : J Z " -' 46# + ; K [ # 75 &4$ , < L \ $ '' '4 - a M c % $3 (4 . > N ^ & 40 34 / ? O d ' 4* 64 0 @ P %&-
  12. JIS X 0201  24

    " # $ % & ' /6- %-& 41 ! 1 A Q Ŗ Ŧ Ŷ 40) %$ " 2 B R Ň ŗ ŧ ŷ 459 %$ # 3 C S ň Ř Ũ Ÿ &59 %$ $ 4 D T ʼn ř ũ Ź &05 %$ % 5 E U Ŋ Ś Ū ź &/2 /", & 6 F V ŋ ś ū Ż "$, 4:/ ' 7 G W Ō Ŝ Ŭ ż #&- &5# ( 8 H X ō ŝ ŭ Ž #4 $"/ ) 9 I Y Ŏ Ş Ů ž )5 &. * : J Z ŏ ş ů ſ " -' 46# + ; K [ Ő Š Ű ƀ # 75 &4$ , < L \ ő š ű Ɓ $ '' '4 - = M c Œ Ţ Ų Ƃ % $3 (4 . > N ^ œ ţ ų ƃ & 40 34 / ? O ‾ Ŕ Ť Ŵ Ƅ ' 4* 64 0 @ P %&- ŕ ť ŵ ƅ
  13. ISO-8859-1  25

    " # $ % & ' /6- %-& 41 ! 1 A Q /#41 › ¤ ³ Á Ð 40) %$ " 2 B R e œ ¥ ´ Â Ñ 459 %$ # 3 C S f  ¦ µ Ã Ò &59 %$ $ 4 D T g ž § ¶ Ä Ó &05 %$ % 5 E U k ´ ¨ · Å Ô &/2 /", & 6 F V = Ÿ © ¸ Æ Õ "$, 4:/ ' 7 G W ] v ‹ ¹ ‘ Ö #&- &5# ( 8 H X j u ª º Ç × #4 $"/ ) 9 I Y ¨ † « Ž È ” )5 &. * : J Z ˜ ¬ » É Ø " -' 46# + ; K [ Œ  ­ ¼ Ê Ù # 75 &4$ , < L \ m { ® ½ Ë Ú $ '' '4 - a M c ™ ¡ ¯ ¾ Ì Û % $3 (4 . > N ^ ¢ ° ¿ Í Ü & 40 34 / ? O d š £ ± À Î Ý ' 4* 64 0 @ P %&-  ~ ² – Ï Þ
  14. ISO-8859-2  26

    " # $ % & ' /6- %-& 41 ! 1 A Q /#41 › 㶋 Đ 㶘 㶟 40) %$ " 2 B R 㵹 㶁 ¥ 㶒  㶠 459 %$ # 3 C S 㵺 㶂 ¦ 㶓 à 㶡 &59 %$ $ 4 D T  “ 㶌 ¶ 㶙 Ó &05 %$ % 5 E U k ´ ¨ · Å Ô &/2 /", & 6 F V 㵻 㶃 㶍 㶔 㶚 㶢 "$, 4:/ ' 7 G W 㵼 㶄 㶎 ¹ 㶛 Ö #&- &5# ( 8 H X j 㶅 ª º Ç × #4 $"/ ) 9 I Y ¨ † 㶏 㶕 㶜 㶣 )5 &. * : J Z ß ã ¬ Ⓖ É Ⓢ " -' 46# + ; K [ 㵽 㶆 㶐 ¼ 㶝 Ù # 75 &4$ , < L \ 㵾 㶇 ® 㶖 Ë 㶤 $ '' '4 - a M c 㵿 㶈 ⒳ ¾ Ⓙ Û % $3 (4 . > N ^ 㶉 ° ¿ Í Ü & 40 34 / ? O d á å ± 㶗 Î 㶥 ' 4* 64 0 @ P %&- 㶀 㶊 㶑 – 㶞 㶦
  15. 4つの領域にわけて  28

    " # $ % & ' /6- %-& 41 ! 1 A Q /#41 › 㶋 Đ 㶘 㶟 40) %$ " 2 B R 㵹 㶁 ¥ 㶒  㶠 459 %$ # 3 C S 㵺 㶂 ¦ 㶓 à 㶡 &59 %$ $ 4 D T  “ 㶌 ¶ 㶙 Ó &05 %$ % 5 E U k ´ ¨ · Å Ô &/2 /", & 6 F V 㵻 㶃 㶍 㶔 㶚 㶢 "$, 4:/ ' 7 G W 㵼 㶄 㶎 ¹ 㶛 Ö #&- &5# ( 8 H X j 㶅 ª º Ç × #4 $"/ ) 9 I Y ¨ † 㶏 㶕 㶜 㶣 )5 &. * : J Z ß ã ¬ Ⓖ É Ⓢ " -' 46# + ; K [ 㵽 㶆 㶐 ¼ 㶝 Ù # 75 &4$ , < L \ 㵾 㶇 ® 㶖 Ë 㶤 $ '' '4 - a M c 㵿 㶈 ⒳ ¾ Ⓙ Û % $3 (4 . > N ^ 㶉 ° ¿ Í Ü & 40 34 / ? O d á å ± 㶗 Î 㶥 ' 4* 64 0 @ P %&- 㶀 㶊 㶑 – 㶞 㶦
  16. 4つの領域にわけて  29

    " # $ % & ' /6- %-& 41 ! 1 A Q /#41 › ¤ ³ Á Ð 40) %$ " 2 B R e œ ¥ ´ Â Ñ 459 %$ # 3 C S f  ¦ µ Ã Ò &59 %$ $ 4 D T g ž § ¶ Ä Ó &05 %$ % 5 E U k ´ ¨ · Å Ô &/2 /", & 6 F V = Ÿ © ¸ Æ Õ "$, 4:/ ' 7 G W ] v ‹ ¹ ‘ Ö #&- &5# ( 8 H X j u ª º Ç × #4 $"/ ) 9 I Y ¨ † « Ž È ” )5 &. * : J Z ˜ ¬ » É Ø " -' 46# + ; K [ Œ  ­ ¼ Ê Ù # 75 &4$ , < L \ m { ® ½ Ë Ú $ '' '4 - a M c ™ ¡ ¯ ¾ Ì Û % $3 (4 . > N ^ ¢ ° ¿ Í Ü & 40 34 / ? O d š £ ± À Î Ý ' 4* 64 0 @ P %&-  ~ ² – Ï Þ
  17. 4つの領域にわけて  30

    " # $ % & ' " # $ % & ' CL GL CR GR
  18. GLとGRの文字を交換しよう  31

    " # $ % & ' " # $ % & ' GL GR ASCIIの左 JIS X 201の左 Latin-1の右 カタカナ 漢字
  19. GLとGRの文字を交換しよう  33

    " # $ % & ' " # $ % & ' GL GR ASCIIの左 JIS X 201の左 Latin-1の右 カタカナ 漢字
  20. 2段階で操作する  34

    " # $ % & ' " # $ % & ' G0 G1 ASCIIの左 JIS X 201の左 Latin-1の右 カタカナ 漢字 G2 G3 GL GR 指示する designate 呼び出す invoke (shift)
  21. iso-2022-jp  36

    " # $ % & ' " # $ % & ' G0 G1 ASCIIの左 JIS X 201の左 漢字 G2 G3 GL GR
  22. iso-2022-jp  37

    " # $ % & ' " # $ % & ' G0 G1 ASCIIの左 JIS X 201の左 漢字 G2 G3 GL GR "෼ࢄRuby".encode('iso-2022-jp') 1b 24 42 4a 2c 3b 36 1b 28 42 52 75 62 79 G0に漢字を指示 GLにG0を呼び出す G0にASCIIを指示 GLにG0を呼び出す 初期状態 G0にASCIIを指示 GLにG0を呼び出す
  23. euc-jp  38

    " # $ % & ' " # $ % & ' G0 G1 ASCIIの左 カタカナ 漢字 JIS X 0208 G2 G3 GL GR 漢字 JIS X 0212
  24. euc-jp  39

    " # $ % & ' " # $ % & ' G0 G1 ASCIIの左 カタカナ 漢字 JIS X 0208 G2 G3 GL GR 漢字 JIS X 0212 '෼ࢄſűƄŖ'.encode('euc-jp') ca ac bb b6 8e d9 8e cb 8e de 8e b0 初期状態 G0にASCIIを指示 GLにG0を呼び出す G1に漢字を指示 GRにG1を呼び出す GRにG2を呼び出す 1文字で元に戻る (シングルシフト)
  25. 操作に見えてきた! 文字コードに決まった初期状態にする 文字列にCL/CRの制御文字を使って命令を埋め込む  40 "෼ࢄRuby".encode('iso-2022-jp') 1b 24 42 4a

    2c 3b 36 1b 28 42 52 75 62 79 G0に漢字を指示 GLにG0を呼び出す G0にASCIIを指示 GLにG0を呼び出す 初期状態 G0にASCIIを指示 GLにG0を呼び出す
  26. Extensionなし IRはISOの登録番号らしい。なおISO_IR 13は0x0201カナなのでsjisではない  45 de fi ned term Ruby

    encoding ͳ͠ ascii ISO_IR 100 windows-1252 ISO_IR 101 iso-8859-2 ISO_IR 109 iso-8859-3 ISO_IR 110 iso-8859-4 ISO_IR 144 iso-8859-5 ISO_IR 127 iso-8859-6 ISO_IR 126 iso-8859-7 ISO_IR 138 iso-8859-8 ISO_IR 148 windows-1254 ISO_IR 203 iso-8859-15 ISO_IR 13 shift-jis (ͷҰ෦) ISO_IR 166 tis-620 ISO_IR 192 utf-8 GB18030 GB18030 GBK gbk
  27. 2022 Extension (0008,0005)に文字集合を複数指定する 列挙された文字集合が利用できる 先頭の文字集合が初期状態になる 先頭が省略されたときはIR 6(ASCII)を意味する  46 ["ISO

    2022 IR 13", "ISO 2022 IR 87"] デフォルトはASCII + カナ 漢字も使用するよ ["", ISO 2022 IR 87", "ISO 2022 IR 13"] デフォルトはIR 6(ASCII) 漢字とカナも使用するよ
  28. 2022 Extension 文字集合ごとに使える操作が決められている 先頭に書いてある文字集合の操作が初期状態になる 表は抜粋。defined termは16、操作は17ある  47 de fi

    ned term ESC ISO 2022 IR 6 1b 28 42 G0/GL ASCII ISO 2022 IR 100 1b 2d 41 G1/GR Latin-1ͷӈ 1b 28 42 G0/GL ASCII ISO 2022 IR 13 1b 29 49 G1/GR JIS X 0201 ŜŦŜū 1b 28 4a G0/GL JIS X 0201 ͷࠨ ISO 2022 IR 87 1b 24 42 G0/GL JIS X 0208 ׽ࣈ ISO 2022 IR 149 1b 24 29 43 G1/GR ؖࠃޠ euc-kr 文字列中で使える操作 先頭がIR 13なら、G1 にカタカナ、G0にJIS X 0201のローマ字が初期 値になる
  29. 作戦 じっと見る DICOMの規格書にある患者名の例  49 źŵŦƄ^ŦƁř=ࢁా^ଠ࿠=΍·ͩ^ͨΖ͏ : (0008,0005) ["ISO 2022

    IR 13", "ISO 2022 IR 87"] d4 cf c0 de 5e c0 db b3 3d 1b 24 42 3b 33 45 44 1b 28 4a 5e 1b 24 42 42 40 4f 3a 1b 28 4a 3d 1b 24 42 24 64 24 5e 24 40 1b 28 4a 5e 1b 24 42 24 3f 24 6d 24 26 1b 28 4a
  30. 作戦 (エスケープシーケンス) |(GL*)|(GR*) のセグメント に分けて処理すればよいのでは! ERBみたいなもんか  50 źŵŦƄ^ŦƁř=ࢁా^ଠ࿠=΍·ͩ^ͨΖ͏ :

    (0008,0005) ["ISO 2022 IR 13", "ISO 2022 IR 87"] d4 cf c0 de 5e c0 db b3 3d 1b 24 42 3b 33 45 44 1b 28 4a 5e 1b 24 42 42 40 4f 3a 1b 28 4a 3d 1b 24 42 24 64 24 5e 24 40 1b 28 4a 5e 1b 24 42 24 3f 24 6d 24 26 1b 28 4a GR GL GR エスケープシーケンス GL
  31. できそう! できた Contextという変換器を作って、convertするAPIです  51 charset = "ISO 2022 IR

    100\\ISO 2022 IR 13" str = %w(50 6f 6b e9 6d 6f 6e 20 1b 29 49 ce df b9 d3 dd).map(&:hex).pack('C*') context = DCM_CharSet::Context.new(charset) ctext = context.convert(str) pp ctext [["Pok", #<Encoding:US-ASCII>, "Pok"], ["\xE9", #<Encoding:ISO-8859-1>, "é"], ["mon ", #<Encoding:US-ASCII>, "mon "], ["\xCE\xDF\xB9\xD3\xDD", #<Encoding:CP50221 (dummy)>, "ŴƅşŹƃ"]] ["Pok", "\xE9", "mon ", "\xCE\xDF\xB9\xD3\xDD"] pp ctext.map {|x| [x, x.encoding, x.encode('utf-8')]}
  32. # coding: us-ascii module DCM_CharSet class InvalidCharSet < RuntimeError; end

    class NotAllowDefaultCharSet < InvalidCharSet; end class Context def initialize(dcm_00080005) ary = parse_charset(dcm_00080005) if ary.size == 1 @wo_extensions = true @encoding = CharactorSetWOExtensions[ary.first] else @default_encoding = CharactorSet[ary.first] @wo_extensions = false @allow_encoding = {} ary.each do |x| CharactorSet[x].each do |y| @allow_encoding[y.escape_sequence] = y end end seg = @allow_encoding.keys.map{Regexp.escape(_1)} + ['[\000-\037]+', '[\200-\237]+', '[\040-\177]+', '[\200-\377]+'] @reg = Regexp.new(seg.join('|'), 0) end end def parse_charset(charset) ary = charset ? charset.strip.split('\\').map {|x| x.strip.upcase} : [] return ['ISO_IR 6'] if ary.empty? if ary.size == 1 raise(InvalidCharSet.new(ary.first)) unless CharactorSetWOExtensions.include?(ary.first) return ary end ary[0] = 'ISO 2022 IR 6' if ary[0].empty? raise(NotAllowDefaultCharSet.new(ary[0])) if MultibyteCharactorSet.include?(ary[0]) ary.each do |x| raise(InvalidCharSet.new(x)) unless CharactorSet.include?(x) end コードを説明します  52
  33. else result << graphic['GR'].encode(seg) end end result end def convert_wo_extensions(str)

    [str.force_encoding(@encoding)] end end end module DCM_CharSet class Element def initialize(code_element, escape_sequence, encoding) @code_element = code_element @escape_sequence = escape_sequence.pack('c*') @encoding = encoding end attr_reader :escape_sequence, :encoding, :code_element def _encode(str) str.dup.force_encoding(@encoding) end def encode(str) s = _encode(str) s.instance_variable_set(:@dicom_encoding_element, self) s.freeze s end def inspect "#<#{self.class.to_s}:#{@escape_sequence.inspect} #{@encoding}>" end end module E_shift_to_GR DCM_CharSet::Element  53 Elementはこれ! "GL"か"GR"を示すcode_elementと エスケープシーケンスとRubyのencodingの 3つの属性を持つ Element#encodeで 自分に設定された方法でStringをencodeする これが主な仕事
  34. ary = charset ? charset.strip.split('\\').map {|x| x.strip.upcase} : [] return

    ['ISO_IR 6'] if ary.empty? if ary.size == 1 raise(InvalidCharSet.new(ary.first)) unless CharactorSetWOExtensions.include?(ary.first) return ary end ary[0] = 'ISO 2022 IR 6' if ary[0].empty? raise(NotAllowDefaultCharSet.new(ary[0])) if MultibyteCharactorSet.include?(ary[0]) ary.each do |x| raise(InvalidCharSet.new(x)) unless CharactorSet.include?(x) end ary end def convert(str) return convert_wo_extensions(str) if @wo_extensions graphic = @default_encoding.map {|e| [e.code_element, e]}.to_h result = [] str.scan(@reg).each do |seg| case seg.bytes.first when 033 e = @allow_encoding[seg] graphic[e.code_element] = e when 0..037, 0200..0237 # CL, CR result << seg when 040..0177 result << graphic['GL'].encode(seg) else result << graphic['GR'].encode(seg) end end result end def convert_wo_extensions(str) [str.force_encoding(@encoding)] end 変換のはじめ / GLとGRの初期化  54 GL, GRごとの変換方法の初期化 { "GL" => DCMCharSet::Element, "GR" => DCMCharSet::Element } といったHash graphicが「操作」される対象だよ
  35. ary = charset ? charset.strip.split('\\').map {|x| x.strip.upcase} : [] return

    ['ISO_IR 6'] if ary.empty? if ary.size == 1 raise(InvalidCharSet.new(ary.first)) unless CharactorSetWOExtensions.include?(ary.first) return ary end ary[0] = 'ISO 2022 IR 6' if ary[0].empty? raise(NotAllowDefaultCharSet.new(ary[0])) if MultibyteCharactorSet.include?(ary[0]) ary.each do |x| raise(InvalidCharSet.new(x)) unless CharactorSet.include?(x) end ary end def convert(str) return convert_wo_extensions(str) if @wo_extensions graphic = @default_encoding.map {|e| [e.code_element, e]}.to_h result = [] str.scan(@reg).each do |seg| case seg.bytes.first when 033 e = @allow_encoding[seg] graphic[e.code_element] = e when 0..037, 0200..0237 # CL, CR result << seg when 040..0177 result << graphic['GL'].encode(seg) else result << graphic['GR'].encode(seg) end end result end def convert_wo_extensions(str) [str.force_encoding(@encoding)] end セグメントに分けて処理する  55 セグメントに分けて処理するイテレータ scanがぴったりくるぞ! 正規表現@regは後述 セグメントの種類ごとの分岐
  36. ary = charset ? charset.strip.split('\\').map {|x| x.strip.upcase} : [] return

    ['ISO_IR 6'] if ary.empty? if ary.size == 1 raise(InvalidCharSet.new(ary.first)) unless CharactorSetWOExtensions.include?(ary.first) return ary end ary[0] = 'ISO 2022 IR 6' if ary[0].empty? raise(NotAllowDefaultCharSet.new(ary[0])) if MultibyteCharactorSet.include?(ary[0]) ary.each do |x| raise(InvalidCharSet.new(x)) unless CharactorSet.include?(x) end ary end def convert(str) return convert_wo_extensions(str) if @wo_extensions graphic = @default_encoding.map {|e| [e.code_element, e]}.to_h result = [] str.scan(@reg).each do |seg| case seg.bytes.first when 033 e = @allow_encoding[seg] graphic[e.code_element] = e when 0..037, 0200..0237 # CL, CR result << seg when 040..0177 result << graphic['GL'].encode(seg) else result << graphic['GR'].encode(seg) end end result end def convert_wo_extensions(str) [str.force_encoding(@encoding)] end セグメントに分けて処理する  56 エスケープシーケンスの場合 graphicの設定を変更する「操作」をする 対応するElementを求めて、 Elementを対応するGL/GRに覚える
  37. ary = charset ? charset.strip.split('\\').map {|x| x.strip.upcase} : [] return

    ['ISO_IR 6'] if ary.empty? if ary.size == 1 raise(InvalidCharSet.new(ary.first)) unless CharactorSetWOExtensions.include?(ary.first) return ary end ary[0] = 'ISO 2022 IR 6' if ary[0].empty? raise(NotAllowDefaultCharSet.new(ary[0])) if MultibyteCharactorSet.include?(ary[0]) ary.each do |x| raise(InvalidCharSet.new(x)) unless CharactorSet.include?(x) end ary end def convert(str) return convert_wo_extensions(str) if @wo_extensions graphic = @default_encoding.map {|e| [e.code_element, e]}.to_h result = [] str.scan(@reg).each do |seg| case seg.bytes.first when 033 e = @allow_encoding[seg] graphic[e.code_element] = e when 0..037, 0200..0237 # CL, CR result << seg when 040..0177 result << graphic['GL'].encode(seg) else result << graphic['GR'].encode(seg) end end result end def convert_wo_extensions(str) [str.force_encoding(@encoding)] end セグメントに分けて処理する  57 CL/CRのときは変換せずに連結
  38. ary = charset ? charset.strip.split('\\').map {|x| x.strip.upcase} : [] return

    ['ISO_IR 6'] if ary.empty? if ary.size == 1 raise(InvalidCharSet.new(ary.first)) unless CharactorSetWOExtensions.include?(ary.first) return ary end ary[0] = 'ISO 2022 IR 6' if ary[0].empty? raise(NotAllowDefaultCharSet.new(ary[0])) if MultibyteCharactorSet.include?(ary[0]) ary.each do |x| raise(InvalidCharSet.new(x)) unless CharactorSet.include?(x) end ary end def convert(str) return convert_wo_extensions(str) if @wo_extensions graphic = @default_encoding.map {|e| [e.code_element, e]}.to_h result = [] str.scan(@reg).each do |seg| case seg.bytes.first when 033 e = @allow_encoding[seg] graphic[e.code_element] = e when 0..037, 0200..0237 # CL, CR result << seg when 040..0177 result << graphic['GL'].encode(seg) else result << graphic['GR'].encode(seg) end end result end def convert_wo_extensions(str) [str.force_encoding(@encoding)] end セグメントに分けて処理する  58 graphicの'GL'に設定されている Elementでencodeする 'GR'も同様だよ
  39. # coding: us-ascii module DCM_CharSet class InvalidCharSet < RuntimeError; end

    class NotAllowDefaultCharSet < InvalidCharSet; end class Context def initialize(dcm_00080005) ary = parse_charset(dcm_00080005) if ary.size == 1 @wo_extensions = true @encoding = CharactorSetWOExtensions[ary.first] else @default_encoding = CharactorSet[ary.first] @wo_extensions = false @allow_encoding = {} ary.each do |x| CharactorSet[x].each do |y| @allow_encoding[y.escape_sequence] = y end end seg = @allow_encoding.keys.map{Regexp.escape(_1)} + ['[\000-\037]+', '[\200-\237]+', '[\040-\177]+', '[\200-\377]+'] @reg = Regexp.new(seg.join('|'), 0) end end def parse_charset(charset) ary = charset ? charset.strip.split('\\').map {|x| x.strip.upcase} : [] return ['ISO_IR 6'] if ary.empty? if ary.size == 1 raise(InvalidCharSet.new(ary.first)) unless CharactorSetWOExtensions.include?(ary.first) return ary end ary[0] = 'ISO 2022 IR 6' if ary[0].empty? raise(NotAllowDefaultCharSet.new(ary[0])) if MultibyteCharactorSet.include?(ary[0]) ary.each do |x| raise(InvalidCharSet.new(x)) unless CharactorSet.include?(x) end @regの準備  59
  40. # coding: us-ascii module DCM_CharSet class InvalidCharSet < RuntimeError; end

    class NotAllowDefaultCharSet < InvalidCharSet; end class Context def initialize(dcm_00080005) ary = parse_charset(dcm_00080005) if ary.size == 1 @wo_extensions = true @encoding = CharactorSetWOExtensions[ary.first] else @default_encoding = CharactorSet[ary.first] @wo_extensions = false @allow_encoding = {} ary.each do |x| CharactorSet[x].each do |y| @allow_encoding[y.escape_sequence] = y end end seg = @allow_encoding.keys.map{Regexp.escape(_1)} + ['[\000-\037]+', '[\200-\237]+', '[\040-\177]+', '[\200-\377]+'] @reg = Regexp.new(seg.join('|'), 0) end end def parse_charset(charset) ary = charset ? charset.strip.split('\\').map {|x| x.strip.upcase} : [] return ['ISO_IR 6'] if ary.empty? if ary.size == 1 raise(InvalidCharSet.new(ary.first)) unless CharactorSetWOExtensions.include?(ary.first) return ary end ary[0] = 'ISO 2022 IR 6' if ary[0].empty? raise(NotAllowDefaultCharSet.new(ary[0])) if MultibyteCharactorSet.include?(ary[0]) ary.each do |x| raise(InvalidCharSet.new(x)) unless CharactorSet.include?(x) end @regの準備  60 DICOM文字列の変換器のクラス (0008,0005)の文字集合の設定が引数です parse_charsetで(0008,0005)の設定から文 字集合の名前のArrayに分割する
  41. # coding: us-ascii module DCM_CharSet class InvalidCharSet < RuntimeError; end

    class NotAllowDefaultCharSet < InvalidCharSet; end class Context def initialize(dcm_00080005) ary = parse_charset(dcm_00080005) if ary.size == 1 @wo_extensions = true @encoding = CharactorSetWOExtensions[ary.first] else @default_encoding = CharactorSet[ary.first] @wo_extensions = false @allow_encoding = {} ary.each do |x| CharactorSet[x].each do |y| @allow_encoding[y.escape_sequence] = y end end seg = @allow_encoding.keys.map{Regexp.escape(_1)} + ['[\000-\037]+', '[\200-\237]+', '[\040-\177]+', '[\200-\377]+'] @reg = Regexp.new(seg.join('|'), 0) end end def parse_charset(charset) ary = charset ? charset.strip.split('\\').map {|x| x.strip.upcase} : [] return ['ISO_IR 6'] if ary.empty? if ary.size == 1 raise(InvalidCharSet.new(ary.first)) unless CharactorSetWOExtensions.include?(ary.first) return ary end ary[0] = 'ISO 2022 IR 6' if ary[0].empty? raise(NotAllowDefaultCharSet.new(ary[0])) if MultibyteCharactorSet.include?(ary[0]) ary.each do |x| raise(InvalidCharSet.new(x)) unless CharactorSet.include?(x) end @regの準備  61 extensionなしのケースは割愛
  42. # coding: us-ascii module DCM_CharSet class InvalidCharSet < RuntimeError; end

    class NotAllowDefaultCharSet < InvalidCharSet; end class Context def initialize(dcm_00080005) ary = parse_charset(dcm_00080005) if ary.size == 1 @wo_extensions = true @encoding = CharactorSetWOExtensions[ary.first] else @default_encoding = CharactorSet[ary.first] @wo_extensions = false @allow_encoding = {} ary.each do |x| CharactorSet[x].each do |y| @allow_encoding[y.escape_sequence] = y end end seg = @allow_encoding.keys.map{Regexp.escape(_1)} + ['[\000-\037]+', '[\200-\237]+', '[\040-\177]+', '[\200-\377]+'] @reg = Regexp.new(seg.join('|'), 0) end end def parse_charset(charset) ary = charset ? charset.strip.split('\\').map {|x| x.strip.upcase} : [] return ['ISO_IR 6'] if ary.empty? if ary.size == 1 raise(InvalidCharSet.new(ary.first)) unless CharactorSetWOExtensions.include?(ary.first) return ary end ary[0] = 'ISO 2022 IR 6' if ary[0].empty? raise(NotAllowDefaultCharSet.new(ary[0])) if MultibyteCharactorSet.include?(ary[0]) ary.each do |x| raise(InvalidCharSet.new(x)) unless CharactorSet.include?(x) end @regの準備  62 CharactorSetは文字集合の名前からElementを引くHash(後述) この文書で使用可能なElementを集めて表(@allow_encoding) を作る。エスケープシーケンスからElementを引くHashである
  43. # coding: us-ascii module DCM_CharSet class InvalidCharSet < RuntimeError; end

    class NotAllowDefaultCharSet < InvalidCharSet; end class Context def initialize(dcm_00080005) ary = parse_charset(dcm_00080005) if ary.size == 1 @wo_extensions = true @encoding = CharactorSetWOExtensions[ary.first] else @default_encoding = CharactorSet[ary.first] @wo_extensions = false @allow_encoding = {} ary.each do |x| CharactorSet[x].each do |y| @allow_encoding[y.escape_sequence] = y end end seg = @allow_encoding.keys.map{Regexp.escape(_1)} + ['[\000-\037]+', '[\200-\237]+', '[\040-\177]+', '[\200-\377]+'] @reg = Regexp.new(seg.join('|'), 0) end end def parse_charset(charset) ary = charset ? charset.strip.split('\\').map {|x| x.strip.upcase} : [] return ['ISO_IR 6'] if ary.empty? if ary.size == 1 raise(InvalidCharSet.new(ary.first)) unless CharactorSetWOExtensions.include?(ary.first) return ary end ary[0] = 'ISO 2022 IR 6' if ary[0].empty? raise(NotAllowDefaultCharSet.new(ary[0])) if MultibyteCharactorSet.include?(ary[0]) ary.each do |x| raise(InvalidCharSet.new(x)) unless CharactorSet.include?(x) end @regの準備  63 この文書で使用するエスケープシーケンス、CL, CR, GL, GRの正 規表現を | で連結して、@reg を作る
  44. 'ISO_IR 127' => 'iso-8859-6', 'ISO_IR 126' => 'iso-8859-7', 'ISO_IR 138'

    => 'iso-8859-8', 'ISO_IR 148' => 'windows-1254', # FIXME 'ISO_IR 203' => 'iso-8859-15', 'ISO_IR 13' => 'shift-jis', #FIXME 'ISO_IR 166' => 'tis-620', 'ISO_IR 192' => 'utf-8', 'GB18030' => 'GB18030', 'GBK' => 'gbk' } AsciiElement = Element.new('GL', [0x1B, 0x28, 0x42], 'ascii') CharactorSet = { 'ISO 2022 IR 6' => [AsciiElement], 'ISO 2022 IR 100' => [ AsciiElement, Element.new('GR', [0x1B, 0x2D, 0x41], 'iso-8859-1') ], 'ISO 2022 IR 101' => [ AsciiElement, Element.new('GR', [0x1B, 0x2D, 0x42], 'iso-8859-2') ], 'ISO 2022 IR 109' => [ AsciiElement, Element.new('GR', [0x1B, 0x2D, 0x43], 'iso-8859-3') ], 'ISO 2022 IR 110' => [ AsciiElement, Element.new('GR', [0x1B, 0x2D, 0x44], 'iso-8859-4') ], 'ISO 2022 IR 144' => [ AsciiElement, Element.new('GR', [0x1B, 0x2D, 0x4C], 'iso-8859-5') CharactorSet  64 DICOMの文字集合の名前から、対 応する操作(Element)のリストを 引くHash
  45. 'ISO_IR 127' => 'iso-8859-6', 'ISO_IR 126' => 'iso-8859-7', 'ISO_IR 138'

    => 'iso-8859-8', 'ISO_IR 148' => 'windows-1254', # FIXME 'ISO_IR 203' => 'iso-8859-15', 'ISO_IR 13' => 'shift-jis', #FIXME 'ISO_IR 166' => 'tis-620', 'ISO_IR 192' => 'utf-8', 'GB18030' => 'GB18030', 'GBK' => 'gbk' } AsciiElement = Element.new('GL', [0x1B, 0x28, 0x42], 'ascii') CharactorSet = { 'ISO 2022 IR 6' => [AsciiElement], 'ISO 2022 IR 100' => [ AsciiElement, Element.new('GR', [0x1B, 0x2D, 0x41], 'iso-8859-1') ], 'ISO 2022 IR 101' => [ AsciiElement, Element.new('GR', [0x1B, 0x2D, 0x42], 'iso-8859-2') ], 'ISO 2022 IR 109' => [ AsciiElement, Element.new('GR', [0x1B, 0x2D, 0x43], 'iso-8859-3') ], 'ISO 2022 IR 110' => [ AsciiElement, Element.new('GR', [0x1B, 0x2D, 0x44], 'iso-8859-4') ], 'ISO 2022 IR 144' => [ AsciiElement, Element.new('GR', [0x1B, 0x2D, 0x4C], 'iso-8859-5') CharactorSet  65 呼び出し先 エスケープシーケンス Rubyのencoding asciiへの操作はなんども出るのでメモしとく
  46. 'ISO_IR 127' => 'iso-8859-6', 'ISO_IR 126' => 'iso-8859-7', 'ISO_IR 138'

    => 'iso-8859-8', 'ISO_IR 148' => 'windows-1254', # FIXME 'ISO_IR 203' => 'iso-8859-15', 'ISO_IR 13' => 'shift-jis', #FIXME 'ISO_IR 166' => 'tis-620', 'ISO_IR 192' => 'utf-8', 'GB18030' => 'GB18030', 'GBK' => 'gbk' } AsciiElement = Element.new('GL', [0x1B, 0x28, 0x42], 'ascii') CharactorSet = { 'ISO 2022 IR 6' => [AsciiElement], 'ISO 2022 IR 100' => [ AsciiElement, Element.new('GR', [0x1B, 0x2D, 0x41], 'iso-8859-1') ], 'ISO 2022 IR 101' => [ AsciiElement, Element.new('GR', [0x1B, 0x2D, 0x42], 'iso-8859-2') ], 'ISO 2022 IR 109' => [ AsciiElement, Element.new('GR', [0x1B, 0x2D, 0x43], 'iso-8859-3') ], 'ISO 2022 IR 110' => [ AsciiElement, Element.new('GR', [0x1B, 0x2D, 0x44], 'iso-8859-4') ], 'ISO 2022 IR 144' => [ AsciiElement, Element.new('GR', [0x1B, 0x2D, 0x4C], 'iso-8859-5') CharactorSet  66 ISO 2022 IR 100は GLにasciiを呼び出す操作 GRにIR 100を呼び出す操作 で構成される
  47. Element.new('GR', [0x1B, 0x2D, 0x46], 'iso-8859-7') ], 'ISO 2022 IR 138'

    => [ AsciiElement, Element.new('GR', [0x1B, 0x2D, 0x48], 'iso-8859-8') ], 'ISO 2022 IR 148' => [ AsciiElement, Element.new('GR', [0x1B, 0x2D, 0x4D], 'iso-8859-9') ], 'ISO 2022 IR 203' => [ AsciiElement, Element.new('GR', [0x1B, 0x2D, 0x62], 'iso-8859-15') ], 'ISO 2022 IR 13' => [ Element.new('GL', [0x1B, 0x28, 0x4A], 'cp50221'), Element.new('GR', [0x1B, 0x29, 0x49], 'cp50221') ], 'ISO 2022 IR 166' => [ AsciiElement, Element.new('GR', [0x1B, 0x2D, 0x54], 'tis-620') ], 'ISO 2022 IR 87' => [ Element.new('GL', [0x1B, 0x24, 0x42], 'euc-jp').extend(E_shift_to_GR) ], 'ISO 2022 IR 159' => [ Element.new('GL', [0x1B, 0x24, 0x28, 0x44], 'euc-jp').extend(E_shift_to_GR) ], 'ISO 2022 IR 149' => [ Element.new('GR', [0x1B, 0x24, 0x29, 0x43], 'euc-kr') ], ちょっと苦労したとこ  67 IR 87, 159はGLなんだけど、GRへシフトして euc-jpとして処理することにした
  48. iso-2022-jp  68

    " # $ % & ' " # $ % & ' G0 G1 ASCIIの左 JIS X 201の左 漢字 G2 G3 GL GR
  49. euc-jp  69

    " # $ % & ' " # $ % & ' G0 G1 ASCIIの左 カタカナ 漢字 JIS X 0208 G2 G3 GL GR 漢字 JIS X 0212
  50. iso-2022-jp  70

    " # $ % & ' " # $ % & ' G0 G1 ASCIIの左 JIS X 201の左 漢字 G2 G3 GL GR
  51. GRへ移動してeuc-jpとして扱う  71

    " # $ % & ' " # $ % & ' G0 G1 ASCIIの左 JIS X 201の左 漢字 G2 G3 GL GR
  52. Element.new('GR', [0x1B, 0x2D, 0x46], 'iso-8859-7') ], 'ISO 2022 IR 138'

    => [ AsciiElement, Element.new('GR', [0x1B, 0x2D, 0x48], 'iso-8859-8') ], 'ISO 2022 IR 148' => [ AsciiElement, Element.new('GR', [0x1B, 0x2D, 0x4D], 'iso-8859-9') ], 'ISO 2022 IR 203' => [ AsciiElement, Element.new('GR', [0x1B, 0x2D, 0x62], 'iso-8859-15') ], 'ISO 2022 IR 13' => [ Element.new('GL', [0x1B, 0x28, 0x4A], 'cp50221'), Element.new('GR', [0x1B, 0x29, 0x49], 'cp50221') ], 'ISO 2022 IR 166' => [ AsciiElement, Element.new('GR', [0x1B, 0x2D, 0x54], 'tis-620') ], 'ISO 2022 IR 87' => [ Element.new('GL', [0x1B, 0x24, 0x42], 'euc-jp').extend(E_shift_to_GR) ], 'ISO 2022 IR 159' => [ Element.new('GL', [0x1B, 0x24, 0x28, 0x44], 'euc-jp').extend(E_shift_to_GR) ], 'ISO 2022 IR 149' => [ Element.new('GR', [0x1B, 0x24, 0x29, 0x43], 'euc-kr') ], ちょっと苦労したとこ  72 IR 87, 159はGLなんだけど、GRへシフトして euc-jpとして処理することにした
  53. end end end module DCM_CharSet class Element def initialize(code_element, escape_sequence,

    encoding) @code_element = code_element @escape_sequence = escape_sequence.pack('c*') @encoding = encoding end attr_reader :escape_sequence, :encoding, :code_element def _encode(str) str.dup.force_encoding(@encoding) end def encode(str) s = _encode(str) s.instance_variable_set(:@dicom_encoding_element, self) s.freeze s end def inspect "#<#{self.class.to_s}:#{@escape_sequence.inspect} #{@encoding}>" end end module E_shift_to_GR def _encode(str) str.each_byte.map {|x| x > 0x20 ? x | 0x80 : x}.pack('c*').force_encoding(@encoding) end end CharactorSetWOExtensions = { 'ISO_IR 6' => 'ascii', 'ISO_IR 100' => 'windows-1252', 'ISO_IR 101' => 'iso-8859-2', 'ISO_IR 109' => 'iso-8859-3', ちょっと苦労したとこ  73 moduleの中身はこれ。x | 0x80