æ¬æ¥ãã¤ãã« JavaSE 9 ããªãªã¼ã¹ããã¾ããï¼
ããã§ãããã¦ããåã«ãªã£ã¦ãã JEP 254: Compact Strings ãã©ã®ããã«å®è£
ããã¦ããã®ã調ã¹ã¦ã¿ã¾ããã
Compact Strings ã®æ¦è¦
ããã¾ã§ String ã¯ã©ã¹ã StringBuilder ã¯ã©ã¹ãªã©ã®å
é¨ã§ã¯ãæååã UTF-16 ã§ã¨ã³ã³ã¼ãã㦠char é
åã§ä¿æãã¦ãã¾ããã
ã¤ã¾ããä¸æåããã*1常㫠char ã²ã¨ã¤ = 2ãã¤ãåã®ã¡ã¢ãªã使ã£ã¦ãã¾ããã
ããããããã 㨠1 ãã¤ãã§è¡¨ãã LATIN1ï¼ASCII ã³ã¼ã ï¼ ã©ãã³æåï¼ã®æååã®å ´åããã®ååã 0x00 ã«ãªãã¨ããç¡é§ãããã¾ããã
ããã§ãå
é¨è¡¨ç¾ãå¤æ´ããæååã LATIN1 ã®ã¿ã§æ§æãããã¨ã㯠1 æåã 1 ãã¤ãã§ä¿æããããã«ãªãã¡ã¯ã¿ãªã³ã°ããã¾ããã
ã¡ãªã¿ã«ãLATIN1 以å¤ã®æåï¼æ¥æ¬èªãªã©ï¼ãããã¨ãã¯ãããã¾ã§éã 1 æå 2 ãã¤ãã® UTF-16 ã§ä¿æãã¾ãã
ãããè¸ã¾ãã¦ãã½ã¼ã¹ã³ã¼ãã追ã£ã¦ã¿ã¾ããã
æååã®ä¸èº«
String ã¯ã©ã¹ã§ã¯ãåãªãã¸ã§ã¯ããã¨ã«ä»¥ä¸ã®3ã¤ã®ãã£ã¼ã«ãå¤æ° (value, coder, hash) ãæã¤ããã«ãªã£ã¦ãã¾ããã
ãã®ãã¡ãvalue ã®åã char é
åãã byte é
åã«å¤æ´ãããã¨ããããã¤ã³ãã
ãã®é
åããLATIN1 ãªã 1byte ãã¤ã UTF-16 ãªã 2byte ãã¤ä½¿ç¨ãã¦ãã¾ãã
// æååã®ä¸èº«ãä¿æããé å // Java 8 ã¾ã§ã¯ byte[] ã§ã¯ãªã char[] ã ã£ã private final byte[] value; // value ã®ã¨ã³ã³ã¼ãæ¹å¼ï¼Java 9 ã§è¿½å ï¼ // 0 ãªã LATIN1, 1 ãªã UTF-16 private final byte coder; // ãã®æååã®ããã·ã¥ã³ã¼ãã®ãã£ãã·ã¥ï¼æ¢åï¼ // æåã« hashCode ã¡ã½ãããå¼ã³åºãããã¨ãã«é 延åæåãããã private int hash; // Default to 0
ããã¨ã å®æ° COMPACT_STRINGS (boolean) ã追å ããã¦ãã¾ããã
ãã®å¤ã true ã 㨠LATIN1/UTF16 ã®åãæ¿ããæå¹ã«ãªããfalse ã ã¨ç¡å¹ï¼å¸¸ã« UTF16ï¼ã«ãªãã¾ãã
ããã©ã«ãã¯æå¹ (true) ã§ãã
static final boolean COMPACT_STRINGS; static { COMPACT_STRINGS = true; }
ãã®å¤ã¯ static final ãªå®æ°ã§ãããVMãªãã·ã§ã³ã§å¤æ´ãå¯è½ã§ã*2ã
-XX:-CompactStrings
ã¡ãªã¿ã«ããã®ãªãã·ã§ã³ã¯ããã¥ã¡ã³ãåããã¦ããªãã£ã½ãã§ãã
ã°ã°ã£ãã®ã§ãããJDK ã®ãã¹ãã«ããè¨è¿°ãè¦ã¤ããã¾ããã§ããã
jdk9/jdk9/jdk: 4f6e52f9dc79 test/java/lang/String/CompactString/VMOptionsTest.java
String ã¯ã©ã¹ã®å¦ç
ãã¦ãString ã¯ã©ã¹ã®å¦çãã©ããªã£ã¦ãããã¨ããã¨â¦ã
ä¸é¨ã®æ¯è¼å¦çãé¤ãã¦ã大ä½ã®å¦çã StringLatin1 ã¯ã©ã¹ã¨ StringUTF16 ã¯ã©ã¹ã«ç§»è²ããã¦ãã¾ããã
ãã¨ãã°ãcharAt ã¡ã½ããããã£ã¦ããã®ã¯ Latin1 ãã©ããå¤å®ãã¦ããããã®ã¯ã©ã¹ã«å¦çãæããã ãã«ãªã£ã¦ãã¾ããã
public char charAt(int index) { if (isLatin1()) { return StringLatin1.charAt(value, index); } else { return StringUTF16.charAt(value, index); } }
StringUTF16 ã¯ã©ã¹ã®å®è£
ã¯ãã ãããããã¾ã§ã® String ã¯ã©ã¹ã®å®è£
ã¨åãã§ããã
ä¸æ¹ãStringLatin1 ã¯ã©ã¹ã®æ¹ã¯ãµãã¼ã²ã¼ããã¢ãèããå¿
è¦ããªããcodePointAt ãªã©ã®ã¡ã½ããã®å®è£
ãã·ã³ãã«ã«ãªã£ã¦ãã¾ããã
ãã®ãããLatin1 ã®æååã®ã¨ãã¯ããã¾ã§ãããå°ãããã©ã¼ãã³ã¹ãè¯ããªã£ã¦ãããã§ãã
ï¼å
·ä½çã«ã©ã®ãããè¯ããªã£ããã¾ã§ã¯åãããâ¦ãä½å¦ãã«è³æããã®ããªâ¦ãï¼
char é åã®åãæ±ã
èªãã§ãã¦æ°ã«ãªã£ãã«ãªã£ãã®ã¯ãå¾æ¥ãããã char é
åãåãåãã³ã³ã¹ãã©ã¯ã¿ã®å¦çã
以åã¯ãã åã« char é
åãã³ãã¼ããã ãã ã£ãã®ã§ãããä»å LATIN1/UTF16 ãåãæ¿ããããã«å¦çãå¢ãã¦ãã¾ããã
- char ã®é·ãã0ãªãã空é åãã³ãã¼ãã
- ããã§ãªããã°ãLATIN1 åæ㧠byte é
åãä½æããLATIN1 ãã©ããã確èªããªãã 1 æåãã¤ã³ãã¼ãã
- ããããã¹ã¦ã®æåã LATIN1 ãªãã°ããã® byte é åã使ã
- LATIN1 ã®ç¯å²å¤ã®æåãè¦ã¤ãã£ããããã® byte é åãç ´æ£ã UTF16 ã® byte é åãæ°ãã«ä½æãã
String(char[] value, int off, int len, Void sig) { if (len == 0) { this.value = "".value; this.coder = "".coder; return; } if (COMPACT_STRINGS) { // â byte é åã«1æåãã¤ã³ãã¼ãã¦ããã // ã³ãã¼ä¸ã« LATIN1 ç¯å²å¤ã®æåããã£ãå ´å㯠null ãè¿ã£ã¦ããã byte[] val = StringUTF16.compress(value, off, len); if (val != null) { this.value = val; this.coder = LATIN1; return; } } this.coder = UTF16; this.value = StringUTF16.toBytes(value, off, len); }
ãã¤ã³ãã¯ãææ©çã« LATIN1 ã§é
åä½ã£ã¦å¦çãã¦ããã¨ããã
ãã®ãããªå¦çã¯ãStringDecoderUTF8#decode ã«ãããã¾ãã
ãã®ã³ã³ã¹ãã©ã¯ã¿ã¯ãããªãã«ä½¿ããã¦ããï¼å°ãªãã¨ããBufferedReader#readLine ãã使ããã¦ããã®ã確èªï¼ã®ã§ãããã«ç ´æ£ãã¦ããã¨ã¯è¨ãä½åãªé
åçæã®ãªã¼ãã¼ãããã大ä¸å¤«ãªã®ãæ°ã«ãªãã¾ããã
ãããã»ã¨ãã©ã®æåå㧠LATIN1 ç¯å²å¤ã使ããã¨ã確å®ãã¦ããå ´åã¯ãåè¿°ã® VM ãªãã·ã§ã³ã§ Compact Strings ãç¡å¹åãã¦ããã©ã¼ãã³ã¹ãæ¯è¼ãã¦ã¿ãã»ããããããããã¾ããã
StringBuilder ã§ã®åãæ±ã
ãã® Compact Strings 㯠String ã¯ã©ã¹ã ãã§ã¯ãªããStringBuilder ã¯ã©ã¹ã§ãå®è£
ããã¦ãã¾ãã
ä¾ãã°ã new StringBuilder().append("abc") ã ã¨ãå
é¨çã«ã¯ LATIN1 ã§æååãä¿æãã¦ãã¾ãã
ã§ã¯ã new StringBuilder().appned("abc").append("ããã"); ã®ããã«ãLATIN1 æååã®ãã¨ã« UTF-16 æååã追å ãããã©ããªãã®ãã
ã½ã¼ã¹ã³ã¼ãã確èªããã¨ãããã®å ´å㯠LATIN1 ã UTF-16 ã«å¤æãã¦æ°ããªé
åã«æ ¼ç´ãããã®å¾ã«æåãä»ã足ãããã«ãªã£ã¦ãã¾ããã
æåãã UTF-16 ã«ãã¦ããæ¹æ³ã¯ãªãããªã¼ã¨æã£ããã§ãããCompact Strings ãç¡å¹åããããæ¹æ³ã¯ããã¾ããã§ããã
AbstractStringBuilder(int capacity) { if (COMPACT_STRINGS) { value = new byte[capacity]; coder = LATIN1; } else { value = StringUTF16.newBytesFor(capacity); coder = UTF16; } }
ã¾ãããã¨ã new StringBuilder("ããã"); ã§ãã£ã¦ãé
åãä½ãç´ãå¦çãçºçãã¾ãã
ããã¯ãå
é¨çã«ã¯ new StringBuilder() + append("ããã") ã¨ããå¦çã«ãªã£ã¦ãããã³ã³ã¹ãã©ã¯ã¿ãå¼ãã æç¹ã§ã¯ LATIN1ããã®å¾ã® append ã§è¿½å ããæååã UTF16 ã«ãªãã®ã§ä¸è¨ã¨åãããã¼ã«ãªã£ã¦ãã¾ãããã§ãã
ãã®å ´åã¯ä¸èº«ã空ãªã®ã§ããã®ã¾ã¾ä½¿ã£ã¦ããã¦ãããã®ã«ãªã¼ã¨æãã¾ããã
ãã®ç¹ã¯ãå°æ¥ã®ãã¼ã¸ã§ã³ã¢ããã§ã¾ããªãã¡ã¯ã¿ãªã³ã°ãããã®ã§ããããâ¦ã
ã¾ã¨ã
å
¨ä½ã¨ãã¦ã¯ãè±èªåã§ããã°ç¢ºå®ã«ããã©ã¼ãã³ã¹ãè¯ããªãããã ãªã¼ã¨æã£ãä¸æ¹ãæ¥æ¬èªåã ã¨ããã©ã¼ãã³ã¹ã大ä¸å¤«ãªã®ãæ°ã«ãªãã¾ããã
æ¥æ¬èªãå«ã¾ããæååã¯æã£ã¦ããã»ã©å¤ããªãã¨æãã¾ãããããå¾¹åºçã«ãã¥ã¼ãã³ã°ãããã¨ãããã¨ã§ããã° Compact Strings ã®æå¹/ç¡å¹ãåãæ¿ãã¦ãã©ã¡ããããã©ã¼ãã³ã¹ãè¯ãã調ã¹ã¦ã¿ãã®ãããããããã¾ããã
ã¡ãªã¿ã«ãJava 9 ã§ã¯ JEP 250: Store Interned Strings in CDS Archives ã JEP 280: Indify String Concatenation ã¨è¨ã£ãæååé¢é£ã®æ¹åãä»ã«ãããã¾ãã
ãããã®å®è£
ã«ã¤ãã¦ããã¾ãã®æ©ä¼ã«ç´¹ä»ã§ããããªã¨æãã¾ãï¼
éå»è¨äº
Java7 で String クラスがリファクタリングされていました - 地平線に行く
Java7 Update6 で String クラスがさらにリファクタリングされていました - 地平線に行く
Java8 で StringBuilder/StringBuffer クラスがリファクタリングされていました - 地平線に行く
Java9 で String クラスがリファクタリングされていました。(replace メソッド編)
*1:ãµãã²ã¼ããã¢ãªã©ã¯èæ ®ããªã
*2:2017/10/06 ç·¨éãå½å㯠"-XX:-CompactStrings -DCompactStringEnabled=false" ã¨ãã¦ãã¾ããããå¾è ã¯ãã¹ãç¨ã®ããããã£ã®ãããVMã®åä½ã«ã¯å½±é¿ããªãã£ãããåé¤ãã¾ãããï¼ã³ã¡ã³ã㧠hana ããããæãã¦ããã ãã¾ããï¼