ã¯ããã«
ãã«ãªããããã©ãã·ã¥ããèªçæ¥ããã§ã¨ããããã¾ãï¼ nikkieã§ã1
Pythonæ¨æºã©ã¤ãã©ãªã®csvã¢ã¸ã¥ã¼ã«ã¨ã®ä»ãåããé·ãã®ã§ããããã®ãã³readerï¼ãwriterï¼ã®dialect弿°ã®æå³ãããããå®å
¨çè§£ãã¾ããï¼
ããã«ã¢ã¦ãããããã¾ãã
ç®æ¬¡
- ã¯ããã«
- ç®æ¬¡
- ãã®åºä¼ãã¯å ¥éè åããã³ãºãªã³
- csvã¢ã¸ã¥ã¼ã«ã®ããã¥ã¡ã³ãå訪
- TSVãã¡ã¤ã«ç¨ã®Dialectã®åå¨ãç¥ã
- çµããã«
- P.S. Dialectãèªåã§è¨å®ã§ããï¼
ãã®åºä¼ãã¯å ¥éè åããã³ãºãªã³
csvã¢ã¸ã¥ã¼ã«ã«ã¯ãPyNyumonã¨ããå
¥éè
åããã³ãºãªã³2ã§åºä¼ãã¾ããã
pynyumon/2_scraping.md at 9d3a9cfc33b78043ea79bd00d4507eb1abea7402 · pynyumon/pynyumon · GitHub
import csv with open('some.csv', 'r') as f: reader = csv.reader(f) for row in reader: print(row)
CSVãã¡ã¤ã«ã使ãã¨ãã¯ãã ããããã®ã³ã¼ãããã¼ã¹ã«ãã¾ã3ã
ä½åãæ¸ãã¦ããã®ã§æãè¦ãã¦ãã¾ãï¼è¦ªã®é¡ãè¦ãããæ¸ããã³ã¼ããªããããªãããªï¼ã
ããã»ã©ã¾ã§ã«æ
£ããã³ã¼ãã§ã¯ããã®ã§ãããcsv.reader(f)ã¨ããé¨åãä½ããã£ã¦ãããã¯ãæ¨æ¥ã¾ã§ã®ç§ã¯å
¨ãåãã£ã¦ããªãã£ãã®ã§ãã
csvã¢ã¸ã¥ã¼ã«ã®ããã¥ã¡ã³ãå訪
csv.reader
https://docs.python.org/ja/3/library/csv.html#csv.reader
æ¹ãã¦å¼æ°ã確èªããã¨ãcsv.reader(csvfile, dialect='excel', **fmtparams)ã¨ãªã£ã¦ãã¾ãã
ã«ã³ãåºåãï¼CSVå½¢å¼ï¼ã®ãã¡ã¤ã«ãèªã¿è¾¼ãã¨ãã¯csvfile弿°ã®ã¿ãæå®ãã¾ããã
ã¿ãåºåãï¼TSVå½¢å¼ï¼ã®ãã¡ã¤ã«ãèªã¿è¾¼ãã¨ãã¯delimiter="\t"ã¨åºåãæåãæå®ãã¾ãã
ããã¯ãå¯å¤é·ãã¼ã¯ã¼ã弿°fmtparamsãå©ç¨ãã¦ããã¨ãããã¨ã§ãã
å¥ã®ãªãã·ã§ã³ã§ãã fmtparams ãã¼ã¯ã¼ã弿°ã¯ãç¾å¨ã®è¡¨ç¾å½¢å¼ã«ãããåã ã®æ¸å¼ãã©ã¡ã¼ã¿ã䏿¸ãããããã«ä¸ãããã¨ãã§ãã¾ãã
å¼ç¨ããä¸ã«ã䏿¸ããã¨ããã¾ãã
ããããï¼
ãããæ¸å¼åãã©ã¡ã¿ã®ã°ã«ã¼ããæå®ããã¦ãã¦ããã®ä¸é¨ã䏿¸ããã¦ãããã§ãï¼ãã®çè§£ãä»åã®ä¸çªã®åç©«ã§ãï¼ã
æ¸å¼åãã©ã¡ã¿ã®ã°ã«ã¼ããæå®ãã¦ããã®ããdialect弿°ï¼
ãªãã·ã§ã³ã¨ã㦠dialect ãã©ã¡ã¼ã¿ãä¸ãããã¨ãã§ããç¹å®ã® CSV 表ç¾å½¢å¼ (dialect) ç¹æã®ãã©ã¡ã¼ã¿ã®éåãå®ç¾©ããããã«ä½¿ããã¾ãã
è©³ç´°ãæ¸ãããç¯ãDialect ã¯ã©ã¹ã¨æ¸å¼åãã©ã¡ã¼ã¿ããè¦ã¦ããã¾ãããã
Dialect ã¯ã©ã¹ã¨æ¸å¼åãã©ã¡ã¼ã¿
ã¬ã³ã¼ãã«å¯¾ããå ¥åºåå½¢å¼ã®æå®ãããç°¡åã«ããããã«ãç¹å®ã®æ¸å¼åãã©ã¡ã¼ã¿ã¯è¡¨ç¾å½¢å¼ (dialect) ã«ã¾ã¨ãã¦ã°ã«ã¼ãåããã¾ãã
csv.Dialectã¨ããã¯ã©ã¹ããããã§ãï¼
https://docs.python.org/ja/3/library/csv.html#csv.Dialect
https://github.com/python/cpython/blob/v3.11.3/Lib/csv.py#L23-L52
class Dialect: # ä¸é¨æç² delimiter = None quotechar = None escapechar = None doublequote = None skipinitialspace = None lineterminator = None quoting = None def __init__(self): # çç¥ def _validate(self): # çç¥
屿§ã®æå³ã¯ãDialect ã¯ã©ã¹ã¨æ¸å¼åãã©ã¡ã¼ã¿ãç¯ã§è§£èª¬ããã¾ãã
ä¾ãã°delimiter4ã¯åºåãæåã§ããã
https://docs.python.org/ja/3/library/csv.html#csv.Dialect.delimiter
ãã£ã¼ã«ãéãåå²ããã®ã«ç¨ãããã 1 æåãããªãæååã§ããããã©ã«ãã§ã¯ ',' ã§ãã
csv.readerã®dialect弿°ã®ããã©ã«ãå¤ã¯'excel'ã¨ããæååã§ããã
ããã¯csv.excelã¯ã©ã¹ãæå®ãã¦ãããã¨ã«ãªãã¾ãã
https://docs.python.org/ja/3/library/csv.html#csv.excel
excel ã¯ã©ã¹ã¯ Excel ã§çæããã CSV ãã¡ã¤ã«ã®é常ã®ããããã£ãå®ç¾©ãã¾ãããã㯠'excel' ã¨ããååã® dialect ã¨ãã¦ç»é²ããã¦ãã¾ãã
å
·ä½çãªè¨å®å¤ãè¦ããã®ã§ãã½ã¼ã¹ã³ã¼ããè¦ãã¾ãã
https://github.com/python/cpython/blob/v3.11.3/Lib/csv.py#L54-L62
class excel(Dialect): delimiter = ',' quotechar = '"' doublequote = True skipinitialspace = False lineterminator = '\r\n' quoting = QUOTE_MINIMAL
Dialectã®å屿§ãè¨å®ããã¦ãã¾ããã
ããã«ãªã屿§ã¯ãã¼ã¹ã¯ã©ã¹ã®ããã©ã«ãå¤ã使ãããããã§ããã
ãããã®å±æ§ã«ã¯ä¸å¤å¼ãããã_validateã¡ã½ãããåããã¯ã©ã¹ã¨ãã¦å®è£
ããã¦ãããã ãããªãã¨ããçè§£ã§ã5ã
以ä¸ã®æ°ä»ããããã¾ããã
- CSVãã¡ã¤ã«ãèªã¿è¾¼ãã¨ãã¯
csv.excelã¨ããDialectã®æå®ã§ååãªã®ã§ãã»ãã®å¼æ°ã¯æå®ããªã - TSVãã¡ã¤ã«ãèªã¿è¾¼ãã¨ãã¯
csv.excelã®ä¸ã®delimiterã ã"\t"ã«ä¸æ¸ãããããã«æå®ãã¦ãã
TSVãã¡ã¤ã«ç¨ã®Dialectã®åå¨ãç¥ã
ããã¥ã¡ã³ããçºããä¸ã§ãcsv.excelã®TSVçãç¥ãã¾ããã
ãã®åãcsv.excel_tabï¼
https://docs.python.org/ja/3/library/csv.html#csv.excel_tab
excel_tab ã¯ã©ã¹ã¯ Excel ã§çæãããã¿ãåå²ãã¡ã¤ã«ã®é常ã®ããããã£ãå®ç¾©ãã¾ãããã㯠'excel-tab' ã¨ããååã® dialect ã¨ãã¦ç»é²ããã¦ãã¾ãã
å®è£
ãè¦ã¦ã¿ãã¨ããããã«delimiter以å¤ã¯csv.excelã¨å
±éã§ãï¼
https://github.com/python/cpython/blob/v3.11.3/Lib/csv.py#L64-L67
class excel_tab(excel): delimiter = '\t'
ããã¾ã§TSVãã¡ã¤ã«ãèªã¿è¾¼ãã¨ãã¯ã以ä¸ã®ããã«æ¸ãã¦ãã¾ããã
with open("some.tsv", encoding="utf8", newline="") as f: reader = csv.reader(f, delimiter="\t") # readerãå¦çãã
ããã¯dialect弿°ãæå®ããã ãã§ãæ¸ããããã§ããã
with open("some.tsv", encoding="utf8", newline="") as f: reader = csv.reader(f, dialect="excel-tab") # readerãå¦çãã
ã¿ã¤ãéãçããªãããã§ã¯ãªãã§ããï¼ãããå¢ãã¦ããï¼ãcsvã¢ã¸ã¥ã¼ã«ã«ç¨æããã¦ããããã ãããä»å¾ã¯ãã¡ãã®æ¸ãæ¹ã«ãã¦ã¿ããããª
çµããã«
csv.reader(f)ãä½ããã£ã¦ãããçè§£ãããã¨ãã¢ã¦ãããããã¾ããã
- æ¸å¼åãã©ã¡ã¿ã®ã°ã«ã¼ãï¼Dialect
csv.excelã¨ããDialectãããã©ã«ãã§æå®ããã¦ãã
- åå¥ã®æ¸å¼åãã©ã¡ã¿ã䏿¸ãã§ãã
- TSVã«ã¯
delimiter="\t"ã¨æå®ããããããã¯Dialectã®delimiter屿§ã䏿¸ããã¦ãã
- TSVã«ã¯
- TSVç¨ã®Dialect
csv.excel_tabãåå¨ãã
ä¸åº¦ç¥ã£ã¦ãã¾ãã¨ããã¾ãã«é·ãéãä½ãåãã£ã¦ããªãç¶æ
ã§ä½¿ãç¶ãã¦ãããªãã¨ç©´æã£ã¦åã¾ã£ã¦ãã¾ããããªãã¾ãã
ãã ä»åç¥ããã¨ãã§ããã®ã§ãä»å¾ã¯csvã¢ã¸ã¥ã¼ã«ãããã¾ã§ãããæ°æ®µçè§£ããç¶æ
ã§ä½¿ããã®ã¯æ¥½ãã¿ã§ããã
ä»åç´¹ä»ããåã
ã®è¦ç´ ã¯ãPythonå®è·µã¬ã·ããã®13.1ã§ãåãä¸ãããã¦ãã¾ããã
ãPythonå®è·µã¬ã·ããã¯csvã¢ã¸ã¥ã¼ã«ã®ããã¥ã¡ã³ãã®å段éã¨ãã¦ããããã§ãã
ã¾ãããã¥ã¡ã³ããå½ããä¸ã§ããã®ã¨ã³ããªã¯Dialectã«ã¾ã¤ããä¸ã¤ã®ã¹ãã¼ãªã¼ãåãåºããç«ã¡ä½ç½®ã¨ãªãããªã¨æãã¾ãï¼
P.S. Dialectãèªåã§è¨å®ã§ããï¼
csv.Snifferã¨ãããã®ã®åå¨ãç¥ãã¾ããã
https://docs.python.org/ja/3/library/csv.html#csv.Sniffer
Sniffer ã¯ã©ã¹ã¯ CSV ãã¡ã¤ã«ã®æ¸å¼ãæ¨çããããã«ç¨ããããã¯ã©ã¹ã§ãã
csv.Sniffer().sniffã¡ã½ããã§dialectãæ¨çããã¦ããããcsv.readerã«æ¸¡ãããã§ããï¼å©ç¨ä¾ã®ã³ã¼ãåç §ï¼
ããã¥ã¡ã³ãã®ã³ã¼ãã®è§£èª¬ããPythonå®è·µã¬ã·ããã«ããã¾ããï¼å©ç¨ã·ã¼ã³ãç´¹ä»ããã¦ãã¾ãï¼
-
åªåå
輩ï¼é¨é·ï¼ããã¡ããããã§ã¨ããããã¾ãï¼
↩ðððð¥ð¥ð® ð½ðð§ð©ðððð®ð
— ã¢ãã¡ãé¿ãï¼ã¦ã¼ãã©ãã¢ã ãå ¬å¼ (@anime_eupho) 2023å¹´4æ15æ¥
æ¬æ¥4æ15æ¥ã¯ã
å宿²»é«æ ¡å¹å¥æ¥½é¨3å¹´ç ãã©ã³ãããæ å½
åå·åªåã®èªçæ¥ã§ãâ¨
ããã§ã¨ããããã¾ãð#anime_eupho pic.twitter.com/9nVsZLNjha - å種ã³ã¬ã¯ã·ã§ã³ã®è¾æ¸ã¨ãåºä¼ã£ããã³ãºãªã³ã§ãã いったいいつから異種コレクションとしての辞書が僕の中で自然になってしまったのだろうか - nikkie-ftnextの日記↩
- çµã¿è¾¼ã¿é¢æ°openã®å¼æ°ã¯2ç®æå¤ãã¾ããnewline弿°ã¨encoding弿°ãæå®ãã¾ããåè ã«ã¤ãã¦ã¯ https://docs.python.org/ja/3/library/csv.html#id3 ãã©ãã↩
- delimiterã¯ç¶´ããééãããã¨ãå¤ãã®ã§ãããdelimitï¼å¢çãå®ããï¼ã¨ããåè©ããããã¨ãç¥ãã¾ãã↩
- ä¸å¤å¼ã«ã¤ãã¦ã¯ 読書ログ | 『ロバストPython』10章「クラス」、不変式を維持せよ!と声高に叫ぶ章(議論する前の理解のまとめ) - nikkie-ftnextの日記 ãã©ãã↩