More than 5 years have passed since last update.

python pandasでの列（column）へのSeriesの追加

Posted at 2017-01-24

鬼ハマりしたので投下．

追加するSeriesのIndexが0からのとき

>>> s1 = pd.Series(data=[10,20,30])
>>> s1
0    10
1    20
2    30
dtype: int64

>>> s2 = pd.Series(data=[100,200,300])
>>> s2
0    100
1    200
2    300
dtype: int64

という2つのSeriesをDataFrameの列（column）として追加する．

>>> df = pd.DataFrame()
>>> df[1]=s1
>>> df[2]=s2
>>> df
    1    2
0  10  100
1  20  200
2  30  300

これは簡単．

追加するSeriesのIndexが異なるとき

>>> s1 = pd.Series(data=[10,20,30], index=[1,2,3])
>>> s1
1    10
2    20
3    30
dtype: int64
>>> s2 = pd.Series(data=[100,200,300], index=[2,3,4])
>>> s2
2    100
3    200
4    300
dtype: int64

s1とs2のindexは0からではなく，共通していないものがある．

このとき先ほどと同じようにDataFrameに追加すると

>>> df[1]=s1
>>> df[2]=s2
>>> df
      1      2
0   NaN    NaN
1  10.0    NaN
2  20.0  100.0

と個数固定で勝手に0から入っていってしまう．

元s1[3]が見たいと思って無理やりdfの中身を見ようとするとエラーになる．

>>> s1[3]
30
>>> df[1][3]
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/series.py", line 603, in __getitem__
    result = self.index.get_value(self, key)
  File "/usr/local/lib/python2.7/dist-packages/pandas/indexes/base.py", line 2169, in get_value
    tz=getattr(series.dtype, 'tz', None))
  File "pandas/index.pyx", line 98, in pandas.index.IndexEngine.get_value (pandas/index.c:3557)
  File "pandas/index.pyx", line 106, in pandas.index.IndexEngine.get_value (pandas/index.c:3240)
  File "pandas/index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas/index.c:4279)
  File "pandas/src/hashtable_class_helper.pxi", line 404, in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:8564)
  File "pandas/src/hashtable_class_helper.pxi", line 410, in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:8508)
KeyError: 3

こういうときは，pandas.concatを使う．

>>> df = pd.DataFrame()
>>> df = pd.concat([df, s1], axis=1)
>>> df
    0
1  10
2  20
3  30
>>> df = pd.concat([df, s2], axis=1)
>>> df
      0      0
1  10.0    NaN
2  20.0  100.0
3  30.0  200.0
4   NaN  300.0

引数にaxis=1を入れるとcolumn方向に追加される．また，ないところにはnumpy.nanが入る．

ただし，columnが0になる．
引数で指定できないっぽいので，まずSeriesを1次元のDataFrameにしてからconcatする．

>>> df = pd.DataFrame()
>>> df = pd.concat([df, pd.DataFrame(s1, columns=[1])], axis=1)
>>> df = pd.concat([df, pd.DataFrame(s2, columns=[2])], axis=1)
>>> df
      1      2
1  10.0    NaN
2  20.0  100.0
3  30.0  200.0
4   NaN  300

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up