12. 12
シーンラベルに基づく特徴 2
• FCN以外のセグメンテーション手法
Hierarchical Labeling [Munoz et al., 2010]
MRF-based Labeling [Yang et al., 2014]
D. Munoz, et al., “Stacked Hierarchical Labeling,” ECCV, 2010
J. Yang, et al., “Context Driven Scene Parsing with Attention to Rare Classes,” CVPR, 2014
14. 14
局所領域のコストに基づく特徴
• 物体の移動する可能性を表現
⎻ コスト低: 移動しやすい
⎻ コスト高: 移動しづらい
BoVW + NN [Walker et al., 2014]
Spatial Matching Network [Huang et al., 2015]
J. Walker, et al., “Patch to the Future: Unsupervised Visual Prediction,” CVPR, 2014
S. Huang, et al., “Deep Learning Driven Visual Path Prediction from a Single Images,” IEEE IP, 2015
18. 18
対象を表現する特徴ベクトルの抽出
• 対象の保有する情報を暗に表現
⎻ 予測対象が含まれるパッチから特徴ベクトルを抽出
Mid-level patch features [Singh et al., 2012]
⎻ HOG + K-meansでパッチ内の特徴を表現2 Singh et al.
Our$Discrimina, ve$Patches$
Visual$Words$
Fig. 1. Thetop two detected Visual Words(bottom) vs. Mid-level Discriminative
Patches (top), trained without any supervision and on the same large unlabeledS. Singh, et al., “Unsupervised discovery of mid-level discriminative patches,” ECCV, 2012
19. 19
現在の対象の位置に関する特徴
• 対象の過去〜現在までの動きを抽出
⎻ オンライン処理の予測に使用
⎻ 過去の経路から未来の移動を方向を限定
HOG + SVM detector [Dalal et al., 2005]
Superpixel-based Bayesian online tracking [Wang et al., 2011]
N. Dalal, et al., “Histograms of oriented gradients for human detection,” CVPR, 2005
S. Wang, et al., “Superpixel tracking,” ICCV, 2011
20. 20
対象の向きに関する特徴
Orientation network [Huang et al., 2015]
⎻ 対象の移動方向を限定
Bayesian orientation estimation [Enzweiler et al., 2010]
⎻ 歩行者の頭部の向きを推定
S. Huang, et al., “Deep learning driven visual path prediction from a single image,” IEEE IP, 2015
M. Enzweiler, et al., “Integrated pedestrian classification and oriented estimation,” CVPR, 2010
21. 21
歩行者属性に関する特徴
• 個人によって移動速度や経路の選択が異なる
⎻ 個人に着目した予測の実現
AlexNet-based multi-task learning [Ma et al., 2017]
⎻ 歩行者属性を単一のネットワークで推定
歩行者の向き
年齢
性別
Young
Old
Male
Female
PersonalizePathway
AlexNet
W. Ma, et al., “Forecasting interactive dynamics of pedestrians with fictitious play,” CVPR, 2017
28. 28
エネルギー最小化に基づく手法 1
Deep Learning Driven Visual Path Prediction
[Huang et al., 2015]
⎻ Spatial Matching Network: 局所領域の報酬(コスト)を推定
⎻ Orientation Network: 対象の向きを推定
⎻ 報酬 + 向きから目的関数を定義
⎻ ダイクストラ法で最短経路を推定
S. Huang, et al., “Deep learning driven visual path prediction from a single image,” IEEE IP, 2015
29. 29
エネルギー最小化に基づく手法 2
Inferring ‘Dark Matter’ and ‘Dark Energy’ from Videos [Xie et al.,
2013]
⎻ 歩行者はシーン中の目的地に向かって移動していると仮定
e.g. bending machine, chairs, exit
⎻ 目的地を中心に引き寄せられるようなエネルギー場を生成
⎻ ダイクストラ法で各目的地へ到達する最短経路を推定
D. Xie, et al., “Inferring dark matter and dark energy from videos,” ICCV, 2013
32. 32
LSTMベースの予測手法
Social LSTM [Alahi et al., 2016]
⎻ 複数の歩行者の移動経路を同時に予測
⎻ Social Pooling Layerを提案
周囲の歩行者の位置・中間層出力を入力
相互作用を考慮
ー GT ー SF
ー Linear ー Social-LSTM
A. Alahi, et al., “Social LSTM: Human trajectory prediction in crowded space,” CVPR, 2016
33. 33
CNNベースの予測手法
Pedestrian Behavior Understanding and Prediction
with DNN [Yi et al., 2016]
⎻ 過去の移動経路をスパースなボリュームデータで表現
⎻ Encoder-Decoder Networkに通すことで経路を予測
S. Yi, et al., “Pedestrian understanding and prediction with deep neural networks,” ECCV, 2016
39. 39
IRLベースの予測手法 1
Activity Forecasting [Kitani et al., 2012]
⎻ 逆強化学習を初めて導入した手法
⎻ 周囲の環境が移動経路の決定に影響を与えていると仮定
K. Kitani, et al., “Activity forecasting,” ECCV, 2012
40. 40
IRLベースの予測手法 2
Forecasting Interactive Dynamics of Pedestrians
with Fictitious Play [Ma et al., 2017]
⎻ 複数の歩行者の移動経路を同時に予測
⎻ Fictitious Playと呼ばれるゲーム理論を導入
⎻ 各時刻・各歩行者の行動を逐次的に推定
t = 3 t = 6 t = 9 t = 12
YellowGreenRed
Forecasted distribution growswith time
Agentstaketurnsforecasting
sampled from U(t )
n
sampled from µ(t )
¬n
n
n
W. Ma, et al., “Forecasting interactive dynamics of pedestrians with fictitious play,” CVPR, 2017
41. 41
IRLベースの経路予測の応用例
• 海鳥の飛行経路予測 [Hirakawa et al., 2017]
⎻ 海鳥は海上のみを飛行する
環境属性に依存
⎻ 各環境属性の影響度を教師データから学習
T. Hirakawa, et al., “Travel Time-dependent IRL for Seabird Trajectory Prediction,” ACPR, 2017
環境マップの例 メスの予測結果
オスの予測結果
44. 44
その他のアプローチ 1 ~Social force model~
• 相互作用を考慮したモデル
⎻ 引力や斥力のようなエネルギー
⎻ 障害物や他者との衝突を回避
Learning Social Etiquette [Robicquet et al., 2016]
⎻ Social Sensitivityと呼ばれる特徴を抽出
特徴に基づきクラスタを割り当てる
衝突を避ける移動経路を決定
A. Robicquet, et al., “Learning social etiquette: Human trajectory understanding in crowded scenes,” ECCV, 2016
45. 45
その他のアプローチ 2 ~Data driven~
Egocentric Future Localization [Park et al., 2016]
⎻ 予測シーンに類似した学習シーンを検索
⎻ 学習データの経路を予測シーンに転移
H.S. Park, et al., “Egocentric future localization,” CVPR, 2016
48. 48
鳥瞰視点映像のデータセット 1
• UCY Dataset [Lerner et al., 2007]
⎻ 歩行者数: 786
⎻ シーン数: 3
• ETH Dataset [Pellegrini et al., 2009]
⎻ 歩行者数: 750
⎻ シーン数: 2
A. Lerner, et al., “Crowds by example,” Computer Graphics Forum, 2007
S. Pellegrini, et al., “You’ll never walk alone: Modeling social behavior for multi-target tracking,” ICCV, 2009
49. 49
鳥瞰視点映像のデータセット 2
• Edinburgh Informatics Forum Pedestrian
Dataset [Majecka, 2009]
⎻ 歩行者数: 95,998
⎻ シーン数: 1
• Stanford Drone Dataset
[Robicquet et al., 2016]
⎻ 歩行者数: 11,216
⎻ シーン数: 8
⎻ その他の予測対象: car, bus, biker, skater,
carts
B. Majecka, et al., “Statistical models of pedestrian behavior in the forum,” PhD thesis, 2009
A. Robicquet, et al., “Learning social etiquette: Human trajectory understanding in crowded scenes,” ECCV, 2016
50. 50
監視カメラ映像のデータセット
• VIRAT Video Dataset [Oh et al., 2011]
⎻ 歩行者数: 4,021
⎻ シーン数: 11
⎻ 追加情報: 物体の座標,行動カテゴリ
• Town Centre Dataset [Benfold et al., 2009]
⎻ 歩行者数: 230
⎻ シーン数: 1
⎻ 追加情報: 頭部の座標
• Grand Central Station Dataset
[Yi et al., 2015]
⎻ 歩行者数: 12,600
⎻ シーン数: 1
S. Oh, et al., “A large-scale benchmark dataset for event recognition in surveillance video,” CVPR, 2011
B. Benfold, et al., “Stable multi-target tracking in real-time surveillance video,” CVPR, 2009
S. Yi, et al., “Understanding pedestrian behaviors from stationary crowd groups,” CVPR, 2015
51. 51
車載カメラ映像のデータセット
• Daimler Pedestrian Path Prediction
Benchmark Dataset [Schneider et al., 2013]
⎻ 歩行者数: 68
⎻ 追加情報: ステレオカメラ
• KITTI Vision Benchmark Suite
[Geiger et al., 2012]
⎻ 歩行者数: 6,336
⎻ 追加情報: ステレオカメラ,LIDAR,
地図情報
N. Schneider, et al., “Pedestrian path prediction with recursive Bayesian filters: A comparative study” GCPR, 2013
A. Geiger, et al., “Are we ready for autonomous driving? The KITTI vision benchmark suite,” CVPR, 2012
52. 52
一人称視点映像のデータセット
• EgoMotion Dataset [Park et al., 2016]
⎻ シーン数: 26
⎻ 追加情報: ステレオカメラ
• First-person Continuous Activity Dataset
[Rhinehart et al., 2017]
⎻ シーン数: 17
⎻ 追加情報: 物体情報
データセットは未公開
H.S. Park, et al., “Egocentric future localization,” CVPR, 2016
N. Rhinehart, et al., “First-person activity forecasting with online inverse reinforcement learning,” ICCV, 2017
56. 56
環境を考慮したDLによる予測手法
DESIRE [Lee et al., 2017]
⎻ DL + IRLの複合的な枠組み
RNN Enc.-Dec.: 複数の予測を生成
IRL: 予測をランク付け,Refine
⎻ Pooling無しのCNNでシーンの特徴抽出
⎻ End-to-endで学習
N. Lee, et al., “DESIRE: Distant future prediction in dynamic scenes with interacting agents,” CVPR, 2017
提案モデル 予測結果
ー Obs. ー GT
ー Result
57. 57
本サーベイで参照した文献 ~予測手法~
N. Schneider and D.M. Gavrila, “Pedestrian path prediction with recursive bayesian filters: A comparative study,” GCPR, 2013.
J.F.P. Kooij, et al., “Context-based pedestrian path prediction,” ECCV, 2014.
L. Ballan, et al., “Knowledge transfer for scene-specific motion prediction,” ECCV, 2016.
D. Xie, et al., “Inferring ‘Dark Matter’ and ‘Dark Energy’ from videos,” ICCV, 2013.
○ J. Walker, et al., “Patch to the future: Unsupervised visual prediction,” CVPR, 2014.
S. Huang, et al., “Deep learning driven visual path prediction from a single image,” IEEE IP, 2016.
S. Yi, et al., “Pedestrian behavior understanding and prediction with deep neural networks,” ECCV, 2016.
○ A. Alahi, et al., “Social LSTM: Human trajectory prediction in crowded spaces,” CVPR, 2016.
T. Fernando, et al., “Soft+ hardwired attention: An LSTM framework for human trajectory prediction and abnormal event detection,” arXiv
preprint, 2017
T. Fernando, et al., “Tree memory networks for modelling long-term temporal dependencies,” arXiv preprint, 2017.
N. Lee, et al., “DESIRE: distant future prediction in dynamic scenes with interacting agents,” CVPR, 2017.
○ K.M. Kitani, et al., “Activity forecasting,” ECCV, 2012.
N. Lee and K.M. Kitani, “Predicting wide receiver trajectories in American football,” WACV, 2016.
S.Z. Bokhari and K.M. Kitani, “Long-term activity forecasting using first-person vision,” ACCV, 2016.
N. Rhinehart and K.M. Kitani, “First-person activity forecasting with online inverse reinforcement learning,” ICCV, 2017.
W. Ma, et al., “Forecasting interactive dynamics of pedestrians with fictitious play,” CVPR, 2016.
E. Rehder, et al., “Pedestrian prediction by planning using deep neural networks,” arXiv preprint, 2017.
C.G. Keller and D.M. Gavrila, “Will the pedestrian cross? a study on pedestrian path prediction,” IEEE ITS, 2014.
E. Rehder and H. Kloeden, “Goal-directed pedestrian prediction,” ICCV Workshop, 2015.
H.S. Park, et al., “Egocentric future localization,” CVPR, 2016.
S. Su, et al., “Predicting behaviors of basketball players from first person videos,” CVPR, 2017.
K. Yamaguchi, et al., “Who are you with and where are you going?,” CVPR, 2011.
A. Robicquet, et al., “Learning social etiquette: Human trajectory understanding in crowded scenes,” ECCV, 2016.
○ … プログラム有
58. 58
本サーベイで参照した文献 ~特徴抽出法~
D. Munoz, et al., “Stacked hierarchical labeling,” ECCV, 2010.
○ J. Yang, et al., “Context driven scene parsing with attention to rare classes,” CVPR, 2014.
○ J. Long, et al., “Fully convolutional networks for semantic segmentation,” CVPR, 2015.
○ E. Shelhamer, et al., “Fully convolutional networks for semantic segmentation,” IEEE PAMI, 2017.
S. Huang, et al., “Deep learning driven visual path prediction from a single image,” IEEE IP, 2016.
○ A. Krizhevsky, et al., “ImageNet classification with deep convolutional neural networks,” NIPS, 2012.
○ J. Bromley, et al., “Signature verification using a "siamese" time delay neural network,” NIPS, 1994.
○ N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” CVPR, 2005.
M. Enzweiler and D.M. Gavrila, “Integrated pedestrian classification and orientation estimation,” CVPR, 2010.
W. Ma, et al., “Forecasting interactive dynamics of pedestrians with fictitious play,” CVPR, 2017.
○ S. Singh, et al., “Unsupervised discovery of mid-level discriminative patches,” ECCV, 2012.
○ … プログラム有