RDBMSでよくね?
本1冊ぐらい読んで判断してください
https://neo4j.com/book-graph-databases/
http://www.allthingsdistributed.com/2015/08/titan-graphdb-integration-in-dynamodb.html
In this way, graphs can scale to billions of vertices and edges, while allowing efficient queries and traversal of any subset of the graph with consistent low latency that doesn’t grow proportionally to the overall graph size. This is an important benefit for many use cases that involve accessing and traversing small subsets of a large graph. A concrete example is generating a product recommendation based on purchase interests of a user’s friends, where the relevant social connections are a small subset of the total network. Another example is for tracking inventory in a vast logistics system, where only a subset of its locations is relevant for a specific item.
https://www.sitepoint.com/why-you-should-use-neo4j-in-your-next-ruby-app/#comment-2689399402
Why is this great? Imagine a world with no foreign keys. Each entity in your database can have many relationships referring directly to other entities. If you want to explore the relationships there are no table or index scans, just a few connections to follow. This matches up well with the typical object model. It is more powerful, though, because Neo4j, while providing a lot of the database functionality that we expect, gives us tools to query for complex patterns in our data.
My typical answer is that something like a database where you're doing logging of lots of repetitive data is usually a better fit for on RDMS (or even Mongo since you're wouldn't generally have foreign keys there). For a graph database obviously things that are already graphy are on the other side of the spectrum (e.g. social networks or hierarchical structures)
Another typical answer I've seen is that you'd want Neo4j for when your data has a lot of relationships. I've found, though, that relationships start coming from places that you don't expect when you have the ability to create them easily ;)
Neo4jと、Titanというグラフデータベースも簡単に調べた結果、Neo4jの方がいいと思った。 (もちろん用途によって選択肢は変わるでしょう)
難しい事はおいといて、とりあえずインストールするとものすごく良くできたチュートリアルがあるので、自分で勝手にためしてください。
https://neo4j.com/download/
そのあとで、このあたり読むとだいたい把握できる
https://neo4j.com/developer/get-started/
http://titan.thinkaurelius.com/
AWSで公式サポートされてる?ところがやっぱり一番惹かれた
詳しくはこのあたり
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Tools.TitanDB.html
Titanというよりもグラフデータベース自体の導入記事としてもいいかも
https://blogs.aws.amazon.com/bigdata/post/Tx12NN92B1F5K0C/Building-a-Graph-Database-on-AWS-Using-Amazon-DynamoDB-and-Titan
1番の理由はデータ構造
http://s3.thinkaurelius.com/docs/titan/current/data-model.html
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Tools.TitanDB.BestPractices.html
https://neo4j.com/book-graph-databases/
の6章のNative Graph Storageに詳しく書かれている
両方読んで、(以下略)
Enterprise Clusteringの実力は?
https://neo4j.com/customers/
LinkedInとかも使ってるし
Neo4jのホスティングサービス
https://neo4j.com/developer/guide-cloud-deployment/
http://neo4jrb.readthedocs.io/en/7.0.x/
Railsでアプリ作った事あればすんなり理解できると思うけど、適当に気になった事などを書いときます。
-
Rails app作成したらまず、pretty_logged_cypher_queriesを設定しておくと Cypherの理解も進むのでオススメ
http://neo4jrb.readthedocs.io/en/7.0.x/Configuration.html -
グラフデータベースは、RDBMSと違ってリレーションも第1級オブジェクトなので、 ActiveRecordとは違って、 ActiveNode, ActiveRel(リレーション)があるYo
-
http://neo4jrb.readthedocs.io/en/7.0.x/ActiveNode.html
なるほど継承をNeo4jで表すときはLabelを複数つけるのね (Labelについては、http://neo4jrb.readthedocs.io/en/7.0.x/Introduction.html#terminology) -
http://neo4jrb.readthedocs.io/en/7.0.x/ActiveNode.html#eager-loading
with_associationsがActiveRecordでいうeager_load
なにもつけないと勝手にpreloadぽくなる。(←これはナイスな気もするけど、余計なお世話な気もする。) -
http://neo4jrb.readthedocs.io/en/7.0.x/Querying.html
Chaining associationsのところ
student.lessons.teachersみたいに has_manyをchainできる。(当然1クエリ) -
proxy_as
-
ActiveNodeでidとしてUUIDが生成されるけど、 実行されるCypher Queryにはそれとは違う別のIDが使われている事が多いのに気がつく。
これはneo4jのID関数で取得できるneo4jが内部的に生成するIDで、これ自体が直接データ構造上の位置をあらわす。
内部的には、RDBMSでおなじみのB Tree探索とかすらする必要がないという事。
(詳しくは https://neo4j.com/book-graph-databases/ の6章のNative Graph Storage)
ちなみにこのIDは、バージョンアップなどで値が変わる可能性があるので、 あくまでも一時的なクエリ生成のみに使用すべきのよう。 https://github.com/neo4jrb/neo4j/blob/master/docs/UniqueIDs.rst
Neo4j版のSQL
チュートリアルやったあとに、
https://neo4j.com/docs/cypher-refcard/current/ みながら
https://neo4j.com/graphgists/ を適当にみていると
使えるようになったような気になれます。
Label = テーブル(複数つけられる。継承を表すときは親ラベルも一緒につける)
Optinal match (left joinみたいなかんじ)
With (サブクエリ的な用途)
UNIONのpost processについてだけど、SQLサブクエリ的な用途でも使える https://neo4j.com/blog/cypher-union-query-using-collect-clause/
https://neo4j.com/blog/tuning-cypher-queries/
この部分の説明はチューニング云々ではなくてグラフデータベースの理解として重要
With the first create index on, we are setting an index on the title of a :movie; with the second, we are setting an index on the name of a :person, both of which allow us to create unique indexes. In graph databases, indexes are only used to find the starting point for queries, while in relational databases indexes are used to return a set of rows that are then used for JOINS.
で、この結果の違いを見る。よい例題だと思う。
Again, an index is used to find the starting point. We have now directed our query to zoom in on Tom Hanks and Meg Ryan and find the connections between them. This gives the query plan a very different shape:
たしかにグラフデータベースのデータ構造を考えると右側のほうがはるかに効率よいのがわかる。
この右のパターンはむしろMySQLの場合はスロークエリ監視してたらよく見るパターンのあれと同じに見える
一時テーブルどおしのjoinをしてしまうとインデックスが使えなくてデータ量が一定以上になると急に遅くなる
こういう点でもアプリケーションエンジニアは両者の内部の動きをきちんと把握しておく事が重要。
-
With > Split MATCH Clauses to Reduce Cardinality
-
Using Size with Relationships
これgistとかみてもあんまり意識されてない気がする -
Using SCAN
-
https://neo4j.com/graphgist/ac0b2c27-2a5f-4943-8b4b-100273cb285e
ネタとして面白かった
Cypher例 -
http://portal.graphgist.org/graph_gists/zombie-apocalypse
ネタとして面白かった -
http://portal.graphgist.org/graph_gists/bank-fraud-detection
このモデルは通常の用途に使えるとは思えない。たぶん検出専用に作られたモデルかな -
http://portal.graphgist.org/graph_gists/project-management
クリティカルパスもシンプルな1クエリで出せる。
クエリでEarliest Start/Finishを設定するあたりとか懐かしい -
http://portal.graphgist.org/graph_gists/network-dependency-graph
-
http://portal.graphgist.org/graph_gists/credit-card-fraud-detection http://linkurio.us/
-
http://portal.graphgist.org/graph_gists/competency-management-a-matter-of-filtering-and-recommendation-engines
ちゃんと理解できてないけど勉強になった。これは業務でそのまま使えそうな
However, it may be critical to only select those candidates who have skills, required by activities, within a certain competency area. Therefore, not to filter through node properties, but through their links or data relationships. Furthermore, it is also essential to expand the search to (and eventually beyond) 3rd degree connections between skills, activities and areas. In other words, we are looking for how potential candidates are connected to competency areas, within a depth of 3. A SQL database will need to execute more JOIN operations to provide the answer – a task that is difficult to code and creates a time-consuming query. As the depth of connections queried expands, this search will become increasingly difficult with an RDBMS and will result in incredibly poor performance.
After 1 year of operations, these parameters result in a graph of approximately 1M nodes. For a graph of this size, the query traversing paths of depth 3 (see above) requires over 30 seconds for a RDBMS to perform, but will only take less than 0.2 seconds with Neo4j [23]. The difference can be critical, whenever querying the database is part of an online tool.