chore(docs): add comparision to awkarray and zarr

hanxiao · hanxiao · commit beeb3226b61f · 2022-01-11T13:53:16.000+01:00
diff --git a/docs/get-started/what-is.md b/docs/get-started/what-is.md
@@ -40,6 +40,59 @@ DocArray is designed to maximize the local experience, with the requirement of c
 | Rich functions for data types   |&#9989;|&#10060;| &#10060;    |&#9989;|&#10060;|
 
 
+There are two other packages that people often compare DocArray to, yet I haven't use them extensively. It would be unfair to put them in the above list, so here is a dedicated section for them. 
+
+## To AwkwardArray
+
+[AwkwardArray](https://awkward-array.org/quickstart.html) is a library for manipulating JSON/dict data via Numpy idioms. Instead of working with Python dynamically typed object, AwkwardArray converts the data into precompiled routines on contiguous data. Hence, it is highly efficient.
+
+DocArray and AwkwardArray are designed with different purposes. DocArray comes from the context of deep learning engineering that works on a stream of multi/cross-modal Documents. AwkwardArray comes from particle physics where with high-performance number-crunching is the priority. Both shares the idea of having generic data structure, but are designed differently to maximize the productivity of their own domains. This results in different sets of feature functions. 
+
+When it comes to the speed, AwkwardArray is fast at column access whereas DocArray is fast at row access (streaming):
+
+```python
+import awkward as ak
+import numpy as np
+from docarray import DocumentArray
+from toytime import TimeContext
+
+da = DocumentArray.empty(100_000)
+da.embeddings = np.random.random([len(da), 64])
+
+da.texts = [f'hello {j}' for j in range(len(da))]
+
+ak_array = ak.from_iter(da.to_list())
+
+with TimeContext('iter via DocArray'):
+    for d in da:
+        pass
+
+with TimeContext('iter via awkward'):
+    for r in ak_array:
+        pass
+
+with TimeContext('access text via DocArray'):
+    da.texts
+
+with TimeContext('access text via awkward'):
+    ak_array['text']
+```
+
+```text
+iter via DocArray ...	0.004s
+iter via awkward ...	1.664s
+access text via DocArray ...	0.031s
+access text via awkward ...	0.000s
+```
+
+As one can see, you can convert a DocumentArray into AwkwardArray via `.to_list()`.
+
+## To Zarr
+
+[Zarr](https://zarr.readthedocs.io/en/stable/) is a format for the storage of chunked, compressed, N-dimensional arrays. I know Zarr quite long time ago, to me it is the package when a `numpy.ndarray` is so big to fit into memory. Zarr provides a comprehensive set of functions that allows one to chunk, compress, stream large NdArray. Hence, from that perspective, Zarr like `numpy.ndarray` focuses on numerical representation and computation.
+
+In DocArray, the basic element one would work with is a Document, not `ndarray`. The support of `ndarray` is important, but not the full story: in the context of deep learning engineering, it is often an intermediate representation of Document for computing, then being thrown away. Therefore, having a consistent data structure that can live *long enough* to cover creating, storing, computing, transferring, returning and rendering is a motivation behind DocArray.
+
 ## To Jina Users
 
 Jina 2.0-2.6 *kind of* have their own "DocArray", with very similar `Document` and `DocumentArray` interface. However, it is an old design and codebase. Since then, many redesigns and improvements have been made to boost its efficiency, usability and portability. DocArray is now an independent package that other frameworks such as future Jina 3.x and Finetuner will rely on.