Stars
A curated list of data oriented design resources.
Learning Amazon Kinesis Development
Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activelo…
Preparation links and resources for system design questions
Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.
Problems from https://datascienceprep.com/
Python port of SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm
Hystrix is a latency and fault tolerance library designed to isolate points of access to remote systems, services and 3rd party libraries, stop cascading failure and enable resilience in complex di…
A pattern-based approach for learning technical interview questions
☁️ Build multimodal AI applications with cloud-native stack
Multi-Task Deep Neural Networks for Natural Language Understanding
Persistent dict, backed by sqlite3 and pickle, multithread-safe.
End-to-End Object Detection with Transformers
Multi-Task Deep Neural Networks for Natural Language Understanding
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
Distributed Asynchronous Hyperparameter Optimization in Python
💥 Fast State-of-the-Art Tokenizers optimized for Research and Production
A booklet on machine learning systems design with exercises. NOT the repo for the book "Designing Machine Learning Systems"
A package which efficiently applies any function to a pandas dataframe or series in the fastest available manner
Fast and lightweight header-only C++ library (with Python bindings) for approximate nearest neighbor search
Python Approximate Nearest Neighbor Search in very high dimensional spaces with optimised indexing.
LambdaRank Neural Network model using Keras.
My Solutions to "A Collection of Data Science Take-Home Challenges" by Giulio Palombo.
A curated list of data mining papers about fraud detection.
The Levenshtein Python C extension module contains functions for fast computation of Levenshtein distance and string similarity
NBoost is a scalable, search-api-boosting platform for deploying transformer models to improve the relevance of search results on different platforms (i.e. Elasticsearch)