Highlights
-
whisperX Public
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
-
webvid Public
Large-scale text-video dataset. 10 million captioned short videos.
-
clip-hitchhiker Public
A Clip-Hitchiker's Guide to Long Video Retrieval [Arxiv 2022]
-
transformers Public
Forked from huggingface/transformers🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Python Apache License 2.0 UpdatedNov 12, 2023 -
PaddleOCR Public
Forked from PaddlePaddle/PaddleOCRAwesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and…
-
video2dataset Public
Forked from iejMac/video2datasetEasily create large video dataset from video urls
-
whisper-asr-webservice Public
Forked from ahmetoner/whisper-asr-webserviceOpenAI Whisper ASR Webservice API
Python MIT License UpdatedOct 5, 2023 -
LAVIS Public
Forked from salesforce/LAVISLAVIS - A One-stop Library for Language-Vision Intelligence
-
CLIP Public
Forked from openai/CLIPCLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
-
-
SimpleDiarization Public
Forked from JaesungHuh/SimpleDiarizationSimple Diarization model
-
pyannote-audio Public
Forked from pyannote/pyannote-audioNeural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
-
whisper Public
Forked from openai/whisperRobust Speech Recognition via Large-Scale Weak Supervision
Jupyter Notebook MIT License UpdatedJan 8, 2023 -
CondensedMovies Public
Story-Based Retrieval with Contextual Embeddings. Largest freely available movie video dataset. [ACCV'20]
-
-
video_features Public
Forked from v-iashin/video_featuresExtract video features from raw videos using multiple GPUs. We support RAFT and PWC flow frames as well as S3D, I3D, R(2+1)D, VGGish, CLIP, ResNet features.
-
frozen-in-time Public
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval [ICCV'21]
-
hydra Public
Forked from facebookresearch/hydraHydra is a framework for elegantly configuring complex applications
Python MIT License UpdatedNov 18, 2021 -
Automated Audiovisual Behaviour Recognition in Wild Primates
1 UpdatedSep 22, 2021 -
-
-
conceptual-12m Public
Forked from google-research-datasets/conceptual-12mConceptual 12M is a dataset containing (image-URL, caption) pairs collected for vision-and-language pre-training.
-
video-transformers Public
Implementations of Transformers for Video
-
pytorch-image-models Public
Forked from huggingface/pytorch-image-modelsPyTorch image models, scripts, pretrained weights -- (SE)ResNet/ResNeXT, DPN, EfficientNet, MixNet, MobileNet-V3/V2, MNASNet, Single-Path NAS, FBNet, and more
-
slurm_gpustat Public
Forked from albanie/slurm_gpustatA simple command line tool to show GPU usage on a SLURM cluster
-
torchvggish Public
Forked from harritaylor/torchvggishPytorch port of Google Research's VGGish model used for extracting audio features.
-
collaborative-experts Public
Forked from albanie/collaborative-expertsVideo embeddings for retrieval - code for the paper "Use What You Have: Video retrieval using representations from collaborative experts"
-
bert-as-service Public
Forked from llSourcell/bert-as-serviceMapping a variable-length sentence to a fixed-length vector using BERT model
Python MIT License UpdatedMar 20, 2019 -
pytorch-multi-label-classifier Public
Forked from pangwong/pytorch-multi-label-classifierA pytorch implemented classifier for Multiple-Label classification