<![CDATA[Explosion · RSS Feed]]> <![CDATA[Explosion is a software company specializing in developer tools and tailored solutions for Artificial Intelligence and Natural Language Processing. We’re the makers of spaCy, one of the leading open-source libraries for advanced NLP.]]> https://explosion.ai https://explosion.ai/icon.png Explosion · RSS Feed https://explosion.ai RSS for Node Wed, 14 Jan 2026 09:21:49 GMT Wed, 14 Jan 2026 09:21:49 GMT <![CDATA[All rights reserved 2026, ExplosionAI GmbH]]> <![CDATA[en]]> <![CDATA[Explosion]]> <![CDATA[Explosion]]> <![CDATA[RiCoRecA: rich cooking recipe annotation schema]]> <![CDATA[The annotation process consists of two sections. Firstly, the annotator utilized a customized Prodigy interface to complete the NER and RC annotation tasks.]]> https://www.frontiersin.org/journals/artificial-intelligence/articles/10.3389/frai.2025.1550604/full https://www.frontiersin.org/journals/artificial-intelligence/articles/10.3389/frai.2025.1550604/full <![CDATA[paper]]> <![CDATA[prodigy]]> <![CDATA[Explosion]]> Mon, 12 Jan 2026 00:00:00 GMT <![CDATA[Engineering a human-aligned LLM evaluation workflow with Prodigy and DSPy]]> <![CDATA[This post demonstrates a human-in-the-loop workflow for developing and evaluating LLMs, using Prodigy and DSPy to create task-specific, human-aligned metrics that guide model optimization beyond generic evaluation measures.]]> https://explosion.ai/blog/human-aligned-llm-evaluation-dspy blog:human-aligned-llm-evaluation-dspy <![CDATA[blog]]> <![CDATA[prodigy]]> <![CDATA[llms]]> <![CDATA[evaluation]]> <![CDATA[strategy]]> <![CDATA[Magdalena Anioł, Matthew Honnibal]]> Mon, 01 Dec 2025 00:00:00 GMT <![CDATA[Building AI with AI]]> <![CDATA[AI-powered coding assistants have transformed the way we build software, and AI itself. In this talk, Ines shows why we should use LLMs to build systems instead of as systems, and why code is more important than ever, not less.]]> https://speakerdeck.com/inesmontani/building-ai-with-ai event:pycon-ireland-2025 <![CDATA[talk]]> <![CDATA[llms]]> <![CDATA[annotation]]> <![CDATA[strategy]]> <![CDATA[Ines Montani]]> Sat, 15 Nov 2025 00:00:00 GMT <![CDATA[How AI is reshaping IT skills]]> <![CDATA[German article featuring Ines’ take on the impact of AI on future-proof skills for IT professionals.]]> https://www.connect-professional.de/markt/wie-kuenstliche-intelligenz-it-kompetenzen-neu-sortiert.334426.html https://www.connect-professional.de/markt/wie-kuenstliche-intelligenz-it-kompetenzen-neu-sortiert.334426.html <![CDATA[interview]]> <![CDATA[Ines Montani]]> Fri, 11 Jul 2025 00:00:00 GMT <![CDATA[Sovereign AI systems instead of black box solutions]]> <![CDATA[German article featuring Ines’ take on AI in industry, the role of open source, and using Generative AI to create systems.]]> https://www.it-daily.net/it-management/ki/souverane-ki-systeme https://www.it-daily.net/it-management/ki/souverane-ki-systeme <![CDATA[interview]]> <![CDATA[llms]]> <![CDATA[Ines Montani]]> Tue, 17 Jun 2025 00:00:00 GMT <![CDATA[Conquering PDFs: document understanding beyond plain text]]> <![CDATA[In this talk, Ines presents a new and modular approach for building robust document understanding systems, using state-of-the-art models and the awesome Python ecosystem.]]> https://speakerdeck.com/inesmontani/conquering-pdfs-document-understanding-beyond-plain-text event:pydata-london-2025 <![CDATA[talk]]> <![CDATA[spacy]]> <![CDATA[prodigy]]> <![CDATA[llms]]> <![CDATA[annotation]]> <![CDATA[strategy]]> <![CDATA[computer-vision]]> <![CDATA[Ines Montani]]> Sat, 07 Jun 2025 00:00:00 GMT <![CDATA[Applied NLP in the Age of Generative AI: Future-Proof Strategies for Banking and Finance]]> <![CDATA[A modern approach and mindset for building future-proof NLP pipelines in-house, focusing on use cases from banking, finance and economics.]]> https://speakerdeck.com/inesmontani/applied-nlp-in-the-age-of-generative-ai-future-proof-strategies-for-banking-and-finance event:econdat-2025 <![CDATA[talk]]> <![CDATA[llms]]> <![CDATA[finance]]> <![CDATA[Ines Montani]]> Thu, 05 Jun 2025 00:00:00 GMT <![CDATA[E^2GraphRAG: Streamlining Graph-based RAG for High Efficiency and Effectiveness]]> <![CDATA[Instead of using LLMs for entity extraction, we employ the traditional NLP tool spaCy to extract entities, and use their co-occurrence in a chunk as relations.]]> https://arxiv.org/abs/2505.24226 https://arxiv.org/abs/2505.24226 <![CDATA[paper]]> <![CDATA[spacy]]> <![CDATA[llms]]> <![CDATA[Explosion]]> Fri, 30 May 2025 00:00:00 GMT <![CDATA[Developer Trends in 2025]]> <![CDATA[Discussion with Michael Kennedy, Calvin Hendryx-Parker, Gina Häußge, Richard Campbell and Ines.]]> https://talkpython.fm/episodes/show/504/developer-trends-in-2025 event:talkpython-2025-panel <![CDATA[interview]]> <![CDATA[Ines Montani]]> Mon, 05 May 2025 00:00:00 GMT <![CDATA[AI in Reality Fireside Chat: Enterprise AI & Open-Source Innovation]]> <![CDATA[Panel discussion with Alexander CS Hendorf, Dr. Alexander Beck, Walid Mehanna and Ines.]]> https://www.youtube.com/watch?v=sAmh5S0MGhs event:pycon-de-2025-panel <![CDATA[talk]]> <![CDATA[Ines Montani]]> Thu, 24 Apr 2025 00:00:00 GMT <![CDATA[Feminist AI LAN Party]]> <![CDATA[Three days of workshops, hacking, creating, publishing and connecting locally, featuring a data development workshop with Prodigy and a session on hacking LLMs.]]> https://feministai.party/events/2025-04-23 event:pycon-de-2025-feminist-ai <![CDATA[talk]]> <![CDATA[llms]]> <![CDATA[humanities]]> <![CDATA[Ines Montani]]> Wed, 23 Apr 2025 00:00:00 GMT <![CDATA[Conquering PDFs: document understanding beyond plain text]]> <![CDATA[In this talk, Ines presents a new and modular approach for building robust document understanding systems, using state-of-the-art models and the awesome Python ecosystem.]]> https://speakerdeck.com/inesmontani/conquering-pdfs-document-understanding-beyond-plain-text event:pycon-de-2025 <![CDATA[talk]]> <![CDATA[spacy]]> <![CDATA[prodigy]]> <![CDATA[llms]]> <![CDATA[annotation]]> <![CDATA[strategy]]> <![CDATA[computer-vision]]> <![CDATA[Ines Montani]]> Wed, 23 Apr 2025 00:00:00 GMT <![CDATA[Keyword Extraction, and Aspect Classification in Sinhala, English, and Code-Mixed Content]]> <![CDATA[Keyword extraction in English is performed with a hybrid approach comprising a fine-tuned spaCy NER model, FinBERT-based KeyBERT embeddings, YAKE, and EmbedRank, which results in a combined accuracy of 91.2%.]]> https://arxiv.org/abs/2504.10679 https://arxiv.org/abs/2504.10679 <![CDATA[paper]]> <![CDATA[spacy]]> <![CDATA[media]]> <![CDATA[Explosion]]> Mon, 14 Apr 2025 00:00:00 GMT <![CDATA[KI ohne Ketten: Warum Open Source gegen Big Tech gewinnen kann]]> <![CDATA[Interview with Ines on open source, LLMs, ethics and sustainable AI development.]]> https://open.spotify.com/episode/3h74zykyYEIgcNtHEgl3CK event:unmute-it-2025 <![CDATA[interview]]> <![CDATA[Ines Montani]]> Sun, 13 Apr 2025 00:00:00 GMT <![CDATA[KI zwischen Freiheit und Kontrolle: The AI Revolution Will Not Be Monopolized]]> <![CDATA[How should we envision the use of AI in practice? And are we heading further into a black box era with larger and larger models, obscured behind APIs controlled by big tech monopolies?]]> https://speakerdeck.com/inesmontani/ki-zwischen-freiheit-und-kontrolle-the-ai-revolution-will-not-be-monopolized event:data-unplugged-2025 <![CDATA[talk]]> <![CDATA[Ines Montani]]> Thu, 10 Apr 2025 00:00:00 GMT <![CDATA[How to advocate for modular NLP in the age of Generative AI]]> <![CDATA[With all the hype around Generative AI, many are led to believe it’s the solution to everything. So how can you, as a developer, communicate the nuances and advocate for new and modular solutions that are better, easier and cheaper?]]> https://explosion.ai/blog/modular-nlp-generative-ai blog:modular-nlp-generative-ai <![CDATA[blog]]> <![CDATA[strategy]]> <![CDATA[llms]]> <![CDATA[annotation]]> <![CDATA[Ines Montani]]> Mon, 31 Mar 2025 00:00:00 GMT <![CDATA[How Love Without Sound helps the music industry recover millions in revenue for artists with NLP, spaCy and Prodigy]]> <![CDATA[A case study on Love Without Sound’s innovative AI-powered tools for the music industry and law firms specializing in royalty negotiations.]]> https://explosion.ai/blog/love-without-sound-nlp-music-industry blog:love-without-sound-nlp-music-industry <![CDATA[blog]]> <![CDATA[spacy]]> <![CDATA[prodigy]]> <![CDATA[case_study]]> <![CDATA[legal]]> <![CDATA[finance]]> <![CDATA[media]]> <![CDATA[strategy]]> <![CDATA[annotation]]> <![CDATA[llms]]> <![CDATA[Ines Montani]]> Mon, 24 Mar 2025 00:00:00 GMT <![CDATA[📚 spacy-layout v0.0.12]]> <![CDATA[Support processing PDFs with context, add document index tables and more docs]]> https://github.com/explosion/spacy-layout release:spacy-layout_0.0.12 <![CDATA[release]]> <![CDATA[spacy]]> <![CDATA[computer-vision]]> <![CDATA[Explosion]]> Sat, 08 Mar 2025 00:00:00 GMT <![CDATA[Künstliche Intelligenz: Technologie der Zukunft – und warum Open Source die Karten neu mischt]]> <![CDATA[German talk on the future of Artificial Intelligence and the impact of open-source software and models.]]> https://speakerdeck.com/inesmontani/kunstliche-intelligenz-technologie-der-zukunft-und-warum-open-source-die-karten-neu-mischt event:heise-2025 <![CDATA[talk]]> <![CDATA[Ines Montani]]> Fri, 07 Mar 2025 00:00:00 GMT <![CDATA[✨ prodigy v1.18.0]]> <![CDATA[Text editing during NER and span annotation, custom translations and more JavaScript features]]> https://prodi.gy/docs/changelog#v1.18.0 release:prodigy_1.18.0 <![CDATA[release]]> <![CDATA[prodigy]]> <![CDATA[annotation]]> <![CDATA[Explosion]]> Mon, 24 Feb 2025 00:00:00 GMT <![CDATA[Mastering spaCy]]> <![CDATA[Build structured NLP solutions with custom components and models powered by LLMs. By end of the book you will be empowered to build robust NLP pipelines and integrate them with web applications to build end-to-end solutions.]]> https://www.packtpub.com/en-us/product/mastering-spacy-9781835880463 https://www.packtpub.com/en-us/product/mastering-spacy-9781835880463 <![CDATA[book]]> <![CDATA[spacy]]> <![CDATA[llms]]> <![CDATA[Duygu Altinok]]> Fri, 14 Feb 2025 00:00:00 GMT <![CDATA[Prozessvisualisierung mit generativer KI im Praxistest]]> <![CDATA[German article by Nils Durner on visualizing technical processes with Generative AI, featuring spaCy and Presidio for PII anonymization.]]> https://www.heise.de/ratgeber/Prozessvisualisierung-mit-generativer-KI-im-Praxistest-10266093.html https://www.heise.de/ratgeber/Prozessvisualisierung-mit-generativer-KI-im-Praxistest-10266093.html <![CDATA[universe]]> <![CDATA[spacy]]> <![CDATA[llms]]> <![CDATA[Explosion]]> Mon, 10 Feb 2025 00:00:00 GMT <![CDATA[What the history of the web can teach us about the future of AI]]> <![CDATA[How will AI development look in the future? There is a lot we can learn from another groundbreaking technology: the web. This blog post takes a look at what the history of the web can teach us, and what this means for developers, models, open source and regulation.]]> https://explosion.ai/blog/history-web-future-ai blog:history-web-future-ai <![CDATA[blog]]> <![CDATA[strategy]]> <![CDATA[llms]]> <![CDATA[Ines Montani]]> Mon, 27 Jan 2025 00:00:00 GMT <![CDATA[What the history of the web can teach us about the future of AI]]> <![CDATA[In this talk, Ines takes a look at what the history of the web can teach us about the future of AI, and what this means for developers, models, open source and regulation.]]> https://speakerdeck.com/inesmontani/what-the-history-of-the-web-can-teach-us-about-the-future-of-ai event:pyconweb-2025 <![CDATA[talk]]> <![CDATA[llms]]> <![CDATA[strategy]]> <![CDATA[Ines Montani]]> Sat, 25 Jan 2025 00:00:00 GMT <![CDATA[Using natural language processing to identify emergency department patients with incidental lung nodules requiring follow-up]]> <![CDATA[CT reports were annotated by MD raters using Prodigy software to develop a stepwise NLP “pipeline” that first excluded prior or known malignancy, determined the presence of a lung nodule, and then categorized any recommended follow-up. NLP was developed using a RoBERTa large language model on the spaCy platform.]]> https://onlinelibrary.wiley.com/doi/abs/10.1111/acem.15080 https://onlinelibrary.wiley.com/doi/abs/10.1111/acem.15080 <![CDATA[paper]]> <![CDATA[spacy]]> <![CDATA[prodigy]]> <![CDATA[biomedical]]> <![CDATA[Explosion]]> Fri, 17 Jan 2025 00:00:00 GMT <![CDATA[Best Way to OCR a PDF in Python]]> <![CDATA[Tutorial by WJB Mattingly on how to use the new spaCy Layout package and Docling to convert PDFs to text.]]> https://www.youtube.com/watch?v=quJtzVxoMtE https://www.youtube.com/watch?v=quJtzVxoMtE <![CDATA[universe]]> <![CDATA[spacy]]> <![CDATA[computer-vision]]> <![CDATA[humanities]]> <![CDATA[Explosion]]> Tue, 14 Jan 2025 00:00:00 GMT <![CDATA[Streaming spaCy]]> <![CDATA[Join spaCy author and core developer Matt as he works on the library, develops features and fixes bugs, while chatting about all things NLP and open source. Every Thursday at 2pm CET and Friday at 11am CET.]]> https://www.youtube.com/playlist?list=PLBmcuObd5An5_iAxNYLJa_xWmNzsYce8c https://www.youtube.com/playlist?list=PLBmcuObd5An5_iAxNYLJa_xWmNzsYce8c <![CDATA[video]]> <![CDATA[spacy]]> <![CDATA[thinc]]> <![CDATA[Matthew Honnibal]]> Thu, 09 Jan 2025 00:00:00 GMT <![CDATA[Prodigy Dashboard Plugin]]> <![CDATA[The new dashboard plugin adds a web application for managing annotations, data analytics and annotation progress, and is now available for early beta testing.]]> https://support.prodi.gy/t/prodigy-dashboard-beta-testers-wanted-for-new-plugin/7468 https://support.prodi.gy/t/prodigy-dashboard-beta-testers-wanted-for-new-plugin/7468 <![CDATA[release]]> <![CDATA[prodigy]]> <![CDATA[annotation]]> <![CDATA[Explosion]]> Thu, 19 Dec 2024 00:00:00 GMT <![CDATA[Cracking the Code: How to Start a Career in AI]]> <![CDATA[Short video interview with Ines about the 4 skills job hunters can cultivate for a career in artificial intelligence.]]> https://www.linkedin.com/feed/update/urn:li:activity:7274774107907407872/ https://www.linkedin.com/feed/update/urn:li:activity:7274774107907407872/ <![CDATA[interview]]> <![CDATA[Ines Montani]]> Tue, 17 Dec 2024 00:00:00 GMT <![CDATA[spaCy Natural Language Processing: From Beginner to Advanced]]> <![CDATA[The first Chinese-language book on spaCy for beginners and experienced practitioners, covering traditional NLP techniques and how to leverage LLMs for various NLP tasks.]]> https://www.linkedin.com/feed/update/urn:li:activity:7274542396934119425/ https://www.linkedin.com/feed/update/urn:li:activity:7274542396934119425/ <![CDATA[book]]> <![CDATA[spacy]]> <![CDATA[Explosion]]> Mon, 16 Dec 2024 00:00:00 GMT <![CDATA[PyLadies entrepreneurs and career development]]> <![CDATA[Panel discussion about career challenges and starting your own business with Cheuk Ting Ho, Tereza Iofciu, Anwesha Das, Una Galyeva and Ines.]]> https://www.youtube.com/watch?v=V73KeBCzXpM event:pyladiescon-2024 <![CDATA[talk]]> <![CDATA[Ines Montani]]> Sat, 07 Dec 2024 00:00:00 GMT <![CDATA[Recognising non-named spatial entities in literary texts: a novel spatial entities classifier]]> <![CDATA[In this paper, we present a case study on the prediction of what we call ‘non-named spatial entities’ (NNSE) in a historical corpus of Swiss-German novels using a deep learning model in conjunction with BERT and Prodigy.]]> https://2024.computational-humanities-research.org/papers/paper59/ https://2024.computational-humanities-research.org/papers/paper59/ <![CDATA[paper]]> <![CDATA[prodigy]]> <![CDATA[humanities]]> <![CDATA[Explosion]]> Wed, 04 Dec 2024 00:00:00 GMT <![CDATA[From PDFs to AI-ready structured data: a deep dive]]> <![CDATA[This blog post presents a new modular workflow for converting PDFs and similar documents to structured data and shows you how to build end-to-end document understanding and information extraction pipelines for industry use cases.]]> https://explosion.ai/blog/pdfs-nlp-structured-data blog:pdfs-nlp-structured-data <![CDATA[blog]]> <![CDATA[spacy]]> <![CDATA[prodigy]]> <![CDATA[llms]]> <![CDATA[annotation]]> <![CDATA[strategy]]> <![CDATA[computer-vision]]> <![CDATA[Ines Montani]]> Mon, 02 Dec 2024 00:00:00 GMT <![CDATA[📚 spacy-layout v0.0.6]]> <![CDATA[Add support for tables and convert tabular data to pandas.DataFrame]]> https://github.com/explosion/spacy-layout release:spacy-layout_0.0.6 <![CDATA[release]]> <![CDATA[spacy]]> <![CDATA[computer-vision]]> <![CDATA[Explosion]]> Sun, 24 Nov 2024 00:00:00 GMT <![CDATA[🔌 prodigy-pdf v0.4.0]]> <![CDATA[Add text-based span annotation for PDFs]]> https://github.com/explosion/prodigy-pdf/releases/tag/v0.4.0 release:prodigy-pdf_0.4.0 <![CDATA[release]]> <![CDATA[prodigy]]> <![CDATA[computer-vision]]> <![CDATA[annotation]]> <![CDATA[Explosion]]> Mon, 25 Nov 2024 00:00:00 GMT <![CDATA[✨ prodigy v1.17.0]]> <![CDATA[Pages UI for multi-page tasks like longer documents, PDFs or collections of images]]> https://prodi.gy/docs/changelog#v1.17.0 release:prodigy_1.17.0 <![CDATA[release]]> <![CDATA[prodigy]]> <![CDATA[annotation]]> <![CDATA[Explosion]]> Mon, 18 Nov 2024 00:00:00 GMT <![CDATA[🔌 prodigy-pdf v0.3.0]]> <![CDATA[Support multi-page PDFs in a single view]]> https://github.com/explosion/prodigy-pdf/releases/tag/v0.3.0 release:prodigy-pdf_0.3.0 <![CDATA[release]]> <![CDATA[prodigy]]> <![CDATA[computer-vision]]> <![CDATA[annotation]]> <![CDATA[Explosion]]> Mon, 18 Nov 2024 00:00:00 GMT <![CDATA[📚 spacy-layout v0.0.1]]> <![CDATA[Process PDFs, Word documents and more with spaCy]]> https://github.com/explosion/spacy-layout release:spacy-layout_0.0.1 <![CDATA[release]]> <![CDATA[spacy]]> <![CDATA[computer-vision]]> <![CDATA[Explosion]]> Mon, 18 Nov 2024 00:00:00 GMT <![CDATA[uOttawa at LegalLens-2024: Transformer-based Classification Experiments]]> <![CDATA[Our training utilizes the spaCy pipeline configured with a transformer model and a transition-based parser for NER tasks. The deberta-v3-base model has been selected for the main transformer architecture.]]> https://arxiv.org/abs/2410.21139 https://arxiv.org/abs/2410.21139 <![CDATA[paper]]> <![CDATA[spacy]]> <![CDATA[legal]]> <![CDATA[Explosion]]> Mon, 28 Oct 2024 00:00:00 GMT <![CDATA[Distill Your LLMs and Surpass Their Performance]]> <![CDATA[In her presentation at InfoQ Dev Summit, Ines Montani provided the audience with practical solutions for using the latest state-of-the-art models in real-world applications and distilling their knowledge into smaller and faster components.]]> https://www.infoq.com/news/2024/10/efficient-mlops-llm-distillation/ https://www.infoq.com/news/2024/10/efficient-mlops-llm-distillation/ <![CDATA[universe]]> <![CDATA[llms]]> <![CDATA[strategy]]> <![CDATA[Explosion]]> Sat, 26 Oct 2024 00:00:00 GMT <![CDATA[Serverless custom NLP with LLMs, Modal and Prodigy]]> <![CDATA[In this blog post, we’ll show you how you can go from an idea and little data to a fully custom information extraction model using Prodigy and Modal, no infrastructure or GPU setup required.]]> https://explosion.ai/blog/modal-prodigy-serverless-nlp blog:modal-prodigy-serverless-nlp <![CDATA[blog]]> <![CDATA[spacy]]> <![CDATA[prodigy]]> <![CDATA[llms]]> <![CDATA[annotation]]> <![CDATA[Ines Montani, Magdalena Anioł]]> Tue, 22 Oct 2024 00:00:00 GMT <![CDATA[✨ prodigy v1.16.0]]> <![CDATA[Modal plugin for on-demand deployment, cross-platform wheels and UI fixes]]> https://prodi.gy/docs/changelog#v1.16.0 release:prodigy_1.16.0 <![CDATA[release]]> <![CDATA[prodigy]]> <![CDATA[annotation]]> <![CDATA[Explosion]]> Tue, 22 Oct 2024 00:00:00 GMT <![CDATA[Accelerate your Career with Open-Source AI]]> <![CDATA[Panel discussion about making a career out of open-source software, featuring Gael Varoquaux (scikit-learn), Steeve Morin (ZML) and Ines.]]> https://www.youtube.com/watch?v=neJ4J4PfdCE event:dotai-2024-discussion <![CDATA[talk]]> <![CDATA[Ines Montani]]> Thu, 17 Oct 2024 00:00:00 GMT <![CDATA[Reality is not an End-to-End Prediction Problem: Applied NLP in the Age of Generative AI]]> https://speakerdeck.com/inesmontani/reality-is-not-an-end-to-end-prediction-problem-applied-nlp-in-the-age-of-generative-ai event:dotai-2024-ines <![CDATA[talk]]> <![CDATA[llms]]> <![CDATA[strategy]]> <![CDATA[Ines Montani]]> Thu, 17 Oct 2024 00:00:00 GMT <![CDATA[Applied NLP with LLMs: Beyond Black-Box Monoliths]]> <![CDATA[In this talk, Ines shows some practical solutions for using the latest state-of-the-art models in real-world applications and distilling their knowledge into smaller and faster components.]]> https://speakerdeck.com/inesmontani/applied-nlp-with-llms-beyond-black-box-monoliths event:pyberlin-2024 <![CDATA[talk]]> <![CDATA[spacy]]> <![CDATA[prodigy]]> <![CDATA[llms]]> <![CDATA[strategy]]> <![CDATA[Ines Montani]]> Wed, 09 Oct 2024 00:00:00 GMT <![CDATA[The 100 who are shaping AI in Europe]]> <![CDATA[Ines is featured among the top 100 individuals who are shaping Artificial Intelligence in Europe, compiled by French newspaper l’Opinion.]]> https://www.lopinion.fr/les-100-qui-font-l-ia-en-europe https://www.lopinion.fr/les-100-qui-font-l-ia-en-europe <![CDATA[blog]]> <![CDATA[Explosion]]> Tue, 08 Oct 2024 00:00:00 GMT <![CDATA[💫 spacy v3.8.0]]> <![CDATA[Memory management for persistent services, numpy 2.0 support]]> https://github.com/explosion/spaCy/releases/tag/release-v3.8.2 release:spacy_3.8.0 <![CDATA[release]]> <![CDATA[spacy]]> <![CDATA[Explosion]]> Tue, 01 Oct 2024 00:00:00 GMT <![CDATA[Taking LLMs out of the black box: A practical guide to human-in-the-loop distillation]]> <![CDATA[LLMs have enormous potential, but also challenge existing workflows in industry that require modularity, transparency and data privacy. In this talk, Ines shows some practical solutions for using the latest models in real-world applications and distilling their knowledge into smaller and faster components that you can run and maintain in-house.]]> https://speakerdeck.com/inesmontani/taking-llms-out-of-the-black-box-a-practical-guide-to-human-in-the-loop-distillation event:infoq-munich-2024 <![CDATA[talk]]> <![CDATA[spacy]]> <![CDATA[prodigy]]> <![CDATA[llms]]> <![CDATA[annotation]]> <![CDATA[strategy]]> <![CDATA[Ines Montani]]> Thu, 26 Sep 2024 00:00:00 GMT <![CDATA[Applied NLP in the Age of Generative AI]]> <![CDATA[In this talk, Ines shares the most important lessons we’ve learned from solving real-world information extraction problems in industry, and shows you a new approach and mindset for designing robust and modular NLP pipelines in the age of Generative AI.]]> https://speakerdeck.com/inesmontani/applied-nlp-in-the-age-of-generative-ai event:pydata-amsterdam-2024-keynote <![CDATA[talk]]> <![CDATA[spacy]]> <![CDATA[prodigy]]> <![CDATA[llms]]> <![CDATA[strategy]]> <![CDATA[Ines Montani]]> Fri, 20 Sep 2024 00:00:00 GMT <![CDATA[Combining the Best of Two Worlds: From TF-IDF to Llama LLM]]> <![CDATA[Talk by William Arias, Staff Developer Advocate at GitLab, on combining traditional NLP techniques and LLMs to solve hallucination issues and create robust spaCy applications.]]> https://osseu2024.sched.com/event/b3c1139bb641e25f16be9451b5123365 https://osseu2024.sched.com/event/b3c1139bb641e25f16be9451b5123365 <![CDATA[universe]]> <![CDATA[spacy]]> <![CDATA[llms]]> <![CDATA[Explosion]]> Wed, 18 Sep 2024 00:00:00 GMT <![CDATA[How GitLab uses spaCy to analyze support tickets and empower their community]]> <![CDATA[A case study on GitLab’s large-scale NLP pipelines for extracting actionable insights from support tickets and usage questions.]]> https://explosion.ai/blog/gitlab-support-insights blog:gitlab-support-insights <![CDATA[blog]]> <![CDATA[spacy]]> <![CDATA[prodigy]]> <![CDATA[case_study]]> <![CDATA[strategy]]> <![CDATA[Ines Montani]]> Mon, 16 Sep 2024 00:00:00 GMT <![CDATA[Szczecin stolicą programowania]]> <![CDATA[News segment about EuroSciPy 2024 on local Polish television, featuring Ines’ talk and interviews with the organizers.]]> https://szczecin.tvp.pl/82006550/szczecin-stolica-programowania-miedzynarodowa-konferencja-na-politechnice-morskiej https://szczecin.tvp.pl/82006550/szczecin-stolica-programowania-miedzynarodowa-konferencja-na-politechnice-morskiej <![CDATA[universe]]> <![CDATA[Explosion]]> Wed, 28 Aug 2024 00:00:00 GMT <![CDATA[10 Years of Open Source: Navigating the Next AI Revolution]]> <![CDATA[In this talk, Ines shares the most important lessons we’ve learned in 10 years of working on open-source software, our core philosophies that helped us adapt to an ever-changing AI landscape and why open source and interoperability still wins over black-box, proprietary APIs.]]> https://speakerdeck.com/inesmontani/10-years-of-open-source-navigating-the-next-ai-revolution event:euroscipy-2024 <![CDATA[talk]]> <![CDATA[spacy]]> <![CDATA[prodigy]]> <![CDATA[Ines Montani]]> Wed, 28 Aug 2024 00:00:00 GMT <![CDATA[Assessing Fine-Tuned NER Models with Limited Data in French: Automating Detection of New Technologies, Technological Domains, and Startup Names in Renewable Energy]]> <![CDATA[In order to assure the uniformity of the process of fine-tuning each model, we decided to use the spaCy library. This library, one of the most widely used for NLP tasks, allows us to directly modify a simple configuration file in order to define the model.]]> https://www.mdpi.com/2504-4990/6/3/96 https://www.mdpi.com/2504-4990/6/3/96 <![CDATA[paper]]> <![CDATA[spacy]]> <![CDATA[humanities]]> <![CDATA[Explosion]]> Tue, 27 Aug 2024 00:00:00 GMT <![CDATA[The NLP and AI Revolution with the spaCy Creators]]> <![CDATA[In this interview with Hugo Bowne-Anderson, we delve into the forefront of NLP and the future of AI development, covering topics like human-in-the-loop distillation, open-source AI and Explosion’s journey.]]> https://vanishinggradients.fireside.fm/34 event:vanishing-gradients-2024 <![CDATA[interview]]> <![CDATA[spacy]]> <![CDATA[prodigy]]> <![CDATA[llms]]> <![CDATA[Ines Montani, Matthew Honnibal]]> Thu, 15 Aug 2024 00:00:00 GMT <![CDATA[spaCy Chunks v0.0.2]]> <![CDATA[spaCy extension and pipeline component for generating overlapping chunks of sentences or tokens from a document.]]> https://github.com/wjbmattingly/spacy-chunks https://github.com/wjbmattingly/spacy-chunks <![CDATA[universe]]> <![CDATA[spacy]]> <![CDATA[Explosion]]> Wed, 14 Aug 2024 00:00:00 GMT <![CDATA[Practical Tips for Bootstrapping Information Extraction Pipelines]]> <![CDATA[This talk presents approaches for bootstrapping NLP pipelines and retrieval via information extraction, including tips for training, modelling and data annotation.]]> https://speakerdeck.com/honnibal/practical-tips-for-bootstrapping-information-extraction-pipelines event:datahack-summit-2024-matt <![CDATA[talk]]> <![CDATA[spacy]]> <![CDATA[prodigy]]> <![CDATA[llms]]> <![CDATA[strategy]]> <![CDATA[annotation]]> <![CDATA[Matthew Honnibal]]> Fri, 09 Aug 2024 00:00:00 GMT <![CDATA[Toward Automatic Summarization of Hospital Discharge Notes]]> <![CDATA[For NLP tasks, vectorizers include spaCy token features such as part of speech (POS) tags, named entity recognition (NER) tags, dependency head relations and depth.]]> https://indigo.uic.edu/articles/thesis/Toward_Automatic_Summarization_of_Hospital_Discharge_Notes/27153255/1 https://indigo.uic.edu/articles/thesis/Toward_Automatic_Summarization_of_Hospital_Discharge_Notes/27153255/1 <![CDATA[paper]]> <![CDATA[spacy]]> <![CDATA[biomedical]]> <![CDATA[Explosion]]> Thu, 01 Aug 2024 00:00:00 GMT <![CDATA[Back to our roots: Company update and future plans]]> <![CDATA[We’re back to running Explosion as a smaller, independent-minded and self-sufficient company. spaCy and Prodigy will stay stable and sustainable, maintained by their original authors. We’ll keep updating our stack wth the latest technologies, without changing its core identity or purpose.]]> https://explosion.ai/blog/back-to-our-roots-company-update blog:back-to-our-roots-company-update <![CDATA[blog]]> <![CDATA[spacy]]> <![CDATA[prodigy]]> <![CDATA[Matthew Honnibal, Ines Montani]]> Wed, 17 Jul 2024 00:00:00 GMT <![CDATA[Building the Future of NLP: Insights on spaCy, Prodigy and Generative AI]]> https://community.analyticsvidhya.com/c/leading-with-data/ines event:leading-with-data-2024-ines <![CDATA[interview]]> <![CDATA[spacy]]> <![CDATA[prodigy]]> <![CDATA[llms]]> <![CDATA[Ines Montani]]> Wed, 10 Jul 2024 00:00:00 GMT <![CDATA[Happy 10th Birthday, spaCy!]]> <![CDATA[10 years ago today Matt pushed the first commit to spaCy. Since then, the library has evolved as the field moved forward, but also stayed true to its core mission: industrial-strength NLP.]]> https://www.linkedin.com/feed/update/urn:li:activity:7214245844407988225/ https://www.linkedin.com/feed/update/urn:li:activity:7214245844407988225/ <![CDATA[blog]]> <![CDATA[spacy]]> <![CDATA[Ines Montani]]> Wed, 03 Jul 2024 00:00:00 GMT <![CDATA[The AI Revolution Will Not Be Monopolized]]> <![CDATA[Open-source initiatives are pivotal in democratizing AI technology, offering transparent, extensible tools that empower users. Daniel Dominguez summarizes the key takeaways from Ines’ recent talk for InfoQ.]]> https://www.infoq.com/articles/ai-revolution-not-monopolized/ https://www.infoq.com/articles/ai-revolution-not-monopolized/ <![CDATA[universe]]> <![CDATA[llms]]> <![CDATA[strategy]]> <![CDATA[Explosion]]> Wed, 03 Jul 2024 00:00:00 GMT <![CDATA[Once a Maintainer: Sofie Van Landeghem]]> <![CDATA[Interview with Sofie about her work as a core maintainer of spaCy, the evolution of NLP, and why dependency management in Python is so terrible.]]> https://onceamaintainer.substack.com/p/once-a-maintainer-sofie-van-landeghem https://onceamaintainer.substack.com/p/once-a-maintainer-sofie-van-landeghem <![CDATA[interview]]> <![CDATA[spacy]]> <![CDATA[Sofie Van Landeghem]]> Fri, 28 Jun 2024 00:00:00 GMT <![CDATA[A practical guide to human-in-the-loop distillation]]> <![CDATA[This blog post presents practical solutions for using the latest state-of-the-art models in real-world applications and distilling their knowledge into smaller and faster components that you can run and maintain in-house.]]> https://explosion.ai/blog/human-in-the-loop-distillation blog:human-in-the-loop-distillation <![CDATA[blog]]> <![CDATA[spacy]]> <![CDATA[prodigy]]> <![CDATA[llms]]> <![CDATA[annotation]]> <![CDATA[strategy]]> <![CDATA[Ines Montani]]> Thu, 27 Jun 2024 00:00:00 GMT <![CDATA[How S&P Global is making markets more transparent with NLP, spaCy and Prodigy]]> <![CDATA[A case study on S&P Global’s efficient information extraction pipelines for real-time commodities trading insights in a high-security environment.]]> https://explosion.ai/blog/sp-global-commodities blog:sp-global-commodities <![CDATA[blog]]> <![CDATA[spacy]]> <![CDATA[prodigy]]> <![CDATA[case_study]]> <![CDATA[finance]]> <![CDATA[strategy]]> <![CDATA[llms]]> <![CDATA[annotation]]> <![CDATA[Ines Montani, India Kerle, Helena Steckmeister]]> Fri, 21 Jun 2024 00:00:00 GMT <![CDATA[Exploring the AI nexus with the mind behind spaCy]]> <![CDATA[In this episode, Matt takes you on a deep dive into the future of data and the challenges facing current Large Language Models (LLMs).]]> https://community.analyticsvidhya.com/c/leading-with-data/matthew-honnibal event:leading-with-data-2024-matt <![CDATA[interview]]> <![CDATA[spacy]]> <![CDATA[llms]]> <![CDATA[Matthew Honnibal]]> Wed, 12 Jun 2024 00:00:00 GMT <![CDATA[How to uncover and avoid structural biases in evaluating your Machine Learning/NLP projects]]> <![CDATA[This talk highlights common pitfalls that occur when evaluating ML and NLP approaches. It provides comprehensive advice on how to set up a solid evaluation procedure in general, and dives into a few specific use-cases to demonstrate artificial bias that unknowingly can creep in.]]> https://speakerdeck.com/sofievl/2024-06-16-pydata-london event:pydata-london-2024-sofie <![CDATA[talk]]> <![CDATA[consulting]]> <![CDATA[annotation]]> <![CDATA[strategy]]> <![CDATA[Sofie Van Landeghem]]> Sun, 16 Jun 2024 00:00:00 GMT <![CDATA[Taking LLMs out of the black box: A practical guide to human-in-the-loop distillation]]> <![CDATA[LLMs have enormous potential, but also challenge existing workflows in industry that require modularity, transparency and data privacy. In this talk, Ines shows some practical solutions for using the latest models in real-world applications and distilling their knowledge into smaller and faster components that you can run and maintain in-house.]]> https://speakerdeck.com/inesmontani/taking-llms-out-of-the-black-box-a-practical-guide-to-human-in-the-loop-distillation event:pydata-london-2024-ines <![CDATA[talk]]> <![CDATA[spacy]]> <![CDATA[prodigy]]> <![CDATA[llms]]> <![CDATA[annotation]]> <![CDATA[strategy]]> <![CDATA[Ines Montani]]> Sat, 15 Jun 2024 00:00:00 GMT <![CDATA[Simply Simplify Language]]> <![CDATA[Interactive app by the Canton of Zurich, Switzerland, using LLMs and spaCy to analyze and simplify institutional communication and make bureaucratic German more inclusive.]]> https://github.com/machinelearningZH/simply-simplify-language https://github.com/machinelearningZH/simply-simplify-language <![CDATA[universe]]> <![CDATA[spacy]]> <![CDATA[llms]]> <![CDATA[legal]]> <![CDATA[Explosion]]> Fri, 14 Jun 2024 00:00:00 GMT <![CDATA[Towards Structured Data: LLMs from Prototype to Production]]> <![CDATA[This talk presents pragmatic and practical approaches for how to use LLMs beyond just chat bots, how to ship more successful NLP projects from prototype to production and how to use the latest state-of-the-art models in real-world applications.]]> https://speakerdeck.com/inesmontani/towards-structured-data-llms-from-prototype-to-production event:cods-2024 <![CDATA[talk]]> <![CDATA[spacy]]> <![CDATA[llms]]> <![CDATA[strategy]]> <![CDATA[annotation]]> <![CDATA[humanities]]> <![CDATA[Ines Montani]]> Wed, 12 Jun 2024 00:00:00 GMT <![CDATA[spaCy meets LLMs: Using Generative AI for Structured Data]]> <![CDATA[This talk dives deeper into spaCy’s LLM integration, which provides a robust framework for extracting structured information from text, distilling large models into smaller components, and closing the gap between prototype and production.]]> https://speakerdeck.com/inesmontani/spacy-meets-llms-using-generative-ai-for-structured-data event:budapest-meetup-2024 <![CDATA[talk]]> <![CDATA[spacy]]> <![CDATA[llms]]> <![CDATA[strategy]]> <![CDATA[Ines Montani]]> Tue, 11 Jun 2024 00:00:00 GMT <![CDATA[The AI Revolution Won’t Be Monopolized]]> <![CDATA[There hasn’t been a boom like the AI boom since the .com days. And it may look like a space destined to be controlled by a couple of tech giants. But Ines Montani thinks open source will play an important role in the future of AI.]]> https://talkpython.fm/episodes/show/465/the-ai-revolution-wont-be-monopolized event:talkpython-2024-ines <![CDATA[interview]]> <![CDATA[llms]]> <![CDATA[Ines Montani]]> Sat, 08 Jun 2024 00:00:00 GMT <![CDATA[KI – Die künstlerische Intelligenz?]]> <![CDATA[Panelists are discussing the latest developments in Generative AI, hype vs. reality and what those new technologies mean for people, businesses, art, creativity and the music industry.]]> https://www.linkedin.com/feed/update/urn:li:ugcPost:7202944655083089920/ event:immergut-2024 <![CDATA[talk]]> <![CDATA[llms]]> <![CDATA[Ines Montani]]> Sat, 01 Jun 2024 00:00:00 GMT <![CDATA[ZenML v0.58.0]]> <![CDATA[New out-of-the-box Prodigy integration in ZenML for LLMs and beyond, to make data development and annotation a core part of your MLOps lifecycle.]]> https://docs.zenml.io/stacks-and-components/component-guide/annotators/prodigy https://docs.zenml.io/stacks-and-components/component-guide/annotators/prodigy <![CDATA[universe]]> <![CDATA[prodigy]]> <![CDATA[llms]]> <![CDATA[annotation]]> <![CDATA[Explosion]]> Mon, 27 May 2024 00:00:00 GMT <![CDATA[Getting Started with NLP and spaCy]]> <![CDATA[There is a lot of text data out there and maybe you're interested in getting structured data out of it. There are a lot of options out there and this course will introduce you to the field by focussing on spaCy while also exploring other tools.]]> https://training.talkpython.fm/courses/getting-started-with-spacy https://training.talkpython.fm/courses/getting-started-with-spacy <![CDATA[universe]]> <![CDATA[spacy]]> <![CDATA[llms]]> <![CDATA[Explosion]]> Wed, 15 May 2024 00:00:00 GMT <![CDATA[The application of natural language processing for the extraction of mechanistic information in toxicology]]> <![CDATA[All steps were conducted using the open-source Python package spaCy. Specifically, the NER model was trained using scispaCy en-core-sci-lg (Neumann et al., 2019) as a starting point, which allowed for a vocabulary (word vectors) and grammar trained on scientific literature.]]> https://www.frontiersin.org/journals/toxicology/articles/10.3389/ftox.2024.1393662/full https://www.frontiersin.org/journals/toxicology/articles/10.3389/ftox.2024.1393662/full <![CDATA[paper]]> <![CDATA[spacy]]> <![CDATA[biomedical]]> <![CDATA[Explosion]]> Fri, 10 May 2024 00:00:00 GMT <![CDATA[Economies of Scale Can’t Monopolise the AI Revolution]]> <![CDATA[During her presentation at QCon London, Ines Montani stated that economies of scale are not enough to create monopolies in the AI space and that open-source techniques and models will allow everybody to keep up with the “Gen AI revolution”.]]> https://www.infoq.com/news/2024/05/ai-revolution-monopol/ https://www.infoq.com/news/2024/05/ai-revolution-monopol/ <![CDATA[universe]]> <![CDATA[llms]]> <![CDATA[strategy]]> <![CDATA[Explosion]]> Fri, 03 May 2024 00:00:00 GMT <![CDATA[spaCyEx v0.0.2]]> <![CDATA[Extension for spaCy’s powerful, linguistically-aware pattern matching that introduces a RegEx-like syntax.]]> https://github.com/wjbmattingly/spacyex https://github.com/wjbmattingly/spacyex <![CDATA[universe]]> <![CDATA[spacy]]> <![CDATA[Explosion]]> Fri, 03 May 2024 00:00:00 GMT <![CDATA[The AI Revolution Will Not Be Monopolized: How open-source beats economies of scale, even for LLMs]]> <![CDATA[With the latest advancements in NLP and LLMs, and big companies like OpenAI dominating the space, many people wonder: Are we heading further into a black box era with larger and larger models, obscured behind APIs controlled by big tech monopolies?]]> https://speakerdeck.com/inesmontani/the-ai-revolution-will-not-be-monopolized-how-open-source-beats-economies-of-scale-even-for-llms event:pycon-de-2024 <![CDATA[talk]]> <![CDATA[llms]]> <![CDATA[strategy]]> <![CDATA[Ines Montani]]> Wed, 24 Apr 2024 00:00:00 GMT <![CDATA[The AI Revolution Will Not Be Monopolized: Behind the scenes]]> <![CDATA[A more in-depth look at the concepts and ideas, academic literature, related experiments and preliminary results for distilled task-specific models.]]> https://speakerdeck.com/inesmontani/the-ai-revolution-will-not-be-monopolized-behind-the-scenes event:python-oss-berlin-2024 <![CDATA[talk]]> <![CDATA[llms]]> <![CDATA[Ines Montani]]> Sun, 21 Apr 2024 00:00:00 GMT <![CDATA[🔮 thinc v9.0.0]]> <![CDATA[Better learning rate schedules and integration of thinc-apple-ops]]> https://github.com/explosion/thinc/releases/tag/v9.0.0 release:thinc_9.0.0 <![CDATA[release]]> <![CDATA[thinc]]> <![CDATA[Explosion]]> Fri, 19 Apr 2024 00:00:00 GMT <![CDATA[🤖 curated-transformers v2.0.0]]> <![CDATA[Model registry, in-place loading, beter HF Hub integration]]> https://github.com/explosion/curated-transformers/releases/tag/v2.0.0 release:curated-transformers_2.0.0 <![CDATA[release]]> <![CDATA[Explosion]]> Wed, 17 Apr 2024 00:00:00 GMT <![CDATA[Ines Montani on Natural Language Processing]]> <![CDATA[Ines speaks with host Jeremy Jung about solving problems using natural language processing. They cover generative vs. predictive tasks, creating a pipeline and breaking down problems, labeling examples for training, fine-tuning models, using LLMs to label data and build prototypes, and the spaCy NLP library.]]> https://se-radio.net/2024/04/se-radio-611-ines-montani-on-natural-language-processing/ event:se-radio-ines <![CDATA[interview]]> <![CDATA[Ines Montani]]> Tue, 09 Apr 2024 00:00:00 GMT <![CDATA[The AI Revolution Will Not Be Monopolized: How open-source beats economies of scale, even for LLMs]]> https://speakerdeck.com/inesmontani/the-ai-revolution-will-not-be-monopolized-how-open-source-beats-economies-of-scale-even-for-llms-qcon-london event:qcon-2024 <![CDATA[talk]]> <![CDATA[llms]]> <![CDATA[strategy]]> <![CDATA[Ines Montani]]> Mon, 08 Apr 2024 00:00:00 GMT <![CDATA[The AI Revolution Will Not Be Monopolized: How open-source beats economies of scale, even for LLMs]]> <![CDATA[With the latest advancements in NLP and LLMs, and big companies like OpenAI dominating the space, many people wonder: Are we heading further into a black box era with larger and larger models, obscured behind APIs controlled by big tech monopolies?]]> https://speakerdeck.com/inesmontani/the-ai-revolution-will-not-be-monopolized-how-open-source-beats-economies-of-scale-even-for-llms event:pycon-lt-2024-keynote <![CDATA[talk]]> <![CDATA[llms]]> <![CDATA[strategy]]> <![CDATA[Ines Montani]]> Fri, 05 Apr 2024 00:00:00 GMT <![CDATA[Designing for tomorrow’s programming workflows]]> <![CDATA[Modern editors and AI-powered tools like GitHub Copilot and ChatGPT are changing how people program and are transforming our workflows and developer productivity. But what does this mean for how we should be writing and designing our APIs and libraries?]]> https://speakerdeck.com/honnibal/designing-for-tomorrows-programming-workflows event:pycon-lt-2024 <![CDATA[talk]]> <![CDATA[llms]]> <![CDATA[Matthew Honnibal]]> Thu, 04 Apr 2024 00:00:00 GMT <![CDATA[🦦 weasel v0.4.0]]> <![CDATA[Allow a git repo file as asset and drop support for Python 3.6]]> https://github.com/explosion/weasel/releases/v0.4.0 release:weasel_0.4.0 <![CDATA[release]]> <![CDATA[Explosion]]> Thu, 04 Apr 2024 00:00:00 GMT <![CDATA[🔌 prodigy-evaluate v0.1.0]]> <![CDATA[Evaluate spaCy pipelines, print confusion matrices and more]]> https://github.com/explosion/prodigy-evaluate/releases/tag/v0.1.0 release:prodigy-evaluate_0.1.0 <![CDATA[release]]> <![CDATA[prodigy]]> <![CDATA[spacy]]> <![CDATA[annotation]]> <![CDATA[Explosion]]> Tue, 26 Mar 2024 00:00:00 GMT <![CDATA[Zero-Shot NER with GliNER and spaCy]]> <![CDATA[Tutorial by WJB Mattingly on how to integrate the generalist GLiNER model for Named Entity Recognition with spaCy's versatile NLP environment.]]> https://www.youtube.com/watch?v=kPOtaXk-K-0 https://www.youtube.com/watch?v=kPOtaXk-K-0 <![CDATA[universe]]> <![CDATA[spacy]]> <![CDATA[llms]]> <![CDATA[humanities]]> <![CDATA[Explosion]]> Wed, 20 Mar 2024 00:00:00 GMT <![CDATA[Constructing a knowledge base with spaCy and spacy-llm]]> <![CDATA[This blog post shows how to use spaCy and LLMs to extract entities and relationships from text and quickly tackle the complex problem of constructing a knowledge base graph from a corpus.]]> https://medium.com/mantisnlp/constructing-a-knowledge-base-with-spacy-and-spacy-llm-f65b50ea534d https://medium.com/mantisnlp/constructing-a-knowledge-base-with-spacy-and-spacy-llm-f65b50ea534d <![CDATA[universe]]> <![CDATA[spacy]]> <![CDATA[llms]]> <![CDATA[Explosion]]> Thu, 22 Feb 2024 00:00:00 GMT <![CDATA[T-RAG: Lessons from the LLM Trenches]]> <![CDATA[An important application area is question answering over private enterprise documents where the main considerations are data security, which necessitates applications that can be deployed on-prem, [and] limited computational resources. [...] In addition to retrieving contextual documents, we use the spaCy library with custom rules to detect named entities from the organization.]]> https://arxiv.org/abs/2402.07483 https://arxiv.org/abs/2402.07483 <![CDATA[paper]]> <![CDATA[spacy]]> <![CDATA[llms]]> <![CDATA[Explosion]]> Mon, 12 Feb 2024 00:00:00 GMT <![CDATA[✨ prodigy v1.15.0]]> <![CDATA[New company plugins and support for SSO]]> https://prodi.gy/docs/changelog#v1.15.0 release:prodigy_1.15.0 <![CDATA[release]]> <![CDATA[prodigy]]> <![CDATA[annotation]]> <![CDATA[Explosion]]> Thu, 15 Feb 2024 00:00:00 GMT <![CDATA[How Nesta uses NLP to process 7m job ads and shed light on the UK’s labor market]]> <![CDATA[A case study on Nesta’s workflow for extracting 7 million job ads to better understand UK skill demand, using a custom mapping step to match skills to any government taxonomy.]]> https://explosion.ai/blog/nesta-skills blog:nesta-skills <![CDATA[blog]]> <![CDATA[spacy]]> <![CDATA[prodigy]]> <![CDATA[case_study]]> <![CDATA[strategy]]> <![CDATA[annotation]]> <![CDATA[India Kerle, Helena Steckmeister]]> Mon, 05 Feb 2024 00:00:00 GMT <![CDATA[Describing Images Fast and Slow: Quantifying and Predicting the Variation in Human Signals during Visuo-Linguistic Processes]]> <![CDATA[We use the spaCy library for tokenization, part-of-speech tagging, and lemmatization of the words in the descriptions.]]> https://arxiv.org/abs/2402.01352 https://arxiv.org/abs/2402.01352 <![CDATA[paper]]> <![CDATA[spacy]]> <![CDATA[computer-vision]]> <![CDATA[Explosion]]> Fri, 02 Feb 2024 00:00:00 GMT <![CDATA[KAZU v1.5]]> <![CDATA[A biomedical NLP framework designed to handle production workloads, built by AstraZeneca and Korea University and using spaCy under the hood.]]> https://github.com/AstraZeneca/KAZU https://github.com/AstraZeneca/KAZU <![CDATA[universe]]> <![CDATA[spacy]]> <![CDATA[biomedical]]> <![CDATA[Explosion]]> Mon, 29 Jan 2024 00:00:00 GMT <![CDATA[spacy-llm: From quick prototyping with LLMs to more reliable and efficient NLP solutions]]> <![CDATA[LLMs are paving the way for fast prototyping of NLP applications. Here, Sofie showcases how to build a structured NLP pipeline to mine clinical trials, using spaCy and spacy-llm. Moving beyond a fast prototype, she offers pragmatic solutions to make the pipeline more reliable and cost efficient.]]> https://speakerdeck.com/sofievl/2024-01-23-az event:az-2024 <![CDATA[talk]]> <![CDATA[spacy]]> <![CDATA[llms]]> <![CDATA[biomedical]]> <![CDATA[Sofie Van Landeghem]]> Tue, 23 Jan 2024 00:00:00 GMT <![CDATA[Microsoft Presidio v2.2.352]]> <![CDATA[Context aware, pluggable and customizable PII de-identification and anonymization service for text and images, featuring a spaCy back-end.]]> https://github.com/microsoft/presidio https://github.com/microsoft/presidio <![CDATA[universe]]> <![CDATA[spacy]]> <![CDATA[legal]]> <![CDATA[computer-vision]]> <![CDATA[Explosion]]> Mon, 22 Jan 2024 00:00:00 GMT <![CDATA[🦙 spacy-llm v0.7.0]]> <![CDATA[Supporting arbitrarily long docs and various new tasks]]> https://github.com/explosion/spacy-llm/releases/tag/v0.7.0 release:spacy-llm_0.7.0 <![CDATA[release]]> <![CDATA[spacy]]> <![CDATA[llms]]> <![CDATA[Explosion]]> Fri, 19 Jan 2024 00:00:00 GMT <![CDATA[Muted: Multilingual Targeted Offensive Speech Identification and Visualization]]> <![CDATA[Muted can leverage any transformer-based HAP-classification model [...] to identify toxic spans, without further fine-tuning. In addition, we use the spaCy library to identify the specific targets and arguments for the words predicted by the attention heatmaps.]]> https://arxiv.org/abs/2312.11344 https://arxiv.org/abs/2312.11344 <![CDATA[paper]]> <![CDATA[spacy]]> <![CDATA[media]]> <![CDATA[humanities]]> <![CDATA[Explosion]]> Mon, 18 Dec 2023 00:00:00 GMT <![CDATA[Prodigy-Segment for Pixel Segmentation]]> <![CDATA[Use Meta’s “Segment Anything” model in Prodigy to help you select the right pixels in images.]]> https://www.youtube.com/watch?v=W-wYXumFJRE https://www.youtube.com/watch?v=W-wYXumFJRE <![CDATA[video]]> <![CDATA[prodigy]]> <![CDATA[annotation]]> <![CDATA[computer-vision]]> <![CDATA[Vincent D. Warmerdam]]> Wed, 13 Dec 2023 00:00:00 GMT <![CDATA[🔌 prodigy-segment v0.1.0]]> <![CDATA[Select pixels in Prodigy via Meta’s “Segment Anything” model]]> https://github.com/explosion/prodigy-segment/releases/tag/v0.1.0 release:prodigy-segment_0.1.0 <![CDATA[release]]> <![CDATA[prodigy]]> <![CDATA[computer-vision]]> <![CDATA[annotation]]> <![CDATA[Explosion]]> Wed, 13 Dec 2023 00:00:00 GMT <![CDATA[✨ prodigy v1.14.12]]> <![CDATA[Audio UI improvements, resetQueue callback, prodigy-segment plugin]]> https://prodi.gy/docs/changelog#v1.14.12 release:prodigy_1.14.12 <![CDATA[release]]> <![CDATA[prodigy]]> <![CDATA[annotation]]> <![CDATA[Explosion]]> Wed, 13 Dec 2023 00:00:00 GMT <![CDATA[Herding LLMs Towards Structured NLP]]> <![CDATA[This talk shows how we integrate LLMs into spaCy, leveraging its modular and customizable framework. This allows for cheaper, faster and more robust NLP - driven by cutting-edge LLMs, without compromising on having structured, validated data.]]> https://speakerdeck.com/rmitsch/herding-llms-towards-structured-nlp event:global-ai-2023 <![CDATA[talk]]> <![CDATA[spacy]]> <![CDATA[llms]]> <![CDATA[biomedical]]> <![CDATA[Raphael Mitsch]]> Tue, 12 Dec 2023 00:00:00 GMT <![CDATA[DeepZensols: A Deep Learning Natural Language Processing Framework for Experimentation and Reproducibility]]> <![CDATA[A linguistic feature mapper that translates spaCy to wordpieces, which are token sub-units with associated vectors, is also accessible as an easy to configure module.]]> https://aclanthology.org/2023.nlposs-1.16/ https://aclanthology.org/2023.nlposs-1.16/ <![CDATA[paper]]> <![CDATA[spacy]]> <![CDATA[Explosion]]> Fri, 01 Dec 2023 00:00:00 GMT <![CDATA[On the Creation of Classifiers to Support Assessment of E-Portfolios]]> <![CDATA[In this workflow, Prodigy selects and presents text examples that were classified with a very low degree of certainty. The annotator reviews the proposed classifications and corrects them, if necessary.]]> https://aisop.de/publikationen/ https://aisop.de/publikationen/ <![CDATA[paper]]> <![CDATA[prodigy]]> <![CDATA[spacy]]> <![CDATA[Explosion]]> Fri, 01 Dec 2023 00:00:00 GMT <![CDATA[Prodigy in 2023: LLMs, task routers, QA and plugins]]> <![CDATA[We have made a ton of new updates in Prodigy this year with v1.12, v1.13, and v1.14 releases. So we decided to write a post about them.]]> https://explosion.ai/blog/prodigy-2023-updates blog:prodigy-2023-updates <![CDATA[blog]]> <![CDATA[prodigy]]> <![CDATA[llms]]> <![CDATA[annotation]]> <![CDATA[Magdalena Anioł, Vincent D. Warmerdam, Kabir Khan, Ryan Wesslen]]> Wed, 29 Nov 2023 00:00:00 GMT <![CDATA[Neuradicon: operational representation learning of neuroimaging reports]]> <![CDATA[Labelled data for each task was produced using the Prodigy labelling tool. Each report was labelled in a paired-annotation manner. [...] We used the grammatical dependency parse produced by the spaCy parser as input and implemented the patterns using the spaCy dependency matcher.]]> https://arxiv.org/abs/2107.10021 https://arxiv.org/abs/2107.10021 <![CDATA[paper]]> <![CDATA[prodigy]]> <![CDATA[spacy]]> <![CDATA[biomedical]]> <![CDATA[Explosion]]> Mon, 27 Nov 2023 00:00:00 GMT <![CDATA[Who said what: using machine learning to correctly attribute quotes]]> <![CDATA[How the Guardian uses spaCy and Prodigy to train a custom coreference resolution model.]]> https://www.theguardian.com/info/2023/nov/21/who-said-what-using-machine-learning-to-correctly-attribute-quotes https://www.theguardian.com/info/2023/nov/21/who-said-what-using-machine-learning-to-correctly-attribute-quotes <![CDATA[universe]]> <![CDATA[spacy]]> <![CDATA[prodigy]]> <![CDATA[media]]> <![CDATA[annotation]]> <![CDATA[Explosion]]> Tue, 21 Nov 2023 00:00:00 GMT <![CDATA[Launching the Explosion Merch Store]]> <![CDATA[Spread the love and support us and our open-source work with some of our unique, custom-designed swag. All orders come with free shipping and stickers!]]> https://explosion.ai/merch https://explosion.ai/merch <![CDATA[blog]]> <![CDATA[spacy]]> <![CDATA[Ines Montani, Helena Steckmeister]]> Mon, 20 Nov 2023 00:00:00 GMT <![CDATA[Impoliteness and morality as instruments of destructive informal social control in online harassment targeting Swedish journalists]]> <![CDATA[In the annotation tool Prodigy used for this process, the tweets directed towards journalists were displayed alongside the initial tweet that initiated the conversation thread and the subsequent reply from the journalist.]]> https://www.sciencedirect.com/science/article/pii/S0271530923000678 https://www.sciencedirect.com/science/article/pii/S0271530923000678 <![CDATA[paper]]> <![CDATA[prodigy]]> <![CDATA[media]]> <![CDATA[humanities]]> <![CDATA[Explosion]]> Thu, 16 Nov 2023 00:00:00 GMT <![CDATA[calamanCy: A Tagalog Natural Language Processing Toolkit]]> <![CDATA[We introduce calamanCy, an open-source toolkit for constructing NLP pipelines for Tagalog. It is built on top of spaCy, enabling easy experimentation and integration with other frameworks.]]> https://arxiv.org/abs/2311.07171 https://arxiv.org/abs/2311.07171 <![CDATA[paper]]> <![CDATA[spacy]]> <![CDATA[Lj Miranda]]> Mon, 13 Nov 2023 00:00:00 GMT <![CDATA[Developing a Named Entity Recognition Dataset for Tagalog]]> <![CDATA[We used Prodigy as our annotation tool. We set up a web server on the Google Cloud Platform and routed the examples through Prodigy’s built-in task router.]]> https://arxiv.org/abs/2311.07161 https://arxiv.org/abs/2311.07161 <![CDATA[paper]]> <![CDATA[spacy]]> <![CDATA[prodigy]]> <![CDATA[Lj Miranda]]> Mon, 13 Nov 2023 00:00:00 GMT <![CDATA[🔌 prodigy-whisper v0.1.0]]> <![CDATA[Audio transcription with OpenAI’s Whisper model in the loop]]> https://github.com/explosion/prodigy-whisper release:prodigy-whisper_0.1.0 <![CDATA[release]]> <![CDATA[prodigy]]> <![CDATA[annotation]]> <![CDATA[llms]]> <![CDATA[Explosion]]> Sun, 12 Nov 2023 00:00:00 GMT <![CDATA[Introducing Prodigy-HF]]> <![CDATA[Last week, Explosion introduced Prodigy-HF, a new Prodigy plugin offering code recipes that directly integrate with the Hugging Face stack.]]> https://huggingface.co/blog/prodigy-hf https://huggingface.co/blog/prodigy-hf <![CDATA[universe]]> <![CDATA[prodigy]]> <![CDATA[Vincent D. Warmerdam]]> Tue, 07 Nov 2023 00:00:00 GMT <![CDATA[State-of-the-Art Transformer Pipelines in spaCy]]> <![CDATA[In this talk, we will show you how you can use transformer models (from pretrained models such as XLM-RoBERTa to large language models like Llama2) to create state-of-the-art annotation pipelines for text annotation tasks such as named entity recognition.]]> https://www.youtube.com/watch?v=clnhMaTq1ZA event:aigrunn-2023 <![CDATA[talk]]> <![CDATA[spacy]]> <![CDATA[llms]]> <![CDATA[Daniël de Kok, Madeesh Kannan]]> Fri, 10 Nov 2023 00:00:00 GMT <![CDATA[Half hour of labeling power: Can we beat GPT?]]> <![CDATA[Large Language Models (LLMs) offer a lot of value for modern NLP and can typically achieve surprisingly good accuracy on predictive NLP tasks. But can we do even better than that? In this workshop we show how to use LLMs at development time to create high-quality datasets and train specific, smaller, private and more accurate models for your business problems.]]> https://speakerdeck.com/inesmontani/workshop-half-hour-of-labeling-power-can-we-beat-gpt event:pydata-nyc-2023 <![CDATA[talk]]> <![CDATA[prodigy]]> <![CDATA[llms]]> <![CDATA[Ines Montani, Ryan Wesslen]]> Wed, 01 Nov 2023 00:00:00 GMT <![CDATA[GERNERMED++: Semantic annotation in German medical NLP through transfer-learning, translation and word alignment]]> <![CDATA[The training of our entity recognition model employs the entity recognition parser from the spaCy library which follows a transducer-based parsing approach with a BILOU scheme instead of a state-agnostic token tagging approach.]]> https://www.sciencedirect.com/science/article/pii/S1532046423002344 https://www.sciencedirect.com/science/article/pii/S1532046423002344 <![CDATA[paper]]> <![CDATA[spacy]]> <![CDATA[biomedical]]> <![CDATA[Explosion]]> Wed, 01 Nov 2023 00:00:00 GMT <![CDATA[Prodigy-ANN for Image Retrieval via CLIP]]> <![CDATA[Dealing with a huge bucket of images that you want to annotate? The new image retrieval features in Prodigy-ANN (approximate nearest neighbors) might help!]]> https://www.youtube.com/watch?v=vhbyekSsG8o https://www.youtube.com/watch?v=vhbyekSsG8o <![CDATA[video]]> <![CDATA[prodigy]]> <![CDATA[annotation]]> <![CDATA[computer-vision]]> <![CDATA[Vincent D. Warmerdam]]> Mon, 30 Oct 2023 00:00:00 GMT <![CDATA[Explosion, NLP, Generative AI, Entrepreneurship]]> https://www.youtube.com/watch?v=XNFqFT-DZwo event:learning-from-ml-ines <![CDATA[interview]]> <![CDATA[Ines Montani]]> Thu, 26 Oct 2023 00:00:00 GMT <![CDATA[How many Labelled Examples do you need for a BERT-sized Model to Beat GPT-4 on Predictive Tasks?]]> <![CDATA[How does in-context learning compare to supervised approaches on predictive tasks? How many labelled examples do you need on different problems before a BERT-sized model can beat GPT-4 in accuracy? The answer might surprise you: models with fewer than 1b parameters are actually very good at classic predictive NLP, while in-context learning struggles on many problem shapes.]]> https://speakerdeck.com/honnibal/how-many-labelled-examples-do-you-need-for-a-bert-sized-model-to-beat-gpt-4-on-predictive-tasks event:gen-ai-summit-2023 <![CDATA[talk]]> <![CDATA[llms]]> <![CDATA[Matthew Honnibal]]> Wed, 25 Oct 2023 00:00:00 GMT <![CDATA[✨ prodigy v1.14.5]]> <![CDATA[Toggle for character vs. token highlighting, CSS and JS from local and remote paths]]> https://prodi.gy/docs/changelog#v1.14.5 release:prodigy_1.14.5 <![CDATA[release]]> <![CDATA[prodigy]]> <![CDATA[annotation]]> <![CDATA[Explosion]]> Tue, 24 Oct 2023 00:00:00 GMT <![CDATA[Prodigy-PDF for PDF annotation and OCR]]> <![CDATA[Want to annotate PDF files? Our new Prodigy plugin can help with that! To explain how to use PDF segmentation and OCR, Vincent made a small demo video.]]> https://www.youtube.com/watch?v=rwyze49ne8I https://www.youtube.com/watch?v=rwyze49ne8I <![CDATA[video]]> <![CDATA[prodigy]]> <![CDATA[annotation]]> <![CDATA[computer-vision]]> <![CDATA[Vincent D. Warmerdam]]> Tue, 24 Oct 2023 00:00:00 GMT <![CDATA[Toward a Critical Toponymy Framework for Named Entity Recognition: A Case Study of Airbnb in New York City]]> <![CDATA[All annotation was performed using Prodigy following an initial training session where annotators collaboratively annotated a randomly chosen set of samples.]]> https://arxiv.org/abs/2310.15302 https://arxiv.org/abs/2310.15302 <![CDATA[paper]]> <![CDATA[prodigy]]> <![CDATA[humanities]]> <![CDATA[Explosion]]> Mon, 23 Oct 2023 00:00:00 GMT <![CDATA[🔌 prodigy-hf v0.1.0]]> <![CDATA[Train Hugging Face models with Prodigy annotations]]> https://github.com/explosion/prodigy-hf/releases/tag/v0.1.0 release:prodigy-hf_0.1.0 <![CDATA[release]]> <![CDATA[prodigy]]> <![CDATA[annotation]]> <![CDATA[Explosion]]> Mon, 23 Oct 2023 00:00:00 GMT <![CDATA[Identifying Signs and Symptoms of Urinary Tract Infection from Emergency Department Clinical Notes Using Large Language Models]]> <![CDATA[For annotation we employed Prodigy, a scriptable annotation tool designed to maximize efficiency, enabling data scientists to perform the annotation tasks themselves and facilitating rapid iterative development in natural language processing (NLP) projects.]]> https://www.medrxiv.org/content/10.1101/2023.10.20.23297156v1 https://www.medrxiv.org/content/10.1101/2023.10.20.23297156v1 <![CDATA[paper]]> <![CDATA[prodigy]]> <![CDATA[biomedical]]> <![CDATA[llms]]> <![CDATA[Explosion]]> Fri, 20 Oct 2023 00:00:00 GMT <![CDATA[DaCy v2.7.2]]> <![CDATA[State-of-the-Art Danish NLP pipelines for spaCy]]> https://centre-for-humanities-computing.github.io/DaCy/ https://centre-for-humanities-computing.github.io/DaCy/ <![CDATA[universe]]> <![CDATA[spacy]]> <![CDATA[Explosion]]> Wed, 11 Oct 2023 00:00:00 GMT <![CDATA[Natural Language Processing and Python]]> https://www.pythonshow.com/p/18-natural-language-processing-and#details event:python-show-ines <![CDATA[interview]]> <![CDATA[Ines Montani]]> Wed, 11 Oct 2023 00:00:00 GMT <![CDATA[✨ prodigy v1.14.3]]> <![CDATA[Inter-annotator agreement for document-level and token-level annotations, new plugins]]> https://prodi.gy/docs/changelog#v1.14.3 release:prodigy_1.14.3 <![CDATA[release]]> <![CDATA[prodigy]]> <![CDATA[annotation]]> <![CDATA[Explosion]]> Fri, 06 Oct 2023 00:00:00 GMT <![CDATA[🦙 spacy-llm v0.6.0]]> <![CDATA[PaLM, Azure OpenAI, Mistral & fixed OS model responses]]> https://github.com/explosion/spacy-llm/releases/tag/v0.6.0 release:spacy-llm_0.6.0 <![CDATA[release]]> <![CDATA[spacy]]> <![CDATA[llms]]> <![CDATA[Explosion]]> Thu, 05 Oct 2023 00:00:00 GMT <![CDATA[🤖 curated-transformers v1.3.0]]> <![CDATA[Custom model repositories, NVTX Ranges, store config in models]]> https://github.com/explosion/curated-transformers/releases/tag/v1.3.0 release:curated-transformers_1.3.0 <![CDATA[release]]> <![CDATA[Explosion]]> Mon, 02 Oct 2023 00:00:00 GMT <![CDATA[💫 spacy v3.7.0]]> <![CDATA[Trained pipelines using Curated Transformers and support for Python 3.12]]> https://github.com/explosion/spaCy/releases/tag/v3.7.0 release:spacy_3.7.0 <![CDATA[release]]> <![CDATA[spacy]]> <![CDATA[Explosion]]> Mon, 02 Oct 2023 00:00:00 GMT <![CDATA[🔌 prodigy-lunr v0.1.0]]> <![CDATA[Document search via LUNR to fetch relevant data subsets to label]]> https://github.com/explosion/prodigy-lunr/releases/tag/v0.1.0 release:prodigy-lunr_0.1.0 <![CDATA[release]]> <![CDATA[prodigy]]> <![CDATA[annotation]]> <![CDATA[computer-vision]]> <![CDATA[Explosion]]> Thu, 05 Oct 2023 00:00:00 GMT <![CDATA[🔌 prodigy-ann v0.1.0]]> <![CDATA[Use ANN techniques to fetch relevant data subsets to label]]> https://github.com/explosion/prodigy-ann/releases/tag/v0.1.0 release:prodigy-ann_0.1.0 <![CDATA[release]]> <![CDATA[prodigy]]> <![CDATA[annotation]]> <![CDATA[Explosion]]> Thu, 05 Oct 2023 00:00:00 GMT <![CDATA[🔌 prodigy-pdf v0.1.0]]> <![CDATA[Annotate and segment PDF files and perform OCR]]> https://github.com/explosion/prodigy-pdf/releases/tag/v0.1.0 release:prodigy-pdf_0.1.0 <![CDATA[release]]> <![CDATA[prodigy]]> <![CDATA[computer-vision]]> <![CDATA[annotation]]> <![CDATA[Explosion]]> Thu, 05 Oct 2023 00:00:00 GMT <![CDATA[scispacy v0.5.3]]> <![CDATA[A Python package containing spaCy models for processing biomedical, scientific or clinical text, developed by AI2.]]> https://allenai.github.io/scispacy/ https://allenai.github.io/scispacy/ <![CDATA[universe]]> <![CDATA[spacy]]> <![CDATA[biomedical]]> <![CDATA[Explosion]]> Sat, 30 Sep 2023 00:00:00 GMT <![CDATA[✨ prodigy v1.14.1]]> <![CDATA[Custom event hooks for custom UI interactivity]]> https://prodi.gy/docs/changelog#v1.14.1 release:prodigy_1.14.1 <![CDATA[release]]> <![CDATA[prodigy]]> <![CDATA[annotation]]> <![CDATA[Explosion]]> Fri, 29 Sep 2023 00:00:00 GMT <![CDATA[MP Interests Tracker: Utilising GenAI to uncover insights in the UK Register of Financial Interest]]> <![CDATA[Project from teams at The Times and BBC using spacy-llm to make complex financial interests data more accessible.]]> https://www.journalismai.info/blog/mp-interests-tracker-utilising-genai-for-uncovering-insights-in-the-uk-register-of-financial-interest https://www.journalismai.info/blog/mp-interests-tracker-utilising-genai-for-uncovering-insights-in-the-uk-register-of-financial-interest <![CDATA[universe]]> <![CDATA[spacy]]> <![CDATA[llms]]> <![CDATA[media]]> <![CDATA[finance]]> <![CDATA[Explosion]]> Tue, 26 Sep 2023 00:00:00 GMT <![CDATA[Panel: Large Language Models]]> <![CDATA[with Ines, Alejandro Saucedo (Zalando, Institute for Ethical AI & ML), Alina Lehnhard (Cerence), Michael Gerz (Heidelberg University), Alexander CS Hendorf (Königsweg)]]> https://www.youtube.com/watch?v=I9f4n_sUgV8 event:pydata-bbq-panel-2023 <![CDATA[talk]]> <![CDATA[llms]]> <![CDATA[Ines Montani]]> Thu, 21 Sep 2023 00:00:00 GMT <![CDATA[Natural Intelligence is All You Need[tm]]]> https://www.youtube.com/watch?v=C9p7suS-NGk event:pydata-amsterdam-2023-keynote <![CDATA[talk]]> <![CDATA[Vincent D. Warmerdam]]> Thu, 14 Sep 2023 00:00:00 GMT <![CDATA[🛸 spacy-transformers v1.3.1]]> <![CDATA[Support for newer versions of Transformers]]> https://github.com/explosion/spacy-transformers/releases/tag/v1.3.1 release:spacy-transformers_1.3.1 <![CDATA[release]]> <![CDATA[spacy]]> <![CDATA[Explosion]]> Tue, 26 Sep 2023 00:00:00 GMT <![CDATA[🦙 spacy-llm v0.5.0]]> <![CDATA[Improved user API and novel Chain-of-Thought prompting for more accurate NER]]> https://github.com/explosion/spacy-llm/releases/tag/v0.5.0 release:spacy-llm_0.5.0 <![CDATA[release]]> <![CDATA[spacy]]> <![CDATA[llms]]> <![CDATA[Explosion]]> Fri, 08 Sep 2023 00:00:00 GMT <![CDATA[✨ prodigy v1.13.2]]> <![CDATA[New LLM recipes for terms generation and prompt engineering]]> https://prodi.gy/docs/changelog#v1.13.2 release:prodigy_1.13.2 <![CDATA[release]]> <![CDATA[prodigy]]> <![CDATA[annotation]]> <![CDATA[llms]]> <![CDATA[Explosion]]> Thu, 07 Sep 2023 00:00:00 GMT <![CDATA[Into the Single Cell Multiverse: an End-to-End Dataset for Procedural Knowledge Extraction in Biomedical Texts]]> <![CDATA[Tissue, cell type, tool, and method were annotated using the Prodigy software tool developed by Explosion AI for easy tracking of token-level tags.]]> https://arxiv.org/abs/2309.01812 https://arxiv.org/abs/2309.01812 <![CDATA[paper]]> <![CDATA[prodigy]]> <![CDATA[biomedical]]> <![CDATA[Explosion]]> Mon, 04 Sep 2023 00:00:00 GMT <![CDATA[Models as annotators in Prodigy]]> <![CDATA[How to use models and LLMs as annotators to find disagreements and prioritize examples to annotate first.]]> https://www.youtube.com/watch?v=SuFAXOgw35U https://www.youtube.com/watch?v=SuFAXOgw35U <![CDATA[video]]> <![CDATA[prodigy]]> <![CDATA[annotation]]> <![CDATA[llms]]> <![CDATA[Vincent D. Warmerdam]]> Tue, 29 Aug 2023 00:00:00 GMT <![CDATA[✨ prodigy v1.13.1]]> <![CDATA[Use models and LLMs as annotators to find disagreements]]> https://prodi.gy/docs/changelog#v1.13.1 release:prodigy_1.13.1 <![CDATA[release]]> <![CDATA[prodigy]]> <![CDATA[annotation]]> <![CDATA[llms]]> <![CDATA[Explosion]]> Wed, 23 Aug 2023 00:00:00 GMT <![CDATA[✨ prodigy v1.13.0]]> <![CDATA[LLM support for NER, text classification and span categorization]]> https://prodi.gy/docs/changelog#v1.13.0 release:prodigy_1.13.0 <![CDATA[release]]> <![CDATA[prodigy]]> <![CDATA[annotation]]> <![CDATA[llms]]> <![CDATA[Explosion]]> Tue, 15 Aug 2023 00:00:00 GMT <![CDATA[🦦 weasel v0.3.0]]> <![CDATA[Updates for requirements checks]]> https://github.com/explosion/weasel/releases/v0.3.0 release:weasel_0.3.0 <![CDATA[release]]> <![CDATA[Explosion]]> Mon, 14 Aug 2023 00:00:00 GMT <![CDATA[🔮 thinc v8.2.0]]> <![CDATA[Updates for automatic imports]]> https://github.com/explosion/thinc/releases/v8.2.0 release:thinc_8.2.0 <![CDATA[release]]> <![CDATA[thinc]]> <![CDATA[Explosion]]> Fri, 11 Aug 2023 00:00:00 GMT <![CDATA[How to Host Your Own API of Open Language Models For Free]]> <![CDATA[Powered by Explosion’s curated-transformers, FastAPI and ngrok.]]> https://levelup.gitconnected.com/how-to-host-your-own-api-of-open-language-models-for-free-92cdaa6e8b64 https://levelup.gitconnected.com/how-to-host-your-own-api-of-open-language-models-for-free-92cdaa6e8b64 <![CDATA[universe]]> <![CDATA[llms]]> <![CDATA[Explosion]]> Wed, 09 Aug 2023 00:00:00 GMT <![CDATA[🦦 weasel v0.2.0]]> <![CDATA[Support for Pydantic v2 and cloudpathlib]]> https://github.com/explosion/weasel/releases/v0.2.0 release:weasel_0.2.0 <![CDATA[release]]> <![CDATA[Explosion]]> Fri, 04 Aug 2023 00:00:00 GMT <![CDATA[🤖 curated-transformers v1.0.0]]> <![CDATA[Lightweight, composable PyTorch transformers]]> https://github.com/explosion/curated-transformers/releases/tag/v1.0.0 release:curated-transformers_1.0.0 <![CDATA[release]]> <![CDATA[Explosion]]> Thu, 03 Aug 2023 00:00:00 GMT <![CDATA[Large Language Models: From Prototype to Production]]> <![CDATA[Large Language Models (LLMs) have shown some impressive capabilities and their impact is the topic of the moment. In this talk, Ines presents visions for NLP in the age of LLMs and a pragmatic, practical approach for how to use Large Language Models to ship more successful NLP projects from prototype to production today.]]> https://speakerdeck.com/inesmontani/large-language-models-from-prototype-to-production-europython-keynote event:europython-2023-keynote <![CDATA[talk]]> <![CDATA[spacy]]> <![CDATA[prodigy]]> <![CDATA[llms]]> <![CDATA[Ines Montani]]> Wed, 19 Jul 2023 00:00:00 GMT <![CDATA[Task Routers in Prodigy]]> <![CDATA[How to use the new task routers to customize how examples are assigned in multi-annotator workflows.]]> https://www.youtube.com/watch?v=vyOtq-UXP-E https://www.youtube.com/watch?v=vyOtq-UXP-E <![CDATA[video]]> <![CDATA[prodigy]]> <![CDATA[annotation]]> <![CDATA[Vincent D. Warmerdam]]> Tue, 18 Jul 2023 00:00:00 GMT <![CDATA[ACL LAW Workshop Poster]]> https://github.com/explosion/assets/blob/main/Prodigy/ACL%20LAW%20Workshop%202023%20Poster.pdf https://github.com/explosion/assets/blob/main/Prodigy/ACL%20LAW%20Workshop%202023%20Poster.pdf <![CDATA[blog]]> <![CDATA[prodigy]]> <![CDATA[annotation]]> <![CDATA[Victoria Slocum, Ryan Wesslen]]> Thu, 13 Jul 2023 00:00:00 GMT <![CDATA[Introducing spaCy v3.6]]> <![CDATA[spaCy v3.6 introduces the span finder component and trained pipelines for Slovenian.]]> https://explosion.ai/blog/spacy-v3-6 blog:spacy-v3-6 <![CDATA[blog]]> <![CDATA[spacy]]> <![CDATA[Matthew Honnibal, Ines Montani, Sofie Van Landeghem, Adriane Boyd, Raphael Mitsch, Ákos Kádár, Daniël de Kok, Madeesh Kannan, Victoria Slocum, Basile Dura, Vinit Ravishankar, Helena Steckmeister]]> Fri, 07 Jul 2023 00:00:00 GMT <![CDATA[🦙 spacy-llm v0.4.0]]> <![CDATA[Falcon, sentiment analysis, summarization, backend refactoring]]> https://github.com/explosion/spacy-llm/releases/tag/v0.4.0 release:spacy-llm_0.4.0 <![CDATA[release]]> <![CDATA[spacy]]> <![CDATA[llms]]> <![CDATA[Explosion]]> Thu, 06 Jul 2023 00:00:00 GMT <![CDATA[Prodigy v1.12: OpenAI integration, prompt engineering, task routers, deployment docs and more]]> https://www.youtube.com/watch?v=-JiwLH9RG1E https://www.youtube.com/watch?v=-JiwLH9RG1E <![CDATA[video]]> <![CDATA[prodigy]]> <![CDATA[annotation]]> <![CDATA[llms]]> <![CDATA[Vincent D. Warmerdam]]> Wed, 05 Jul 2023 00:00:00 GMT <![CDATA[✨ prodigy v1.12.0]]> <![CDATA[LLM-assisted workflows for annotation and prompt engineering, task routing for multi-annotator setups]]> https://prodi.gy/docs/changelog#v1.12.0 release:prodigy_1.12.0 <![CDATA[release]]> <![CDATA[prodigy]]> <![CDATA[annotation]]> <![CDATA[llms]]> <![CDATA[Explosion]]> Wed, 05 Jul 2023 00:00:00 GMT <![CDATA[🍬 confection v0.1.0]]> <![CDATA[Improved JSON parsing, updated utils and warnings]]> https://github.com/explosion/confection/releases/v0.1.0 release:confection_0.1.0 <![CDATA[release]]> <![CDATA[Explosion]]> Thu, 29 Jun 2023 00:00:00 GMT <![CDATA[Concepts and measures of bureaucratic constraints in European Union laws from hand-coding to machine-learning]]> <![CDATA[The models “learn” the relations between the text tokens and the entity categories from two randomly selected samples of sentences that are extracted from a pre-processed corpus and have been manually annotated using the Python-implemented platform “Prodigy”.]]> https://onlinelibrary.wiley.com/doi/full/10.1111/rego.12543 https://onlinelibrary.wiley.com/doi/full/10.1111/rego.12543 <![CDATA[paper]]> <![CDATA[prodigy]]> <![CDATA[legal]]> <![CDATA[Explosion]]> Wed, 28 Jun 2023 00:00:00 GMT <![CDATA[spaCy: a customizable NLP toolkit designed for developers]]> https://speakerdeck.com/sofievl/2023-06-15-odsc event:odsc-2023 <![CDATA[talk]]> <![CDATA[spacy]]> <![CDATA[llms]]> <![CDATA[Sofie Van Landeghem]]> Thu, 15 Jun 2023 00:00:00 GMT <![CDATA[🦙 spacy-llm v0.3.0]]> <![CDATA[Cohere, Anthropic, OpenLLaMa, StableLM, logging, streamlit demo, lemmatization task]]> https://github.com/explosion/spacy-llm/releases/tag/v0.3.0 release:spacy-llm_0.3.0 <![CDATA[release]]> <![CDATA[spacy]]> <![CDATA[llms]]> <![CDATA[Explosion]]> Wed, 14 Jun 2023 00:00:00 GMT <![CDATA[🦦 weasel v0.1.0]]> <![CDATA[A small and easy workflow system]]> https://github.com/explosion/weasel/releases/v0.1.0 release:weasel_0.1.0 <![CDATA[release]]> <![CDATA[Explosion]]> Wed, 14 Jun 2023 00:00:00 GMT <![CDATA[Inter-rater agreement for the annotation of neurologic signs and symptoms in electronic health records]]> <![CDATA[Prodigy was used to annotate neurologic concepts in the EHR physician notes.]]> https://www.frontiersin.org/journals/digital-health/articles/10.3389/fdgth.2023.1075771/full https://www.frontiersin.org/journals/digital-health/articles/10.3389/fdgth.2023.1075771/full <![CDATA[paper]]> <![CDATA[prodigy]]> <![CDATA[biomedical]]> <![CDATA[Explosion]]> Tue, 13 Jun 2023 00:00:00 GMT <![CDATA[How Good is the Model in Model-in-the-loop Event Coreference Resolution Annotation?]]> <![CDATA[Figure 6 illustrates the interface design of the annotation methodology on the popular model-in-the-loop annotation tool - Prodigy. We use this tool for the simplicity it offers in plugging in the various ranking methods we explained. ]]> https://arxiv.org/abs/2306.05434 https://arxiv.org/abs/2306.05434 <![CDATA[paper]]> <![CDATA[prodigy]]> <![CDATA[spacy]]> <![CDATA[annotation]]> <![CDATA[media]]> <![CDATA[humanities]]> <![CDATA[Rehan Ahmed]]> Tue, 06 Jun 2023 00:00:00 GMT <![CDATA[Large Language Models: From Prototype to Production]]> https://speakerdeck.com/inesmontani/large-language-models-from-prototype-to-production event:pydata-london-2023 <![CDATA[talk]]> <![CDATA[spacy]]> <![CDATA[prodigy]]> <![CDATA[llms]]> <![CDATA[Ines Montani]]> Sat, 03 Jun 2023 00:00:00 GMT <![CDATA[SpanCat with spaCy and Prodigy on real data]]> <![CDATA[YouTube series by WJB Mattingly showing an end-to-end project, from cultivating and annotating data to training, testing and visualizing a model.]]> https://www.youtube.com/watch?v=6S52SUBFZxc&list=PL2VXyKi-KpYtKSdydjcsI3L8dUj4Ck3iP https://www.youtube.com/watch?v=6S52SUBFZxc&list=PL2VXyKi-KpYtKSdydjcsI3L8dUj4Ck3iP <![CDATA[universe]]> <![CDATA[spacy]]> <![CDATA[prodigy]]> <![CDATA[humanities]]> <![CDATA[Explosion]]> Fri, 02 Jun 2023 00:00:00 GMT <![CDATA[🦙 spacy-llm v0.2.0]]> <![CDATA[REL and spancat tasks, reading prompt templates from file]]> https://github.com/explosion/spacy-llm/releases/tag/v0.2.0 release:spacy-llm_0.2.0 <![CDATA[release]]> <![CDATA[spacy]]> <![CDATA[llms]]> <![CDATA[Explosion]]> Tue, 30 May 2023 00:00:00 GMT <![CDATA[Large Disagreement Modelling]]> <![CDATA[“In this blogpost I’d like to talk about large language models. There’s a bunch of hype, sure, but there’s also an opportunity to revisit one of my favourite machine learning techniques: disagreement.”]]> https://koaning.io/posts/large-disagreement-models/ https://koaning.io/posts/large-disagreement-models/ <![CDATA[blog]]> <![CDATA[spacy]]> <![CDATA[prodigy]]> <![CDATA[llms]]> <![CDATA[Vincent D. Warmerdam]]> Fri, 26 May 2023 00:00:00 GMT <![CDATA[Against LLM maximalism]]> <![CDATA[LLMs are not a direct solution to most of the NLP use-cases companies have been working on. They are extremely useful, but if you want to deliver reliable software you can improve over time, you can't just write a prompt and call it a day. Once you're past prototyping and want to deliver the best system you can, supervised learning will often give you better efficiency, accuracy and reliability.]]> https://explosion.ai/blog/against-llm-maximalism blog:against-llm-maximalism <![CDATA[blog]]> <![CDATA[llms]]> <![CDATA[strategy]]> <![CDATA[Matthew Honnibal]]> Thu, 18 May 2023 00:00:00 GMT <![CDATA[🦙 spacy-llm v0.1.0]]> <![CDATA[Integrating LLMs into structured NLP pipelines]]> https://github.com/explosion/spacy-llm/releases/tag/v0.1.0 release:spacy-llm_0.1.0 <![CDATA[release]]> <![CDATA[spacy]]> <![CDATA[llms]]> <![CDATA[Explosion]]> Thu, 11 May 2023 00:00:00 GMT <![CDATA[Efficient Information Extraction From Text With spaCy]]> <![CDATA[This webinar takes you through building a spaCy project that uses a named entity recognition (NER) model to extract entities of interest from restaurant reviews, like prices, opening hours and ratings.]]> https://www.youtube.com/watch?v=1S8icpu9dX0 event:jetbrains-2023 <![CDATA[talk]]> <![CDATA[spacy]]> <![CDATA[Victoria Slocum]]> Thu, 11 May 2023 00:00:00 GMT <![CDATA[spaCy Plugin for VSCode]]> <![CDATA[The spaCy VSCode Extension provides additional tooling and features for working with spaCy’s config files. Version 1.0.0 includes hover descriptions for registry functions, variables, and section names within the config as an installable extension.]]> https://marketplace.visualstudio.com/items?itemName=Explosion.spacy-extension https://marketplace.visualstudio.com/items?itemName=Explosion.spacy-extension <![CDATA[blog]]> <![CDATA[spacy]]> <![CDATA[Edward Schmuhl, Victoria Slocum]]> Wed, 03 May 2023 00:00:00 GMT <![CDATA[Implementing a custom trainable component for relation extraction]]> <![CDATA[Relation extraction refers to the process of predicting and labeling semantic relationships between named entities. In this blog post, we'll go over the process of building a custom relation extraction component using spaCy and Thinc. We'll also add a Hugging Face transformer to improve performance at the end of the post. You'll see how you can utilize Thinc's flexible and customizable system to build an NLP pipeline for biomedical relation extraction.]]> https://explosion.ai/blog/relation-extraction blog:relation-extraction <![CDATA[blog]]> <![CDATA[spacy]]> <![CDATA[thinc]]> <![CDATA[biomedical]]> <![CDATA[Sofie Van Landeghem, Victoria Slocum]]> Fri, 28 Apr 2023 00:00:00 GMT <![CDATA[You are what you read: Building a personal internet front-page with spaCy and Prodigy]]> https://speakerdeck.com/victorialslocum/pydata-berlin-2023 event:pycon-de-2023-victoria <![CDATA[talk]]> <![CDATA[spacy]]> <![CDATA[prodigy]]> <![CDATA[Victoria Slocum]]> Tue, 18 Apr 2023 00:00:00 GMT <![CDATA[Incorporating LLMs into practical NLP workflows]]> https://speakerdeck.com/inesmontani/incorporating-llms-into-practical-nlp-workflows event:pycon-de-2023-ines <![CDATA[talk]]> <![CDATA[spacy]]> <![CDATA[prodigy]]> <![CDATA[llms]]> <![CDATA[Ines Montani]]> Mon, 17 Apr 2023 00:00:00 GMT <![CDATA[Predicting relations between SOAP note sections: The value of incorporating a clinical information model]]> <![CDATA[To support human annotation, we first annotate 100 Assessment and Plan subsections manually using Prodigy, and then use spacy-transformers to fine-tune a general domain RoBERTa-base model pretrained on OntoNotes 5 for both the Assessment and Plan section NER tagging.]]> https://www.sciencedirect.com/science/article/abs/pii/S1532046423000813?casa_token=xwJ-nM7yrPMAAAAA:lQmA8sCmWcHhxq9-0ducDxtT0lmsHVT185-7PjRAPTp-rXkbx5cx05KnzvJodXubWBl3Jhl5VgA https://www.sciencedirect.com/science/article/abs/pii/S1532046423000813?casa_token=xwJ-nM7yrPMAAAAA:lQmA8sCmWcHhxq9-0ducDxtT0lmsHVT185-7PjRAPTp-rXkbx5cx05KnzvJodXubWBl3Jhl5VgA <![CDATA[paper]]> <![CDATA[prodigy]]> <![CDATA[spacy]]> <![CDATA[biomedical]]> <![CDATA[Explosion]]> Fri, 14 Apr 2023 00:00:00 GMT <![CDATA[Intro to NLP with spaCy for Digital Humanities]]> https://speakerdeck.com/victorialslocum/princeton-workshop-presentation event:princeton-2023 <![CDATA[talk]]> <![CDATA[spacy]]> <![CDATA[humanities]]> <![CDATA[Victoria Slocum, Ákos Kádár]]> Tue, 04 Apr 2023 00:00:00 GMT <![CDATA[The Tale of Bloom Embeddings and Unseen Entities]]> <![CDATA[The default Bloom embedding layer in spaCy is unconventional, but very powerful and efficient. We wrote about it before and showed the advantages it provides in terms of memory efficiency for our floret embeddings. Now we have released the first technical report by Explosion, where we explain Bloom embeddings in more detail and rigorously compare them to traditional embeddings. In this post we'll highlight some of our results with a special focus on unseen entities. ]]> https://explosion.ai/blog/technical-report blog:technical-report <![CDATA[blog]]> <![CDATA[spacy]]> <![CDATA[Ákos Kádár, Lj Miranda, Victoria Slocum, Sofie Van Landeghem]]> Mon, 03 Apr 2023 00:00:00 GMT <![CDATA[Creating Custom Event Data Without Dictionaries: A Bag-of-Tricks]]> <![CDATA[While in the past the process of generating training case has been quite time consuming and tedious, newer approaches such as those incorporated into the web-based Prodigy annotation system allow this to be done much more quickly.]]> https://arxiv.org/abs/2304.01331 https://arxiv.org/abs/2304.01331 <![CDATA[paper]]> <![CDATA[prodigy]]> <![CDATA[spacy]]> <![CDATA[media]]> <![CDATA[humanities]]> <![CDATA[Explosion]]> Mon, 03 Apr 2023 00:00:00 GMT <![CDATA[textaCy v0.13.0]]> <![CDATA[Utility library for NLP tasks before and after spaCy, including preprocessing, normalization and additional information extraction features.]]> https://textacy.readthedocs.io/en/latest/ https://textacy.readthedocs.io/en/latest/ <![CDATA[universe]]> <![CDATA[spacy]]> <![CDATA[Explosion]]> Sun, 02 Apr 2023 00:00:00 GMT <![CDATA[Rulers, NER, and data iteration]]> <![CDATA[About the power of Rules + ML and the importance of iteration on your pipeline and your data.]]> https://blog.victoriaslocum.com/post/spanruler-ner-data https://blog.victoriaslocum.com/post/spanruler-ner-data <![CDATA[blog]]> <![CDATA[spacy]]> <![CDATA[prodigy]]> <![CDATA[annotation]]> <![CDATA[Victoria Slocum]]> Thu, 30 Mar 2023 00:00:00 GMT <![CDATA[Slovak Dataset for Multilingual Question Answering]]> <![CDATA[We used the Prodigy annotation tool to annotate the questions and answers. One annotation task corresponds to one web application deployment and different configurations.]]> https://ieeexplore.ieee.org/document/10082887 https://ieeexplore.ieee.org/document/10082887 <![CDATA[paper]]> <![CDATA[prodigy]]> <![CDATA[Explosion]]> Mon, 27 Mar 2023 00:00:00 GMT <![CDATA[Modular Journalism: The new way to find stories?]]> https://soundcloud.com/interhacktives/modularjournalism-the-new-way-to-find-stories event:interhacktives-ines <![CDATA[interview]]> <![CDATA[media]]> <![CDATA[Ines Montani]]> Wed, 15 Mar 2023 00:00:00 GMT <![CDATA[The Nesta Skills Extractor Library]]> <![CDATA[A new library for extracting skills from job adverts and mapping them to a taxonomy of your choice, built on top of spaCy.]]> https://www.escoe.ac.uk/the-skills-extractor-library/ https://www.escoe.ac.uk/the-skills-extractor-library/ <![CDATA[universe]]> <![CDATA[spacy]]> <![CDATA[India Kerle]]> Mon, 13 Mar 2023 00:00:00 GMT <![CDATA[Fiscal data in text: Information extraction from audit reports using Natural Language Processing]]> <![CDATA[I relied on the text annotation software Prodigy in Python that offers a friendly user interface where the reviewer can read the text and assign a label to each paragraph.]]> https://www.cambridge.org/core/journals/data-and-policy/article/fiscal-data-in-text-information-extraction-from-audit-reports-using-natural-language-processing/F4CAA159BD8C5C71873D85FCF1E4AA96 https://www.cambridge.org/core/journals/data-and-policy/article/fiscal-data-in-text-information-extraction-from-audit-reports-using-natural-language-processing/F4CAA159BD8C5C71873D85FCF1E4AA96 <![CDATA[paper]]> <![CDATA[prodigy]]> <![CDATA[finance]]> <![CDATA[Explosion]]> Tue, 28 Feb 2023 00:00:00 GMT <![CDATA[NLP: From Prototype to Production]]> https://www.youtube.com/watch?v=QstuufSBvy4 event:outerbounds-2023 <![CDATA[interview]]> <![CDATA[Ines Montani]]> Fri, 24 Feb 2023 00:00:00 GMT <![CDATA[Deploying a Prodigy cloud service for Posh’s financial chatbots]]> <![CDATA[A Prodigy case study of Posh AI's production-ready annotation platform and custom chatbot annotation tasks for banking customers.]]> https://explosion.ai/blog/posh-prodigy-financial-chatbots blog:posh-prodigy-financial-chatbots <![CDATA[blog]]> <![CDATA[prodigy]]> <![CDATA[case_study]]> <![CDATA[finance]]> <![CDATA[Ryan Wesslen, Victoria Slocum]]> Thu, 16 Feb 2023 00:00:00 GMT <![CDATA[🕊️ radicli v0.0.3]]> <![CDATA[Radically lightweight command-line interfaces]]> https://github.com/explosion/radicli release:radicli_0.0.3 <![CDATA[release]]> <![CDATA[Explosion]]> Thu, 09 Feb 2023 00:00:00 GMT <![CDATA[Towards a Tagalog NLP pipeline]]> <![CDATA[In this blog post, Lj talks about how he built an NER pipeline for Tagalog, the gold-standard dataset, benchmarking results, and his hopes for the future of Tagalog NLP.]]> https://ljvmiranda921.github.io/notebook/2023/02/04/tagalog-pipeline/ https://ljvmiranda921.github.io/notebook/2023/02/04/tagalog-pipeline/ <![CDATA[blog]]> <![CDATA[spacy]]> <![CDATA[Lj Miranda]]> Sat, 04 Feb 2023 00:00:00 GMT <![CDATA[AI/ML for the rest of us]]> https://www.linkedin.com/pulse/ai-rest-us-github/ https://www.linkedin.com/pulse/ai-rest-us-github/ <![CDATA[interview]]> <![CDATA[Ines Montani]]> Fri, 03 Feb 2023 00:00:00 GMT <![CDATA[Introducing spaCy v3.5]]> <![CDATA[spaCy v3.5 introduces new CLI commands, fuzzy matching, improvements for entity linking and more.]]> https://explosion.ai/blog/spacy-v3-5 blog:spacy-v3-5 <![CDATA[blog]]> <![CDATA[spacy]]> <![CDATA[Matthew Honnibal, Ines Montani, Sofie Van Landeghem, Adriane Boyd, Paul O’Leary McCann, Edward Schmuhl, Raphael Mitsch, Daniël de Kok, Madeesh Kannan, Richard Hudson, Lj Miranda, Peter Baumgartner, Victoria Slocum, Helena Steckmeister]]> Mon, 30 Jan 2023 00:00:00 GMT <![CDATA[Calmcode, Explosion, Data Science]]> https://www.youtube.com/watch?v=yvgxRzqx1Jg event:learning-from-ml-vincent <![CDATA[interview]]> <![CDATA[Vincent D. Warmerdam]]> Tue, 31 Jan 2023 00:00:00 GMT <![CDATA[Explosion in 2022: Our Year in Review]]> <![CDATA[It's been another exciting year at Explosion! We've developed a new end-to-end neural coref component for spaCy, improved the speed of our CNN pipelines up to 60%, and published new pre-trained pipelines for Finnish, Korean, Swedish and Croatian. We've also released several updates to Prodigy and introduced new recipes to kickstart annotation with zero- or few-shot learning.]]> https://explosion.ai/blog/year-in-review-2022 blog:year-in-review-2022 <![CDATA[blog]]> <![CDATA[spacy]]> <![CDATA[prodigy]]> <![CDATA[Matthew Honnibal, Ines Montani, Walter Henry, Victoria Slocum, Philip Vollet, Daniël de Kok, Madeesh Kannan, Sofie Van Landeghem, Helena Steckmeister]]> Mon, 30 Jan 2023 00:00:00 GMT <![CDATA[Robust solutions with Explosion’s applied NLP philosophy]]> https://drive.google.com/file/d/1Mo5qEkTXxkg3DZ0tZZwBBFZ8t1ddymGB/view?usp=sharing event:unc-charlotte-2023 <![CDATA[talk]]> <![CDATA[spacy]]> <![CDATA[prodigy]]> <![CDATA[strategy]]> <![CDATA[Ryan Wesslen, Damian Romero]]> Mon, 23 Jan 2023 00:00:00 GMT <![CDATA[Training spaCy NER Models with Prodigy]]> <![CDATA[This handy flowchart contains our most common tips, tricks, and best practices for training and updating spaCy named entity recognition models with Prodigy.]]> https://github.com/explosion/assets/blob/main/Prodigy/Prodigy_NER_flowchart_v2_0_0_light.pdf https://github.com/explosion/assets/blob/main/Prodigy/Prodigy_NER_flowchart_v2_0_0_light.pdf <![CDATA[blog]]> <![CDATA[prodigy]]> <![CDATA[spacy]]> <![CDATA[annotation]]> <![CDATA[Victoria Slocum, Damian Romero, Helena Steckmeister]]> Sat, 14 Jan 2023 00:00:00 GMT <![CDATA[🛸 spacy-transformers v1.2.0]]> <![CDATA[Better alignment for fast tokenizers]]> https://github.com/explosion/spacy-transformers/releases/tag/v1.2.0 release:spacy-transformers_1.2.0 <![CDATA[release]]> <![CDATA[spacy]]> <![CDATA[Explosion]]> Sat, 14 Jan 2023 00:00:00 GMT <![CDATA[Reflections on a year of spaCy consulting at Explosion]]> <![CDATA[In this post, Peter shares some lessons learned from chatting with practitioners about their NLP challenges, developing production-ready NLP pipelines for clients, and working with an open-source development team.]]> https://www.linkedin.com/pulse/reflections-year-spacy-consulting-explosion-peter-baumgartner/ https://www.linkedin.com/pulse/reflections-year-spacy-consulting-explosion-peter-baumgartner/ <![CDATA[blog]]> <![CDATA[spacy]]> <![CDATA[consulting]]> <![CDATA[Peter Baumgartner]]> Tue, 10 Jan 2023 00:00:00 GMT <![CDATA[Extracting Structured Information from Greek Legislation Data]]> <![CDATA[Worth noting is the existence of an application, called Prodigy, which takes advantage of an active learning framework and provides users with an interactive interface for data annotation.]]> https://repository.ihu.edu.gr/xmlui/handle/11544/30135 https://repository.ihu.edu.gr/xmlui/handle/11544/30135 <![CDATA[paper]]> <![CDATA[prodigy]]> <![CDATA[legal]]> <![CDATA[Explosion]]> Thu, 05 Jan 2023 00:00:00 GMT <![CDATA[WW2 spaCy v0.0.9]]> <![CDATA[spaCy pipeline for processing primary and secondary sources for World War 2 texts.]]> https://github.com/wjbmattingly/ww2-spacy https://github.com/wjbmattingly/ww2-spacy <![CDATA[universe]]> <![CDATA[spacy]]> <![CDATA[humanities]]> <![CDATA[Explosion]]> Thu, 05 Jan 2023 00:00:00 GMT <![CDATA[Setting your ML project up for success]]> <![CDATA[“What can you do to maximize probability of success for your Machine Learning solution? Throughout my 15 years as data scientist in academia, big pharma and through consulting, one common theme has emerged: the most reliable predictor of success for any NLP or ML-based solution is whether or not you involve the data science team early on.”]]> https://www.linkedin.com/pulse/setting-your-ml-project-up-success-sofie-van-landeghem/ consulting-sofie <![CDATA[blog]]> <![CDATA[spacy]]> <![CDATA[prodigy]]> <![CDATA[consulting]]> <![CDATA[strategy]]> <![CDATA[Sofie Van Landeghem]]> Mon, 19 Dec 2022 00:00:00 GMT <![CDATA[Multi hash embeddings in spaCy]]> <![CDATA[In this technical report we lay out a bit of history and introduce the embedding methods in spaCy in detail. Second, we critically evaluate the hash embedding architecture with multi-embeddings on Named Entity Recognition datasets from a variety of domains and languages. The experiments validate most key design choices behind spaCy’s embedders, but we also uncover a few surprising results.]]> https://arxiv.org/abs/2212.09255 https://arxiv.org/abs/2212.09255 <![CDATA[paper]]> <![CDATA[spacy]]> <![CDATA[Lj Miranda, Ákos Kádár, Adriane Boyd, Sofie Van Landeghem, Matthew Honnibal]]> Mon, 19 Dec 2022 00:00:00 GMT <![CDATA[Data is the new coffee]]> https://www.youtube.com/watch?v=GrQcVU-eapc event:normconf-2022-peter <![CDATA[talk]]> <![CDATA[Peter Baumgartner]]> Thu, 15 Dec 2022 00:00:00 GMT <![CDATA[Group-by statements that save the day]]> https://www.youtube.com/watch?v=S7vhi6RjBZA event:normconf-2022-vincent <![CDATA[talk]]> <![CDATA[Vincent D. Warmerdam]]> Thu, 15 Dec 2022 00:00:00 GMT <![CDATA[How the Guardian uses AI to analyse articles]]> https://www.journalismaifestival.com event:journalismai-2022 <![CDATA[talk]]> <![CDATA[media]]> <![CDATA[Ines Montani]]> Thu, 08 Dec 2022 00:00:00 GMT <![CDATA[Custom Interfaces with blocks]]> <![CDATA[You can create custom annotation layouts in Prodigy using the annotation widgets that Prodigy provides by using the blocks feature. This video explains how to use this feature by building a custom interface that can manually annotate and transcribe audio.]]> https://www.youtube.com/watch?v=lZdM2HScvVo https://www.youtube.com/watch?v=lZdM2HScvVo <![CDATA[video]]> <![CDATA[prodigy]]> <![CDATA[annotation]]> <![CDATA[Vincent D. Warmerdam]]> Wed, 07 Dec 2022 00:00:00 GMT <![CDATA[The triangulation of ethical leader signals using qualitative, experimental, and data science methods]]> <![CDATA[This additional text was labeled by the same coding team using Prodigy, [...] a flexible user interface tool built on top of spaCy, a leading open source library in python for natural language processing. We created a spaCy end‐to‐end project workflow including package versioning, data pre‐processing, data ingestion into a database, annotation sessions using Prodigy’s user interface, model training, model evaluation, python packaging, and visual app for testing the model.]]> https://www.sciencedirect.com/science/article/abs/pii/S1048984322000613 https://www.sciencedirect.com/science/article/abs/pii/S1048984322000613 <![CDATA[paper]]> <![CDATA[prodigy]]> <![CDATA[spacy]]> <![CDATA[finance]]> <![CDATA[humanities]]> <![CDATA[Ryan Wesslen]]> Mon, 05 Dec 2022 00:00:00 GMT <![CDATA[Is it possible to have entities within entities within entities?]]> <![CDATA[Named entity recognition models might not be able to handle a wide variety of spans, but Spancat certainly can! Dive into named entity recognition, its limitations, and how we’ve solved them with a solution-focused talk and practical applications.]]> https://speakerdeck.com/victorialslocum/pydata-global-2022-spancat event:pydata-global-2022 <![CDATA[talk]]> <![CDATA[spacy]]> <![CDATA[Victoria Slocum]]> Sat, 03 Dec 2022 00:00:00 GMT <![CDATA[Fast transformer inference with Metal Performance Shaders]]> <![CDATA[We are happy to introduce support for Metal Performance Shaders in Thinc PyTorch layers. This makes it possible to run spaCy transformer-based pipelines on GPU on Apple Silicon Macs and improves inference speed up to 4.7 times. ]]> https://explosion.ai/blog/metal-performance-shaders blog:metal-performance-shaders <![CDATA[blog]]> <![CDATA[thinc]]> <![CDATA[Daniël de Kok, Madeesh Kannan]]> Thu, 24 Nov 2022 00:00:00 GMT <![CDATA[Tools to Improve Training Data]]> https://www.youtube.com/watch?v=KRQJDLyc1uM event:cohere-2022-vincent <![CDATA[interview]]> <![CDATA[Vincent D. Warmerdam]]> Wed, 23 Nov 2022 00:00:00 GMT <![CDATA[Coreference Resolution in spaCy]]> <![CDATA[In everyday conversation, we use pronouns or other expressions to refer to entities in many different ways, but we effortlessly understand these references. In NLP this is a challenging problem known as Coreference Resolution. In this video, we’ll show how to train spaCy’s new component for Coreference Resolution and how to apply the pipeline to resolve references in a text.]]> https://www.youtube.com/watch?v=fio3BejnRsM https://www.youtube.com/watch?v=fio3BejnRsM <![CDATA[video]]> <![CDATA[spacy]]> <![CDATA[Edward Schmuhl]]> Wed, 02 Nov 2022 00:00:00 GMT <![CDATA[Finetuning and Bulk Labelling Images with Prodigy ]]> <![CDATA[In this video, we’ll show how you might be able to improve the annotation experience by using bulk labelling for image classification.]]> https://www.youtube.com/watch?v=DmH3JmX3w2I https://www.youtube.com/watch?v=DmH3JmX3w2I <![CDATA[video]]> <![CDATA[prodigy]]> <![CDATA[annotation]]> <![CDATA[computer-vision]]> <![CDATA[Vincent D. Warmerdam]]> Thu, 27 Oct 2022 00:00:00 GMT <![CDATA[medspacy v1.0]]> <![CDATA[A library of tools for performing clinical NLP and text processing tasks with spaCy.]]> https://github.com/medspacy/medspacy https://github.com/medspacy/medspacy <![CDATA[universe]]> <![CDATA[spacy]]> <![CDATA[biomedical]]> <![CDATA[Explosion]]> Tue, 18 Oct 2022 00:00:00 GMT <![CDATA[spaCy Cheat Sheet]]> <![CDATA[Everything you need to know about spaCy as a handy two-page PDF.]]> https://github.com/explosion/assets/blob/main/spaCy/spaCy-cheat-sheet.pdf https://github.com/explosion/assets/blob/main/spaCy/spaCy-cheat-sheet.pdf <![CDATA[blog]]> <![CDATA[spacy]]> <![CDATA[Victoria Slocum, Ines Montani]]> Tue, 18 Oct 2022 00:00:00 GMT <![CDATA[How the Guardian approaches quote extraction with NLP]]> <![CDATA[A case study of the Guardian's spaCy-Prodigy workflow to modularize quote extraction for content creation. This study includes iterative annotation guidelines and custom interface functionality.]]> https://explosion.ai/blog/guardian blog:guardian <![CDATA[blog]]> <![CDATA[spacy]]> <![CDATA[prodigy]]> <![CDATA[case_study]]> <![CDATA[strategy]]> <![CDATA[annotation]]> <![CDATA[media]]> <![CDATA[Ryan Wesslen, Victoria Slocum, Chung-Fan Tsai]]> Thu, 13 Oct 2022 00:00:00 GMT <![CDATA[Finding Video Games with Sense2Vec]]> <![CDATA[In this video, we’ll show how you can improve the annotation experience by leveraging sense2vec to pre-fill named entities.]]> https://www.youtube.com/watch?v=EoYHbUHr0fM https://www.youtube.com/watch?v=EoYHbUHr0fM <![CDATA[video]]> <![CDATA[prodigy]]> <![CDATA[annotation]]> <![CDATA[Vincent D. Warmerdam]]> Wed, 11 Oct 2023 00:00:00 GMT <![CDATA[End-to-end Neural Coreference Resolution in spaCy]]> <![CDATA[Coreference resolution is the problem of resolving entities in texts to references such as pronouns. Even if you've never heard of it, it's something we all do constantly every day, and is a key to understanding natural language. We recently added an experimental implementation of an end-to-end neural coreference component to spaCy. This post explains the architecture of our model in detail. ]]> https://explosion.ai/blog/coref blog:coref <![CDATA[blog]]> <![CDATA[spacy]]> <![CDATA[Ákos Kádár, Paul O’Leary McCann, Richard Hudson, Edward Schmuhl, Sofie Van Landeghem, Adriane Boyd, Madeesh Kannan, Victoria Slocum]]> Thu, 06 Oct 2022 00:00:00 GMT <![CDATA[🧪 spacy-experimental v0.6.0]]> <![CDATA[Added Coref components and models]]> https://github.com/explosion/spacy-experimental/releases/tag/v0.6.0 release:spacy-experimental_0.6.0 <![CDATA[release]]> <![CDATA[spacy]]> <![CDATA[Explosion]]> Wed, 28 Sep 2022 00:00:00 GMT <![CDATA[spaCy behind the scenes: library patterns & design concepts explained]]> <![CDATA[Developer productivity has been central to our design of spaCy, both in smaller decisions and some of the bigger architectural questions. We believe in embracing the complexities of machine learning, not hiding it away under leaky abstractions, while also maintaining the developer experience. Read on to learn some of the design patterns within the library, how we've implemented them, and most importantly, why.]]> https://explosion.ai/blog/spacy-design-concepts blog:spacy-design-concepts <![CDATA[blog]]> <![CDATA[spacy]]> <![CDATA[strategy]]> <![CDATA[Ines Montani, Victoria Slocum]]> Wed, 31 Aug 2022 00:00:00 GMT <![CDATA[floret: lightweight, robust word vectors]]> <![CDATA[An exploration of floret vectors: lightweight vectors for noisy data, novel words, rich morphology and more.]]> https://explosion.ai/blog/floret-vectors blog:floret-vectors <![CDATA[blog]]> <![CDATA[spacy]]> <![CDATA[Adriane Boyd, Vincent D. Warmerdam]]> Tue, 23 Aug 2022 00:00:00 GMT <![CDATA[Speech acts in the Dutch COVID-19 Press Conferences]]> <![CDATA[We used the annotation tool Prodigy. Prodigy provides a simple interface in which the annotator sees a sentence and selects the applicable speech acts. The use of Prodigy considerably sped up the annotation process, allowing the annotators to annotate around 200 sentences per hour.]]> https://link.springer.com/article/10.1007/s10579-022-09602-7 https://link.springer.com/article/10.1007/s10579-022-09602-7 <![CDATA[paper]]> <![CDATA[prodigy]]> <![CDATA[biomedical]]> <![CDATA[media]]> <![CDATA[humanities]]> <![CDATA[Explosion]]> Sat, 09 Jul 2022 00:00:00 GMT <![CDATA[🍏 thinc-apple-ops v0.1.0]]> <![CDATA[Many performance improvements]]> https://github.com/explosion/thinc-apple-ops/releases/tag/v0.1.0 release:thinc-apple-ops_0.1.0 <![CDATA[release]]> <![CDATA[thinc]]> <![CDATA[Explosion]]> Tue, 19 Jul 2022 00:00:00 GMT <![CDATA[Introducing Holmes 4.0]]> <![CDATA[A few weeks ago we released version 4.0 of Holmes, which we are now able to offer under a permissive MIT license. Holmes is a library in the spaCy Universe that runs on top of spaCy and enables information extraction and intelligent search, currently for English and German. Holmes goes beyond simple matching algorithms and allows you to look for a specified idea or ideas in a corpus of documents.]]> https://explosion.ai/blog/introduction-to-holmes blog:introduction-to-holmes <![CDATA[blog]]> <![CDATA[spacy]]> <![CDATA[Richard Hudson, Chung-Fan Tsai]]> Wed, 13 Jul 2022 00:00:00 GMT <![CDATA[Introducing spaCy v3.4]]> <![CDATA[spaCy v3.4 brings typing and speed improvements along with new vectors for English CNN pipelines and new trained pipelines for Croatian.]]> https://explosion.ai/blog/spacy-v3-4 blog:spacy-v3-4 <![CDATA[blog]]> <![CDATA[spacy]]> <![CDATA[Matthew Honnibal, Ines Montani, Sofie Van Landeghem, Adriane Boyd, Paul O’Leary McCann, Daniël de Kok, Edward Schmuhl, Lj Miranda, Philip Vollet, Peter Baumgartner, Richard Hudson, Vincent D. Warmerdam, Madeesh Kannan, Raphael Mitsch, Helena Steckmeister]]> Tue, 12 Jul 2022 00:00:00 GMT <![CDATA[Bulk Labelling and Prodigy]]> <![CDATA[In this video, we’ll show a bulk labelling technique that can help you prepare data for Prodigy.]]> https://www.youtube.com/watch?v=gDk7_f3ovIk https://www.youtube.com/watch?v=gDk7_f3ovIk <![CDATA[video]]> <![CDATA[prodigy]]> <![CDATA[annotation]]> <![CDATA[Vincent D. Warmerdam]]> Tue, 05 Jul 2022 00:00:00 GMT <![CDATA[Introducing Span Categorization in Prodigy and spaCy]]> <![CDATA[In this video, we’ll show you how to use Prodigy for spaCy’s Span Categorizer. We’ll be annotating food recipes and looking into ways to help with consistent annotations and speed up the process with patterns and temporary models.]]> https://www.youtube.com/watch?v=xgV3Rlj49lQ https://www.youtube.com/watch?v=xgV3Rlj49lQ <![CDATA[video]]> <![CDATA[spacy]]> <![CDATA[prodigy]]> <![CDATA[Edward Schmuhl]]> Wed, 22 Jun 2022 00:00:00 GMT <![CDATA[Spancat: a new approach for span labeling]]> <![CDATA[The SpanCategorizer is a spaCy component that answers the NLP community's need to have structured annotation for a wide variety of labeled spans, including long phrases, non-named entities, or overlapping annotations. In this blog post, we're excited to talk more about spancat and showcase new features to help with your span labeling needs! ]]> https://explosion.ai/blog/spancat blog:spancat <![CDATA[blog]]> <![CDATA[spacy]]> <![CDATA[Edward Schmuhl, Lj Miranda, Ákos Kádár, Sofie Van Landeghem, Adriane Boyd]]> Tue, 14 Jun 2022 00:00:00 GMT <![CDATA[🧪 spacy-experimental v0.5.0]]> <![CDATA[Added SpanFinder, Span suggesters and bugfixes]]> https://github.com/explosion/spacy-experimental/releases/tag/v0.5.0 release:spacy-experimental_0.5.0 <![CDATA[release]]> <![CDATA[spacy]]> <![CDATA[Explosion]]> Sat, 11 Jun 2022 00:00:00 GMT <![CDATA[Evolution of spaCy]]> https://www.youtube.com/watch?v=kz2EWrmfw8Y&ab_channel=DeepakJohnReji event:d4-data-ines <![CDATA[interview]]> <![CDATA[Ines Montani]]> Sat, 11 Jun 2022 00:00:00 GMT <![CDATA[Finding Bad Labels for Text Classification with Jupyter and Prodigy ]]> <![CDATA[In this video, we’ll show you how to use set up Prodigy to find bad labels in text classification tasks. While many of the techniques are applied to text classification, they can also be used for classification tasks in general.]]> https://www.youtube.com/watch?v=khZ5-AN-n2Y https://www.youtube.com/watch?v=khZ5-AN-n2Y <![CDATA[video]]> <![CDATA[prodigy]]> <![CDATA[annotation]]> <![CDATA[Vincent D. Warmerdam]]> Fri, 02 Jun 2023 00:00:00 GMT <![CDATA[Diary of a spaCy project: Predicting GitHub Tags]]> <![CDATA[Many people assume that working on an NLP project involves a lot of machine learning. Our experience is that it's much less about flowing tensors, and more about making a tailored solution. This blogposts demonstrates how a typical spaCy project could be initiated, implemented and executed towards a custom solution.]]> https://explosion.ai/blog/diary-of-github-spacy-project blog:diary-of-github-spacy-project <![CDATA[blog]]> <![CDATA[spacy]]> <![CDATA[prodigy]]> <![CDATA[consulting]]> <![CDATA[strategy]]> <![CDATA[annotation]]> <![CDATA[Vincent D. Warmerdam, Sofie Van Landeghem]]> Tue, 31 May 2022 00:00:00 GMT <![CDATA[Solutions for Advanced NLP for Diverse Languages]]> <![CDATA[This talk discusses spaCy’s philosophy for modern NLP, its extensible design and new recent features to enable the development of advanced natural language processing pipelines for typologically diverse languages.]]> https://speakerdeck.com/inesmontani/advanced-nlp-for-diverse-languages event:princeton-2022 <![CDATA[talk]]> <![CDATA[humanities]]> <![CDATA[Ines Montani]]> Thu, 12 May 2022 00:00:00 GMT <![CDATA[Introducing spaCy v3.3]]> <![CDATA[spaCy v3.3 improves the speed of core pipeline components, adds a new trainable lemmatizer, and introduces trained pipelines for Finnish, Korean and Swedish.]]> https://explosion.ai/blog/spacy-v3-3 blog:spacy-v3-3 <![CDATA[blog]]> <![CDATA[spacy]]> <![CDATA[Matthew Honnibal, Ines Montani, Sofie Van Landeghem, Adriane Boyd, Paul O’Leary McCann, Daniël de Kok, Edward Schmuhl, Lj Miranda, Philip Vollet, Peter Baumgartner, Richard Hudson, Vincent D. Warmerdam, Madeesh Kannan, Raphael Mitsch]]> Fri, 29 Apr 2022 00:00:00 GMT <![CDATA[How we built a Stack Overflow Community questions analyzer]]> <![CDATA[How GitLab used spaCy to analyze and better understand Stack Overflow community questions about their tools and products.]]> https://about.gitlab.com/blog/2022/04/28/how-we-built-a-stack-overflow-community-questions-analyzer-and-you-can-too/ https://about.gitlab.com/blog/2022/04/28/how-we-built-a-stack-overflow-community-questions-analyzer-and-you-can-too/ <![CDATA[universe]]> <![CDATA[spacy]]> <![CDATA[Explosion]]> Thu, 28 Apr 2022 00:00:00 GMT <![CDATA[Finding Bad Image Data using UMAP and Prodigy]]> <![CDATA[In this video, we’ll show you how to use Prodigy to find bad examples in the Google QuickDraw dataset. We will be leveraging a technique that involves UMAP to find strange images semi-automatically.]]> https://www.youtube.com/watch?v=s0Y45xscE-0 https://www.youtube.com/watch?v=s0Y45xscE-0 <![CDATA[video]]> <![CDATA[prodigy]]> <![CDATA[annotation]]> <![CDATA[computer-vision]]> <![CDATA[Vincent D. Warmerdam]]> Wed, 27 Apr 2022 00:00:00 GMT <![CDATA[Compact word vectors with Bloom embeddings]]> <![CDATA[An introduction to the compact word vectors with Bloom embeddings used in Thinc, spaCy and floret.]]> https://explosion.ai/blog/bloom-embeddings blog:bloom-embeddings <![CDATA[blog]]> <![CDATA[spacy]]> <![CDATA[Matthew Honnibal, Adriane Boyd, Vincent D. Warmerdam]]> Tue, 26 Apr 2022 00:00:00 GMT <![CDATA[Automated Identification of Clinical Procedures in Free-Text Electronic Clinical Records with a Low-Code Named Entity Recognition Workflow]]> <![CDATA[The use of a low-code annotation software tool [Prodigy] allows the rapid creation of a custom annotation dataset to train a NER model to identify clinical procedures stored in free-text electronic clinical notes.]]> https://www.thieme-connect.com/products/ejournals/abstract/10.1055/s-0042-1749358 https://www.thieme-connect.com/products/ejournals/abstract/10.1055/s-0042-1749358 <![CDATA[paper]]> <![CDATA[prodigy]]> <![CDATA[biomedical]]> <![CDATA[Explosion]]> Wed, 13 Apr 2022 00:00:00 GMT <![CDATA[Finding Duplicates in Tabular Data with Jupyter and Prodigy]]> <![CDATA[In this video, we’ll show you how to use Prodigy to train a named entity recognition model from scratch, by taking advantage of semi-automatic annotation and modern transfer learning techniques.]]> https://www.youtube.com/watch?v=kJ5Jb56T5uc https://www.youtube.com/watch?v=kJ5Jb56T5uc <![CDATA[video]]> <![CDATA[prodigy]]> <![CDATA[annotation]]> <![CDATA[Vincent D. Warmerdam]]> Wed, 12 Apr 2023 00:00:00 GMT <![CDATA[skweak v0.3.1]]> <![CDATA[Weak supervision and flexible label functions and agrregation, integrated with spaCy.]]> https://github.com/NorskRegnesentral/skweak https://github.com/NorskRegnesentral/skweak <![CDATA[universe]]> <![CDATA[spacy]]> <![CDATA[annotation]]> <![CDATA[Explosion]]> Fri, 25 Mar 2022 00:00:00 GMT <![CDATA[Applied Language Technology]]> <![CDATA[Extensive online course on applied language technology with spaCy by Tuomo Hiippala, designed for students new to NLP and programming.]]> https://applied-language-technology.mooc.fi/html/index.html https://applied-language-technology.mooc.fi/html/index.html <![CDATA[universe]]> <![CDATA[spacy]]> <![CDATA[humanities]]> <![CDATA[Explosion]]> Thu, 24 Mar 2022 00:00:00 GMT <![CDATA[🧪 spacy-experimental v0.4.0]]> <![CDATA[Added biaffine parser and other fixes for experimental tools]]> https://github.com/explosion/spacy-experimental/releases/tag/v0.4.0 release:spacy-experimental_0.4.0 <![CDATA[release]]> <![CDATA[spacy]]> <![CDATA[Explosion]]> Tue, 22 Mar 2022 00:00:00 GMT <![CDATA[Introducing spaCy Tailored Pipelines]]> <![CDATA[Explosion is pleased to announce a new development services offering, spaCy Tailored Pipelines. We’ll build you a custom natural language processing pipeline, delivered in a standardized format using spaCy’s projects system.]]> https://explosion.ai/blog/introducing-spacy-tailored-pipelines blog:introducing-spacy-tailored-pipelines <![CDATA[blog]]> <![CDATA[spacy]]> <![CDATA[consulting]]> <![CDATA[strategy]]> <![CDATA[Matthew Honnibal, Ines Montani]]> Tue, 08 Feb 2022 00:00:00 GMT <![CDATA[When Women Make Headlines]]> <![CDATA[Using spaCy and other packages from the NLP ecosystem for analyzing more than 382,000 headlines to see how women are represented (or misrepresented) in the news.]]> https://pudding.cool/2022/02/women-in-headlines/ https://pudding.cool/2022/02/women-in-headlines/ <![CDATA[universe]]> <![CDATA[spacy]]> <![CDATA[media]]> <![CDATA[Explosion]]> Tue, 01 Feb 2022 00:00:00 GMT <![CDATA[Creating Tools that Spark Joy with Ines Montani]]> https://www.youtube.com/watch?v=Bj1NyfumGDw event:zenml-ines <![CDATA[interview]]> <![CDATA[Ines Montani]]> Thu, 27 Jan 2022 00:00:00 GMT <![CDATA[Explosion in 2021: Our Year in Review]]> <![CDATA[The year 2021 is coming to an end, and like the previous year, it was shaped by unique challenges that impacted our work together. For Explosion, it was a very productive year. We found an investor that fits our strategy, the work on Prodigy Teams is in full swing, and the team has grown a lot. So here's our look back at our highlights of the year 2021.]]> https://explosion.ai/blog/year-in-review-2021 blog:year-in-review-2021 <![CDATA[blog]]> <![CDATA[spacy]]> <![CDATA[prodigy]]> <![CDATA[Ines Montani, Matthew Honnibal, Sofie Van Landeghem, Philip Vollet]]> Fri, 31 Dec 2021 00:00:00 GMT <![CDATA[Healthsea: an end-to-end spaCy pipeline for exploring health supplement effects]]> <![CDATA[Create better access to health with machine learning and natural language processing. Read about our journey of developing Healthsea, an end-to-end spaCy pipeline for analyzing user reviews to supplement products and extracting potential effects on health. ]]> https://explosion.ai/blog/healthsea blog:healthsea <![CDATA[blog]]> <![CDATA[spacy]]> <![CDATA[prodigy]]> <![CDATA[biomedical]]> <![CDATA[Edward Schmuhl]]> Wed, 15 Dec 2021 00:00:00 GMT <![CDATA[Universal Dependencies v2.5 Benchmarks for spaCy]]> <![CDATA[We present Universal Dependencies v2.5 benchmarks for spaCy v3.2 that show the competitive performance of spaCy in a direct comparison with Stanza and Trankit using the end-to-end evaluation from the CoNLL 2018 Shared Task. ]]> https://explosion.ai/blog/ud-benchmarks-v3-2 blog:ud-benchmarks-v3-2 <![CDATA[blog]]> <![CDATA[spacy]]> <![CDATA[Adriane Boyd]]> Tue, 14 Dec 2021 00:00:00 GMT <![CDATA[Neural edit-tree lemmatization for spaCy]]> <![CDATA[We are happy to introduce a new, experimental, machine learning-based lemmatizer that posts accuracies above 95% for many languages. This lemmatizer learns to predict lemmatization rules from a corpus of examples and removes the need to write an exhaustive set of per-language lemmatization rules. ]]> https://explosion.ai/blog/edit-tree-lemmatizer blog:edit-tree-lemmatizer <![CDATA[blog]]> <![CDATA[spacy]]> <![CDATA[Daniël de Kok]]> Wed, 24 Nov 2021 00:00:00 GMT <![CDATA[Talking sense: using machine learning to understand quotes]]> <![CDATA[How the Guardian uses spaCy and Prodigy to train a machine learning model that helps extract quotes from news articles and match them to the correct source.]]> https://www.theguardian.com/info/2021/nov/25/talking-sense-using-machine-learning-to-understand-quotes https://www.theguardian.com/info/2021/nov/25/talking-sense-using-machine-learning-to-understand-quotes <![CDATA[universe]]> <![CDATA[spacy]]> <![CDATA[prodigy]]> <![CDATA[media]]> <![CDATA[annotation]]> <![CDATA[Explosion]]> Wed, 17 Nov 2021 00:00:00 GMT <![CDATA[spaCy v3's project and config systems are pretty great]]> <![CDATA[The road to production has become increasingly harder. Machine Learning Engineers who turn prototypes into production-ready software face difficulties with the lack of tooling and best-practices. spaCy v3, with its configuration and project system, introduced a way to solve this problem. Here's my take on how it works, and how it can ramp-up your team! ]]> https://explosion.ai/blog/spacy-v3-project-config-systems blog:spacy-v3-project-config-systems <![CDATA[blog]]> <![CDATA[spacy]]> <![CDATA[strategy]]> <![CDATA[Lj Miranda]]> Wed, 17 Nov 2021 00:00:00 GMT <![CDATA[Introducing spaCy v3.2]]> <![CDATA[spaCy v3.2 features usability improvements for custom training and scoring, improved performance and support for floret, our new fastText word vectors algorithm.]]> https://explosion.ai/blog/spacy-v3-2 blog:spacy-v3-2 <![CDATA[blog]]> <![CDATA[spacy]]> <![CDATA[Matthew Honnibal, Ines Montani, Sofie Van Landeghem, Adriane Boyd, Paul O’Leary McCann, Daniël de Kok, Duygu Altinok, Edward Schmuhl, Lj Miranda, Philip Vollet]]> Fri, 05 Nov 2021 00:00:00 GMT <![CDATA[🌸 floret v0.10.0]]> <![CDATA[fastText + Bloom embeddings for compact, full-coverage vectors with spaCy]]> https://github.com/explosion/floret release:floret_0.10.0 <![CDATA[release]]> <![CDATA[spacy]]> <![CDATA[Explosion]]> Wed, 27 Oct 2021 00:00:00 GMT <![CDATA[🛸 spacy-transformers v1.1.0]]> <![CDATA[Better serialization, full ModelOutput, mixed-precision training and more]]> https://github.com/explosion/spacy-transformers/releases/tag/v1.1.0 release:spacy-transformers_1.1.0 <![CDATA[release]]> <![CDATA[spacy]]> <![CDATA[Explosion]]> Mon, 18 Oct 2021 00:00:00 GMT <![CDATA[Anecdotes from 11 Role Models in Machine Learning]]> https://towardsdatascience.com/anecdotes-from-11-role-models-in-machine-learning-d01bc0d65dcd https://towardsdatascience.com/anecdotes-from-11-role-models-in-machine-learning-d01bc0d65dcd <![CDATA[interview]]> <![CDATA[Ines Montani, Matthew Honnibal]]> Tue, 21 Sep 2021 00:00:00 GMT <![CDATA[We’ve sold 5% of Explosion]]> <![CDATA[Since founding Explosion in 2016, we’ve run the company as a profitable business and we decided to only consider external investment if we could find a deal that wouldn’t compromise the direction or stability of the company. We’re pleased to announce that we’ve found an investment that ticks all the boxes.]]> https://explosion.ai/blog/weve-sold-5-percent-of-explosion blog:weve-sold-5-percent-of-explosion <![CDATA[blog]]> <![CDATA[Ines Montani, Matthew Honnibal]]> Thu, 02 Sep 2021 00:00:00 GMT <![CDATA[Reproducible spaCy NLP Experiments with Weights & Biases]]> <![CDATA[This tutorial will show how to add Weights & Biases to any spaCy NLP project to track your experiments, save model checkpoints, and version your datasets.]]> https://wandb.ai/wandb/wandb_spacy_integration/reports/Reproducible-spaCy-NLP-Experiments-with-Weights-Biases--Vmlldzo4NjM2MDk https://wandb.ai/wandb/wandb_spacy_integration/reports/Reproducible-spaCy-NLP-Experiments-with-Weights-Biases--Vmlldzo4NjM2MDk <![CDATA[universe]]> <![CDATA[spacy]]> <![CDATA[strategy]]> <![CDATA[Explosion]]> Thu, 12 Aug 2021 00:00:00 GMT <![CDATA[✨ prodigy v1.11.0]]> <![CDATA[spaCy v3 support, annotation for overlapping and nested spans, better installation & more]]> https://prodi.gy/docs/changelog#v1.11.0 release:prodigy_1.11.0 <![CDATA[release]]> <![CDATA[prodigy]]> <![CDATA[annotation]]> <![CDATA[Explosion]]> Wed, 12 Aug 2020 00:00:00 GMT <![CDATA[Welcome spaCy to the Hugging Face Hub]]> <![CDATA[Hugging Face makes it really easy to share your spaCy pipelines with the community! With a single command, you can upload any pipeline package, with a pretty model card and all required metadata auto-generated for you.]]> https://huggingface.co/blog/spacy https://huggingface.co/blog/spacy <![CDATA[universe]]> <![CDATA[spacy]]> <![CDATA[Ines Montani]]> Tue, 13 Jul 2021 00:00:00 GMT <![CDATA[Introduction to Japanese Natural Language Processing]]> <![CDATA[A thorough guide for programmers working with Japanese text, covering fundamental issues like tokenization and recent research topics like generating natural language texts.]]> https://www.japanesenlp.com https://www.japanesenlp.com <![CDATA[book]]> <![CDATA[spacy]]> <![CDATA[Paul O’Leary McCann]]> Sun, 11 Jul 2021 00:00:00 GMT <![CDATA[Mastering spaCy]]> <![CDATA[An end-to-end practical guide to implementing NLP applications using the Python ecosystem. By the end of this book, you'll be able to confidently use spaCy, including its linguistic features, word vectors, and classifiers, to create your own NLP apps.]]> https://www.amazon.de/-/en/Mastering-spaCy-end-end-implementing/dp/1800563353 https://www.amazon.de/-/en/Mastering-spaCy-end-end-implementing/dp/1800563353 <![CDATA[book]]> <![CDATA[spacy]]> <![CDATA[Duygu Altinok]]> Fri, 09 Jul 2021 00:00:00 GMT <![CDATA[Introducing spaCy v3.1]]> <![CDATA[It’s been great to see the adoption of spaCy v3, which introduced transformer-based pipelines, a new training system and more. Version 3.1 adds more on top of it, including the ability to use predicted annotations during training, a component for predicting arbitrary and overlapping spans and new pipelines for Catalan and Danish.]]> https://explosion.ai/blog/spacy-v3-1 blog:spacy-v3-1 <![CDATA[blog]]> <![CDATA[spacy]]> <![CDATA[Matthew Honnibal, Ines Montani, Sofie Van Landeghem, Adriane Boyd, Paul O’Leary McCann]]> Wed, 07 Jul 2021 00:00:00 GMT <![CDATA[🤗 spacy-huggingface-hub v0.0.1]]> <![CDATA[Upload spaCy pipelines to the Hugging Face Hub]]> https://github.com/explosion/spacy-huggingface-hub release:spacy-huggingface-hub_0.0.1 <![CDATA[release]]> <![CDATA[spacy]]> <![CDATA[Explosion]]> Tue, 06 Jul 2021 00:00:00 GMT <![CDATA[Applied NLP Thinking: How to Translate Problems into Solutions]]> <![CDATA[We’ve been running Explosion for about five years now, which has given us a lot of insights into what Natural Language Processing looks like in industry contexts. In this blog post, I’m going to discuss some of the biggest challenges for applied NLP and translating business problems into machine learning solutions.]]> https://explosion.ai/blog/applied-nlp-thinking blog:applied-nlp-thinking <![CDATA[blog]]> <![CDATA[strategy]]> <![CDATA[Ines Montani]]> Sat, 19 Jun 2021 00:00:00 GMT <![CDATA[Interview: Ines Montani & Sebastián Ramírez]]> https://www.youtube.com/watch?v=pi-MhNMe-_Y&t=10199s event:pyfest-2021 <![CDATA[interview]]> <![CDATA[Ines Montani]]> Mon, 14 Jun 2021 00:00:00 GMT <![CDATA[What does “real-world NLP” look like and how can students get ready for it?]]> https://speakerdeck.com/inesmontani/applied-nlp-thinking event:naacl-2021 <![CDATA[talk]]> <![CDATA[Ines Montani]]> Thu, 10 Jun 2021 00:00:00 GMT <![CDATA[spaCy v3: State-of-the-art NLP from Prototype to Production]]> https://www.youtube.com/watch?v=rUvINwHJoQc&ab_channel=RobertMonarch event:bay-area-nlp-2021 <![CDATA[talk]]> <![CDATA[Matthew Honnibal]]> Fri, 04 Jun 2021 00:00:00 GMT <![CDATA[Corpus-Level Evaluation for Event QA: The IndiaPoliceEvents Corpus Covering the 2002 Gujarat Violence]]> <![CDATA[Figure A2 shows a stylized version of the custom interface we built using the Prodigy annotation tool. Annotators are presented with an entire document, with sentences sequentially highlighted.]]> https://arxiv.org/abs/2105.12936 https://arxiv.org/abs/2105.12936 <![CDATA[paper]]> <![CDATA[prodigy]]> <![CDATA[annotation]]> <![CDATA[media]]> <![CDATA[humanities]]> <![CDATA[Explosion]]> Thu, 27 May 2021 00:00:00 GMT <![CDATA[Building Industrial-Strength NLP Applications]]> https://snorkel.ai/building-industrial-strength-nlp-applications-with-ines-montani/ event:snorkel-science-talks <![CDATA[interview]]> <![CDATA[Ines Montani]]> Fri, 30 Apr 2021 00:00:00 GMT <![CDATA[A Bit of AI Episode 7]]> https://www.youtube.com/watch?v=R6ZtnkuPkQM event:a-bit-of-ai <![CDATA[interview]]> <![CDATA[Ines Montani]]> Thu, 22 Apr 2021 00:00:00 GMT <![CDATA[Intro to NLP with spaCy (6): Detecting programming languages]]> https://www.youtube.com/watch?v=k77RrmMaKEI&list=PLBmcuObd5An559HbDr_alBnwVsGq-7uTF&index=6 https://www.youtube.com/watch?v=k77RrmMaKEI&list=PLBmcuObd5An559HbDr_alBnwVsGq-7uTF&index=6 <![CDATA[video]]> <![CDATA[spacy]]> <![CDATA[Vincent D. Warmerdam]]> Wed, 17 Mar 2021 00:00:00 GMT <![CDATA[How We Found Pricey Provisions in New Jersey Police Contracts]]> <![CDATA[ProPublica and the Asbury Park Press scoured hundreds of police union agreements for details on publicly funded payouts to cops, using spaCy under the hood.]]> https://www.propublica.org/article/how-we-found-pricey-provisions-in-new-jersey-police-contracts https://www.propublica.org/article/how-we-found-pricey-provisions-in-new-jersey-police-contracts <![CDATA[universe]]> <![CDATA[spacy]]> <![CDATA[media]]> <![CDATA[legal]]> <![CDATA[finance]]> <![CDATA[Explosion]]> Mon, 08 Feb 2021 00:00:00 GMT <![CDATA[🦆 sense2vec v2.0.0]]> <![CDATA[Update component for spaCy v3]]> https://github.com/explosion/sense2vec/releases/tag/v2.0.0 release:sense2vec_2.0.0 <![CDATA[release]]> <![CDATA[spacy]]> <![CDATA[Explosion]]> Sun, 07 Feb 2021 00:00:00 GMT <![CDATA[spaCy v3: Custom trainable relation extraction component]]> <![CDATA[spaCy v3.0 features new transformer-based pipelines that get spaCy’s accuracy right up to the current state-of-the-art, and a new training config and workflow system to help you take projects from prototype to production. In this video, Sofie shows you how to apply all these new features when implementing a custom trainable component from scratch.]]> https://www.youtube.com/watch?v=8HL-Ap5_Axo https://www.youtube.com/watch?v=8HL-Ap5_Axo <![CDATA[video]]> <![CDATA[spacy]]> <![CDATA[thinc]]> <![CDATA[biomedical]]> <![CDATA[Sofie Van Landeghem]]> Mon, 01 Feb 2021 00:00:00 GMT <![CDATA[spaCy v3: Design concepts explained (behind the scenes)]]> <![CDATA[In this video, Ines shows you some of the new design concepts and explain what’s going on under the hood, how we’ve implemented them and most importantly, why.]]> https://www.youtube.com/watch?v=BWhh3r6W-qE https://www.youtube.com/watch?v=BWhh3r6W-qE <![CDATA[video]]> <![CDATA[spacy]]> <![CDATA[Ines Montani]]> Mon, 01 Feb 2021 00:00:00 GMT <![CDATA[🛸 spacy-transformers v1.0.0]]> <![CDATA[Update components for spaCy v3.0]]> https://github.com/explosion/spacy-transformers/releases/tag/v1.0.0 release:spacy-transformers_1.0.0 <![CDATA[release]]> <![CDATA[spacy]]> <![CDATA[Explosion]]> Mon, 01 Feb 2021 00:00:00 GMT <![CDATA[spaCy v3: State-of-the-art NLP from Prototype to Production]]> https://www.youtube.com/watch?v=9k_EfV7Cns0 https://www.youtube.com/watch?v=9k_EfV7Cns0 <![CDATA[video]]> <![CDATA[spacy]]> <![CDATA[Ines Montani, Matthew Honnibal]]> Mon, 01 Feb 2021 00:00:00 GMT <![CDATA[Introducing spaCy v3.0]]> <![CDATA[spaCy v3.0 is a huge release! It features new transformer-based pipelines that get spaCy's accuracy right up to the current state-of-the-art, and a new workflow system to help you take projects from prototype to production. It's much easier to configure and train your pipeline, and there are lots of new and improved integrations with the rest of the NLP ecosystem.]]> https://explosion.ai/blog/spacy-v3 blog:spacy-v3 <![CDATA[blog]]> <![CDATA[spacy]]> <![CDATA[Matthew Honnibal, Ines Montani, Sofie Van Landeghem, Adriane Boyd]]> Mon, 01 Feb 2021 00:00:00 GMT <![CDATA[🔮 thinc v8.0.0]]> <![CDATA[Type checking, wrap PyToch, TensorFlow & MXNet, integrated config system]]> https://thinc.ai release:thinc_8.0.0 <![CDATA[release]]> <![CDATA[thinc]]> <![CDATA[Explosion]]> Sun, 24 Jan 2021 00:00:00 GMT <![CDATA[Building a Data Science Startup]]> https://talkpython.fm/episodes/show/300/building-a-data-science-startup-panel event:talkpython-2021-ines <![CDATA[interview]]> <![CDATA[Ines Montani]]> Fri, 22 Jan 2021 00:00:00 GMT <![CDATA[Explosion in 2020: Our Year in Review]]> <![CDATA[While 2020 hasn’t been easy for anyone, at Explosion we’ve considered ourselves relatively fortunate in this most interesting year. We’ve always worked remotely, so we’ve been able to take both pride and comfort in continuing to ship good software. Here’s a look back at what we’ve been up to.]]> https://explosion.ai/blog/year-in-review-2020 blog:year-in-review-2020 <![CDATA[blog]]> <![CDATA[spacy]]> <![CDATA[prodigy]]> <![CDATA[Matthew Honnibal, Ines Montani, Walter Henry]]> Thu, 31 Dec 2020 00:00:00 GMT <![CDATA[How to build resilient NLP applications]]> https://www.youtube.com/watch?v=Reg-esi12o8&ab_channel=Rasa event:rasa-chats-ines-matt <![CDATA[interview]]> <![CDATA[Ines Montani, Matthew Honnibal]]> Tue, 24 Nov 2020 00:00:00 GMT <![CDATA[Building Industrial-Strength NLP Pipelines]]> https://www.youtube.com/watch?v=v550Ve66vEc&ab_channel=Weights%26Biases event:wandb-ines-sofie <![CDATA[interview]]> <![CDATA[Ines Montani, Sofie Van Landeghem]]> Thu, 29 Oct 2020 00:00:00 GMT <![CDATA[Ines becomes a Python Software Foundation Fellow]]> https://pyfound.blogspot.com/2020/10/python-software-foundation-fellow.html https://pyfound.blogspot.com/2020/10/python-software-foundation-fellow.html <![CDATA[blog]]> <![CDATA[Ines Montani]]> Tue, 27 Oct 2020 00:00:00 GMT <![CDATA[spaCy v3.0: Bringing State-of-the-art NLP from Prototype to Production]]> https://globalai.live/october-sessions-natural-language-processing/keynote-2/ event:globalai-2020 <![CDATA[talk]]> <![CDATA[Ines Montani]]> Thu, 22 Oct 2020 00:00:00 GMT <![CDATA[Ines Montani brought linguistic and computers together]]> https://devjourney.info/Guests/122-InesMontani.html event:devjourney-ines <![CDATA[interview]]> <![CDATA[Ines Montani]]> Tue, 13 Oct 2020 00:00:00 GMT <![CDATA[How We Analyzed Google’s Search Results]]> <![CDATA[Using the Prodigy annotation tool, we created a user interface and a coder manual for two annotators to spot-check 741 stained images randomly sampled from our dataset.]]> https://themarkup.org/google-the-giant/2020/07/28/how-we-analyzed-google-search-results-web-assay-parsing-tool https://themarkup.org/google-the-giant/2020/07/28/how-we-analyzed-google-search-results-web-assay-parsing-tool <![CDATA[universe]]> <![CDATA[prodigy]]> <![CDATA[media]]> <![CDATA[Explosion]]> Tue, 28 Jul 2020 00:00:00 GMT <![CDATA[The Physical Traits that Define Men and Women in Literature]]> <![CDATA[Analysis of physical traits most tied to gender in literature using spaCy.]]> https://pudding.cool/2020/07/gendered-descriptions/ https://pudding.cool/2020/07/gendered-descriptions/ <![CDATA[universe]]> <![CDATA[spacy]]> <![CDATA[media]]> <![CDATA[Explosion]]> Wed, 01 Jul 2020 00:00:00 GMT <![CDATA[👑 spacy-streamlit v0.0.2]]> <![CDATA[spaCy building blocks and visualizers for Streamlit apps]]> https://github.com/explosion/spacy-streamlit release:spacy-streamlit_0.0.2 <![CDATA[release]]> <![CDATA[spacy]]> <![CDATA[Explosion]]> Tue, 23 Jun 2020 00:00:00 GMT <![CDATA[🦉 srsly v2.1.0]]> <![CDATA[Support YAML]]> https://github.com/explosion/srsly/releases/tag/v2.1.0 release:srsly_2.1.0 <![CDATA[release]]> <![CDATA[Explosion]]> Mon, 22 Jun 2020 00:00:00 GMT <![CDATA[Designing Practical NLP Solutions]]> https://speakerdeck.com/inesmontani/designing-practical-nlp-solutions event:l3-ai-2020 <![CDATA[talk]]> <![CDATA[strategy]]> <![CDATA[Ines Montani]]> Thu, 18 Jun 2020 00:00:00 GMT <![CDATA[Prodigy v1.10: Dependencies, relations, audio, video & more]]> <![CDATA[Version 1.10 of Prodigy includes tons of new features, including manual dependency and relation annotation, audio and video annotation, a new and improved image UI, new recipe callbacks, more settings for manual NER, plus various new config options and settings.]]> https://www.youtube.com/watch?v=KCrIa538u4I https://www.youtube.com/watch?v=KCrIa538u4I <![CDATA[video]]> <![CDATA[prodigy]]> <![CDATA[annotation]]> <![CDATA[computer-vision]]> <![CDATA[Ines Montani]]> Wed, 17 Jun 2020 00:00:00 GMT <![CDATA[✨ prodigy v1.10.0]]> <![CDATA[Dependency and relation annotation, audio, video, character-based NER & more]]> https://prodi.gy/docs/changelog#v1.10.0 release:prodigy_1.10.0 <![CDATA[release]]> <![CDATA[prodigy]]> <![CDATA[annotation]]> <![CDATA[Explosion]]> Tue, 16 Jun 2020 00:00:00 GMT <![CDATA[Introducing spaCy v2.3]]> <![CDATA[spaCy now speaks Chinese, Japanese, Danish, Polish and Romanian! Version 2.3 of the spaCy Natural Language Processing library adds models for five new languages. We've also updated all 15 model families with word vectors and improved accuracy, while also decreasing model size and loading times for models with vectors.]]> https://explosion.ai/blog/spacy-v2-3 blog:spacy-v2-3 <![CDATA[blog]]> <![CDATA[spacy]]> <![CDATA[Adriane Boyd, Matthew Honnibal]]> Tue, 16 Jun 2020 00:00:00 GMT <![CDATA[🛸 spacy-transformers v0.6.0]]> <![CDATA[Update to transformers v2.5.0]]> https://github.com/explosion/spacy-transformers/releases/tag/v0.6.0 release:spacy-transformers_0.6.0 <![CDATA[release]]> <![CDATA[spacy]]> <![CDATA[Explosion]]> Sun, 24 May 2020 00:00:00 GMT <![CDATA[Identifying Predictors of Suicide in Severe Mental Illness: A Feasibility Study of a Clinical Prediction Rule]]> <![CDATA[The named entity recognition model was developed in two phases: 1) training with“gold-standard” annotations collected with GATE and 2) model fine-tuning with Prodigy—an active learning-based annotation tool.]]> https://www.frontiersin.org/journals/psychiatry/articles/10.3389/fpsyt.2020.00268/full https://www.frontiersin.org/journals/psychiatry/articles/10.3389/fpsyt.2020.00268/full <![CDATA[paper]]> <![CDATA[prodigy]]> <![CDATA[biomedical]]> <![CDATA[Explosion]]> Wed, 15 Apr 2020 00:00:00 GMT <![CDATA[Building customizable NLP pipelines with spaCy]]> https://speakerdeck.com/sofievl/2020-02-19-spacy-pipelines event:turku-ai-2020 <![CDATA[talk]]> <![CDATA[spacy]]> <![CDATA[Sofie Van Landeghem]]> Wed, 19 Feb 2020 00:00:00 GMT <![CDATA[Image Captioning with Prodigy & PyTorch]]> <![CDATA[In this video, we’ll show you how you can use Prodigy to script fully custom annotation workflows in Python, how to plug in your own machine learning models and how to mix and match different interfaces for your specific use case.]]> https://www.youtube.com/watch?v=zlyq9z7hdUA https://www.youtube.com/watch?v=zlyq9z7hdUA <![CDATA[video]]> <![CDATA[prodigy]]> <![CDATA[annotation]]> <![CDATA[computer-vision]]> <![CDATA[Ines Montani]]> Tue, 24 Mar 2020 00:00:00 GMT <![CDATA[Training a Named Entity Recognition Model with Prodigy and Transfer Learning]]> <![CDATA[In this video, we’ll show you how to use Prodigy to train a named entity recognition model from scratch, by taking advantage of semi-automatic annotation and modern transfer learning techniques.]]> https://www.youtube.com/watch?v=59BKHO_xBPA https://www.youtube.com/watch?v=59BKHO_xBPA <![CDATA[video]]> <![CDATA[prodigy]]> <![CDATA[annotation]]> <![CDATA[Ines Montani]]> Mon, 16 Mar 2020 00:00:00 GMT <![CDATA[Intro to NLP with spaCy (5): Detecting programming languages]]> https://www.youtube.com/watch?v=f4sqeLRzkPg&list=PLBmcuObd5An559HbDr_alBnwVsGq-7uTF&index=5 https://www.youtube.com/watch?v=f4sqeLRzkPg&list=PLBmcuObd5An559HbDr_alBnwVsGq-7uTF&index=5 <![CDATA[video]]> <![CDATA[spacy]]> <![CDATA[Vincent D. Warmerdam]]> Sat, 13 Jun 2020 00:00:00 GMT <![CDATA[Training a custom entity linking model with spaCy]]> <![CDATA[In this video, we show you how to create a custom Entity Linking model in spaCy to disambiguate different mentions of the person “Emerson” to unique identifiers in a knowledge base.]]> https://www.youtube.com/watch?v=8u57WSXVpmw https://www.youtube.com/watch?v=8u57WSXVpmw <![CDATA[video]]> <![CDATA[spacy]]> <![CDATA[Sofie Van Landeghem]]> Thu, 07 May 2020 00:00:00 GMT <![CDATA[Intro to NLP with spaCy (4): Detecting programming languages]]> https://www.youtube.com/watch?v=IqOJU1-_Fi0&list=PLBmcuObd5An559HbDr_alBnwVsGq-7uTF&index=4 https://www.youtube.com/watch?v=IqOJU1-_Fi0&list=PLBmcuObd5An559HbDr_alBnwVsGq-7uTF&index=4 <![CDATA[video]]> <![CDATA[spacy]]> <![CDATA[Vincent D. Warmerdam]]> Mon, 02 Mar 2020 00:00:00 GMT <![CDATA[PyCon Colombia Speaker Interview]]> <![CDATA[Karo and Ines talked about getting into tech and machine learning, and what’s next for spaCy and our other tools.]]> https://www.youtube.com/watch?v=1oBlKPET530 https://www.youtube.com/watch?v=1oBlKPET530 <![CDATA[interview]]> <![CDATA[Ines Montani]]> Sat, 08 Feb 2020 00:00:00 GMT <![CDATA[The Future of NLP in Python]]> <![CDATA[The data community came to Python for the language, and stayed for each other – once it got critical mass, it’s the ecosystem that counts. We’ve been proud to be part of that. So what does the future hold for NLP in Python?]]> https://speakerdeck.com/inesmontani/the-future-of-nlp-in-python-keynote-pycon-colombia-2020 event:pycon-colombia-2020 <![CDATA[talk]]> <![CDATA[Ines Montani]]> Sat, 08 Feb 2020 00:00:00 GMT <![CDATA[Explosion in 2019: Our Year in Review]]> <![CDATA[As 2019 draws to a close and we step into the 2020s, we thought we’d take a look back at the year and all we’ve accomplished. And we realized we had so much that we could give you a month-by-month rundown of everything that happened.]]> https://explosion.ai/blog/year-in-review-2019 blog:year-in-review-2019 <![CDATA[blog]]> <![CDATA[spacy]]> <![CDATA[prodigy]]> <![CDATA[Matthew Honnibal, Ines Montani, Walter Henry]]> Sun, 29 Dec 2019 00:00:00 GMT <![CDATA[✨ prodigy v1.9.0]]> <![CDATA[Custom UI blocks, text input UI, better training and data conversion]]> https://prodi.gy/docs/changelog#v1.9.0 release:prodigy_1.9.0 <![CDATA[release]]> <![CDATA[prodigy]]> <![CDATA[annotation]]> <![CDATA[Explosion]]> Wed, 18 Dec 2019 00:00:00 GMT <![CDATA[Intro to NLP with spaCy (3): Detecting programming languages]]> https://www.youtube.com/watch?v=4V0JDdohxAk&list=PLBmcuObd5An559HbDr_alBnwVsGq-7uTF&index=3 https://www.youtube.com/watch?v=4V0JDdohxAk&list=PLBmcuObd5An559HbDr_alBnwVsGq-7uTF&index=3 <![CDATA[video]]> <![CDATA[spacy]]> <![CDATA[Vincent D. Warmerdam]]> Sat, 07 Dec 2019 00:00:00 GMT <![CDATA[Künstliche Intelligenz Beyond the Hype]]> <![CDATA[“Artificial intelligence” is everywhere in the headlines. Many futuristic-sounding things suddenly seem possible. It’s not easy to judge what all these technological advances mean. What is hype and what really works? And how should we imagine the future?]]> https://speakerdeck.com/inesmontani/kunstliche-intelligenz-beyond-the-hype event:zuendfunk-2019 <![CDATA[talk]]> <![CDATA[Ines Montani]]> Sat, 09 Nov 2019 00:00:00 GMT <![CDATA[spaCy meets Transformers]]> https://www.youtube.com/watch?v=40koTT6FocE&ab_channel=HackingMachineLearning event:hacking-ml-2019 <![CDATA[talk]]> <![CDATA[Ines Montani]]> Fri, 08 Nov 2019 00:00:00 GMT <![CDATA[sense2vec reloaded: contextually-keyed word vectors]]> <![CDATA[In 2016 we trained a sense2vec model on the 2015 portion of the Reddit comments corpus, leading to a useful library and one of our most popular demos. That work is now due for an update. In this post, we present a new version and a demo NER project that we trained to usable accuracy in just a few hours.]]> https://explosion.ai/blog/sense2vec-reloaded blog:sense2vec-reloaded <![CDATA[blog]]> <![CDATA[spacy]]> <![CDATA[prodigy]]> <![CDATA[Matthew Honnibal, Ines Montani]]> Fri, 22 Nov 2019 00:00:00 GMT <![CDATA[🦆 sense2vec v1.0.0]]> <![CDATA[More features, 2019 Reddit vectors model and Prodigy recipes]]> https://github.com/explosion/sense2vec/releases/tag/v1.0.0 release:sense2vec_1.0.0 <![CDATA[release]]> <![CDATA[spacy]]> <![CDATA[Explosion]]> Fri, 22 Nov 2019 00:00:00 GMT <![CDATA[Interview with Ines Montani]]> <![CDATA[Ines talks about how she got into programming, how to stay up to date with the latest developments in our field and the ideas behind the PyCon India keynote “Let Them Write Code”.]]> https://sayakpaul.medium.com/an-interview-with-ines-montani-co-founder-at-explosion-114afef7b48a https://sayakpaul.medium.com/an-interview-with-ines-montani-co-founder-at-explosion-114afef7b48a <![CDATA[interview]]> <![CDATA[Ines Montani]]> Wed, 23 Oct 2019 00:00:00 GMT <![CDATA[Using spaCy with Hugging Face Transformers]]> <![CDATA[Transformer models like BERT have set a new standard for accuracy on almost every NLP leaderboard. However, these models are very new, and most of the software ecosystem surrounding them is oriented towards the many opportunities for further research. In this talk, Matt describes how you can now use these models in spaCy to work on real problems and the many opportunities transfer learningfor production NLP, regardless of which software packages you choose.]]> https://speakerdeck.com/honnibal/spacy-meets-transformers event:pycon-india-2019 <![CDATA[talk]]> <![CDATA[spacy]]> <![CDATA[Matthew Honnibal]]> Sat, 12 Oct 2019 00:00:00 GMT <![CDATA[Let Them Write Code]]> https://speakerdeck.com/inesmontani/let-them-write-code-keynote-pycon-india-2019 event:pycon-india-2019-keynote <![CDATA[talk]]> <![CDATA[Ines Montani]]> Sun, 13 Oct 2019 00:00:00 GMT <![CDATA[spaCy and the future of multi-lingual NLP]]> https://speakerdeck.com/inesmontani/spacy-and-the-future-of-multi-lingual-nlp event:meta-forum-2019 <![CDATA[talk]]> <![CDATA[spacy]]> <![CDATA[humanities]]> <![CDATA[Ines Montani, Matthew Honnibal]]> Wed, 09 Oct 2019 00:00:00 GMT <![CDATA[Explosion awarded META Seal of Recognition]]> <![CDATA[We’re proud to accept the META Seal of Recognition at META-FORUM in Brussels, along with Mozilla. The META-FORUM is an international conference series backed by the European Union on powerful and innovative Language Technologies for a multilingual information society.]]> http://www.meta-net.eu/meta-seal http://www.meta-net.eu/meta-seal <![CDATA[blog]]> <![CDATA[spacy]]> <![CDATA[humanities]]> <![CDATA[Ines Montani, Matthew Honnibal]]> Thu, 10 Oct 2019 00:00:00 GMT <![CDATA[Introducing spaCy v2.2]]> <![CDATA[Version 2.2 of the spaCy Natural Language Processing library is leaner, cleaner and even more user-friendly. In addition to new model packages and features for training, evaluation and serialization, we've made lots of bug fixes, improved debugging and error handling, and greatly reduced the size of the library on disk.]]> https://explosion.ai/blog/spacy-v2-2 blog:spacy-v2-2 <![CDATA[blog]]> <![CDATA[spacy]]> <![CDATA[Matthew Honnibal, Ines Montani]]> Wed, 02 Oct 2019 00:00:00 GMT <![CDATA[Entity linking for spaCy: Grounding textual mentions]]> https://speakerdeck.com/sofievl/2019-10-01-el-spacy event:belgium-nlp-2019 <![CDATA[talk]]> <![CDATA[spacy]]> <![CDATA[Sofie Van Landeghem]]> Tue, 01 Oct 2019 00:00:00 GMT <![CDATA[Intro to NLP with spaCy (2): Detecting programming languages]]> https://www.youtube.com/watch?v=KL4-Mpgbahw&list=PLBmcuObd5An559HbDr_alBnwVsGq-7uTF&index=2 https://www.youtube.com/watch?v=KL4-Mpgbahw&list=PLBmcuObd5An559HbDr_alBnwVsGq-7uTF&index=2 <![CDATA[video]]> <![CDATA[spacy]]> <![CDATA[Vincent D. Warmerdam]]> Tue, 24 Sep 2019 00:00:00 GMT <![CDATA[Millennials Kill Everything]]> <![CDATA[Analysis on media reporting of millenials using spaCy. From napkins to marriage to Applebees, just looking at headlines you’d guess that for the past decade the millennial generation’s been on a rampage.]]> https://pudding.cool/2019/09/millennials/ https://pudding.cool/2019/09/millennials/ <![CDATA[universe]]> <![CDATA[spacy]]> <![CDATA[media]]> <![CDATA[Explosion]]> Sun, 01 Sep 2019 00:00:00 GMT <![CDATA[Intro to NLP with spaCy (1): Detecting programming languages]]> <![CDATA[In this new video series, data science instructor Vincent Warmerdam gets started with spaCy, an open-source library for Natural Language Processing in Python. His mission: building a system to automatically detect programming languages in large volumes of text.]]> https://www.youtube.com/watch?v=WnGPv6HnBok&list=PLBmcuObd5An559HbDr_alBnwVsGq-7uTF&index=1 https://www.youtube.com/watch?v=WnGPv6HnBok&list=PLBmcuObd5An559HbDr_alBnwVsGq-7uTF&index=1 <![CDATA[video]]> <![CDATA[spacy]]> <![CDATA[Vincent D. Warmerdam]]> Wed, 21 Aug 2019 00:00:00 GMT <![CDATA[Blackstone v0.1.15]]> <![CDATA[A spaCy pipeline and model for NLP on unstructured legal text]]> https://github.com/ICLRandD/Blackstone https://github.com/ICLRandD/Blackstone <![CDATA[universe]]> <![CDATA[spacy]]> <![CDATA[legal]]> <![CDATA[Explosion]]> Wed, 07 Aug 2019 00:00:00 GMT <![CDATA[spaCy meets Transformers: Fine-tune BERT, XLNet and GPT-2]]> <![CDATA[Huge transformer models like BERT, GPT-2 and XLNet have set a new standard for accuracy on almost every NLP leaderboard. You can now use these models in spaCy, via a new interface library we've developed that connects spaCy to Hugging Face's awesome implementations.]]> https://explosion.ai/blog/spacy-transformers blog:spacy-transformers <![CDATA[blog]]> <![CDATA[spacy]]> <![CDATA[Matthew Honnibal, Ines Montani]]> Fri, 02 Aug 2019 00:00:00 GMT <![CDATA[PyDev of the Week: Ines Montani]]> https://www.blog.pythonlibrary.org/2019/07/29/pydev-of-the-week-ines-montani/ https://www.blog.pythonlibrary.org/2019/07/29/pydev-of-the-week-ines-montani/ <![CDATA[interview]]> <![CDATA[Ines Montani]]> Mon, 29 Jul 2019 00:00:00 GMT <![CDATA[Episode 139: f"Yes!" for the f-strings]]> https://pythonbytes.fm/episodes/show/139/f-yes-for-the-f-strings event:pythonbytes-2019 <![CDATA[interview]]> <![CDATA[Ines Montani]]> Thu, 18 Jul 2019 00:00:00 GMT <![CDATA[spaCy and Explosion: past, present & future]]> https://speakerdeck.com/inesmontani/spacy-and-explosion-past-present-and-future event:spacy-irl-2019-matt-ines <![CDATA[talk]]> <![CDATA[spacy]]> <![CDATA[Matthew Honnibal, Ines Montani]]> Sat, 06 Jul 2019 00:00:00 GMT <![CDATA[David Dodson: spaCy in the News: Quartz’s NLP pipeline]]> https://www.youtube.com/watch?v=azrVX8JksMU&list=PLBmcuObd5An4UC6jvK_-eSl6jCvP1gwXc&index=11 https://www.youtube.com/watch?v=azrVX8JksMU&list=PLBmcuObd5An4UC6jvK_-eSl6jCvP1gwXc&index=11 <![CDATA[talk]]> <![CDATA[spacy]]> <![CDATA[media]]> <![CDATA[Explosion]]> Sat, 06 Jul 2019 00:00:00 GMT <![CDATA[McKenzie Marshall: NLP in Asset Management (Barings)]]> https://www.youtube.com/watch?v=kX14Ycieju8&list=PLBmcuObd5An4UC6jvK_-eSl6jCvP1gwXc&index=10 https://www.youtube.com/watch?v=kX14Ycieju8&list=PLBmcuObd5An4UC6jvK_-eSl6jCvP1gwXc&index=10 <![CDATA[talk]]> <![CDATA[spacy]]> <![CDATA[finance]]> <![CDATA[Explosion]]> Sat, 06 Jul 2019 00:00:00 GMT <![CDATA[Patrick Harrison: Financial NLP at S&P Global]]> https://www.youtube.com/watch?v=rdmaR4WRYEM&list=PLBmcuObd5An4UC6jvK_-eSl6jCvP1gwXc&index=9 https://www.youtube.com/watch?v=rdmaR4WRYEM&list=PLBmcuObd5An4UC6jvK_-eSl6jCvP1gwXc&index=9 <![CDATA[talk]]> <![CDATA[spacy]]> <![CDATA[finance]]> <![CDATA[Explosion]]> Sat, 06 Jul 2019 00:00:00 GMT <![CDATA[Mark Neumann: ScispaCy: A spaCy pipeline & models for scientific & biomedical text]]> https://www.youtube.com/watch?v=2_HSKDALwuw&list=PLBmcuObd5An4UC6jvK_-eSl6jCvP1gwXc&index=8 https://www.youtube.com/watch?v=2_HSKDALwuw&list=PLBmcuObd5An4UC6jvK_-eSl6jCvP1gwXc&index=8 <![CDATA[talk]]> <![CDATA[spacy]]> <![CDATA[biomedical]]> <![CDATA[Explosion]]> Sat, 06 Jul 2019 00:00:00 GMT <![CDATA[Applied NLP: Lessons from the Field]]> https://docs.google.com/presentation/d/10wsqCTs4GqzWJlyrH2vhiwKSSPdmoP1N7D2Qr6MMKFk/edit#slide=id.p event:spacy-irl-2019-peter <![CDATA[talk]]> <![CDATA[spacy]]> <![CDATA[Peter Baumgartner]]> Sat, 06 Jul 2019 00:00:00 GMT <![CDATA[Entity linking functionality in spaCy]]> https://drive.google.com/file/d/1EuGxcQLcXvjjkZ-KRUlwpr_doBVyEBEG/view event:spacy-irl-2019-sofie <![CDATA[talk]]> <![CDATA[spacy]]> <![CDATA[Sofie Van Landeghem]]> Sat, 06 Jul 2019 00:00:00 GMT <![CDATA[spaCy IRL 2019: 2 days of NLP in Berlin]]> <![CDATA[We were pleased to invite the spaCy community and other folks working on Natural Language Processing to Berlin this summer for a small and intimate event.]]> https://irl.spacy.io/2019 https://irl.spacy.io/2019 <![CDATA[blog]]> <![CDATA[spacy]]> <![CDATA[Explosion]]> Sat, 06 Jul 2019 00:00:00 GMT <![CDATA[The Brains behind spaCy]]> https://soundcloud.com/datahack-radio/ines-montani-matthew-honnibal-the-brains-behind-spacy event:datahack-ines-matt <![CDATA[interview]]> <![CDATA[Matthew Honnibal, Ines Montani]]> Mon, 03 Jun 2019 00:00:00 GMT <![CDATA[✨ prodigy v1.8.0]]> <![CDATA[Support for spaCy v2.1, basic auth, multi-user sessions, review workflow & more]]> https://prodi.gy/docs/changelog#v1.8.0 release:prodigy_1.8.0 <![CDATA[release]]> <![CDATA[prodigy]]> <![CDATA[annotation]]> <![CDATA[Explosion]]> Mon, 20 May 2019 00:00:00 GMT <![CDATA[Practical transfer learning for NLP with spaCy and Prodigy]]> https://www.youtube.com/watch?v=dkJnI70mTk4&ab_channel=Infoshare event:infoshare-2019 <![CDATA[talk]]> <![CDATA[spacy]]> <![CDATA[prodigy]]> <![CDATA[Ines Montani]]> Thu, 09 May 2019 00:00:00 GMT <![CDATA[Practical Natural Language Processing with spaCy and Prodigy]]> https://twimlai.com/podcast/twimlai/practical-natural-language-processing-spacy-prodigy-ines-montani/ event:twimlai-2019 <![CDATA[interview]]> <![CDATA[Ines Montani]]> Tue, 07 May 2019 00:00:00 GMT <![CDATA[Advanced NLP with spaCy: A free online course]]> <![CDATA[In this free and interactive online course, you’ll learn how to use spaCy to build advanced natural language understanding systems, using both rule-based and machine learning approaches.]]> https://course.spacy.io https://course.spacy.io <![CDATA[blog]]> <![CDATA[spacy]]> <![CDATA[Ines Montani]]> Wed, 17 Apr 2019 00:00:00 GMT <![CDATA[Introducing spaCy v2.1]]> <![CDATA[Version 2.1 of the spaCy Natural Language Processing library includes a huge number of features, improvements and bug fixes. In this post, we highlight some of the things we're especially pleased with, and explain some of the most challenging parts of preparing this big release.]]> https://explosion.ai/blog/spacy-v2-1 blog:spacy-v2-1 <![CDATA[blog]]> <![CDATA[spacy]]> <![CDATA[Matthew Honnibal, Ines Montani]]> Mon, 18 Mar 2019 00:00:00 GMT <![CDATA[Building a software business with Python]]> https://www.youtube.com/watch?v=-DoKDNVjzkg&ab_channel=TalkPython event:talkpython-2019 <![CDATA[interview]]> <![CDATA[Ines Montani]]> Sat, 09 Mar 2019 00:00:00 GMT <![CDATA[FAQ #1: Tips & tricks for NLP, annotation & training with Prodigy and spaCy]]> <![CDATA[In this video, Ines talks about a few frequently asked questions and shares some general tips and tricks for how to structure your NLP annotation projects, how to design your label schemes and how to solve common problems.]]> https://www.youtube.com/watch?v=tMAU3gLbKII https://www.youtube.com/watch?v=tMAU3gLbKII <![CDATA[video]]> <![CDATA[prodigy]]> <![CDATA[spacy]]> <![CDATA[annotation]]> <![CDATA[strategy]]> <![CDATA[Ines Montani]]> Wed, 06 Feb 2019 00:00:00 GMT <![CDATA[Practical transfer learning for NLP with spaCy and Prodigy]]> https://speakerdeck.com/inesmontani/practical-transfer-learning-for-nlp-with-spacy-and-prodigy event:applied-ml-days-2019 <![CDATA[talk]]> <![CDATA[spacy]]> <![CDATA[prodigy]]> <![CDATA[Ines Montani]]> Mon, 28 Jan 2019 00:00:00 GMT <![CDATA[Frag deinen Kühlschrank: Wie künstliche Intelligenz die Welt verändert]]> <![CDATA[In this documentation we explore what it feels like to work with intelligent machines. At large research centers and small start-ups we meet people who decide how and what AI learns today. Ines Montani teaches machines to understand the meaning of texts. Even for the young programmer, artificial intelligence is not magic, but a technology that everyone should understand.]]> https://www.br.de/fernsehen/ard-alpha/programmkalender/ausstrahlung-1642110.html event:ard-alpha-doc <![CDATA[interview]]> <![CDATA[Ines Montani]]> Wed, 16 Jan 2019 00:00:00 GMT <![CDATA[Where Do Corpora Come From?]]> https://soundcloud.com/nlp-highlights/78-where-do-corpora-come-from-with-matt-honnibal-and-ines-montani event:nlp-highlights <![CDATA[interview]]> <![CDATA[Matthew Honnibal, Ines Montani]]> Tue, 15 Jan 2019 00:00:00 GMT <![CDATA[The AI Revolution will not be Monopolized]]> <![CDATA[Who’s going to "win at AI"? There are now several large companies eager to claim that title. Others say that China will take over, leaving Europe and the US far behind. But short of true Artificial General Intelligence, there’s no reason to believe that machine learning or data science will have a single winner. Instead, AI will follow the same trajectory as other technologies for building software: lots of developers, a rich ecosystem, many failed projects and a few shining success stories.]]> https://speakerdeck.com/inesmontani/the-ai-revolution-will-not-be-monopoilized event:hack-talks-2018 <![CDATA[talk]]> <![CDATA[Ines Montani]]> Thu, 22 Nov 2018 00:00:00 GMT <![CDATA[The process: Transforming spaCy’s docs]]> <![CDATA[Making your documentation work for users with vastly different needs is a challenge. Here’s how spaCy, an open-source library for natural language processing, did it.]]> https://increment.com/documentation/transforming-spacys-docs/ https://increment.com/documentation/transforming-spacys-docs/ <![CDATA[blog]]> <![CDATA[spacy]]> <![CDATA[Ines Montani]]> Fri, 10 Aug 2018 00:00:00 GMT <![CDATA[How to Ignore Most Startup Advice and Build a Decent Software Business]]> <![CDATA[“In this talk, I’m not going to give you one "weird trick" or tell you to ~* just follow your dreams *~. But I’ll share some of the things we’ve learned from building a successful software company around commercial developer tools and our open-source library spaCy.”]]> https://speakerdeck.com/inesmontani/how-to-ignore-most-startup-advice-and-build-a-decent-software-business event:europython-2018-keynote <![CDATA[talk]]> <![CDATA[Ines Montani]]> Thu, 26 Jul 2018 00:00:00 GMT <![CDATA[What 1.2 million parliamentary speeches can teach us about gender representation]]> <![CDATA[Analysis of parliamentary speeches using spaCy.]]> https://pudding.cool/2018/07/women-in-parliament/ https://pudding.cool/2018/07/women-in-parliament/ <![CDATA[universe]]> <![CDATA[spacy]]> <![CDATA[media]]> <![CDATA[legal]]> <![CDATA[Explosion]]> Tue, 10 Jul 2018 00:00:00 GMT <![CDATA[Building new NLP solutions with spaCy and Prodigy]]> <![CDATA[“Commercial machine learning projects are currently like start-ups: many projects fail, but some are extremely successful, justifying the total investment. While some people will tell you to embrace failure, I say failure sucks — so what can we do to fight it? In this talk, I will discuss how to address some of the most likely causes of failure for new NLP projects.”]]> https://www.youtube.com/watch?v=jpWqz85F_4Y event:pydata-berlin-2018 <![CDATA[talk]]> <![CDATA[spacy]]> <![CDATA[prodigy]]> <![CDATA[strategy]]> <![CDATA[Matthew Honnibal]]> Sat, 07 Jul 2018 00:00:00 GMT <![CDATA[Embed, encode, attend, predict]]> <![CDATA[While there is a wide literature on developing neural networks for natural language understanding, the networks all have the same general architecture. This talk explains the four components (embed, encode, attend, predict), gives a brief history of approaches to each subproblem, and explains two sophisticated networks in terms of this framework.]]> https://speakerdeck.com/honnibal/embed-encode-attend-predict-a-four-step-framework-for-understanding-neural-network-approaches-to-natural-language-understanding-problems event:data-science-tel-aviv-2018-2 <![CDATA[talk]]> <![CDATA[Matthew Honnibal]]> Mon, 28 May 2018 00:00:00 GMT <![CDATA[Rapid NLP annotation]]> <![CDATA[This talk presents a fast, flexible and even somewhat fun approach to named entity annotation. Using our approach, a model can be trained for a new entity type in only a few hours, starting from only a feed of unannotated text and a handful of seed terms.]]> https://www.youtube.com/watch?v=WgwWlWoP_G4&ab_channel=DataScienceSummit event:data-science-tel-aviv-2018-1 <![CDATA[talk]]> <![CDATA[prodigy]]> <![CDATA[annotation]]> <![CDATA[Matthew Honnibal]]> Mon, 28 May 2018 00:00:00 GMT <![CDATA[Can You Verifi This? Studying Uncertainty and Decision-Making About Misinformation]]> <![CDATA[HCI interface to identify misinformation on social media using spaCy for NER.]]> https://wesslen.netlify.app/publication/icwsm-2018-1/ https://wesslen.netlify.app/publication/icwsm-2018-1/ <![CDATA[paper]]> <![CDATA[spacy]]> <![CDATA[media]]> <![CDATA[Ryan Wesslen]]> Fri, 15 Jun 2018 00:00:00 GMT <![CDATA[Increasing Data Science Productivity: spaCy & Prodigy]]> https://www.youtube.com/watch?v=jB1-NukGZm0& event:sf-meetup-2018 <![CDATA[talk]]> <![CDATA[Matthew Honnibal, Ines Montani]]> Sat, 14 Apr 2018 00:00:00 GMT <![CDATA[Explosion in 2017: Our Year in Review]]> <![CDATA[We founded Explosion in October 2016, so this was our first full calendar year in operation. We set ourselves ambitious goals this year, and we're very happy with how we achieved them. Here's what we got done.]]> https://explosion.ai/blog/year-in-review-2017 blog:year-in-review-2017 <![CDATA[blog]]> <![CDATA[spacy]]> <![CDATA[prodigy]]> <![CDATA[Matthew Honnibal, Ines Montani]]> Sat, 13 Jan 2018 00:00:00 GMT <![CDATA[Training a new entity type with Prodigy – annotation powered by active learning]]> <![CDATA[In this video, we’ll show you how to use Prodigy to train a phrase recognition system for a new concept. Specifically, we’ll train a model to detect references to drugs, using text from Reddit.]]> https://www.youtube.com/watch?v=l4scwf8KeIA https://www.youtube.com/watch?v=l4scwf8KeIA <![CDATA[video]]> <![CDATA[prodigy]]> <![CDATA[spacy]]> <![CDATA[annotation]]> <![CDATA[Matthew Honnibal]]> Mon, 18 Dec 2017 00:00:00 GMT <![CDATA[More than a Million Pro-Repeal Net Neutrality Comments were Likely Faked]]> <![CDATA[Analysis of net neutrality comments by Jeff Kao using spaCy for word vectors.]]> https://hackernoon.com/more-than-a-million-pro-repeal-net-neutrality-comments-were-likely-faked-e9f0e3ed36a6 https://hackernoon.com/more-than-a-million-pro-repeal-net-neutrality-comments-were-likely-faked-e9f0e3ed36a6 <![CDATA[universe]]> <![CDATA[spacy]]> <![CDATA[media]]> <![CDATA[legal]]> <![CDATA[Explosion]]> Wed, 22 Nov 2017 00:00:00 GMT <![CDATA[spaCy’s entity recognition model: incremental parsing with Bloom embeddings & residual CNNs]]> <![CDATA[spaCy v2.0’s Named Entity Recognition system features a sophisticated word embedding strategy using subword features and "Bloom" embeddings, a deep convolutional neural network with residual connections, and a novel transition-based approach to named entity parsing.]]> https://www.youtube.com/watch?v=sqDHBH9IjRU https://www.youtube.com/watch?v=sqDHBH9IjRU <![CDATA[video]]> <![CDATA[spacy]]> <![CDATA[Matthew Honnibal]]> Sun, 12 Nov 2017 00:00:00 GMT <![CDATA[Introducing custom pipelines and extensions for spaCy v2.0]]> <![CDATA[As the release candidate for spaCy v2.0 gets closer, we've been excited to implement some of the last outstanding features. One of the best improvements is a new system for adding pipeline components and registering extensions to the Doc, Span and Token objects. In this post, we'll introduce you to the new functionality, and finish with an example extension package, spacymoji.]]> https://explosion.ai/blog/spacy-v2-pipelines-extensions blog:spacy-v2-pipelines-extensions <![CDATA[blog]]> <![CDATA[spacy]]> <![CDATA[Ines Montani]]> Mon, 16 Oct 2017 00:00:00 GMT <![CDATA[Training an insults classifier with Prodigy in ~1 hour]]> <![CDATA[In this video, we’ll show you how to use Prodigy to train a classifier to detect disparaging or insulting comments. Prodigy makes text classification particularly powerful, because you can try out new ideas very quickly.]]> https://www.youtube.com/watch?v=5di0KlKl0fE https://www.youtube.com/watch?v=5di0KlKl0fE <![CDATA[video]]> <![CDATA[prodigy]]> <![CDATA[spacy]]> <![CDATA[annotation]]> <![CDATA[Ines Montani]]> Wed, 06 Sep 2017 00:00:00 GMT <![CDATA[Why Python’s the best language for AI (and how to make it even better)]]> https://www.youtube.com/watch?v=yJR3qCUB27I&ab_channel=PyConIsrael event:pycon-israel-2017 <![CDATA[talk]]> <![CDATA[Matthew Honnibal]]> Tue, 13 Jun 2017 00:00:00 GMT <![CDATA[Pseudo-rehearsal: A simple solution to catastrophic forgetting for NLP]]> <![CDATA[Sometimes you want to fine-tune a pre-trained model to add a new label or correct some specific errors. This can introduce the "catastrophic forgetting" problem. Pseudo-rehearsal is a good solution: use the original model to label examples, and mix them through your fine-tuning updates.]]> https://explosion.ai/blog/pseudo-rehearsal-catastrophic-forgetting blog:pseudo-rehearsal-catastrophic-forgetting <![CDATA[blog]]> <![CDATA[spacy]]> <![CDATA[Matthew Honnibal]]> Wed, 23 Aug 2017 00:00:00 GMT <![CDATA[Building Prodigy: Our new tool for efficient machine teaching]]> <![CDATA[The philosophy behind Prodigy’s features and its cloud-free design.]]> https://ines.io/blog/prodigy-annotation-tool/ https://ines.io/blog/prodigy-annotation-tool/ <![CDATA[blog]]> <![CDATA[prodigy]]> <![CDATA[annotation]]> <![CDATA[Ines Montani]]> Sat, 05 Aug 2017 00:00:00 GMT <![CDATA[Prodigy: A new tool for radically efficient machine teaching]]> <![CDATA[Machine learning systems are built from both code and data. It's easy to reuse the code but hard to reuse the data, so building AI mostly means doing annotation. This is good, because the examples are how you program the behaviour – the learner itself is really just a compiler. What's not good is the current technology for creating the examples. That's why we're pleased to introduce Prodigy, a downloadable tool for radically efficient machine teaching.]]> https://explosion.ai/blog/prodigy-annotation-tool-active-learning blog:prodigy-annotation-tool-active-learning <![CDATA[blog]]> <![CDATA[prodigy]]> <![CDATA[annotation]]> <![CDATA[Ines Montani, Matthew Honnibal]]> Fri, 04 Aug 2017 00:00:00 GMT <![CDATA[Reflections on running spaCy: commercial open-source NLP]]> <![CDATA[As more and more people and companies are getting involved with open-source software, balancing the expectations of an open community and a traditional provider vs. consumer relationship is becoming increasingly difficult. Are maintainers becoming too authoritarian? Are users becoming too demanding? Are large companies selling out open-source?]]> https://ines.io/blog/spacy-commercial-open-source-nlp/ https://ines.io/blog/spacy-commercial-open-source-nlp/ <![CDATA[blog]]> <![CDATA[spacy]]> <![CDATA[Ines Montani]]> Wed, 10 May 2017 00:00:00 GMT <![CDATA[Supervised learning is great — it's data collection that's broken]]> <![CDATA[Short of Artificial General Intelligence, we'll always need some way of specifying what we're trying to compute. Labelled examples are a great way to do that, but the process is often tedious. However, the dissatisfaction with supervised learning is misplaced. Instead of waiting for the unsupervised messiah to arrive, we need to fix the way we're collecting and reusing human knowledge.]]> https://explosion.ai/blog/supervised-learning-data-collection blog:supervised-learning-data-collection <![CDATA[blog]]> <![CDATA[annotation]]> <![CDATA[Ines Montani, Matthew Honnibal]]> Sun, 02 Apr 2017 00:00:00 GMT <![CDATA[Supervised similarity: Learning symmetric relations from duplicate question data]]> <![CDATA[Supervised models for text-pair classification let you create software that assigns a label to two texts, based on some relationship between them. When the relationship is symmetric, it can be useful to incorporate this constraint into the model. This post shows how a siamese convolutional neural network performs on two duplicate question data sets with experimental results.]]> https://explosion.ai/blog/supervised-similarity-siamese-cnn blog:supervised-similarity-siamese-cnn <![CDATA[blog]]> <![CDATA[Matthew Honnibal]]> Wed, 01 Mar 2017 00:00:00 GMT <![CDATA[Deep text-pair classification with Quora's 2017 question dataset]]> <![CDATA[Quora recently released the first dataset from their platform: a set of 400,000 question pairs, with annotations indicating whether the questions request the same information. This data set is large, real, and relevant — a rare combination. In this post, I'll explain how to solve text-pair tasks with deep learning, using both new and established tips and technologies.]]> https://explosion.ai/blog/quora-deep-text-pair-classification blog:quora-deep-text-pair-classification <![CDATA[blog]]> <![CDATA[Matthew Honnibal]]> Mon, 13 Feb 2017 00:00:00 GMT <![CDATA[Embed, encode, attend, predict: The new deep learning formula for state-of-the-art NLP models]]> <![CDATA[Over the last six months, a powerful new neural network playbook has come together for Natural Language Processing. The new approach can be summarised as a simple four-step formula: embed, encode, attend, predict. This post explains the components of this new approach, and shows how they're put together in two recent systems.]]> https://explosion.ai/blog/deep-learning-formula-nlp blog:deep-learning-formula-nlp <![CDATA[blog]]> <![CDATA[Matthew Honnibal]]> Thu, 10 Nov 2016 00:00:00 GMT <![CDATA[spaCy v1.0: Deep Learning with custom pipelines and Keras]]> <![CDATA[I'm pleased to announce the 1.0 release of spaCy, the fastest NLP library in the world. By far the best part of the 1.0 release is a new system for integrating custom models into spaCy. This post introduces you to the changes, and shows you how to use the new custom pipeline functionality to add a Keras-powered LSTM sentiment analysis model into a spaCy pipeline.]]> https://explosion.ai/blog/spacy-deep-learning-keras blog:spacy-deep-learning-keras <![CDATA[blog]]> <![CDATA[spacy]]> <![CDATA[Matthew Honnibal]]> Wed, 19 Oct 2016 00:00:00 GMT <![CDATA[An open-source named entity visualizer for the modern web]]> <![CDATA[Named Entity Recognition is a crucial technology for NLP. Whatever you're doing with text, you usually want to handle names, numbers, dates and other entities differently from regular words. To help you make use of NER, we've released displaCy-ent.js. This post explains how the library works, and how to use it.]]> https://explosion.ai/blog/displacy-ent-named-entity-visualizer blog:displacy-ent-named-entity-visualizer <![CDATA[blog]]> <![CDATA[Ines Montani]]> Wed, 05 Oct 2016 00:00:00 GMT <![CDATA[displaCy.js: An open-source NLP visualizer for the modern web]]> <![CDATA[With new offerings from Google, Microsoft and others, there are now a range of excellent cloud APIs for syntactic dependencies. A key part of these services is the interactive demo, where you enter a sentence and see the resulting annotation. We're pleased to announce the release of displaCy.js, a modern and service-independent visualization library. We hope this makes it easy to compare different services, and explore your own in-house models.]]> https://explosion.ai/blog/displacy-js-nlp-visualizer blog:displacy-js-nlp-visualizer <![CDATA[blog]]> <![CDATA[Ines Montani]]> Mon, 03 Oct 2016 00:00:00 GMT <![CDATA[Introducing Explosion AI]]> <![CDATA[The problem with developing a machine learning model is that you don't know how well it'll work until you try — and trying is very expensive. Obviously, this risk is unappealing, but the existing solution in the market, one-size-fits-all cloud services, are even worse. We're launching Explosion AI to give you a better option.]]> https://explosion.ai/blog/introducing-explosion-ai blog:introducing-explosion-ai <![CDATA[blog]]> <![CDATA[Matthew Honnibal, Ines Montani]]> Mon, 03 Oct 2016 00:00:00 GMT <![CDATA[How front-end development can improve Artificial Intelligence]]> <![CDATA[What's holding back Artificial Intelligence? While researchers rightly focus on better algorithms, there are a lot more things to be done. In this post I'll discuss three ways in which front-end development can improve AI technology: by improving the collection of annotated data, communicating the capabilities of the technology to key stakeholders, and exploring the system's behaviours and errors.]]> https://explosion.ai/blog/how-front-end-can-improve-ai blog:how-front-end-can-improve-ai <![CDATA[blog]]> <![CDATA[Ines Montani]]> Mon, 22 Aug 2016 00:00:00 GMT <![CDATA[A natural language user interface is just a user interface]]> <![CDATA[Let’s say you’re writing an application, and you want to give it a conversational interface: your users will type some command, and your application will do something in response, possibly after asking for clarification.]]> https://explosion.ai/blog/natural-user-interface blog:natural-user-interface <![CDATA[blog]]> <![CDATA[Matthew Honnibal]]> Tue, 28 Jun 2016 00:00:00 GMT <![CDATA[SyntaxNet in context: Understanding Google's new TensorFlow NLP model]]> <![CDATA[Yesterday, Google open sourced their Tensorflow-based dependency parsing library, SyntaxNet. The library gives access to a line of neural network parsing models published by Google researchers over the last two years. I've been following this work closely since it was published, and have been looking forward to the software being published. This post tries to provide some context around the release — what's new here, and how important is it?]]> https://explosion.ai/blog/syntaxnet-in-context blog:syntaxnet-in-context <![CDATA[blog]]> <![CDATA[Matthew Honnibal]]> Fri, 13 May 2016 00:00:00 GMT <![CDATA[Multi-threading spaCy's parser and named entity recognizer]]> <![CDATA[In v0.100.3, we quietly rolled out support for GIL-free multi-threading for spaCy's syntactic dependency parsing and named entity recognition models. Because these models take up a lot of memory, we've wanted to release the global interpretter lock (GIL) around them for a long time. When we finally did, it seemed a little too good to be true, so we delayed celebration — and then quickly moved on to other things. It's now past time for a write-up.]]> https://explosion.ai/blog/multithreading-with-cython blog:multithreading-with-cython <![CDATA[blog]]> <![CDATA[spacy]]> <![CDATA[Matthew Honnibal]]> Wed, 11 May 2016 00:00:00 GMT <![CDATA[spaCy now speaks German]]> <![CDATA[Many people have asked us to make spaCy available for their language. Being based in Berlin, German was an obvious choice for our first second language. Now spaCy can do all the cool things you use for processing English on German text too. But more importantly, teaching spaCy to speak German required us to drop some comfortable but English-specific assumptions about how language works and made spaCy fit to learn more languages in the future.]]> https://explosion.ai/blog/german-model blog:german-model <![CDATA[blog]]> <![CDATA[spacy]]> <![CDATA[Explosion]]> Mon, 09 May 2016 00:00:00 GMT <![CDATA[Statistical NLP in the Ten Hundred Most Common English Words]]> <![CDATA[When I was little, my favorite TV shows all had talking computers. Now I’m big and there are still no talking computers, so I’m trying to make some myself. Well, we can make computers say things. But when we say things back, they don’t really understand. Why not?]]> https://explosion.ai/blog/eli5-computers-learn-reading blog:eli5-computers-learn-reading <![CDATA[blog]]> <![CDATA[Matthew Honnibal]]> Mon, 04 Apr 2016 00:00:00 GMT <![CDATA[Sense2vec with spaCy and Gensim]]> <![CDATA[If you were doing text analytics in 2015, you were probably using word2vec. Sense2vec (Trask et. al, 2015) is a new twist on word2vec that lets you learn more interesting, detailed and context-sensitive word vectors. This post motivates the idea, explains our implementation, and comes with an interactive demo that we've found surprisingly addictive.]]> https://explosion.ai/blog/sense2vec-with-spacy blog:sense2vec-with-spacy <![CDATA[blog]]> <![CDATA[spacy]]> <![CDATA[Matthew Honnibal]]> Mon, 15 Feb 2016 00:00:00 GMT <![CDATA[Dead Code Should Be Buried]]> <![CDATA[Natural Language Processing moves fast, so maintaining a good library means constantly throwing things away. Most libraries are failing badly at this, as academics hate to editorialize. This post explains the problem, why it's so damaging, and why I wrote spaCy to do things differently.]]> https://explosion.ai/blog/dead-code-should-be-buried blog:dead-code-should-be-buried <![CDATA[blog]]> <![CDATA[Matthew Honnibal]]> Fri, 04 Sep 2015 00:00:00 GMT <![CDATA[How spaCy Works]]> <![CDATA[This post was pushed out in a hurry, immediately after spaCy was released. It explains some of how spaCy is designed and implemented, and provides some quick notes explaining which algorithms were used. The post pre-dates spaCy's named entity recogniser, but it provides some detail about the tokenisation algorithm, general design, and efficiency concerns.]]> https://explosion.ai/blog/how-spacy-works blog:how-spacy-works <![CDATA[blog]]> <![CDATA[spacy]]> <![CDATA[Matthew Honnibal]]> Thu, 19 Feb 2015 00:00:00 GMT <![CDATA[Introducing spaCy]]> <![CDATA[Computers don't understand text. This is unfortunate, because that's what the web almost entirely consists of. We want to recommend people text based on other text they liked. We want to shorten text to display it on a mobile screen. We want to aggregate it, link it, filter it, categorise it, generate it and correct it. spaCy provides a library of utility functions that help programmers build such products.]]> https://explosion.ai/blog/introducing-spacy blog:introducing-spacy <![CDATA[blog]]> <![CDATA[spacy]]> <![CDATA[Matthew Honnibal]]> Thu, 19 Feb 2015 00:00:00 GMT <![CDATA[Writing C in Cython]]> <![CDATA[For the last two years, I’ve done almost all of my work in Cython. And I don’t mean, I write Python, and then “Cythonize” it, with various type-declarations et cetera. I just, write Cython. I use "raw" C structs and arrays, and occasionally C++ vectors, with a thin wrapper around malloc/free that I wrote myself. The code is almost always exactly as fast as C/C++, because that's really all it is, but with Python right there, if I want it.]]> https://explosion.ai/blog/writing-c-in-cython blog:writing-c-in-cython <![CDATA[blog]]> <![CDATA[Matthew Honnibal]]> Tue, 21 Oct 2014 00:00:00 GMT <![CDATA[Parsing English in 500 Lines of Python]]> <![CDATA[This post explains how transition-based dependency parsers work, and argues that this algorithm represents a break-through in natural language understanding. A concise sample implementation is provided, in 500 lines of Python, with no external dependencies. This post was written in 2013. In 2015 this type of parser is now increasingly dominant.]]> https://explosion.ai/blog/parsing-english-in-python blog:parsing-english-in-python <![CDATA[blog]]> <![CDATA[Matthew Honnibal]]> Wed, 18 Dec 2013 00:00:00 GMT <![CDATA[A Good Part-of-Speech Tagger in about 200 Lines of Python]]> <![CDATA[Up-to-date knowledge about natural language processing is mostly locked away in academia. And academics are mostly pretty self-conscious when we write. We’re careful. We don’t want to stick our necks out too much. But under-confident recommendations suck, so here’s how to write a good part-of-speech tagger.]]> https://explosion.ai/blog/part-of-speech-pos-tagger-in-python blog:part-of-speech-pos-tagger-in-python <![CDATA[blog]]> <![CDATA[Matthew Honnibal]]> Wed, 18 Sep 2013 00:00:00 GMT