spaCy within MLflow
spaCy is the leading industrial-strength natural language processing library, designed from the ground up for production use. Created by Explosion AI, spaCy combines cutting-edge research with practical engineering to deliver fast, accurate, and scalable NLP solutions that power everything from chatbots and content analysis to document processing and knowledge extraction systems.
spaCy's production-first philosophy sets it apart from academic NLP libraries. With its streamlined API, extensive pre-trained models, and robust pipeline architecture, spaCy enables developers to build sophisticated NLP applications without sacrificing speed or maintainability.
Logging spaCy Models to MLflow
Basic Model Logging
MLflow provides native support for spaCy models through the mlflow.spacy.log_model()
function:
import mlflow
import spacy
# Load or train your spaCy model
nlp = spacy.load("en_core_web_sm")
# Log the model to MLflow
with mlflow.start_run():
mlflow.spacy.log_model(nlp, name="spacy_model")
What Gets Automatically Captured
Model Components & Architecture
- 🧠 Pipeline Components: All pipeline components (tokenizer, tagger, parser, NER, text categorizer)
- 📐 Model Configuration: Architecture details, hyperparameters, and component settings
- 🎯 Component Metadata: Individual component configurations and performance metrics