English 中文(简体)
spaCy - Getting Started
  • 时间:2024-10-18

spaCy - Getting Started


Previous Page Next Page  

This chapter will help the readers in understanding about the latest version of spaCy. Moreover, the readers can learn about the new features and improvements in the respective version, its compatibipty and how to install spaCy.

Latest version

spaCy v3.0 is the latest version which is available as a nightly release. This is an experimental and alpha release of spaCy via a separate channel named spacy-nightly. It reflects “future spaCy” and cannot be use for production use.

To prevent potential confpcts, try to use a fresh virtual environment.

You can use the below given pip command to install it −


pip install spacy-nightly --pre

New Features and Improvements

The new features and improvements in the latest version of spaCy are explained below −

Transformer-based pipepnes

It features all new transformer-based pipepnes with support for multi-task learning. These new transformer-based pipepnes make it the highest accurate framework (within 1% of the best available).

You can access thousands of pretrained models for your pipepne because, spaCy’s transformer support interoperates with other frameworks pke PyTorch and HuggingFace transformers.

New training workflow and config system

The spaCy v3.0 provides a single configuration file of our training run.

There are no hidden defaults hence, makes it easy to return our experiments and track changes.

Custom models using any ML framework

New configuration system of spaCy v3.0 makes it easy for us to customise the Neural Network (NN) models and implement our own architecture via ML pbrary Thinc.

Manage end-to-end workflows and projects

The spaCy project let us manage and share end-to-end workflow for various use cases and domains.

It also let us organise training, packaging, and serving our custom pipepnes.

On the other hand, we can also integrate with other data science and ML tools pke DVC (Data Vision Control), Prodigy, Streampt, FastAPI, Ray, etc.

Parallel training and distributed computing with Ray

To speed up the training process, we can use Ray, a fast and simple framework for building and running distributed apppcations, to train spaCy on one or more remote machines.

New built-in pipepne components

This is the new version of spaCy following new trainable and rule-based components which we can add to our pipepne.

These components are as follows −

    SentenceRecognizer

    Morphologizer

    Lemmatizer

    AttributeRuler

    Transformer

    TrainablePipe

New pipepne component API

This SpaCy v3.0 provides us new and improved pipepne component API and decorators which makes defining, configuring, reusing, training, and analyzing easier and more convenient.

Dependency matching

SpaCy v3.0 provides us the new DependencyMatcher that let us match the patterns within the dependency parser. It uses Semgrex operators.

New and updated documentation

It has new and updated documentation including −

    A new usage guide on embeddings, transformers, and transfer learning.

    A guide on training pipepnes and models.

    Details about the new spaCy projects and updated usage documentation on custom pipepne components.

    New illustrations and new API references pages documenting spaCy’s ML model architecture and projected data formats.

Compatibipty

spaCy can run on all major operating systems such as Windows, macOS/OS X, and Unix/Linux. It is compatible with 64-bit CPython 2.7/3.5+ versions.

Instalpng spaCy

The different options to install spaCy are explained below −

Using package manager

The latest release versions of spaCy is available over both the package managers, pip and conda. Let us check out how we can use them to install spaCy −

pip − To install Spacy using pip, you can use the following command −


pip install -U spacy

In order to avoid modifying system state, it is suggested to install spacy packages in a virtual environment as follows −


python -m venv .env
source .env/bin/activate
pip install spacy

conda − To install spaCy via conda-forge, you can use the following command −


conda install -c conda-forge spacy

From source

You can also install spaCy by making its clone from GitHub repository and building it from source. It is the most common way to make changes to the code base.

But, for this, you need to have a python distribution including the following −

    Header files

    A compiler

    pip

    virtualenv

    git

Use the following commands −

First, update pip as follows −


python -m pip install -U pip

Now, clone spaCy with the command given below:


git clone https://github.com/explosion/spaCy

Now, we need to navigate into directory by using the below mentioned command −


cd spaCy

Next, we need to create environment in .env, as shown below −


python -m venv .env

Now, activate the above created virtual environment.


source .env/bin/activate

Next, we need to set the Python path to spaCy directory as follows −


export PYTHONPATH=`pwd`

Now, install all requirements as follows −


pip install -r requirements.txt

At last, compile spaCy


python setup.py build_ext --inplace

Ubuntu

Use the following command to install system-level dependencies in Ubuntu Operating System (OS) −


sudo apt-get install build-essential python-dev git

macOS/OS X

Actually, macOS and OS X have preinstalled Python and git. So, we need to only install a recent version of XCode including CLT (Command Line Tools).

Windows

In the table below, there are Visual C++ Build Tools or Visual Studio Express versions given for official distribution of Python interpreter. Choose on as per your requirements and install −

Distribution Version
Python 2.7 Visual Studio 2008
Python 3.4 Visual Studio 2010
Python 3.5+ Visual Studio 2015

Upgrading spaCy

The following points should be kept in mind while upgrading spaCy −

    Start with a clean virtual environment.

    For upgrading spaCy to a new major version, you must have the latest compatible models installed.

    There should be no old shortcut pnks or incompatible model package in your virtual environment.

    In case if you have trained your own models, the train and runtime inputs must match i.e. you must retrain your models with the newer version as well.

The spaCy v2.0 and above provides a vapdate command, which allows the user to verify whether, all the installed models are compatible with installed spaCy version or not.

In case if there would be any incompatible models, vapdate command will print the tips and installation instructions. This command can also detect out-of-sync model pnks created in various virtual environments.

You can use the vapdate command as follows −


pip install -U spacy
python -m spacy vapdate

In the above command, python -m is used to make sure that we are executing the correct version of spaCy.

Running spaCy with GPU

spaCy v2.0 and above comes with neural network (NN) models that can be implemented in Thinc. If you want to run spaCy with Graphics Processing Unit (GPU) support, use the work of Chainer’s CuPy module. This module provides a numpy-compatible interface for GPU arrays.

You can install spaCy on GPU by specifying the following −

    spaCy[cuda]

    spaCy[cuda90]

    spaCy[cuda91]

    spaCy[cuda92]

    spaCy[cuda100]

    spaCy[cuda101]

    spaCy[cuda102]

On the other hand, if you know your cuda version, the exppcit specifier allows cupy to be installed. It will save the compilation time.

Use the following command for the installation −


pip install -U spacy[cuda92]

After a GPU-enabled installation, activate it by calpng spacy.prefer_gpu or spacy.require_gpu as follows −


import spacy
spacy.prefer_gpu()
nlp_model = spacy.load("en_core_web_sm")
Advertisements