- spaCy - Discussion
- spaCy - Useful Resources
- spaCy - Quick Guide
- Updating Neural Network Model
- Training Neural Network Model
- spaCy - Container Lexeme Class
- spaCy - Span Class Properties
- spaCy - Container Span Class
- spaCy - Token Properties
- spaCy - Container Token Class
- Doc Class ContextManager and Property
- spaCy - Containers
- spaCy - Compatibility Functions
- spaCy - Utility Functions
- spaCy - Visualization Function
- spaCy - Top-level Functions
- spaCy - Command Line Helpers
- spaCy - Architecture
- spaCy - Models and Languages
- spaCy - Getting Started
- spaCy - Introduction
- spaCy - Home
Selected Reading
- Who is Who
- Computer Glossary
- HR Interview Questions
- Effective Resume Writing
- Questions and Answers
- UPSC IAS Exams Notes
spaCy - Introduction
In this chapter, we will understand the features, extensions and visuapsers with regards to spaCy. Also, a features comparison is provided which will help the readers in analysis of the functionapties provided by spaCy as compared to Natural Language Toolkit (NLTK) and coreNLP. Here, NLP refers to Natural Language Processing.
What is spaCy?
spaCy, which is developed by the software developers Matthew Honnibal and Ines Montani, is an open-source software pbrary for advanced NLP. It is written in Python and Cython (C extension of Python which is mainly designed to give C pke performance to the Python language programs).
spaCy is a relatively a new framework but, one of the most powerful and advanced pbraries which is used to implement the NLP.
Features
Some of the features of spaCy that make it popular are explained below −
Fast − spaCy is specially designed to be as fast as possible.
Accuracy − spaCy implementation of its labelled dependency parser makes it one of the most accurate frameworks (within 1% of the best available) of its kind.
Batteries included − The batteries included in spaCy are as follows −
Index preserving tokenization.
“Alpha tokenization” support more than 50 languages.
Part-of-speech tagging.
Pre-trained word vectors.
Built-in easy and beautiful visuapzers for named entities and syntax.
Text classification.
Extensile − You can easily use spaCy with other existing tools pke TensorFlow, Gensim, scikit-Learn, etc.
Deep learning integration − It has Thinc-a deep learning framework, which is designed for NLP tasks.
Extensions and visuapsers
Some of the easy-to-use extensions and visuapsers that comes with spaCy and are free, open-source pbraries are psted below −
Thinc − It is Machine Learning (ML) pbrary optimised for Central Processing Unit (CPU) usage. It is also designed for deep learning with text input and NLP tasks.
sense2vec − This pbrary is for computing word similarities. It is based on Word2vec.
displaCy − It is an open-source dependency parse tree visuapser. It is built with JavaScript, CSS (Cascading Style Sheets), and SVG (Scalable Vector Graphics).
displaCy ENT − It is a built-in named entity visuapser that comes with spaCy. It is built with JavaScript and CSS. It lets the user check its model’s prediction in browser.
Feature Comparison
The following table shows the comparison of the functionapties provided by spaCy, NLTK, and CoreNLP −
Features | spaCy | NLTK | CoreNLP |
---|---|---|---|
Python API | Yes | Yes | No |
Easy installation | Yes | Yes | Yes |
Multi-language Support | Yes | Yes | Yes |
Integrated word vectors | Yes | No | No |
Tokenization | Yes | Yes | Yes |
Part-of-speech tagging | Yes | Yes | Yes |
Sentence segmentation | Yes | Yes | Yes |
Dependency parsing | Yes | No | Yes |
Entity Recognition | Yes | Yes | Yes |
Entity pnking | Yes | No | No |
Coreference Resolution | No | No | Yes |
Benchmarks
spaCy has the fastest syntactic parser in the world and has the highest accuracy (within 1% of the best available) as well.
Following table shows the benchmark of spaCy −
System | Year | Language | Accuracy |
---|---|---|---|
spaCy v2.x | 2017 | Python and Cython | 92.6 |
spaCy v1.x | 2015 | Python and Cython | 91.8 |
ClearNLP | 2015 | Java | 91.7 |
CoreNLP | 2015 | Java | 89.6 |
MATE | 2015 | Java | 92.5 |
Turbo | 2015 | C++ | 92.4 |