Rule Based Matching in Spacy

Rule based matching is a very useful feature in Spacy. It allows you to extract the information in a document using a pattern or a combination of patterns.

I will use the Obama speech in http://obamaspeeches.com/ as illustration. I would like to extract the number of times Obama said “America” in this speech. You can use rule based matcher in Spacy to parse the text and extract the information as follows:

from spacy.matcher import Matcher 
nlp = spacy.load("en_core_web_sm")

matcher = Matcher(nlp.vocab)
pattern = [{"TEXT": "America"}]
matcher.add("Obama",[pattern])

text = open('obama.txt').read()
doc = nlp(text)
matches = matcher(doc)
count = 0
for _ in matches:
    count = count +1
print("No of times Obama used America is ",count)

Output:
No of times Obama used America is 10

References:

Spacy
Spacy Rule Based Matcher

Relevant Courses

NICF – Natural Language Processing (NLP) with Python for Beginners
Natural Language Processing (NLP) with Python and SpaCy

May 23, 2021

NLP, spacy