Rule Based Matching in Spacy
Rule based matching is a very useful feature in Spacy. It allows you to extract the information in a document using a pattern or a combination of patterns.
I will use the Obama speech in http://obamaspeeches.com/ as illustration. I would like to extract the number of times Obama said “America” in this speech. You can use rule based matcher in Spacy to parse the text and extract the information as follows:
from spacy.matcher import Matcher
nlp = spacy.load("en_core_web_sm")
matcher = Matcher(nlp.vocab)
pattern = [{"TEXT": "America"}]
matcher.add("Obama",[pattern])
text = open('obama.txt').read()
doc = nlp(text)
matches = matcher(doc)
count = 0
for _ in matches:
count = count +1
print("No of times Obama used America is ",count)
Output:No of times Obama used America is 10
References:
Relevant Courses
May 23, 2021