Skip to content Skip to sidebar Skip to footer

Extracting Related Date And Location From A Sentence

I'm working with written text (paragraphs of articles and books) that includes both locations and dates. I want to extract from the texts pairs that contain locations and dates tha

Solution 1:

This seems like a Named Entity Recognition problem. Following are the steps to the same. For a detailed understanding, please refer to this article.

  1. Download Stanford NER from here
  2. Unzip the zipped folder and save in a drive
  3. Copy the “stanford-ner.jar” from the folder and save it just outside the folder as shown in the image below. enter image description here
  4. Download the caseless models from https://stanfordnlp.github.io/CoreNLP/history.html by clicking on “caseless” as given below. The models in the first link also work however, the caseless models help in identifying named entities even when they are not capitalized as required by formal grammar rules. enter image description here
  5. Run the following Python code. Please note that this code worked on a windows 10, 64 bit machine with Python 2.7 version.

Note: Please ensure that all the paths are updated to the paths on the local machine

#Import all the required libraries.import os
from nltk.tag import StanfordNERTagger
import pandas as pd

#Set environmental variables programmatically.#Set the classpath to the path where the jar file is located
os.environ['CLASSPATH'] = "<your path>/stanford-ner-2015-04-20/stanford-ner.jar"#Set the Stanford models to the path where the models are stored
os.environ['STANFORD_MODELS'] = '<your path>/stanford-corenlp-caseless-2015-04-20-models/edu/stanford/nlp/models/ner'#Set the java jdk path. This code worked with this particular java jdk
java_path = "C:/Program Files/Java/jdk1.8.0_191/bin/java.exe"
os.environ['JAVAHOME'] = java_path


#Set the path to the model that you would like to use
stanford_classifier  =  '<your path>/stanford-corenlp-caseless-2015-04-20-models/edu/stanford/nlp/models/ner/english.muc.7class.caseless.distsim.crf.ser.gz'#Build NER tagger object
st = StanfordNERTagger(stanford_classifier)

#A sample text for NER tagging
text = 'The man left Amsterdam on January and reached Nepal on October 21st'#Tag the sentence and print output
tagged = st.tag(str(text).split())
print(tagged)
#[(u'The', u'O'), # (u'man', u'O'), # (u'left', u'O'), # (u'Amsterdam', u'LOCATION'), # (u'on', u'O'), # (u'January', u'DATE'), # (u'and', u'O'), # (u'reached', u'O'), # (u'Nepal', u'LOCATION'), # (u'on', u'O'), # (u'October', u'DATE'), # (u'21st', u'DATE')]

This approach works for a majority of the cases.

Post a Comment for "Extracting Related Date And Location From A Sentence"