NATURAL LANGUAGE PROCESSING WITH PYTHON
Named Entity Extraction Project with spaCy
Robert is Assistant Professor of Computer Science at Monmouth College (US).
He worked as research intern and software engineer at Huawei and Cerner corporation.
Robert holds a PhD of Computer Science (Washington University in St. Louis) and Bachelor of Science
Mathematics, Computer Science.
This task is known as Named Entity Recognition. It seeks to locate and classify named entity mentions in unstructured text into pre-defined categories such as the person names, organizations, locations and so on.
In this particular project, you will be given a single text file containing multiple news articles. The goal is to scan each article for the names of people and print results following several conditions such as sorting, whitespaces, blank lines, etc.
Additionally, you are requested to implement functions to count, modify and add characters to the results.
DOWNLOAD / CONTENT
You will receive an email with a protected ZIP and a password to access the content. If you are a registered user, the download is always available on your account.
1) One PDF with the project description along with hints to guide you during the project.
2) A .py file with the project solved. It contains not only the source code but also detailed explanations and comments about how the code
works. For specific topics, links are provided to online tutorials.
3) Two .txt files with the output of the code and the dataset with the news articles.
WHAT YOU WILL PRACTICE
– Libraries: you will have to work with the next libraries: Spacy, re, datetime.
– Python functions, loops (for) and conditional statements (if/else).
– Read csv, txt files (with open()), list comprehensions and dictionaries.
– Basic Regular Expressions
– Argparse when running from the Linux command line.
If you need additional information, do not hesitate to contact us.
3 reviews for ANALYZE NEWS ARTICLES AND PULL DATA FROM TEXTS
Only logged in customers who have purchased this product may leave a review.