- Implementation of Agile
- Creating better scene with agile & data science
- Improving Prediction Performance
- Fixing Prediction Problem
- Agile Data Science - SparkML
- Deploying a predictive system
- Building a Regression Model
- Extracting features with PySpark
- Role of Predictions
- Working with Reports
- Data Enrichment
- Data Visualization
- Collecting & Displaying Records
- NoSQL & Dataflow programming
- SQL versus NoSQL
- Data Processing in Agile
- Agile Tools & Installation
- Agile Data Science - Process
- Methodology Concepts
- Agile Data Science - Introduction
- Agile Data Science - Home
Agile Data Science Useful Resources
Selected Reading
- Who is Who
- Computer Glossary
- HR Interview Questions
- Effective Resume Writing
- Questions and Answers
- UPSC IAS Exams Notes
Agile Data Science - Data Enrichment
Data enrichment refers to a range of processes used to enhance, refine and improve raw data. It refers to useful data transformation (raw data to useful information). The process of data enrichment focusses on making data a valuable data asset for modern business or enterprise.
The most common data enrichment process includes correction of spelpng mistakes or typographical errors in database through use of specific decision algorithms. Data enrichment tools add useful information to simple data tables.
Consider the following code for spell correction of words −
import re from collections import Counter def words(text): return re.findall(r w+ , text.lower()) WORDS = Counter(words(open( big.txt ).read())) def P(word, N=sum(WORDS.values())): "Probabipties of words" return WORDS[word] / N def correction(word): "Spelpng correction of word" return max(candidates(word), key=P) def candidates(word): "Generate possible spelpng corrections for word." return (known([word]) or known(edits1(word)) or known(edits2(word)) or [word]) def known(words): "The subset of `words` that appear in the dictionary of WORDS." return set(w for w in words if w in WORDS) def edits1(word): "All edits that are one edit away from `word`." letters = abcdefghijklmnopqrstuvwxyz sppts = [(word[:i], word[i:]) for i in range(len(word) + 1)] deletes = [L + R[1:] for L, R in sppts if R] transposes = [L + R[1] + R[0] + R[2:] for L, R in sppts if len(R)>1] replaces = [L + c + R[1:] for L, R in sppts if R for c in letters] inserts = [L + c + R for L, R in sppts for c in letters] return set(deletes + transposes + replaces + inserts) def edits2(word): "All edits that are two edits away from `word`." return (e2 for e1 in edits1(word) for e2 in edits1(e1)) print(correction( spepng )) print(correction( korrectud ))
In this program, we will match with “big.txt” which includes corrected words. Words match with words included in text file and print the appropriate results accordingly.
Output
The above code will generate the following output −
Advertisements