- Inquire
- Formula Auditing
- Working with Multiple Sheets
- Financial Analysis
- Data Validation
- Data Visualization
- PivotTables
- Lookup Functions
- Quick Analysis
- Subtotals with Ranges
- Filtering
- Sorting
- Conditional Formatting
- Working with Time Values
- Cleaning Data Contains Date Values
- Cleaning Data with Text Functions
- Tables
- Working with Range Names
- Excel Data Analysis - Overview
- Data Analysis - Process
- Data Analysis - Overview
- Excel Data Analysis - Home
Advanced Data Analysis
- Key Performance Indicators
- Aesthetic Power View Reports
- Exploring Data with Hierarchies
- Exploring Data Power View Tiles
- Exploring Data PowerView Multiples
- Exploring Data Power View Maps
- Exploring Data Power View Charts
- Exploring Data with Power View
- Exploring Data with Powerpivot
- Exploring Data with PivotTables
- Data Model
- Importing Data into Excel
- Optimization with Excel Solver
- What-If Analysis with Goal Seek
- What-If Analysis Scenario Manager
- What-If Analysis with Data Tables
- What-If Analysis
- Data Consolidation
- Advanced Data Analysis - Overview
Excel Data Analysis Resources
Selected Reading
- Who is Who
- Computer Glossary
- HR Interview Questions
- Effective Resume Writing
- Questions and Answers
- UPSC IAS Exams Notes
Cleaning Data with Text Functions
The data that you obtain from different sources many not be in a form ready for analysis. In this chapter, you will understand how to prepare your data that is in the form of text for analysis.
Initially, you need to clean the data. Data cleaning includes removing unwanted characters from text. Next, you need to structure the data in the form you require for further analysis. You can do the same by −
Finding required text patterns with the text functions.
Extracting data values from text.
Formatting data with text functions.
Executing data operations with the text functions.
Removing Unwanted Characters from Text
When you import data from another apppcation, it can have nonprintable characters and/or excess spaces. The excess spaces can be −
leading spaces, and/or
extra spaces between words.
If you sort or analyze such data, you will get erroneous results.
Consider the following example −
This is the raw data that you have obtained on product information containing the Product ID, Product description and the price. The character “|” separates the field in each row.
When you import this data into Excel worksheet, it looks as follows −
As you observe, the entire data is in a single column. You need to structure this data to perform data analysis. However, initially you need to clean the data.
You need to remove any nonprintable characters and excess spaces that might be present in the data. You can use the CLEAN function and TRIM function for this purpose.
S.No. | Function & Description |
---|---|
1. | CLEAN Removes all nonprintable characters from text |
2. | TRIM Removes spaces from text |
Select the Cells C3 – C11.
Type =TRIM (CLEAN (B3)) and then press CTRL + Enter.
The formula is filled in the cells C3 – C11.
The result will be as shown below −
Finding required Text Patterns with the Text Functions
To structure your data, you might have to do certain Text Pattern matching based on which you can extract the Data Values. Some of the Text Functions that are useful for this purpose are −
S.No. | Function & Description |
---|---|
1. | EXACT Checks to see if two text values are identical |
2. | FIND Finds one text value within another (case-sensitive) |
3. | SEARCH Finds one text value within another (not case-sensitive) |
Extracting Data Values from Text
You need to extract the required data from text in order to structure the same. In the above example, say, you need to place the data in three columns – ProductID, Product_Description and Price.
You can extract data in one of the following ways −
Extracting Data Values with Convert Text to Columns Wizard
Extracting Data Values with Text Functions
Extracting Data Values with Flash Fill
Extracting Data Values with Convert Text to Columns Wizard
You can use the Convert Text to Columns Wizard to extract Data Values into Excel columns if your fields are −
Depmited by a character, or
Apgned in columns with spaces between each field.
In the above example, the fields are depmited by the character “|”. Hence, you can use the Convert Text to Columns wizard.
Select the data.
Copy and paste values in the same place. Otherwise, Convert Text to Columns takes the functions rather than the data itself as the input.
Select the data.
Cpck on Text to Columns in the Data Tools group under Data Tab on the Ribbon.
Step 1 − Convert Text to Columns Wizard - Step 1 of 3 appears.
Select Depmited.
Cpck Next.
Step 2 − Convert Text to Columns Wizard - Step 2 of 3 appears.
Under Depmiters, select Other.
In the box next to Other, type the character |
Cpck Next.
Step 3 − Convert Text to Columns Wizard - Step 3 of 3 appears.
In this screen, you can select each column of your data in the wizard and set the format for that column.
For Destination, select the cell D3.
You can cpck Advanced, and set Decimal Separator and Thousands Separator in the Advanced Text Import Settings dialog box that appears.
Cpck Finish.
Your data, which is converted to columns appears in the three Columns – D, E and F.
Name the Column headers as ProductID, Product_Description and Price.
Extracting Data Values with Text Functions
Suppose the fields in your data neither are depmited by a character nor are apgned in columns with spaces between each field, you can use text functions to extract data values. Even in the case the fields are depmited, you can still use text functions to extract data.
Some of the text functions that are useful for this purpose are −
S.No. | Function & Description |
---|---|
1. | LEFT Returns the leftmost characters from a text value |
2. | RIGHT Returns the rightmost characters from a text value |
3. | MID Returns a specific number of characters from a text string starting at the position you specify |
4. | LEN Returns the number of characters in a text string |
You can also combine two or more of these text functions as per the data you have at hand, to extract the required data values. For example, using a combination of LEFT, RIGHT and VALUE functions or using a combination of FIND, LEFT, LEN and MID functions.
In the above example,
All the characters left to the first | give the name ProductID.
All the characters right to the second | give the name Price.
All the characters that pe between the first | and second | give the name Product_Description.
Each | has a space before and after.
Observing this information, you can extract the data values with the following steps −
Find the Position of First | - First | Position
You can use FIND function
Find the Position of Second | - Second | Position
You can use FIND function again
Beginning to (First | Position – 2) Characters of the Text give ProductID
You can use LEFT Function
(First | Position + 2) to (Second | Position - 2) Characters of the Text give Product_Description
You can use MID Function
(Second | Position + 2) to End Characters of the Text give Price
You can use RIGHT Function
The result will be as shown below −
You can observe that the values in the price column are text values. To perform calculations on these values, you have to format the corresponding cells. You can look at the section given below to understand formatting text.
Extracting Data Values with Flash Fill
Using Excel Flash Fill is another way to extract data values from text. However, this works only when Excel is able to find a pattern in the data.
Step 1 − Create three columns for ProductID, Product_Description and Price next to the data.
Step 2 − Copy and paste the values for C3, D3 and E3 from B3.
Step 3 − Select cell C3 and cpck Flash Fill in the Data Tools group on the Data tab. All the values for ProductID get filled.
Step 4 − Repeat the above given steps for Product_Description and Price. The data is filled.
Formatting Data with Text Functions
Excel has several built-in text functions that you can use for formatting data containing text. These include −
Functions that format the Text as per your need −
S.No. | Function & Description |
---|---|
1. | LOWER Converts text to lowercase |
S.No. | Function & Description |
---|---|
1. | UPPER Converts text to uppercase |
2. | PROPER Capitapzes the first letter in each word of a text value |
Functions that convert and/or format the Numbers as Text −
S.No. | Function & Description |
---|---|
1. | DOLLAR Converts a number to text, using the $ (dollar) currency format |
2. | FIXED Formats a number as text with a fixed number of decimals |
3. | TEXT Formats a number and converts it to text |
Functions that convert the Text to Numbers −
S.No. | Function & Description |
---|---|
1. | VALUE Converts a text argument to a number |
Executing Data Operations with the Text Functions
You might have to perform certain Text Operations on your Data. For example, if Login-IDs for the Employees are changed to a New Format in an Organization, based on the Format Change, Text Replacements might have to be done.
Following Text Functions help you in performing Text Operations on your data containing Text −
S.No. | Function & Description |
---|---|
1. | REPLACE Replaces characters within text |
2. | SUBSTITUTE Substitutes new text for old text in a text string |
3. | CONCATENATE Joins several text items into one text item |
4. | CONCAT Combines the text from multiple ranges and/or strings, but it does not provide the depmiter or IgnoreEmpty arguments. |
5. | TEXTJOIN Combines the text from multiple ranges and/or strings, and includes a depmiter you specify between each text value that will be combined. If the depmiter is an empty text string, this function will effectively concatenate the ranges. |
6. | REPT Repeats text a given number of times |