How to read excel file in python

How to read excel file in python.

When we start working on a new project, the first thing that we need is the data or the inputs. Inputs are generally given in the form of excel sheets or csv files or text files. Most of the data are saved in excel sheets in most of the organization, only disadvantage of excel is the limit of rows to 10,48,576 and in such scenario, we prefer to use csv files. Here in this lesson, we will show how to read excel file in python and discuss on various options and few libraries.

you can read Excel files in Python without using any external libraries, but it involves more effort because you have to handle the Excel file format and parsing manually. Excel files are binary files with a complex structure, so it’s recommended to use libraries for most use cases.

Reading Excel files in Python can be accomplished using several libraries, with the most popular ones being `pandas`, `openpyxl`, and `xlrd`. Each library has its own set of pros and cons. Let’s explore these options in detail:

Please make sure to install the necessary libraries (`pandas`, `openpyxl`, or `xlrd`) before running the code. You can install them using `pip` for corresponding library.

pip install pandas

Option 1: Using `pandas`

import pandas as pd
data = pd.read_excel('file.xlsx')

The sample excel file is shown here for reference, we will read this in python to see the output:

when we read this excel we see the output as shown:

 

Pros:

– Ease of Use: Pandas provides a high-level data manipulation interface. It makes reading Excel files simple and intuitive.

– Data Manipulation: Pandas offers powerful data manipulation and analysis capabilities after reading the data.

– Performance: Pandas is optimized for performance, making it suitable for large datasets.

Cons:

– Dependencies: Pandas has dependencies, and it might be overkill if you only need to read data from Excel files without performing complex data operations.

– Memory Usage: For very large datasets, pandas can consume a significant amount of memory.

Option 2: Using `openpyxl`

from openpyxl import load_workbook
wb = load_workbook('file.xlsx')
sheet = wb.active
# Iterate through rows and columns and print cell values
for row in sheet.iter_rows(values_only=True):
    for cell_value in row:
        print(cell_value, end='\t')
    print()  # Move to the next line for the next row
# Close the workbook
workbook.close()

Pros:

– Low-Level Control: Openpyxl provides fine-grained control over Excel files, allowing you to read specific cells or ranges.

– Writing Capabilities: It supports creating and modifying Excel files, not just reading.

 

Cons:

– Learning Curve: Openpyxl has a steeper learning curve compared to pandas for simple read operations.

– Performance: It might be slower for large datasets compared to pandas.

Option 3: Using `xlrd`

import xlrd
workbook = xlrd.open_workbook('file.xlsx')
sheet = workbook.sheet_by_index(0)
data = [[sheet.cell_value(row, col) for col in range(sheet.ncols)] for row in range(sheet.nrows)]

Pros:

– Simplicity: Simple and easy to use for basic Excel reading operations.

– Lightweight: Lightweight library without external dependencies.

 

Cons:

– Read-Only: xlrd is read-only and does not support writing or modifying Excel files.

– Limited Features: Limited functionality compared to pandas and openpyxl.

Reading Excel files with different formats using Pandas

Reading Excel files with different formats in Python typically involves using libraries like `pandas`, `openpyxl`, or `xlrd`, which can handle various Excel file formats. These libraries are capable of reading files in different formats, such as `.xls`, `.xlsx`, and `.csv`. Here’s how you can read Excel files of different formats using `pandas`, a popular data manipulation library:

1. Reading `.xlsx` (Excel 2007+ format):

```python
import pandas as pd
data_xlsx = pd.read_excel('file.xlsx', sheet_name='Sheet1')  # Read .xlsx file
```

2. Reading `.xls` (Excel 97-2003 format):

```python
import pandas as pd
data_xls = pd.read_excel('file.xls', sheet_name='Sheet1')  # Read .xls file
```

3. Reading `.csv` (Comma-separated values):

```python
import pandas as pd
data_csv = pd.read_csv('file.csv')  # Read .csv file
```

Reading Different Excel File Formats Using openpyxl:

```python
from openpyxl import load_workbook
# For .xlsx (Excel 2007+ format)
workbook_xlsx = load_workbook('file.xlsx')
sheet_xlsx = workbook_xlsx.active
# For .xls (Excel 97-2003 format)
workbook_xls = load_workbook('file.xls')
sheet_xls = workbook_xls.active
```

Reading Different Excel File Formats Using xlrd:

```python
import xlrd
# For .xls (Excel 97-2003 format)
workbook_xls = xlrd.open_workbook('file.xls')
sheet_xls = workbook_xls.sheet_by_index(0)
# Note: xlrd doesn't support the .xlsx format.

Conclusion:
– Use Pandas If: You need to perform complex data analysis and manipulation after reading Excel files. Pandas is excellent for data wrangling, cleaning, and analysis due to its rich functionality and ease of use.

– Use Openpyxl If: You require low-level control over Excel files, especially if you need to read specific cells, modify data, or create new Excel files. Openpyxl is a good choice for more advanced Excel operations.

– Use xlrd If: You have a simple read-only requirement, and you want a lightweight solution without additional dependencies. If you only need basic data extraction from Excel files, xlrd can be a suitable choice. There are more libraries available like excel wings and few more… but here we have discussed the most frequently used. with recent improvements color coding formatting are also included in both pandas and openpyxl.

In summary, choose the library that best fits your specific use case. If you’re primarily focused on data analysis, Pandas is the go-to option due to its ease of use and powerful features. If you need more control or have specific requirements, consider Openpyxl or xlrd accordingly.

Was this article helpful?
YesNo

Leave a Comment