os library is used to list the files in a directory. The Python
os.listdir() method returns a list of every file and folder in a directory and the
os.walk() function returns a list of every file in an entire file tree.
Often, when you’re working with files in Python, you’ll encounter situations where you want to list the files in a directory. For instance, if you’re creating a program that analyzes book sales using a list of books, you may want to check whether the raw data you’ll be analyzing is in the right folder before your program starts analyzing the data.
The Python os library offers a number of methods that can be used to list files in a directory. This tutorial will discuss how to use os.listdir() to get the files and folders in a directory, and
os.walk() to get the files and folders in a directory and in its subdirectories. We’ll also explore an example of each of these methods in action to show how you can use them in your code.
Python os Library
The Python os library provides a number of functions that you can use to work with operating systems. The functions included in the os module work on any modern operating system, whether it is Windows, Linux, or Mac.
Since os is an external library, we need to import it into our code before we start using it. We can do so using the following code:
Now that we’ve imported the os library into our code, we can start using its functions to list items in a directory.
In Python, the
os.listdir() method lists files and folders in a given directory. The method does not return special entries such as ‘.’ and ‘..’, which the operating system uses to navigate through different directories. It also does not return files and folders beyond the first level of folders. In other words,
os.listdir() does not return anything within subfolders discovered by the method.
The os.listdir() function accepts one parameter: the path of the list directory whose file and folder names you want to retrieve.
Here’s the syntax for the listdir method:
Let’s walk through an example to showcase how to use this method in a Python program.
Say that we are creating a program that analyzes the stock market performance of Netflix over the last decade. We have a folder (name: /home/data_analysis/netflix) with all of our raw data, and before our program starts running, we want to check to make sure that the file
raw_data_2019.csv exists within that folder. In order to function properly, our program needs that particular file to be stored in that particular folder.
We could use the following code to retrieve a list of the files in the
/home/data_analysis/netflix work directory:
import os path = '/home/data_analysis/netflix' files = os.listdir(path) for f in files: print(f)
Our program retrieves a list of all files and folders in the specified directory and returns the following:
README.md app.py raw_data_2016.csv raw_data_2017.csv raw_data_2018.csv raw_data_2019.csv processed_data
Now, we can check to see if the file
raw_data_2019.csv is in the folder. As you can see, it is.
Let’s break down our code. On the first line, we import the os module, which we need to do in order to access the
os.listdir() function. Then, we declare a variable called
path, which stores the name of the path whose contents we want to retrieve.
On the next line, we use the os.listdir() method to get a list of the files and folders in the
/home/data_analysis/netflix directory. Finally, we create a
for loop that iterates through every item in the list produced by
os.listdir(), and we print out each individual file and directory name.
In our example, the
/home/data_analysis/netflix directory contained six files and one directory. The directory is called
processed_data and is distinguishable from the other files because it does not have an extension.
os.listdir() method is used to retrieve the contents of a specific folder. But what if we want to get the contents of a folder and the contents of every folder contained within that folder (i.e., in subfolders)? The
os.walk() function can do this.
os.walk() function retrieves a list of files contained within a tree. The method iterates over each directory in a tree and returns the name of every file and folder within a directory and any of its subdirectories.
The syntax for the
os.walk() method is as follows:
os.walk(top, topdown, onerror, followlinks)
os.walk() method accepts four parameters:
- top is the top directory whose component file and folder names you want to retrieve (required)
- topdown, when set to True, specifies that directories should be scanned from the top down; if set to False, directories will be scanned from the bottom up (optional)
- onerror provides an error handler if an error is encountered (optional)
- followlinks, if set to True, visits folders referenced by system links (optional)
For this tutorial, we are going to focus on the first two parameters since onerror and followlinks are more advanced and are not as commonly used.
Let’s say that we want to retrieve the names of all files in the
/home/data_analysis/netflix directory, but we also want to find out what’s enclosed within all subdirectories in that folder.
For example, as we discussed above, the
netflix directory contains one folder:
processed_data. We could use the following code to retrieve the names of all files in the
/home/data_analysis/netflix directory and the names of all files enclosed within all subdirectories in that directory:
import os path = '/home/data_analysis/netflix' for root, directories, files in os.walk(path, topdown=False): for name in files: print(os.path.join(root, name)) for name in directories: print(os.path.join(root, name))
Here’s the output from our code:
/home/data_analysis/netflix/README.md /home/data_analysis/netflix/app.py /home/data_analysis/netflix/raw_data_2016.csv /home/data_analysis/netflix/raw_data_2017.csv /home/data_analysis/netflix/raw_data_2018.csv /home/data_analysis/netflix/raw_data_2019.csv /home/data_analysis/netflix/processed_data /home/data_analysis/netflix/processed_data/final.csv
Let’s break down our code. On the first line, we import the os module from which we reference the
os.path.join() methods later in our code. Then, we declare a variable called
path, which stores the path whose file names we want to discover.
We then create a
for loop that uses
os.walk() to retrieve a list of all files and folders in the
path directory, and then iterates through those files and folders. It’s worth noting that we specify the
topdown=False parameter in the
os.walk() method, which tells our code to conduct a top-down search.
for loop then iterates through each file and directory discovered by the os.walk() method using additional
In our code above, here are our
for root, directories, files in os.walk(path): for name in files: print(os.path.join(root, name)) for name in directories: print(os.path.join(root, name))
Then, our program uses os.path.join() to join together the root folder of each file (
i.e. /home/data_analysis/netflix)and the name of the file (i.e.
raw_datra_2019.csv). The root folder refers to the directory path in which a file exists, and the name of the file is the name given to the specific file or directory found by
As you can see, there are a few differences between what our code in the first example and what our code in the second example returned.
os.walk() returned the full paths for each item because we used the
os.path.join() method, which joined the root directory mame with the name of the file.
os.listdir() only returned the names of files in a directory. Second,
os.walk() returned the contents of the
processed_data folder, which in this case was one file:
When coding in Python, it is sometimes useful to see a list of all files and folders within a directory. You can use the Python
listdir() method to do this. You can also use the
walk() method, which lists everything in a directory, including anything within subdirectories.
This guide explored, providing examples, how to use the
os.walk() methods to list files and folders in a directory in Python. Now you have the skills you need to list files in a directory in Python like an expert!