The Python os library is used to list the files in a directory. The Python os.listdir() method returns a list of every file and folder in a directory. os.walk() function returns a list of every file in an entire file tree.
Often, when you’re working with files in Python, you’ll encounter situations where you want to list the files in a directory. For instance, you may want to find all of the Python files in a folder.
The Python os library offers a number of methods that can be used to list files in a directory. This tutorial will discuss how to use os.listdir() to get the files and folders in a director. We’ll also talk about using os.walk() to get the files and folders in a directory and in its subdirectories.
Python os Library
The Python os library provides a number of functions that you can use to work with operating systems. The functions included in the os module work on any modern operating system, whether it is Windows, Linux, or Mac.
Since os is an external library, we need to import it into our code before we start using it. We can do so using a Python import statement:
Now that we’ve imported the os library into our code, we can start using its functions to list items in a directory.
In Python, the os.listdir() method lists files and folders in a given directory. The method does not return special entries such as ‘.’ and ‘..’, which the operating system uses to navigate through different directories.
os.listdir() also does not return files and folders beyond the first level of folders. In other words, os.listdir() does not return anything within subfolders discovered by the method.
The os.listdir() function accepts one parameter: the file path of the directory whose file and folder names you want to retrieve.
Here’s the syntax for the listdir method:
Let’s walk through an example to showcase how to use this method in a Python program.
os.listdir() Python Example
Say that we are creating a program that analyzes the stock market performance of Netflix over the last decade. We have a folder (name: /home/data_analysis/netflix) with all of our raw data, and before our program starts running, we want to check to make sure that the file raw_data_2019.csv exists within that folder.
In order to function properly, our program needs that particular file to be stored in that particular folder.
We could use the following code to retrieve a list of the files in the /home/data_analysis/netflix work directory:
import os path = '/home/data_analysis/netflix' files = os.listdir(path) for f in files: print(f)
Our program retrieves a list of all files and folders in the specified directory and returns the following:
README.md app.py raw_data_2016.csv raw_data_2017.csv raw_data_2018.csv raw_data_2019.csv processed_data
Now, we can check to see if the file raw_data_2019.csv is in the folder. As you can see, it is.
Let’s break down our code. On the first line, we import the os module, which we need to do in order to access the os.listdir() function. Then, we declare a Python variable called path, which stores the name of the path whose contents we want to retrieve.
On the next line, we use the os.listdir() method to get a list of the files and folders in the /home/data_analysis/netflix directory. Finally, we create a Python for loop. This loop iterates through every item in the list produced by os.listdir(). We print out the name of each file to the console using a Python print() statement.
The /home/data_analysis/netflix directory contained six files and one directory. The directory is called processed_data and is distinguishable from the other files because it does not have an extension.
The os.walk() function retrieves a list of files contained within a tree. The method iterates over each directory in a tree. Then, os.walk() returns the name of every file and folder within a directory and any of its subdirectories.
The syntax for the os.walk() method is as follows:
os.walk(top, topdown, onerror, followlinks)
The os.walk() method accepts four parameters:
- top is the top directory whose component file and folder names you want to retrieve (required)
- topdown, when set to True, specifies that directories should be scanned from the top down. If this value is set to False, directories will be scanned from the bottom up (optional)
- onerror provides an error handler if an error is encountered (optional)
- followlinks, if set to True, visits folders referenced by system links (optional)
We are going to focus on the first two parameters since onerror and followlinks are more advanced and are not as commonly used.
os.walk() Python Example
Let’s say that we want to retrieve the names of all files in the /home/data_analysis/netflix directory. We also want to find out what’s enclosed within all subdirectories in that folder.
As we discussed above, the netflix directory contains one folder: processed_data. We could use the following code to retrieve the names of all files in the /home/data_analysis/netflix directory and its subdirectories:
import os path = '/home/data_analysis/netflix' for root, directories, files in os.walk(path, topdown=False): for name in files: print(os.path.join(root, name)) for name in directories: print(os.path.join(root, name))
Here’s the output from our code:
/home/data_analysis/netflix/README.md /home/data_analysis/netflix/app.py /home/data_analysis/netflix/raw_data_2016.csv /home/data_analysis/netflix/raw_data_2017.csv /home/data_analysis/netflix/raw_data_2018.csv /home/data_analysis/netflix/raw_data_2019.csv /home/data_analysis/netflix/processed_data /home/data_analysis/netflix/processed_data/final.csv
We import the os module from which we reference the os.walk() and os.path.join() methods later in our code. Then, we declare a variable called path, which stores the path whose file names we want to discover.
We then create a for loop that uses os.walk() to retrieve a list of all files and folders in the path directory. That loop iterates through the files and folders that os.walk() returns. It’s worth noting that we specify the topdown=False parameter in the os.walk() method, which tells our code to conduct a top-down search.
Our for loop iterates through each file and directory discovered by the os.walk() method using additional for loops. We print out the files in os.walk() to the console.
In our code above, here are our for loops:
for root, directories, files in os.walk(path): for name in files: print(os.path.join(root, name)) for name in directories: print(os.path.join(root, name))
Then, our program uses os.path.join() to join together the root folder of each file (i.e. /home/data_analysis/netflix)and the name of the file (i.e. raw_datra_2019.csv). The root folder refers to the directory path in which a file exists.
You can use the Python listdir() method to do this. You can also use the walk() method, which lists everything in a directory, including anything within subdirectories.
This guide explored, providing examples, how to use the os.listdir() and os.walk() methods to list files and folders in a directory in Python. Now you have the skills you need to list files in a directory in Python like an expert!
To learn more about coding in Python, read our full How to Learn Python guide.
About us: Career Karma is a platform designed to help job seekers find, research, and connect with job training programs to advance their careers. Learn about the CK publication.