我试图使一个脚本列出所有目录，子目录，和文件在一个给定的目录。我试了一下:

import sys, os

root = "/home/patate/directory/"
path = os.path.join(root, "targetdirectory")

for r, d, f in os.walk(path):
    for file in f:
        print(os.path.join(root, file))

不幸的是，它不能正常工作。我得到了所有文件，但没有它们的完整路径。

例如，如果dir结构体为:

/home/patate/directory/targetdirectory/123/456/789/file.txt

它将打印:

/home/patate/directory/targetdirectory/file.txt

我需要的是第一个结果。任何帮助都将不胜感激!谢谢。

当前回答

使用任何受支持的Python版本(3.4+)，都应该使用pathlib。Rglob来递归地列出当前目录和所有子目录的内容:

from pathlib import Path


def generate_all_files(root: Path, only_files: bool = True):
    for p in root.rglob("*"):
        if only_files and not p.is_file():
            continue
        yield p


for p in generate_all_files(Path("."), only_files=False):
    print(p)

如果你想要复制粘贴的东西:

例子

文件夹结构:

$ tree . -a
.
├── a.txt
├── bar
├── b.py
├── collect.py
├── empty
├── foo
│   └── bar.bz.gz2
├── .hidden
│   └── secrect-file
└── martin
    └── thoma
        └── cv.pdf

给:

$ python collect.py 
bar
empty
.hidden
collect.py
a.txt
b.py
martin
foo
.hidden/secrect-file
martin/thoma
martin/thoma/cv.pdf
foo/bar.bz.gz2

2022-03-19 08:20:54

其他回答

另一种选择是使用标准库中的glob模块:

import glob

path = "/home/patate/directory/targetdirectory/**"

for path in glob.glob(path, recursive=True):
    print(path)

如果你需要一个迭代器，你可以使用iglob作为替代:

for file in glob.iglob(my_path, recursive=True):
    # ...

2021-11-11 23:25:43

使用os.path.join连接目录和文件名:

for path, subdirs, files in os.walk(root):
    for name in files:
        print(os.path.join(path, name))

注意在连接中使用path而不是root，因为使用root是不正确的。

在Python 3.4中，添加了pathlib模块以简化路径操作。所以os.path.join的等价代码是:

pathlib.PurePath(path, name)

pathlib的优点是您可以在路径上使用各种有用的方法。如果你使用具体的Path变量，你也可以通过它们进行实际的OS调用，比如改变到一个目录，删除路径，打开它指向的文件等等。

2010-05-26 03:46:18

非常简单的解决方案是运行一组子流程调用，将文件导出为CSV格式:

import subprocess

# Global variables for directory being mapped

location = '.' # Enter the path here.
pattern = '*.py' # Use this if you want to only return certain filetypes
rootDir = location.rpartition('/')[-1]
outputFile = rootDir + '_directory_contents.csv'

# Find the requested data and export to CSV, specifying a pattern if needed.
find_cmd = 'find ' + location + ' -name ' + pattern +  ' -fprintf ' + outputFile + '  "%Y%M,%n,%u,%g,%s,%A+,%P\n"'
subprocess.call(find_cmd, shell=True)

该命令生成逗号分隔的值，可以很容易地在Excel中进行分析。

f-rwxrwxrwx,1,cathy,cathy,2642,2021-06-01+00:22:00.2970880000,content-audit.py

生成的CSV文件没有标题行，但是您可以使用第二个命令来添加它们。

# Add headers to the CSV
headers_cmd = 'sed -i.bak 1i"Permissions,Links,Owner,Group,Size,ModifiedTime,FilePath" ' + outputFile
subprocess.call(headers_cmd, shell=True)

根据返回的数据量，可以使用Pandas对其进行进一步处理。以下是一些我认为有用的东西，特别是在处理要查看的多级目录时。

将这些添加到导入中:

import numpy as np
import pandas as pd

然后将以下内容添加到代码中:

# Create DataFrame from the csv file created above.
df = pd.read_csv(outputFile)
    
# Format columns
# Get the filename and file extension from the filepath 
df['FileName'] = df['FilePath'].str.rsplit("/",1).str[-1]
df['FileExt'] = df['FileName'].str.rsplit('.',1).str[1]

# Get the full path to the files. If the path doesn't include a "/" it's the root directory
df['FullPath'] = df["FilePath"].str.rsplit("/",1).str[0]
df['FullPath'] = np.where(df['FullPath'].str.contains("/"), df['FullPath'], rootDir)

# Split the path into columns for the parent directory and its children
df['ParentDir'] = df['FullPath'].str.split("/",1).str[0]
df['SubDirs'] = df['FullPath'].str.split("/",1).str[1]
# Account for NaN returns, indicates the path is the root directory
df['SubDirs'] = np.where(df.SubDirs.str.contains('NaN'), '', df.SubDirs)

# Determine if the item is a directory or file.
df['Type'] = np.where(df['Permissions'].str.startswith('d'), 'Dir', 'File')

# Split the time stamp into date and time columns
df[['ModifiedDate', 'Time']] = df.ModifiedTime.str.rsplit('+', 1, expand=True)
df['Time'] = df['Time'].str.split('.').str[0]

# Show only files, output includes paths so you don't necessarily need to display the individual directories.
df = df[df['Type'].str.contains('File')]

# Set columns to show and their order.
df=df[['FileName','ParentDir','SubDirs','FullPath','DocType','ModifiedDate','Time', 'Size']]

filesize=[] # Create an empty list to store file sizes to convert them to something more readable.

# Go through the items and convert the filesize from bytes to something more readable.
for items in df['Size'].items():
    filesize.append(convert_bytes(items[1]))
    df['Size'] = filesize 

# Send the data to an Excel workbook with sheets by parent directory
with pd.ExcelWriter("scripts_directory_contents.xlsx") as writer:
    for directory, data in df.groupby('ParentDir'):
    data.to_excel(writer, sheet_name = directory, index=False) 
        

# To convert sizes to be more human readable
def convert_bytes(size):
    for x in ['b', 'K', 'M', 'G', 'T']:
        if size < 1024:
            return "%3.1f %s" % (size, x)
        size /= 1024

    return size

2021-06-01 04:34:11

使用任何受支持的Python版本(3.4+)，都应该使用pathlib。Rglob来递归地列出当前目录和所有子目录的内容:

from pathlib import Path


def generate_all_files(root: Path, only_files: bool = True):
    for p in root.rglob("*"):
        if only_files and not p.is_file():
            continue
        yield p


for p in generate_all_files(Path("."), only_files=False):
    print(p)

如果你想要复制粘贴的东西:

例子

文件夹结构:

$ tree . -a
.
├── a.txt
├── bar
├── b.py
├── collect.py
├── empty
├── foo
│   └── bar.bz.gz2
├── .hidden
│   └── secrect-file
└── martin
    └── thoma
        └── cv.pdf

给:

$ python collect.py 
bar
empty
.hidden
collect.py
a.txt
b.py
martin
foo
.hidden/secrect-file
martin/thoma
martin/thoma/cv.pdf
foo/bar.bz.gz2

2022-03-19 08:20:54

以防万一……获取目录中的所有文件和匹配某些模式的子目录(例如*.py):

import os
from fnmatch import fnmatch

root = '/some/directory'
pattern = "*.py"

for path, subdirs, files in os.walk(root):
    for name in files:
        if fnmatch(name, pattern):
            print(os.path.join(path, name))

2012-11-04 00:38:36

Python列表目录，子目录和文件

推荐文章

最新文章

标签