我如何能看到什么是在S3桶与boto3?(例如,写一个“ls”)?

做以下事情:

import boto3
s3 = boto3.resource('s3')
my_bucket = s3.Bucket('some/path/')

返回:

s3.Bucket(name='some/path/')

我如何看到它的内容?


当前回答

所以你在boto3中要求等同于aws s3 ls。这将列出所有顶级文件夹和文件。这是我能得到的最接近的结果;它只列出所有顶级文件夹。这么简单的操作居然这么难。

import boto3

def s3_ls():
  s3 = boto3.resource('s3')
  bucket = s3.Bucket('example-bucket')
  result = bucket.meta.client.list_objects(Bucket=bucket.name,
                                           Delimiter='/')
  for o in result.get('CommonPrefixes'):
    print(o.get('Prefix'))

其他回答

如果你想传递ACCESS和SECRET密钥(你不应该这样做,因为这是不安全的):

from boto3.session import Session

ACCESS_KEY='your_access_key'
SECRET_KEY='your_secret_key'

session = Session(aws_access_key_id=ACCESS_KEY,
                  aws_secret_access_key=SECRET_KEY)
s3 = session.resource('s3')
your_bucket = s3.Bucket('your_bucket')

for s3_file in your_bucket.objects.all():
    print(s3_file.key)

下面是一个简单的函数,它返回所有文件的文件名或具有特定类型的文件,如'json', 'jpg'。

def get_file_list_s3(bucket, prefix="", file_extension=None):
            """Return the list of all file paths (prefix + file name) with certain type or all
            Parameters
            ----------
            bucket: str
                The name of the bucket. For example, if your bucket is "s3://my_bucket" then it should be "my_bucket"
            prefix: str
                The full path to the the 'folder' of the files (objects). For example, if your files are in 
                s3://my_bucket/recipes/deserts then it should be "recipes/deserts". Default : ""
            file_extension: str
                The type of the files. If you want all, just leave it None. If you only want "json" files then it
                should be "json". Default: None       
            Return
            ------
            file_names: list
                The list of file names including the prefix
            """
            import boto3
            s3 = boto3.resource('s3')
            my_bucket = s3.Bucket(bucket)
            file_objs =  my_bucket.objects.filter(Prefix=prefix).all()
            file_names = [file_obj.key for file_obj in file_objs if file_extension is not None and file_obj.key.split(".")[-1] == file_extension]
            return file_names
import boto3
s3 = boto3.resource('s3')

## Bucket to use
my_bucket = s3.Bucket('city-bucket')

## List objects within a given prefix
for obj in my_bucket.objects.filter(Delimiter='/', Prefix='city/'):
  print obj.key

输出:

city/pune.csv
city/goa.csv

我花了一整晚的时间在这个问题上因为我只想知道子文件夹下的文件数但它也返回了一个额外的文件在子文件夹本身的内容中,

在研究之后,我发现这就是s3的工作方式 一个场景,我卸载数据从红移在以下目录

s3://bucket_name/subfolder/<10 number of files>

当我用

paginator.paginate(Bucket=price_signal_bucket_name,Prefix=new_files_folder_path+"/")

它只会返回10个文件,但当我在s3桶上创建文件夹时,它也会返回子文件夹

结论

如果整个文件夹都上传到s3,那么列出only将返回前缀下的文件 但是如果文件夹是在s3桶本身创建的,那么使用boto3客户端列出它也将返回子文件夹和文件

从lambda函数运行aws cli命令也是一个不错的选择

import subprocess
import logging

logger = logging.getLogger()
logger.setLevel(logging.INFO)

def run_command(command):
    command_list = command.split(' ')

    try:
        logger.info("Running shell command: \"{}\"".format(command))
        result = subprocess.run(command_list, stdout=subprocess.PIPE);
        logger.info("Command output:\n---\n{}\n---".format(result.stdout.decode('UTF-8')))
    except Exception as e:
        logger.error("Exception: {}".format(e))
        return False

    return True

def lambda_handler(event, context):
    run_command('/opt/aws s3 ls s3://bucket-name')