我如何能看到什么是在S3桶与boto3?(例如,写一个“ls”)?

做以下事情:

import boto3
s3 = boto3.resource('s3')
my_bucket = s3.Bucket('some/path/')

返回:

s3.Bucket(name='some/path/')

我如何看到它的内容?


当前回答

所以你在boto3中要求等同于aws s3 ls。这将列出所有顶级文件夹和文件。这是我能得到的最接近的结果;它只列出所有顶级文件夹。这么简单的操作居然这么难。

import boto3

def s3_ls():
  s3 = boto3.resource('s3')
  bucket = s3.Bucket('example-bucket')
  result = bucket.meta.client.list_objects(Bucket=bucket.name,
                                           Delimiter='/')
  for o in result.get('CommonPrefixes'):
    print(o.get('Prefix'))

其他回答

我花了一整晚的时间在这个问题上因为我只想知道子文件夹下的文件数但它也返回了一个额外的文件在子文件夹本身的内容中,

在研究之后,我发现这就是s3的工作方式 一个场景,我卸载数据从红移在以下目录

s3://bucket_name/subfolder/<10 number of files>

当我用

paginator.paginate(Bucket=price_signal_bucket_name,Prefix=new_files_folder_path+"/")

它只会返回10个文件,但当我在s3桶上创建文件夹时,它也会返回子文件夹

结论

如果整个文件夹都上传到s3,那么列出only将返回前缀下的文件 但是如果文件夹是在s3桶本身创建的,那么使用boto3客户端列出它也将返回子文件夹和文件

一种更节俭的方法,而不是通过一个for循环来迭代,你也可以只打印原始对象,其中包含S3桶中的所有文件:

session = Session(aws_access_key_id=aws_access_key_id,aws_secret_access_key=aws_secret_access_key)
s3 = session.resource('s3')
bucket = s3.Bucket('bucket_name')

files_in_s3 = bucket.objects.all() 
#you can print this iterable with print(list(files_in_s3))

查看内容的一种方法是:

for my_bucket_object in my_bucket.objects.all():
    print(my_bucket_object)

这类似于'ls',但它没有考虑到前缀文件夹约定,并将列出bucket中的对象。由读取器来过滤掉作为Key名称一部分的前缀。

在Python 2中:

from boto.s3.connection import S3Connection

conn = S3Connection() # assumes boto.cfg setup
bucket = conn.get_bucket('bucket_name')
for obj in bucket.get_all_keys():
    print(obj.key)

在Python 3中:

from boto3 import client

conn = client('s3')  # again assumes boto.cfg setup, assume AWS S3
for key in conn.list_objects(Bucket='bucket_name')['Contents']:
    print(key['Key'])

ObjectSummary:

有两个标识符附加到ObjectSummary:

bucket_name 关键

boto3 S3: ObjectSummary

有关AWS S3文档中的对象键的更多信息:

Object Keys: When you create an object, you specify the key name, which uniquely identifies the object in the bucket. For example, in the Amazon S3 console (see AWS Management Console), when you highlight a bucket, a list of objects in your bucket appears. These names are the object keys. The name for a key is a sequence of Unicode characters whose UTF-8 encoding is at most 1024 bytes long. The Amazon S3 data model is a flat structure: you create a bucket, and the bucket stores objects. There is no hierarchy of subbuckets or subfolders; however, you can infer logical hierarchy using key name prefixes and delimiters as the Amazon S3 console does. The Amazon S3 console supports a concept of folders. Suppose that your bucket (admin-created) has four objects with the following object keys: Development/Projects1.xls Finance/statement1.pdf Private/taxdocument.pdf s3-dg.pdf Reference: AWS S3: Object Keys

下面是一些示例代码,演示如何获取桶名和对象键。

例子:

import boto3
from pprint import pprint

def main():

    def enumerate_s3():
        s3 = boto3.resource('s3')
        for bucket in s3.buckets.all():
             print("Name: {}".format(bucket.name))
             print("Creation Date: {}".format(bucket.creation_date))
             for object in bucket.objects.all():
                 print("Object: {}".format(object))
                 print("Object bucket_name: {}".format(object.bucket_name))
                 print("Object key: {}".format(object.key))

    enumerate_s3()


if __name__ == '__main__':
    main()