如何搜索亚马逊s3桶?

我也面临同样的问题。在S3中进行搜索应该比目前的情况容易得多。这就是为什么我在S3中实现了这个用于搜索的开源工具。

search是完全开源的S3搜索工具。它的实现始终牢记性能是关键因素，并根据基准测试在几秒钟内搜索包含~1000个文件的桶。

安装很简单。你只需要下载docker-compose文件并运行它

docker-compose up

搜索将开始，你可以在任何桶搜索任何东西。

2020-02-22 13:07:51

S3没有原生的“搜索此桶”，因为实际内容是未知的-此外，由于S3是基于键/值的，因此没有原生的方法可以一次访问多个节点，而更传统的数据存储提供了一个(SELECT * FROM…(在SQL模型中)。

您需要做的是执行ListBucket以获得bucket中对象的列表，然后遍历每个项，执行您实现的自定义操作—这就是您的搜索。

2011-02-12 16:52:19

另一种选择是在您的web服务器上镜像S3桶并在本地遍历。诀窍在于本地文件是空的，只用作骨架。或者，本地文件可以保存您通常需要从S3获取的有用元数据(例如，文件大小、mimetype、作者、时间戳、uuid)。当您提供下载文件的URL时，在本地搜索，但要提供到S3地址的链接。

本地文件遍历很容易，而且这种用于S3管理的方法与语言无关。本地文件遍历还可以避免维护和查询文件数据库，或者延迟执行一系列远程API调用来验证和获取桶内容。

您可以允许用户通过FTP或HTTP直接将文件上传到您的服务器，然后在非高峰时段通过递归遍历任意大小文件的目录将一批新的和更新的文件传输到Amazon。在完成向Amazon的文件传输后，将web服务器文件替换为同名的空文件。如果一个本地文件有任何文件大小，那么直接提供它，因为它正在等待批量传输。

2011-09-20 17:43:48

我是这样做的: 我在s3中有数千个文件。我在列表中看到一个文件的属性面板。你可以看到该文件的URI，我复制粘贴到浏览器-这是一个文本文件，它呈现得很好。我用手边的uuid替换了url中的uuid文件就出来了。

我希望AWS有更好的方法来搜索文件，但这对我来说很管用。

2015-11-07 00:55:22

虽然不是AWS的原生服务，但有Mixpeek，它在S3文件上运行文本提取，如Tika、Tesseract和ImageAI，然后将它们放在Lucene索引中，使它们可搜索。

在这里查看文档

积分如下:

Download the module: https://github.com/mixpeek/mixpeek-python Import the module and your API keys: from mixpeek import Mixpeek, S3 from config import mixpeek_api_key, aws Instantiate the S3 class (which uses boto3 and requests): s3 = S3( aws_access_key_id=aws['aws_access_key_id'], aws_secret_access_key=aws['aws_secret_access_key'], region_name='us-east-2', mixpeek_api_key=mixpeek_api_key ) Upload one or more existing S3 files: # upload all S3 files in bucket "demo" s3.upload_all(bucket_name="demo") # upload one single file called "prescription.pdf" in bucket "demo" s3.upload_one(s3_file_name="prescription.pdf", bucket_name="demo") Now simply search using the Mixpeek module: # mixpeek api direct mix = Mixpeek( api_key=mixpeek_api_key ) # search result = mix.search(query="Heartgard") print(result) Where result can be: [ { "_id": "REDACTED", "api_key": "REDACTED", "highlights": [ { "path": "document_str", "score": 0.8759502172470093, "texts": [ { "type": "text", "value": "Vetco Prescription\nVetcoClinics.com\n\nCustomer:\n\nAddress: Canine\n\nPhone: Australian Shepherd\n\nDate of Service: 2 Years 8 Months\n\nPrescription\nExpiration Date:\n\nWeight: 41.75\n\nSex: Female\n\n℞ " }, { "type": "hit", "value": "Heartgard" }, { "type": "text", "value": " Plus Green 26-50 lbs (Ivermectin 135 mcg/Pyrantel 114 mg)\n\nInstructions: Give one chewable tablet by mouth once monthly for protection against heartworms, and the treatment and\ncontrol of roundworms, and hookworms. " } ] } ], "metadata": { "date_inserted": "2021-10-07 03:19:23.632000", "filename": "prescription.pdf" }, "score": 0.13313256204128265 } ]

然后解析结果

2022-02-25 14:41:46