我有一个装满了上千份文件的桶。我如何搜索水桶?


当前回答

有多种选择,没有一个是简单的“一次性”全文解决方案:

Key name pattern search: Searching for keys starting with some string- if you design key names carefully, then you may have rather quick solution. Search metadata attached to keys: when posting a file to AWS S3, you may process the content, extract some meta information and attach this meta information in form of custom headers into the key. This allows you to fetch key names and headers without need to fetch complete content. The search has to be done sequentialy, there is no "sql like" search option for this. With large files this could save a lot of network traffic and time. Store metadata on SimpleDB: as previous point, but with storing the metadata on SimpleDB. Here you have sql like select statements. In case of large data sets you may hit SimpleDB limits, which can be overcome (partition metadata across multiple SimpleDB domains), but if you go really far, you may need to use another metedata type of database. Sequential full text search of the content - processing all the keys one by one. Very slow, if you have too many keys to process.

几年来,我们每天存储1440个版本的文件(每分钟一个),使用版本化桶,这是很容易实现的。但要获得一些较旧的版本需要时间,因为人们必须一个版本一个版本地按顺序进行。有时我使用简单的CSV记录索引,显示发布时间和版本id,有了这个,我可以很快跳转到旧版本。

正如你所看到的,AWS S3并不是为全文搜索而设计的,它是一个简单的存储服务。

其他回答

考虑到你在AWS…我认为你会想要使用他们的CloudSearch工具。把你想要搜索的数据放到他们的服务中…让它指向S3密钥。

http://aws.amazon.com/cloudsearch/

使用Amazon Athena查询S3桶。另外,加载数据到Amazon Elastic搜索。希望这能有所帮助。

快进到2020年,使用aws-okta作为我们的2fa,下面的命令,尽管迭代这个特定bucket(+270,000)中的所有对象和文件夹非常缓慢,但运行良好。

aws-okta exec dev -- aws s3 ls my-cool-bucket --recursive | grep needle-in-haystax.txt

(至少)有两个不同的用例可以描述为“搜索桶”:

Search for something inside every object stored at the bucket; this assumes a common format for all the objects in that bucket (say, text files), etc etc. For something like this, you're forced to do what Cody Caughlan just answered. The AWS S3 docs has example code showing how to do this with the AWS SDK for Java: Listing Keys Using the AWS SDK for Java (there you'll also find PHP and C# examples). List item Search for something in the object keys contained in that bucket; S3 does have partial support for this, in the form of allowing prefix exact matches + collapsing matches after a delimiter. This is explained in more detail at the AWS S3 Developer Guide. This allows, for example, to implement "folders" through using as object keys something like folder/subfolder/file.txt If you follow this convention, most of the S3 GUIs (such as the AWS Console) will show you a folder view of your bucket.

2018 - 07年现状: 亚马逊有本地sql像搜索csv和json文件!

https://aws.amazon.com/blogs/developer/introducing-support-for-amazon-s3-select-in-the-aws-sdk-for-javascript/