S3 Client

Alluxio支持RESTful API,兼容Amazon S3 API 的基本操作。

会在Alluxio构建时生成并且可以通过获得。

使用HTTP代理会带来一些性能的影响,尤其是在使用代理的时候会增加一个额外的跳计数。为了达到最优的性能,推荐代理服务和一个Alluxio worker运行在一个计算节点上。或者,推荐将所有的代理服务器放到load balancer之后。

特性支持

语言支持

Alluxio S3 客户端支持各种编程语言,比如C++、Java、Python、Golang、Ruby等。在这个文档中,我们使用curl REST调用和python S3 client作为使用示例。

使用示例

举个例子,你可以使用如下的RESTful API调用方式在本地运行一个Alluxio集群。Alluxio代理会默认在39999端口监听。

获取bucket(objects列表)

  1. $ curl -i -X GET http://localhost:39999/api/v1/s3/testbucket
  2. HTTP/1.1 200 OK
  3. Date: Tue, 29 Aug 2017 22:35:00 GMT
  4. Content-Type: application/xml
  5. Content-Length: 200
  6. Server: Jetty(9.2.z-SNAPSHOT)
  7. <ListBucketResult xmlns=""><Name>/testbucket</Name><Prefix/><ContinuationToken/><NextContinuationToken/><KeyCount>0</KeyCount><MaxKeys>1000</MaxKeys><IsTruncated>false</IsTruncated></ListBucketResult>

加入object

假定本地现存一个文件LICENSE

  1. $ curl -i -X PUT -T "LICENSE" http://localhost:39999/api/v1/s3/testbucket/testobject
  2. HTTP/1.1 100 Continue
  3. HTTP/1.1 200 OK
  4. Date: Tue, 29 Aug 2017 22:36:03 GMT
  5. ETag: "9347237b67b0be183499e5893128704e"
  6. Content-Length: 0
  7. Server: Jetty(9.2.z-SNAPSHOT)

获取object

  1. $ curl -i -X GET http://localhost:39999/api/v1/s3/testbucket/testobject
  2. HTTP/1.1 200 OK
  3. Date: Tue, 29 Aug 2017 22:37:34 GMT
  4. Last-Modified: Tue, 29 Aug 2017 22:36:03 GMT
  5. Content-Type: application/xml
  6. Content-Length: 26847
  7. Server: Jetty(9.2.z-SNAPSHOT)
  8. .................. Content of the test file ...................

列出含有单个object的bucket

  1. $ curl -i -X GET http://localhost:39999/api/v1/s3/testbucket
  2. HTTP/1.1 200 OK
  3. Date: Tue, 29 Aug 2017 22:38:48 GMT
  4. Content-Type: application/xml
  5. Content-Length: 363
  6. Server: Jetty(9.2.z-SNAPSHOT)
  7. <ListBucketResult xmlns=""><Name>/testbucket</Name><Prefix/><ContinuationToken/><NextContinuationToken/><KeyCount>1</KeyCount><MaxKeys>1000</MaxKeys><IsTruncated>false</IsTruncated><Contents><Key>testobject</Key><LastModified>2017-08-29T15:36:03.613Z</LastModified><ETag></ETag><Size>26847</Size><StorageClass>STANDARD</StorageClass></Contents></ListBucketResult>

列出含有多个objects的bucket

  1. $ curl -i -X PUT -T "LICENSE" http://localhost:39999/api/v1/s3/testbucket/key2
  2. $ curl -i -X PUT -T "LICENSE" http://localhost:39999/api/v1/s3/testbucket/key3
  3. $ curl -i -X GET http://localhost:39999/api/v1/s3/testbucket\?max-keys\=2
  4. HTTP/1.1 200 OK
  5. Date: Tue, 29 Aug 2017 22:40:45 GMT
  6. Content-Type: application/xml
  7. Content-Length: 537
  8. Server: Jetty(9.2.z-SNAPSHOT)
  9. <ListBucketResult xmlns=""><Name>/testbucket</Name><Prefix/><ContinuationToken/><NextContinuationToken>key3</NextContinuationToken><KeyCount>2</KeyCount><MaxKeys>2</MaxKeys><IsTruncated>true</IsTruncated><Contents><Key>key1</Key><LastModified>2017-08-29T15:40:42.213Z</LastModified><ETag></ETag><Size>26847</Size><StorageClass>STANDARD</StorageClass></Contents><Contents><Key>key2</Key><LastModified>2017-08-29T15:40:43.269Z</LastModified><ETag></ETag><Size>26847</Size><StorageClass>STANDARD</StorageClass></Contents></ListBucketResult>
  10. # curl -i -X GET http://localhost:39999/api/v1/s3/testbucket\?max-keys\=2\&continuation-token\=key3
  11. HTTP/1.1 200 OK
  12. Date: Tue, 29 Aug 2017 22:41:18 GMT
  13. Content-Type: application/xml
  14. Content-Length: 540
  15. Server: Jetty(9.2.z-SNAPSHOT)
  16. <ListBucketResult xmlns=""><Name>/testbucket</Name><Prefix/><ContinuationToken>key3</ContinuationToken><NextContinuationToken/><KeyCount>2</KeyCount><MaxKeys>2</MaxKeys><IsTruncated>false</IsTruncated><Contents><Key>key3</Key><LastModified>2017-08-29T15:40:44.002Z</LastModified><ETag></ETag><Size>26847</Size><StorageClass>STANDARD</StorageClass></Contents><Contents><Key>testobject</Key><LastModified>2017-08-29T15:36:03.613Z</LastModified><ETag></ETag><Size>26847</Size><StorageClass>STANDARD</StorageClass></Contents></ListBucketResult>

你还可以验证这些对象是否为Alluxio文件,在/testbucket目录下。

  1. $ ./bin/alluxio fs ls -R /testbucket

删除objects

  1. $ curl -i -X DELETE http://localhost:39999/api/v1/s3/testbucket/key1
  2. HTTP/1.1 204 No Content
  3. Date: Tue, 29 Aug 2017 22:43:22 GMT
  4. Server: Jetty(9.2.z-SNAPSHOT)
  1. $ curl -i -X DELETE http://localhost:39999/api/v1/s3/testbucket/key2
  2. $ curl -i -X DELETE http://localhost:39999/api/v1/s3/testbucket/key3
  3. $ curl -i -X DELETE http://localhost:39999/api/v1/s3/testbucket/testobject

初始化multipart upload

  1. $ curl -i -X PUT http://localhost:39999/api/v1/s3/testbucket/testobject?partNumber=1&uploadId=2
  2. HTTP/1.1 200 OK
  3. Date: Tue, 29 Aug 2017 22:43:22 GMT
  4. ETag: "b54357faf0632cce46e942fa68356b38"
  5. Server: Jetty(9.2.z-SNAPSHOT)

罗列已上传的分块

  1. $ curl -i -X GET http://localhost:39999/api/v1/s3/testbucket/testobject?uploadId=2
  2. HTTP/1.1 200 OK
  3. Date: Tue, 29 Aug 2017 22:43:22 GMT
  4. Content-Length: 985
  5. Server: Jetty(9.2.z-SNAPSHOT)
  6. <?xml version="1.0" encoding="UTF-8"?>
  7. <ListPartsResult xmlns="">
  8. <Bucket>testbucket</Bucket>
  9. <Key>testobject</Key>
  10. <UploadId>2</UploadId>
  11. <StorageClass>STANDARD</StorageClass>
  12. <IsTruncated>false</IsTruncated>
  13. <Part>
  14. <PartNumber>1</PartNumber>
  15. <LastModified>2017-08-29T20:48:34.000Z</LastModified>
  16. <ETag>"b54357faf0632cce46e942fa68356b38"</ETag>
  17. <Size>10485760</Size>
  18. </Part>
  19. </ListPartsResult>

完成multipart upload

  1. $ curl -i -X POST http://localhost:39999/api/v1/s3/testbucket/testobject?uploadId=2 -d '
  2. <CompleteMultipartUpload>
  3. <Part>
  4. <PartNumber>1</PartNumber>
  5. <ETag>"b54357faf0632cce46e942fa68356b38"</ETag>
  6. </Part>
  7. </CompleteMultipartUpload>'
  8. HTTP/1.1 200 OK
  9. Date: Tue, 29 Aug 2017 22:43:22 GMT
  10. Server: Jetty(9.2.z-SNAPSHOT)
  11. <?xml version="1.0" encoding="UTF-8"?>
  12. <CompleteMultipartUploadResult xmlns="">
  13. <Location>/testbucket/testobjectLocation>
  14. <Bucket>testbucket</Bucket>
  15. <Key>testobject</Key>
  16. <ETag>"b54357faf0632cce46e942fa68356b38"</ETag>
  17. </CompleteMultipartUploadResult>

中止multipart upload

  1. HTTP/1.1 204 OK
  2. Date: Tue, 29 Aug 2017 22:43:22 GMT
  3. Content-Length: 0
  4. Server: Jetty(9.2.z-SNAPSHOT)

删除空bucket

  1. $ curl -i -X DELETE http://localhost:39999/api/v1/s3/testbucket
  2. HTTP/1.1 204 No Content
  3. Date: Tue, 29 Aug 2017 22:45:19 GMT

Python S3 Client

创建连接

  1. import boto
  2. import boto.s3.connection
  3. conn = boto.connect_s3(
  4. aws_access_key_id = '',
  5. aws_secret_access_key = '',
  6. host = 'localhost',
  7. port = 39999,
  8. path = '/api/v1/s3',
  9. is_secure=False,
  10. calling_format = boto.s3.connection.OrdinaryCallingFormat(),
  11. )

创建bucket

  1. bucketName = 'bucket-for-testing'
  2. bucket = conn.create_bucket(bucketName)

加入small object

  1. smallObjectKey = 'small.txt'
  2. smallObjectContent = 'Hello World!'
  3. key = bucket.new_key(smallObjectKey)
  4. key.set_contents_from_string(smallObjectContent)

上传large object

在本地文件系统创建一个8MB文件

  1. $ dd if=/dev/zero of=8mb.data bs=1048576 count=8

使用python S3 client把它作为object上传

  1. largeObjectKey = 'large.txt'
  2. largeObjectFile = '8mb.data'
  3. key = bucket.new_key(largeObjectKey)
  4. with open(largeObjectFile, 'rb') as f:
  5. key.set_contents_from_file(f)
  6. with open(largeObjectFile, 'rb') as f:
  7. largeObject = f.read()

获取large objecy

  1. assert largeObject == key.get_contents_as_string()

删除objects

  1. bucket.delete_key(smallObjectKey)
  2. bucket.delete_key(largeObjectKey)

初始化multipart upload

  1. mp = bucket.initiate_multipart_upload(largeObjectFile)

上传分块

  1. import math, os
  2. from filechunkio import FileChunkIO
  3. # Use a chunk size of 1MB (feel free to change this)
  4. sourceSize = os.stat(largeObjectFile).st_size
  5. chunkSize = 1048576
  6. chunkCount = int(math.ceil(sourceSize / float(chunkSize)))
  7. for i in range(chunkCount):
  8. offset = chunkSize * i
  9. bytes = min(chunkSize, sourceSize - offset)
  10. with FileChunkIO(largeObjectFile, 'r', offset=offset, bytes=bytes) as fp:
  11. mp.upload_part_from_file(fp, part_num=i + 1)

完成multipart upload

  1. mp.complete_upload()

中止multipart upload