Unicode and Special Symbols in Object Names

View as Markdown

AIStore provides seamless support for object names containing Unicode characters (like Japanese, Chinese, or Korean text), emojis, and special symbols.

This README demonstrates handling these names properly across both native (ais://) buckets and Cloud storage.

Table of Contents

Client vs. Server Responsibilities

In accordance with standard HTTP behavior, clients are responsible for percent-encoding object names into valid URL paths. This is especially important when object names contain characters like spaces, slashes, or symbols such as ?, #, or ".

The AIS server — like all HTTP servers — automatically decodes (unescapes) the request path before processing it.

For example:

  • If a client sends a request to read or write bucket/my%20file, the server will unescape it and handle it as bucket/my file.

  • In terms of RESTful API, when a client sends a request with URL path /v1/objects/bucket/my%20file, AIS will handle it as /v1/objects/bucket/my file.

The bottom line is, AIStore always uses the original UTF-8 string decoded from the HTTP path. The path that you may (or may not) encode on the client side.

For convenience, AIS CLI provides the --encode-objname flag to do this automatically, so you can use natural-looking object names even if they contain special characters.

But if you use tools like curl, you must encode URLs manually.

Below are examples that demonstrate this behavior.

Working with Unicode Object Names

1#
2# Set a Unicode object name (Japanese "Hello World")
3#
4$ export helloworld="こんにちは世界"
5
6#
7# Put an object with Unicode name into an AIS bucket
8#
9$ ais put LICENSE ais://onebucket/$helloworld
10PUT "LICENSE" => ais://onebucket/こんにちは世界
11
12#
13# Content is preserved correctly
14#
15$ ais object cat ais://onebucket/$helloworld
16MIT License
17Copyright (c) 2017 NVIDIA Corporation
18Permission is hereby granted, free of charge, to any person obtaining a copy
19...
20
21#
22# List the object details - Unicode name displays properly
23#
24$ ais ls ais://onebucket/$helloworld
25PROPERTY VALUE
26atime 09 Apr 25 16:49 EDT
27checksum xxhash2[ed5b3e74f9f3516a]
28name ais://onebucket/こんにちは世界
29size 1.05KiB

Cross-Backend Compatibility (S3)

1#
2# Put the same object into an S3 bucket
3#
4$ ais put LICENSE s3://twobucket/$helloworld
5PUT "LICENSE" => s3://twobucket/こんにちは世界
6
7#
8# Verify with native S3 tools - Unicode is preserved
9#
10$ s3cmd ls s3://twobucket/$helloworld
112025-04-09 20:50 1075 s3://twobucket/こんにちは世界
12
13#
14# Retrieve with native S3 tools - content is preserved
15#
16$ s3cmd get s3://twobucket/$helloworld - | cat
17download: 's3://twobucket/こんにちは世界' -> '-' [1 of 1]
18download: 's3://twobucket/こんにちは世界' -> '-' [1 of 1]
19 1075 of 1075 100% in 0s 897.27 KB/s
20MIT License
21Copyright (c) 2017 NVIDIA Corporation
22Permission is hereby granted, free of charge, to any person obtaining a copy
23...
24 1075 of 1075 100% in 0s 868.32 KB/s done

Terminal and Environment Considerations

For proper display of Unicode characters in your terminal:

  1. Ensure your terminal supports UTF-8 (most modern terminals do)
  2. Set your locale to UTF-8: export LANG=en_US.UTF-8
  3. If using VIM to edit configuration files with Unicode:
    $# Add to your .vimrc
    $set encoding=utf-8
    $set fileencoding=utf-8
    $set termencoding=utf-8

Curl

For programmatic access to objects with Unicode names, remember that the URL must be properly encoded:

1$ export helloworld="こんにちは世界"
2$ curl -L -X GET "http://ais-endpoint/v1/objects/onebucket/$helloworld"
3
4MIT License
5Copyright (c) 2017 NVIDIA Corporation
6Permission is hereby granted, free of charge, to any person obtaining a copy
7...

Special Symbols in Object Names

Special symbols such as ;, :, ', ", <, >, /, \, |, ?, #, %, +, and & may require encoding depending on your shell or toolchain.

When using the CLI, specify --encode-objname flag with GET and PUT commands:

1$ ais put LICENSE "ais://threebucket/aaa bbb ccc" --encode-objname
2PUT "LICENSE" => ais://threebucket/aaa bbb ccc
3
4$ ais ls ais://threebucket
5NAME SIZE
6aaa bbb ccc 1.05KiB
7
8$ ais object cat "ais://threebucket/aaa bbb ccc" --encode-objname
9
10MIT License
11Copyright (c) 2017 NVIDIA Corporation
12Permission is hereby granted, free of charge, to any person obtaining a copy
13...

Note: object names are always displayed in their original, human-readable form.


Encoding Helper (Python)

For programmatic clients, here’s how to encode object names using Python:

1import urllib.parse
2
3name = 'my weird/obj?name#with!symbols'
4encoded = urllib.parse.quote(name, safe='') # fully encode all special chars
5print(encoded)
6# Output: my%20weird%2Fobj%3Fname%23with%21symbols

This encoded string can be used in direct HTTP requests (e.g., with curl). Similar encoding functions exist in other programming languages - refer to your language’s URL encoding documentation.


If you’re building your own clients in Go or another language, make sure to encode the object name into a valid URL.Path — just as you would for any other HTTP API.

For more information, see the full AIStore documentation at https://github.com/NVIDIA/aistore