Unicode and Special Symbols in Object Names
AIStore provides seamless support for object names containing Unicode characters (like Japanese, Chinese, or Korean text), emojis, and special symbols.
This README demonstrates handling these names properly across both native (ais://) buckets and Cloud storage.
Table of Contents
- Client vs. Server Responsibilities
- Working with Unicode Object Names
- Cross-Backend Compatibility (S3)
- Terminal and Environment Considerations
- Curl
- Special Symbols in Object Names
- Encoding Helper (Python)
Client vs. Server Responsibilities
In accordance with standard HTTP behavior, clients are responsible for percent-encoding object names into valid URL paths. This is especially important when object names contain characters like spaces, slashes, or symbols such as ?, #, or ".
The AIS server — like all HTTP servers — automatically decodes (unescapes) the request path before processing it.
For example:
-
If a client sends a request to read or write
bucket/my%20file, the server will unescape it and handle it asbucket/my file. -
In terms of RESTful API, when a client sends a request with URL path
/v1/objects/bucket/my%20file, AIS will handle it as/v1/objects/bucket/my file.
The bottom line is, AIStore always uses the original UTF-8 string decoded from the HTTP path. The path that you may (or may not) encode on the client side.
For convenience, AIS CLI provides the --encode-objname flag to do this automatically, so you can use natural-looking object names even if they contain special characters.
But if you use tools like curl, you must encode URLs manually.
Below are examples that demonstrate this behavior.
Working with Unicode Object Names
Cross-Backend Compatibility (S3)
Terminal and Environment Considerations
For proper display of Unicode characters in your terminal:
- Ensure your terminal supports UTF-8 (most modern terminals do)
- Set your locale to UTF-8:
export LANG=en_US.UTF-8 - If using VIM to edit configuration files with Unicode:
Curl
For programmatic access to objects with Unicode names, remember that the URL must be properly encoded:
Special Symbols in Object Names
Special symbols such as ;, :, ', ", <, >, /, \, |, ?, #, %, +, and & may require encoding depending on your shell or toolchain.
When using the CLI, specify --encode-objname flag with GET and PUT commands:
Note: object names are always displayed in their original, human-readable form.
Encoding Helper (Python)
For programmatic clients, here’s how to encode object names using Python:
This encoded string can be used in direct HTTP requests (e.g., with curl).
Similar encoding functions exist in other programming languages - refer to your language’s URL encoding documentation.
If you’re building your own clients in Go or another language, make sure to encode the object name into a valid URL.Path — just as you would for any other HTTP API.
For more information, see the full AIStore documentation at https://github.com/NVIDIA/aistore