KV#

基本概念#

openYuanrong datasystem (下文中称为数据系统)提供了近计算 KV 缓存能力,基于共享内存实现免拷贝的 KV 数据读写,实现高性能数据缓存。同时 KV 接口通过对接外部组件提供数据可靠性语义。

样例代码#

from yr.datasystem.ds_client import DsClient

client = DsClient("127.0.0.1", 31501)
client.init()

key = "key"
expected_val = b"value"
client.kv().set(key, expected_val)

val = client.kv().get([key])
assert val[0] == expected_val

client.kv().delete([key])
#include "datasystem/datasystem.h"

ConnectOptions connectOptions = { .host = "127.0.0.1", .port = 31501 };
auto client = std::make_shared<DsClient>(connectOptions);
ASSERT_TRUE(client->Init().IsOk());

std::string key = "testKey";
std::string value = "Hello kv client";
std::string value2 = "Hello modify";
Status status = client->KV()->Set(key, value);
ASSERT_TRUE(status.IsOk());

std::string getValue;
status = client->KV()->Get(key, getValue);
ASSERT_TRUE(status.IsOk());
ASSERT_TRUE(getValue == value);

status = client->KV()->Set(key, value2);
ASSERT_TRUE(status.IsOk());

status = client->KV()->Get(key, getValue);
ASSERT_TRUE(status.IsOk());
ASSERT_TRUE(getValue == value2);

status = client->KV()->Del(key);
ASSERT_TRUE(status.IsOk());

status = client->KV()->Get(key, getValue);
ASSERT_TRUE(status.IsError());

使用限制#

  • key 仅支持大写字母、小写字母、数字以及如下特定字符:-_!@#%^*()+=:;

  • key 的最大长度为 255 字节。

  • value 的最大长度没有限制,但是不能超出配置的共享内存大小。

  • 未写入二级缓存的数据,不保证数据可靠性,当发生故障时数据可能会丢失。

关于 KV 更多信息#

数据一致性#

KV 接口支持 Causal 级别数据读写一致性。 一致性模型定义参见 Consistency Models

数据溢出到磁盘#

KV 数据存储在数据系统的共享内存中,当内存不足时,支持自动将数据溢出到磁盘并从内存中删除数据。当数据需要读取时,自动从磁盘中加载到共享内存。 若磁盘空间也不足时,如果数据已写入到二级缓存,则自动将数据从本地磁盘和内存中删除。当数据需要读取时,自动从二级缓存加载到共享内存。 使用 KV 溢出功能,需要在部署时指定相关参数,默认为关闭。

# The path of the spilling, empty means local_disk spill disabled.
# It will create a new subdirectory("datasystem_spill_data") under the SPILL_DIRECTORY to store the spill file.
# Example: If SPILL_DIRECTORY is "/home/spill", spill files will exist in the "/home/spill/datasystem_spill_data".
spillDirectory: ""

数据溢出有以下参数,可用于设置磁盘空间上限、溢出的并发线程、文件大小、是否可以迁移数据到远端内存等参数,用于性能调优。

# Maximum amount of spilled data that can be stored in the spill directory. If spill is enable and spillSizeLimit is 0, spillSizeLimit will be set to 95% of the spill directory.
# Unit for spillSizeLimit is Bytes.
spillSizeLimit: "0"
# It represents the maximum parallelism of writing files, more threads will consume more CPU and I/O resources.
spillThreadNum: 8
# The size limit of single spill file, spilling objects which larger than that value with one object per file.
# If there are some big objects, you can increase this value to avoid run out of inodes quickly.
# The valid range is 200-10240.
spillFileMaxSizeMb: 200
# The maximum number of open file descriptors about spill. If opened file exceed this value,
# some files will be temporarily closed to prevent exceeding the maximum system limit. You need reduce this value if your system resources are limited.
# The valid range is greater than or equal to 8.
spillFileOpenLimit: 512
# Disable readahead can mitigate the read amplification problem for offset read, default is true
spillEnableReadahead: true
# Thread number of eviction for object cache.
evictionThreadNum: 1
# It indicates that when node resources are insufficient, it supports spilling memory to the memory of other nodes.
# When enabled, if local node memory reaches the high watermark, the system attempts to migrate objects to other workers' shared memory. If no worker has available memory, objects spill to disk.
spillToRemoteWorker: false

数据可靠性#

数据系统 KV 接口提供可靠性语义,在数据写入时,通过 writeMode 参数配置数据可靠性级别。仅当 writeMode 配置为 WRITE_THROUGH_L2_CACHEWRITE_BACK_L2_CACHE 时才保证数据可靠性,否则当出现故障或空间不足时,数据可能丢失。
数据系统通过对接外部存储组件作为二级缓存实现数据可靠性。当前支持的二级缓存组件有:OBS/SFS。
在集群部署时,需要在数据系统的部署参数中配置二级缓存相关参数,若未配置,则在 KV 写入时 writeMode 参数指定为 WRITE_THROUGH_L2_CACHEWRITE_BACK_L2_CACHE 时会写入失败。

集群部署时通过以下参数指定二级缓存类型。

# 指定二级缓存的类型。可选值为:'obs', 'sfs'.
# 默认值为'none',表示不支持二级缓存。
l2CacheType: "none"

对接各类外部组件的配置参数如下:

obs:
  # The access key for obs AK/SK authentication. If the value of encryptKit is not plaintext, encryption is required.
  obsAccessKey: ""
  # The secret key for obs AK/SK authentication. If the value of encryptKit is not plaintext, encryption is required.
  obsSecretKey: ""
  # OBS endpoint. Example: "xxx.hwcloudtest.cn"
  obsEndpoint: ""
  # OBS bucket name.
  obsBucket: ""
  # Whether to enable the https in obs. false: use HTTP (default), true: use HTTPS
  obsHttpsEnabled: false
  # Use cloud service token rotation to connect obs.
  cloudServiceTokenRotation:
    # Whether to use ccms credential rotation mode to access OBS, default is false. If is enabled, need to specify
    # iamHostName, identityProvider, projectId, regionId at least.
    # In addition, obsEndpoint and obsBucket need to be specified.
    enable: false
    # Domain name of the IAM token to be obtained. Example:  iam.example.com.
    iamHostName: ""
    # Provider that provides permissions for the ds-worker. Example: csms-datasystem.
    identityProvider: ""
    # Project id of the OBS to be accessed. Example: fb6a00ff7ae54a5fbb8ff855d0841d00.
    projectId: ""
    # Region id of the OBS to be accessed. Example: cn-beijing-4.
    regionId: ""
    # Whether to access OBS of other accounts by agency, default is false. If is true, need to specify tokenAgencyName
    # and tokenAgencyDomain.
    enableTokenByAgency: false
    # Agency name for proxy access to other accounts. Example: obs_access.
    tokenAgencyName: ""
    # Agency domain for proxy access to other accounts. Example: op_svc_cff.
    tokenAgencyDomain: ""
sfsTurbo:
  # Endpoint of sfs-turbo, which is used to concatenate the shared path in sfs-turbo. such as '172.21.7.239'.
  endpoint: ""
  # Sfs-turbo sub-path mounted to ds-worker. If this parameter is not specified, the root directory '/' of sfs-turbo
  # is mounted by default.
  subPath: ""
  # Specifies the sfs-turbo ID, which can be viewed on the sfs-turbo page.
  id: "0"
  # Specifies the sfs-turbo enterprise project ID, which can be viewed on the sfs-turbo page.
  projectId: "0"
  # Specifies the capacity of using sfs-turbo. Note that the size must be smaller than the size of sfsTurbo.
  capacity: "500Gi"