vllm.config.ec_transfer ¶
ECTransferConfig ¶
Configuration for distributed EC cache transfer.
Source code in vllm/config/ec_transfer.py
ec_buffer_device class-attribute instance-attribute ¶
ec_buffer_device: str | None = 'cuda'
The device used by ec connector to buffer the EC cache. Currently only support 'cuda'.
ec_buffer_size class-attribute instance-attribute ¶
ec_buffer_size: float = 1000000000.0
The buffer size for TorchDistributedConnector. Measured in number of bytes. Recommended value: 1e9 (about 1GB).
ec_connector class-attribute instance-attribute ¶
ec_connector: str | None = None
The EC connector for vLLM to transmit EC caches between vLLM instances.
ec_connector_extra_config class-attribute instance-attribute ¶
any extra config that the connector may need.
ec_connector_module_path class-attribute instance-attribute ¶
ec_connector_module_path: str | None = None
The Python module path to dynamically load the EC connector from. Only supported in V1.
ec_ip class-attribute instance-attribute ¶
ec_ip: str = '127.0.0.1'
The EC connector ip, used to build distributed connection.
ec_parallel_size class-attribute instance-attribute ¶
ec_parallel_size: int = 1
The number of parallel instances for EC cache transfer. For PyNcclConnector, this should be 2.
ec_port class-attribute instance-attribute ¶
ec_port: int = 14579
The EC connector port, used to build distributed connection.
ec_rank class-attribute instance-attribute ¶
ec_rank: int | None = None
The rank of this vLLM instance in the EC cache transfer. Typical value: 0 for encoder, 1 for pd instance. Currently only 1P1D is supported.
ec_role class-attribute instance-attribute ¶
ec_role: ECRole | None = None
Whether this vLLM instance produces, consumes EC cache, or both. Choices are 'ec_producer', 'ec_consumer'.
engine_id class-attribute instance-attribute ¶
engine_id: str | None = None
The engine id for EC transfers.
__post_init__ ¶
Source code in vllm/config/ec_transfer.py
compute_hash ¶
compute_hash() -> str
WARNING: Whenever a new field is added to this config, ensure that it is included in the factors list if it affects the computation graph.
Provide a hash that uniquely identifies all the configs that affect the structure of the computation graph from input ids/embeddings to the final hidden states, excluding anything before input ids/embeddings and after the final hidden states.