vllm.distributed.ec_transfer.ec_connector.base ¶
ECConnectorBase Class for Distributed Encoder Cache & P2P Encoder cache communication in V1
The class provides the following primitives
Scheduler-side: runs in the scheduler, binds metadata, which is used by the worker-side to load/save Encoder cache. check_caches_exist() - Check whether Encoder cache of requests exist update_state_after_alloc() - update ECConnector state after allocate. This will decide to load the cache or not request_finished() - called when a request is finished, free the cache with the requests
Worker-side: runs in each worker, loads/saves Encoder Cache to/from the Connector based on the metadata. start_load_ec() - starts loading all ECs (maybe async) wait_for_save() - blocks until all saves are done
get_finished() - called with ids of finished requests, returns
ids of requests that have completed async sending/recving.
ECConnectorBase ¶
Bases: ABC
Source code in vllm/distributed/ec_transfer/ec_connector/base.py
59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 | |
__init__ ¶
__init__(vllm_config: VllmConfig, role: ECConnectorRole)
Source code in vllm/distributed/ec_transfer/ec_connector/base.py
_get_connector_metadata ¶
_get_connector_metadata() -> ECConnectorMetadata
Get the connector metadata.
This function should only be called inside the connector.
Returns:
| Name | Type | Description |
|---|---|---|
ConnectorMetadata | ECConnectorMetadata | the connector metadata. |
Source code in vllm/distributed/ec_transfer/ec_connector/base.py
bind_connector_metadata ¶
bind_connector_metadata(
connector_metadata: ECConnectorMetadata,
) -> None
Set the connector metadata from the scheduler.
This function should be called by the model runner every time before the model execution. The metadata will be used for runtime EC cache loading.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
connector_metadata | dict | the connector metadata. | required |
Source code in vllm/distributed/ec_transfer/ec_connector/base.py
build_connector_meta abstractmethod ¶
build_connector_meta(
scheduler_output: SchedulerOutput,
) -> ECConnectorMetadata
Build the connector metadata for this step.
This function should NOT modify fields in the scheduler_output. Also, calling this function will reset the state of the connector.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
scheduler_output | SchedulerOutput | the scheduler output object. | required |
Source code in vllm/distributed/ec_transfer/ec_connector/base.py
clear_connector_metadata ¶
Clear the connector metadata.
This function should be called by the model runner every time after the model execution.
get_finished ¶
Notifies worker-side connector ids of requests that have finished generating tokens on the worker. The scheduler process (via the Executors) will use this output to track which workers are done.
Returns:
| Type | Description |
|---|---|
set[str] | None | ids of requests that have finished asynchronous transfer |
set[str] | None | (requests that previously returned True from request_finished()), |
tuple[set[str] | None, set[str] | None] | tuple of (sending/saving ids, recving/loading ids). |
tuple[set[str] | None, set[str] | None] | The finished saves/sends req ids must belong to a set provided in a |
tuple[set[str] | None, set[str] | None] | call to this method (this call or a prior one). |
Source code in vllm/distributed/ec_transfer/ec_connector/base.py
has_caches abstractmethod ¶
Check if encoder cache exists for each mm data of requests
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
request | Request | the request object. | required |
Returns:
| Type | Description |
|---|---|
list[bool] | A list bool where ith value is True if cache exist for |
list[bool] | ith mm_data of requests |
Source code in vllm/distributed/ec_transfer/ec_connector/base.py
register_caches ¶
Initialize with the EC caches. Args: ec_caches: dictionary of encoder cache
Source code in vllm/distributed/ec_transfer/ec_connector/base.py
request_finished ¶
Called when a request has finished, before its encoder cache is freed.
Returns:
| Type | Description |
|---|---|
bool | True if the request is being saved/sent asynchronously and cached |
dict[str, Any] | None | should not be freed until the request_id is returned from |
tuple[bool, dict[str, Any] | None] | get_finished(). |
Source code in vllm/distributed/ec_transfer/ec_connector/base.py
save_caches abstractmethod ¶
Save the encoder cache to the connector.
This method saves the encoder cache from the worker's local storage to shared storage or another external connector.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
encoder_cache | dict[str, Tensor] | A dictionary mapping multimodal data hashes ( | required |
mm_hash | str | The hash of the multimodal data whose cache is being saved. | required |
kwargs | dict | Additional keyword arguments for the connector. | {} |
Source code in vllm/distributed/ec_transfer/ec_connector/base.py
start_load_caches abstractmethod ¶
Start loading the cache from the connector into vLLM's encoder cache.
This method loads the encoder cache based on metadata provided by the scheduler. It is called before _gather_mm_embeddings for the EC Connector. For EC, the encoder_cache and mm_hash are stored in kwargs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
encoder_cache | dict[str, Tensor] | A dictionary mapping multimodal data hashes ( | required |
kwargs | dict | Additional keyword arguments for the connector. | {} |
Source code in vllm/distributed/ec_transfer/ec_connector/base.py
update_connector_output ¶
update_connector_output(
connector_output: ECConnectorOutput,
)
Update ECConnector state from worker-side connectors output.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
connector_output | ECConnectorOutput | the worker-side connectors output. | required |
Source code in vllm/distributed/ec_transfer/ec_connector/base.py
ECConnectorMetadata ¶
Bases: ABC
Abstract Metadata used to communicate between the Scheduler ECConnector and Worker ECConnector.