vllm.logprobs ¶
PromptLogprobs module-attribute ¶
PromptLogprobs = (
FlatLogprobs | list[LogprobsOnePosition | None]
)
FlatLogprobs dataclass ¶
Bases: MutableSequence[LogprobsOnePosition]
Flat logprobs of a request into multiple primitive type lists.
Compared to list[dict[int, Logprob]], this data structure reduced GC overhead significantly. As it flattened logprob information for all positions and ranks in to multiple primitive type lists (i.e. logprobs, token_ids, ranks per token_ids, decoded_tokens). So regardless of the sequence length and top_logprobs setup, FlatLogprobs would only introduce a constant amount of objects.
As each position might contains different amount of ranks, start_indices_per_position would be used to access the logprob ranges for different positions.
NOTE: To reduce the migration overhead and improve backward compatibility, we support the key Sequence APIs of list, so it could act as list[LogprobsOnePosition]
Source code in vllm/logprobs.py
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 | |
decoded_tokens class-attribute instance-attribute ¶
end_indices class-attribute instance-attribute ¶
start_indices class-attribute instance-attribute ¶
__delitem__ ¶
__getitem__ ¶
__getitem__(position: int) -> LogprobsOnePosition
__getitem__(s: slice) -> FlatLogprobs
Extracts logprobs of a given position or slice
Source code in vllm/logprobs.py
__init__ ¶
__init__(
start_indices: list[int] = list(),
end_indices: list[int] = list(),
token_ids: list[int] = list(),
logprobs: list[float] = list(),
ranks: list[int | None] = list(),
decoded_tokens: list[str | None] = list(),
) -> None
__iter__ ¶
__iter__() -> Iterator[LogprobsOnePosition]
Iterates the container and yields LogprobsOnePosition for each position.
__setitem__ ¶
append ¶
append(
logprobs_one_position: LogprobsOnePosition | None,
) -> None
Appends the container with logprobs for the next position
Source code in vllm/logprobs.py
append_fast ¶
append_fast(
token_ids: list[int],
logprobs: list[float],
ranks: chain[int],
decoded_tokens: Iterable[str | None],
) -> None
Appends logprobs for the next position without creating the intermediate logprob dictionary.
Source code in vllm/logprobs.py
extend ¶
Extends the container with logprobs for the next multiple positions
Logprob dataclass ¶
Infos for supporting OpenAI compatible logprobs and token ranks.
Attributes:
| Name | Type | Description |
|---|---|---|
logprob | float | The logprob of chosen token |
rank | int | None | The vocab rank of chosen token (>=1) |
decoded_token | str | None | The decoded chosen token index |
Source code in vllm/logprobs.py
append_logprobs_for_next_position ¶
append_logprobs_for_next_position(
request_logprobs: PromptLogprobs | SampleLogprobs,
token_ids: list[int],
logprobs: list[float],
decoded_tokens: Iterable[str | None],
rank: int,
num_logprobs: int,
) -> None
Appends logprobs for the next position
Source code in vllm/logprobs.py
create_prompt_logprobs ¶
create_prompt_logprobs() -> PromptLogprobs
Creates a container to store prompt logprobs for a request
Source code in vllm/logprobs.py
create_sample_logprobs ¶
create_sample_logprobs() -> SampleLogprobs