agent_inspect.metrics.observed package

Submodules

agent_inspect.metrics.observed.latency module

class agent_inspect.metrics.observed.latency.AverageLatency(config=None)[source]

Bases: LatencyMetric

ObservedMetric to measure the average latency in ms of agent responses (per turn) per evaluation sample.

Parameters:

config (Optional[Dict[str, Any]]) – Configuration for average latency metric initialization.

evaluate(agent_turn_traces)[source]

Calculate the average latency of the agent’s response.

Parameters:

agent_turn_traces (List[TurnTrace]) – a List [TurnTrace] object constructed with the agent trajectory information from the first turn up to the current turn.

Returns:

a NumericalScore object containing the average latency score in ms per turn (float).

class agent_inspect.metrics.observed.latency.LatencyMetric(config=None)[source]

Bases: ObservedMetric

Abstract class which should be extended for actual implementation of latency metric. Initialise an instance of LatencyMetric.

Parameters:

config (Optional[Dict[str, Any]]) – Configuration for latency metric initialization.

abstract evaluate(agent_turn_traces)[source]

This is an abstract method and should be implemented in a concrete class. Calculate the latency of the agent’s response.

Parameters:

agent_turn_traces (List[TurnTrace]) – a List [TurnTrace] object constructed with the agent trajectory information from the first turn up to the current turn.

Return type:

NumericalScore

Returns:

a NumericalScore object containing the latency score (float).

class agent_inspect.metrics.observed.latency.TotalLatency(config=None)[source]

Bases: LatencyMetric

ObservedMetric to measure the total latency of agent responses per evaluation sample.

Parameters:

config (Optional[Dict[str, Any]]) – Configuration for total latency metric initialization.

evaluate(agent_turn_traces)[source]

Calculate the total latency in ms of the agent responses.

Parameters:

agent_turn_traces (List[TurnTrace]) – a List [TurnTrace] object constructed with the agent trajectory information from the first turn up to the current turn.

Return type:

NumericalScore

Returns:

a NumericalScore object containing the total latency score in ms (float).

agent_inspect.metrics.observed.observed_metric module

class agent_inspect.metrics.observed.observed_metric.ObservedMetric(config=None)[source]

Bases: ABC

This is a base abstract class that should be extended for actual implementations.

Parameters:

config (Optional[Dict[str, Any]]) – configuration for metric initialization.

abstract evaluate(agent_turn_traces)[source]

This is an abstract method and should be implemented in a concrete class.

Parameters:

agent_turn_traces (List[TurnTrace]) – a List [TurnTrace] object constructed with the agent trajectory information from the first turn up to the current turn.

Return type:

NumericalScore

Returns:

a NumericalScore object.

agent_inspect.metrics.observed.token_count module

class agent_inspect.metrics.observed.token_count.InputTotalTokenCount(config=None)[source]

Bases: TokenConsumptionMetric

Metric to measure the input token consumption by the agent.

Parameters:

config (Optional[Dict[str, Any]]) – Configuration for input token consumption metric initialization.

evaluate(agent_turn_traces)[source]

Calculate the input token consumption by the agent.

Parameters:

agent_turn_traces (List[TurnTrace]) – a List [TurnTrace] object constructed with the agent trajectory information from the first turn up to the current turn.

Return type:

NumericalScore

Returns:

a NumericalScore object containing the total input token consumption count.

class agent_inspect.metrics.observed.token_count.OutputTotalTokenCount(config=None)[source]

Bases: TokenConsumptionMetric

Metric to measure the output token consumption by the agent.

Parameters:

config (Optional[Dict[str, Any]]) – Configuration for output token consumption metric initialization.

evaluate(agent_turn_traces)[source]

Calculate the output token consumption by the agent.

Parameters:

agent_turn_traces (List[TurnTrace]) – a List [TurnTrace] object constructed with the agent trajectory information from the first turn up to the current turn.

Return type:

NumericalScore

Returns:

a NumericalScore object containing the total output token consumption count.

class agent_inspect.metrics.observed.token_count.ReasoningTotalTokenCount(config=None)[source]

Bases: TokenConsumptionMetric

Metric to measure the reasoning token consumption by the agent.

Parameters:

config (Optional[Dict[str, Any]]) – Configuration for reasoning token consumption metric initialization.

evaluate(agent_turn_traces)[source]

Calculate the reasoning token consumption by the agent.

Parameters:

agent_turn_traces (List[TurnTrace]) – a List [TurnTrace] object constructed with the agent trajectory information from the first turn up to the current turn.

Return type:

NumericalScore

Returns:

a NumericalScore object containing the total reasoning token consumption count.

class agent_inspect.metrics.observed.token_count.TokenConsumptionMetric(config=None)[source]

Bases: ObservedMetric

ObservedMetric to measure the token consumption responses per evaluation sample. Initialise an instance of TokenConsumptionMetric.

Parameters:

config (Optional[Dict[str, Any]]) – Configuration for token consumption metric initialization.

abstract evaluate(agent_turn_traces)[source]

This is an abstract method and should be implemented in a concrete class. Calculate the token consumption by the agent.

Parameters:

agent_turn_traces (List[TurnTrace]) – a List [TurnTrace] object constructed with the agent trajectory information from the first turn up to the current turn.

Return type:

NumericalScore

Returns:

a NumericalScore object containing the token consumption count.

class agent_inspect.metrics.observed.token_count.TotalTokenConsumption(config=None)[source]

Bases: TokenConsumptionMetric

Metric to measure the total token consumption consisting of input, output, reasoning tokens by the agent.

Parameters:

config (Optional[Dict[str, Any]]) – Configuration for total token consumption metric initialization.

evaluate(agent_turn_traces)[source]

Calculate the total token consumption by the agent.

Parameters:

agent_turn_traces (List[TurnTrace]) – a List [TurnTrace] object constructed with the agent trajectory information from the first turn up to the current turn.

Return type:

NumericalScore

Returns:

a NumericalScore object containing the total token consumption count.

agent_inspect.metrics.observed.tool_call_count module

class agent_inspect.metrics.observed.tool_call_count.ToolCallCount(config=None)[source]

Bases: ObservedMetric

ToolCallCountMetric to measure the total number of tools called by the agent per evaluation sample.

Parameters:

config (Optional[Dict[str, Any]]) – Configuration for ToolCallCountMetric initialization.

evaluate(agent_turn_traces)[source]

Calculate the total number of tools called by the agent.

Parameters:

agent_turn_traces (List[TurnTrace]) – a List [TurnTrace] object constructed with the agent trajectory information from the first turn up to the current turn.

Return type:

NumericalScore

Returns:

a NumericalScore object containing the total number of tool calls.

Module contents

class agent_inspect.metrics.observed.AverageLatency(config=None)[source]

Bases: LatencyMetric

ObservedMetric to measure the average latency in ms of agent responses (per turn) per evaluation sample.

Parameters:

config (Optional[Dict[str, Any]]) – Configuration for average latency metric initialization.

evaluate(agent_turn_traces)[source]

Calculate the average latency of the agent’s response.

Parameters:

agent_turn_traces (List[TurnTrace]) – a List [TurnTrace] object constructed with the agent trajectory information from the first turn up to the current turn.

Returns:

a NumericalScore object containing the average latency score in ms per turn (float).

class agent_inspect.metrics.observed.InputTotalTokenCount(config=None)[source]

Bases: TokenConsumptionMetric

Metric to measure the input token consumption by the agent.

Parameters:

config (Optional[Dict[str, Any]]) – Configuration for input token consumption metric initialization.

evaluate(agent_turn_traces)[source]

Calculate the input token consumption by the agent.

Parameters:

agent_turn_traces (List[TurnTrace]) – a List [TurnTrace] object constructed with the agent trajectory information from the first turn up to the current turn.

Return type:

NumericalScore

Returns:

a NumericalScore object containing the total input token consumption count.

class agent_inspect.metrics.observed.ObservedMetric(config=None)[source]

Bases: ABC

This is a base abstract class that should be extended for actual implementations.

Parameters:

config (Optional[Dict[str, Any]]) – configuration for metric initialization.

abstract evaluate(agent_turn_traces)[source]

This is an abstract method and should be implemented in a concrete class.

Parameters:

agent_turn_traces (List[TurnTrace]) – a List [TurnTrace] object constructed with the agent trajectory information from the first turn up to the current turn.

Return type:

NumericalScore

Returns:

a NumericalScore object.

class agent_inspect.metrics.observed.OutputTotalTokenCount(config=None)[source]

Bases: TokenConsumptionMetric

Metric to measure the output token consumption by the agent.

Parameters:

config (Optional[Dict[str, Any]]) – Configuration for output token consumption metric initialization.

evaluate(agent_turn_traces)[source]

Calculate the output token consumption by the agent.

Parameters:

agent_turn_traces (List[TurnTrace]) – a List [TurnTrace] object constructed with the agent trajectory information from the first turn up to the current turn.

Return type:

NumericalScore

Returns:

a NumericalScore object containing the total output token consumption count.

class agent_inspect.metrics.observed.ReasoningTotalTokenCount(config=None)[source]

Bases: TokenConsumptionMetric

Metric to measure the reasoning token consumption by the agent.

Parameters:

config (Optional[Dict[str, Any]]) – Configuration for reasoning token consumption metric initialization.

evaluate(agent_turn_traces)[source]

Calculate the reasoning token consumption by the agent.

Parameters:

agent_turn_traces (List[TurnTrace]) – a List [TurnTrace] object constructed with the agent trajectory information from the first turn up to the current turn.

Return type:

NumericalScore

Returns:

a NumericalScore object containing the total reasoning token consumption count.

class agent_inspect.metrics.observed.TokenConsumptionMetric(config=None)[source]

Bases: ObservedMetric

ObservedMetric to measure the token consumption responses per evaluation sample. Initialise an instance of TokenConsumptionMetric.

Parameters:

config (Optional[Dict[str, Any]]) – Configuration for token consumption metric initialization.

abstract evaluate(agent_turn_traces)[source]

This is an abstract method and should be implemented in a concrete class. Calculate the token consumption by the agent.

Parameters:

agent_turn_traces (List[TurnTrace]) – a List [TurnTrace] object constructed with the agent trajectory information from the first turn up to the current turn.

Return type:

NumericalScore

Returns:

a NumericalScore object containing the token consumption count.

class agent_inspect.metrics.observed.ToolCallCount(config=None)[source]

Bases: ObservedMetric

ToolCallCountMetric to measure the total number of tools called by the agent per evaluation sample.

Parameters:

config (Optional[Dict[str, Any]]) – Configuration for ToolCallCountMetric initialization.

evaluate(agent_turn_traces)[source]

Calculate the total number of tools called by the agent.

Parameters:

agent_turn_traces (List[TurnTrace]) – a List [TurnTrace] object constructed with the agent trajectory information from the first turn up to the current turn.

Return type:

NumericalScore

Returns:

a NumericalScore object containing the total number of tool calls.

class agent_inspect.metrics.observed.TotalLatency(config=None)[source]

Bases: LatencyMetric

ObservedMetric to measure the total latency of agent responses per evaluation sample.

Parameters:

config (Optional[Dict[str, Any]]) – Configuration for total latency metric initialization.

evaluate(agent_turn_traces)[source]

Calculate the total latency in ms of the agent responses.

Parameters:

agent_turn_traces (List[TurnTrace]) – a List [TurnTrace] object constructed with the agent trajectory information from the first turn up to the current turn.

Return type:

NumericalScore

Returns:

a NumericalScore object containing the total latency score in ms (float).

class agent_inspect.metrics.observed.TotalTokenConsumption(config=None)[source]

Bases: TokenConsumptionMetric

Metric to measure the total token consumption consisting of input, output, reasoning tokens by the agent.

Parameters:

config (Optional[Dict[str, Any]]) – Configuration for total token consumption metric initialization.

evaluate(agent_turn_traces)[source]

Calculate the total token consumption by the agent.

Parameters:

agent_turn_traces (List[TurnTrace]) – a List [TurnTrace] object constructed with the agent trajectory information from the first turn up to the current turn.

Return type:

NumericalScore

Returns:

a NumericalScore object containing the total token consumption count.