InfoGrab Docs

Teleport 메트릭

요약

Teleport 메트릭은 성능 모니터링을 위한 것입니다. These metrics are exported by the Auth Service. These metrics apply to all inference providers.

Teleport 메트릭은 성능 모니터링을 위한 것입니다. Teleport 사용량을 모니터링하려면 이벤트 핸들러 플러그인을 사용하여 감사 이벤트를 선호하는 로그 집계 시스템(Elastic, Splunk, Sumo Logic 등)으로 푸시하는 것을 고려하세요.

다음 메트릭을 사용할 수 있습니다:

Auth Service and backends#

Name Type Component Description
audit_failed_disk_monitoring counter Teleport Audit Log Number of times disk monitoring failed.
audit_failed_emit_events counter Teleport Audit Log Number of times emitting audit events failed.
audit_percentage_disk_space_used gauge Teleport Audit Log Percentage of disk space used.
audit_server_open_files gauge Teleport Audit Log Number of open audit files.
auth_generate_requests_throttled_total counter Teleport Auth Number of throttled requests to generate new server keys.
auth_generate_requests_total counter Teleport Auth Number of requests to generate new server keys.
auth_generate_requests gauge Teleport Auth Number of current generate requests.
auth_generate_seconds histogram Teleport Auth Latency for generate requests.
backend_batch_read_requests_total counter cache Number of read requests to the backend.
backend_batch_read_seconds histogram cache Latency for batch read operations.
backend_batch_write_requests_total counter cache Number of batch write requests to the backend.
backend_batch_write_seconds histogram cache Latency for backend batch write operations.
backend_read_requests_total counter cache Number of read requests to the backend.
backend_read_seconds histogram cache Latency for read operations.
backend_requests counter cache Number of requests to the backend (reads, writes, and keepalives).
backend_write_requests_total counter cache Number of write requests to the backend.
backend_write_seconds histogram cache Latency for backend write operations.
cluster_name_not_found_total counter Teleport Auth Number of times a cluster was not found.
dynamo_requests_total counter DynamoDB Total number of requests to the DynamoDB API.
dynamo_requests counter DynamoDB Total number of requests to the DynamoDB API grouped by result.
dynamo_requests_seconds histogram DynamoDB Latency of DynamoDB API requests.
etcd_backend_batch_read_requests counter etcd Number of read requests to the etcd database.
etcd_backend_batch_read_seconds histogram etcd Latency for etcd read operations.
etcd_backend_read_requests counter etcd Number of read requests to the etcd database.
etcd_backend_read_seconds histogram etcd Latency for etcd read operations.
etcd_backend_tx_requests counter etcd Number of transaction requests to the database.
etcd_backend_tx_seconds histogram etcd Latency for etcd transaction operations.
etcd_backend_write_requests counter etcd Number of write requests to the database.
etcd_backend_write_seconds histogram etcd Latency for etcd write operations.
teleport_etcd_events counter etcd Total number of etcd events processed.
teleport_etcd_event_backpressure counter etcd Total number of times event processing encountered backpressure.
firestore_events_backend_batch_read_requests counter GCP Cloud Firestore Number of batch read requests to Cloud Firestore events.
firestore_events_backend_batch_read_seconds histogram GCP Cloud Firestore Latency for Cloud Firestore events batch read operations.
firestore_events_backend_batch_write_requests counter GCP Cloud Firestore Number of batch write requests to Cloud Firestore events.
firestore_events_backend_batch_write_seconds histogram GCP Cloud Firestore Latency for Cloud Firestore events batch write operations.
firestore_events_backend_write_requests counter GCP Cloud Firestore Number of write requests to Cloud Firestore events.
firestore_events_backend_write_seconds histogram GCP Cloud Firestore Latency for Cloud Firestore events write operations.
gcs_event_storage_downloads_seconds histogram GCP GCS Latency for GCS download operations.
gcs_event_storage_downloads counter GCP GCS Number of downloads from the GCS backend.
gcs_event_storage_uploads_seconds histogram GCP GCS Latency for GCS upload operations.
gcs_event_storage_uploads counter GCP GCS Number of uploads to the GCS backend.
grpc_server_started_total counter Teleport Auth Total number of RPCs started on the server.
grpc_server_handled_total counter Teleport Auth Total number of RPCs completed on the server, regardless of success or failure.
grpc_server_msg_received_total counter Teleport Auth Total number of RPC stream messages received on the server.
grpc_server_msg_sent_total counter Teleport Auth Total number of gRPC stream messages sent by the server.
heartbeat_connections_received_total counter Teleport Auth Number of times the Auth Service received a heartbeat connection, representing total heart beating Agents.
s3_requests_total counter Amazon S3 Total number of requests to the S3 API.
s3_requests counter Amazon S3 Total number of requests to the S3 API grouped by result.
s3_requests_seconds histogram Amazon S3 Request latency for the S3 API.
teleport_audit_emit_events counter Teleport Audit Log Number of audit events emitted.
teleport_audit_parquetlog_batch_processing_seconds histogram Teleport Audit Log Duration of processing single batch of events in the Parquet-format audit log.
teleport_audit_parquetlog_s3_flush_seconds histogram Teleport Audit Log Duration of flushing parquet files to S3 in Parquet-format audit log.
teleport_audit_parquetlog_delete_events_seconds histogram Teleport Audit Log Duration of deletion events from SQS in Parquet-format audit log.
teleport_audit_parquetlog_batch_size histogram Teleport Audit Log Overall size of events in single batch in Parquet-format audit log.
teleport_audit_parquetlog_batch_count counter Teleport Audit Log Total number of events in single batch in Parquet-format audit log.
teleport_audit_parquetlog_last_processed_timestamp gauge Teleport Audit Log Number of last processing time in Parquet-format audit log.
teleport_audit_parquetlog_age_oldest_processed_message gauge Teleport Audit Log Number of age of oldest event in Parquet-format audit log.
teleport_audit_parquetlog_errors_from_collect_count counter Teleport Audit Log Number of collect failures in Parquet-format audit log.
teleport_connected_resources gauge Teleport Auth Number and type of resources connected via keepalives.
teleport_postgres_events_backend_write_requests counter Postgres (Events) Number of write requests to postgres events, labeled with the request status (success or failure).
teleport_postgres_events_backend_batch_read_requests counter Postgres (Events) Number of batch read requests to postgres events, labeled with the request status (success or failure).
teleport_postgres_events_backend_batch_delete_requests counter Postgres (Events) Number of batch delete requests to postgres events, labeled with the request status (success or failure).
teleport_postgres_events_backend_write_seconds histogram Postgres (Events) Latency for postgres events write operations, in seconds.
teleport_postgres_events_backend_batch_read_seconds histogram Postgres (Events) Latency for postgres events batch read operations, in seconds.
teleport_postgres_events_backend_batch_delete_seconds histogram Postgres (Events) Latency for postgres events batch delete operations, in seconds.
teleport_registered_servers gauge Teleport Auth The number of Teleport services that are connected to an Auth Service instance grouped by version.
teleport_registered_servers_by_install_methods gauge Teleport Auth The number of Teleport services that are connected to an Auth Service instance grouped by install methods.
teleport_roles_total gauge Teleport Auth The number of roles that exist in the cluster.
teleport_migrations gauge Teleport Auth Tracks for each migration if it is active (1) or not (0).
teleport_bot_instances gauge Teleport Auth The number of bot instances across the entire cluster grouped by version.
user_login_total counter Teleport Auth Number of user logins.
watcher_event_sizes histogram cache Overall size of events emitted.
watcher_events histogram cache Per resource size of events emitted.

Session recording summarizer#

These metrics are exported by the Auth Service. They are all labeled with an inference_model_name label, which is the metadata.name field of corresponding inference_model resource.

General metrics#

These metrics apply to all inference providers.

Name Type Component Description
teleport_summarizer_summarizations_total counter Teleport Auth Total number of summarization jobs started
teleport_summarizer_summarization_errors counter Teleport Auth Number of failed summarization jobs
teleport_summarizer_summarization_jobs_pending gauge Teleport Auth Number of summarization jobs currently awaiting execution
teleport_summarizer_summarization_jobs_running gauge Teleport Auth Number of summarization jobs currently being executed

OpenAI-specific metrics#

These metrics apply to jobs executed using OpenAI inference provider, including OpenAI-compatible proxies.

Name Type Component Description
teleport_summarizer_openai_api_requests counter Teleport Auth Total number of OpenAI API requests
teleport_summarizer_openai_api_errors counter Teleport Auth Number of errors returned by the OpenAI API. Additionally labeled with api_error_code which denotes the OpenAI API error code.
teleport_summarizer_openai_api_requests_in_flight gauge Teleport Auth Number of OpenAI requests currently awaiting response

Enhanced Session Recording / BPF#

Name Type Component Description
bpf_lost_command_events counter BPF Number of lost command events.
bpf_lost_disk_events counter BPF Number of lost disk events.
bpf_lost_network_events counter BPF Number of lost network events.

Proxy Service#

Name Type Component Description
failed_connect_to_node_attempts_total counter Teleport Proxy Number of failed SSH connection attempts to the SSH Service. Use with teleport_connect_to_node_attempts_total to get the failure rate.
failed_login_attempts_total counter Teleport Proxy Number of failed tsh login or tsh ssh logins.
grpc_client_started_total counter Teleport Proxy Total number of RPCs started on the client.
grpc_client_handled_total counter Teleport Proxy Total number of RPCs completed on the client, regardless of success or failure.
grpc_client_msg_received_total counter Teleport Proxy Total number of RPC stream messages received on the client.
grpc_client_msg_sent_total counter Teleport Proxy Total number of gRPC stream messages sent by the client.
proxy_connection_limit_exceeded_total counter Teleport Proxy Number of connections that exceeded the Proxy Service connection limit.
proxy_peer_client_dial_error_total counter Teleport Proxy Total number of errors encountered dialing peer Proxy Service instances.
proxy_peer_client_connections gauge Teleport Proxy Number of currently opened connection to proxy Proxy Service instances.
proxy_peer_client_rpc gauge Teleport Proxy Number of current client RPC requests.
proxy_peer_client_rpc_total counter Teleport Proxy Total number of client RPC requests.
proxy_peer_client_rpc_duration_seconds histogram Teleport Proxy Duration in seconds of RPCs sent by the client.
proxy_peer_client_message_sent_size histogram Teleport Proxy Size of messages sent by the client.
proxy_peer_client_message_received_size histogram Teleport Proxy Size of messages received by the client.
proxy_peer_server_connections gauge Teleport Proxy Number of currently opened connection to peer Proxy Service clients.
proxy_peer_server_rpc gauge Teleport Proxy Number of current server RPC requests.
proxy_peer_server_rpc_total counter Teleport Proxy Total number of server RPC requests.
proxy_peer_server_rpc_duration_seconds histogram Teleport Proxy Duration in seconds of RPCs sent by the server.
proxy_peer_server_message_sent_size histogram Teleport Proxy Size of messages sent by the server.
proxy_peer_server_message_received_size histogram Teleport Proxy Size of messages received by the server.
proxy_ssh_sessions_total gauge Teleport Proxy Number of active sessions through this Proxy Service instance.
proxy_missing_ssh_tunnels gauge Teleport Proxy Number of missing SSH tunnels. Used to debug if Teleport instances have discovered all Proxy Service instances.
remote_clusters gauge Teleport Proxy Number of inbound connections from leaf clusters.
teleport_connect_to_node_attempts_total counter Teleport Proxy Number of SSH connection attempts to a SSH Service. Use with failed_connect_to_node_attempts_total to get the failure rate.
teleport_reverse_tunnels_connected gauge Teleport Proxy Number of reverse SSH tunnels connected to the Teleport Proxy Service by Teleport instances.
teleport_proxy_db_connection_setup_time_seconds histogram Teleport Proxy Time to establish connection to DB service from Proxy service.
teleport_proxy_db_connection_dial_attempts_total counter Teleport Proxy Number of dial attempts from Proxy to DB service made.
teleport_proxy_db_connection_dial_failures_total counter Teleport Proxy Number of failed dial attempts from Proxy to DB service made.
teleport_proxy_db_attempted_servers_total histogram Teleport Proxy Number of servers processed during connection attempt to the DB service from Proxy service.
teleport_proxy_db_connection_tls_config_time_seconds histogram Teleport Proxy Time to fetch TLS configuration for the connection to DB service from Proxy service.
teleport_proxy_db_active_connections_total gauge Teleport Proxy Number of currently active connections to DB service from Proxy service.
trusted_clusters gauge Teleport Proxy Number of outbound connections to leaf clusters.

Database Service#

Name Type Component Description
teleport_db_messages_from_client_total counter Teleport Database Service Number of messages (packets) received from the DB client.
teleport_db_messages_from_server_total counter Teleport Database Service Number of messages (packets) received from the DB server.
teleport_db_method_call_count_total counter Teleport Database Service Number of times a DB method was called.
teleport_db_method_call_latency_seconds histogram Teleport Database Service Call latency for a DB method calls.
teleport_db_initialized_connections_total counter Teleport Database Service Number of initialized DB connections.
teleport_db_active_connections_total gauge Teleport Database Service Number of active DB connections.
teleport_db_connection_durations_seconds histogram Teleport Database Service Duration of DB connection.
teleport_db_connection_setup_time_seconds histogram Teleport Database Service Initial time to setup DB connection, before any requests are handled.
teleport_db_errors_total counter Teleport Database Service Number of synthetic DB errors sent to the client.

Kubernetes access#

The following tables identify all metrics available in the Teleport Proxy Service if at least one Kubernetes cluster is enrolled in your Teleport cluster.

Client#

The following table identifies all metrics available when the service connects to upstream servers. In the case of proxy, the upstream server can be a kubernetes_service or Kubernetes Cluster if it's running in legacy mode.

Name Type Component Description
teleport_kubernetes_client_in_flight_requests gauge Teleport Kubernetes Proxy In-flight requests waiting for the upstream response.
teleport_kubernetes_client_requests_total counter Teleport Kubernetes Proxy Total number of requests sent to the upstream Teleport proxy, kube_service or Kubernetes Cluster servers.
teleport_kubernetes_client_tls_duration_seconds histogram Teleport Kubernetes Proxy Latency distribution of TLS handshakes.
teleport_kubernetes_client_got_conn_duration_seconds histogram Teleport Kubernetes Proxy Latency distribution of time to dial to the upstream server - using reverse tunnel or direct dialer.
teleport_kubernetes_client_first_byte_response_duration_seconds histogram Teleport Kubernetes Proxy Latency distribution of time to receive the first response byte from the upstream server.
teleport_kubernetes_client_request_duration_seconds histogram Teleport Kubernetes Proxy Latency distribution of the upstream request time.

Server#

The following table identifies all metrics available for incoming connections.

Name Type Component Description
teleport_kubernetes_server_in_flight_requests gauge Teleport Kubernetes Proxy In-flight requests currently handled by the server.
teleport_kubernetes_server_api_requests_total counter Teleport Kubernetes Proxy Total number of requests handled by the server.
teleport_kubernetes_server_request_duration_seconds histogram Teleport Kubernetes Proxy Latency distribution of the total request time.
teleport_kubernetes_server_response_size_bytes histogram Teleport Kubernetes Proxy Distribution of the response size.
teleport_kubernetes_server_exec_in_flight_sessions gauge Teleport Kubernetes Proxy Number of active kubectl exec sessions.
teleport_kubernetes_server_exec_sessions_total counter Teleport Kubernetes Proxy Total number of kubectl exec sessions.
teleport_kubernetes_server_portforward_in_flight_sessions gauge Teleport Kubernetes Proxy Number of active kubectl portforward sessions.
teleport_kubernetes_server_portforward_sessions_total counter Teleport Kubernetes Proxy Number of active kubectl portforward sessions.
teleport_kubernetes_server_join_in_flight_sessions gauge Teleport Kubernetes Proxy Number of active joining sessions,
teleport_kubernetes_server_join_sessions_total counter Teleport Kubernetes Proxy Total number of joining sessions.

Teleport SSH Service#

Name Type Component Description
user_max_concurrent_sessions_hit_total counter Teleport SSH Number of times a user exceeded their concurrent session limit.

Teleport Kubernetes Service#

The following table identifies all metrics available when the service connects to upstream servers. In the case of kubernetes_service, the upstream server is always a Kubernetes cluster.

Name Type Component Description
teleport_kubernetes_client_in_flight_requests gauge Teleport Kubernetes Service In-flight requests waiting for the upstream response.
teleport_kubernetes_client_requests_total counter Teleport Kubernetes Service Total number of requests sent to the upstream teleport proxy, kube_service or Kubernetes Cluster servers.
teleport_kubernetes_client_tls_duration_seconds histogram Teleport Kubernetes Service Latency distribution of TLS handshakes.
teleport_kubernetes_client_got_conn_duration_seconds histogram Teleport Kubernetes Service Latency distribution of time to dial to the upstream server - using reversetunnel or direct dialer.
teleport_kubernetes_client_first_byte_response_duration_seconds histogram Teleport Kubernetes Service Latency distribution of time to receive the first response byte from the upstream server.
teleport_kubernetes_client_request_duration_seconds histogram Teleport Kubernetes Service Latency distribution of the upstream request time.

The following table identifies all metrics available for incoming connections.

Name Type Component Description
teleport_kubernetes_server_in_flight_requests gauge Teleport Kubernetes Service In-flight requests currently handled by the server.
teleport_kubernetes_server_api_requests_total counter Teleport Kubernetes Service Total number of requests handled by the server.
teleport_kubernetes_server_request_duration_seconds histogram Teleport Kubernetes Service Latency distribution of the total request time.
teleport_kubernetes_server_response_size_bytes histogram Teleport Kubernetes Service Distribution of the response size.
teleport_kubernetes_server_exec_in_flight_sessions gauge Teleport Kubernetes Service Number of active kubectl exec sessions.
teleport_kubernetes_server_exec_sessions_total counter Teleport Kubernetes Service Total number of kubectl exec sessions.
teleport_kubernetes_server_portforward_in_flight_sessions gauge Teleport Kubernetes Service Number of active kubectl portforward sessions.
teleport_kubernetes_server_portforward_sessions_total counter Teleport Kubernetes Service Number of active kubectl portforward sessions.
teleport_kubernetes_server_join_in_flight_sessions gauge Teleport Kubernetes Service Number of active joining sessions,
teleport_kubernetes_server_join_sessions_total counter Teleport Kubernetes Service Total number of joining sessions.

All Teleport instances#

Name Type Component Description
process_state gauge Teleport State of the teleport process: 0 - ok, 1 - recovering, 2 - degraded, 3 - starting.
certificate_mismatch_total counter Teleport Number of SSH server login failures due to a certificate mismatch.
rx counter Teleport Number of bytes received during an SSH connection.
server_interactive_sessions_total gauge Teleport Number of active sessions.
teleport_build_info gauge Teleport Provides build information of Teleport including gitref (git describe --long --tags), Go version, and Teleport version. The value of this gauge will always be 1.
teleport_breaker_connector_executions_total counter Teleport Number of requests to the Teleport Auth Service API that go through a circuit breaker done by Teleport services, labeled by role of the connector (almost always Instance), state of the associated circuit breaker and success as interpreted by the breaker.
teleport_cache_events counter Teleport Number of events received by a Teleport service cache. Teleport's Auth Service, Proxy Service, and other services cache incoming events related to their service.
teleport_cache_stale_events counter Teleport Number of stale events received by a Teleport service cache. A high percentage of stale events can indicate a degraded backend.
tx counter Teleport Number of bytes transmitted during an SSH connection.

Teleport Health Checks#

Name Type Component Description
teleport_resources_health_status_healthy gauge Teleport Health Check Number of healthy resources.
teleport_resources_health_status_unhealthy gauge Teleport Health Check Number of unhealthy resources.
teleport_resources_health_status_unknown gauge Teleport Health Check Number of resources in an unknown health state.

Go runtime metrics#

These metrics are surfaced by the Go runtime and are not specific to Teleport.

Name Type Component Description
go_gc_duration_seconds summary Internal Go A summary of GC invocation durations.
go_goroutines gauge Internal Go Number of goroutines that currently exist.
go_info gauge Internal Go Information about the Go environment.
go_memstats_alloc_bytes_total counter Internal Go Total number of bytes allocated, even if freed.
go_memstats_alloc_bytes gauge Internal Go Number of bytes allocated and still in use.
go_memstats_buck_hash_sys_bytes gauge Internal Go Number of bytes used by the profiling bucket hash table.
go_memstats_frees_total counter Internal Go Total number of frees.
go_memstats_gc_cpu_fraction gauge Internal Go The fraction of this program's available CPU time used by the GC since the program started.
go_memstats_gc_sys_bytes gauge Internal Go Number of bytes used for garbage collection system metadata.
go_memstats_heap_alloc_bytes gauge Internal Go Number of heap bytes allocated and still in use.
go_memstats_heap_idle_bytes gauge Internal Go Number of heap bytes waiting to be used.
go_memstats_heap_inuse_bytes gauge Internal Go Number of heap bytes that are in use.
go_memstats_heap_objects gauge Internal Go Number of allocated objects.
go_memstats_heap_released_bytes gauge Internal Go Number of heap bytes released to the OS.
go_memstats_heap_sys_bytes gauge Internal Go Number of heap bytes obtained from the system.
go_memstats_last_gc_time_seconds gauge Internal Go Number of seconds since the Unix epoch of the last garbage collection.
go_memstats_lookups_total counter Internal Go Total number of pointer lookups.
go_memstats_mallocs_total counter Internal Go Total number of mallocs.
go_memstats_mcache_inuse_bytes gauge Internal Go Number of bytes in use by mcache structures.
go_memstats_mcache_sys_bytes gauge Internal Go Number of bytes used for mcache structures obtained from system.
go_memstats_mspan_inuse_bytes gauge Internal Go Number of bytes in use by mspan structures.
go_memstats_mspan_sys_bytes gauge Internal Go Number of bytes used for mspan structures obtained from system.
go_memstats_next_gc_bytes gauge Internal Go Number of heap bytes when next the garbage collection will take place.
go_memstats_other_sys_bytes gauge Internal Go Number of bytes used for other system allocations.
go_memstats_stack_inuse_bytes gauge Internal Go Number of bytes in use by the stack allocator.
go_memstats_stack_sys_bytes gauge Internal Go Number of bytes obtained from the system for stack allocator.
go_memstats_sys_bytes gauge Internal Go Number of bytes obtained from the system.
go_threads gauge Internal Go Number of OS threads created.
process_cpu_seconds_total counter Internal Go Total user and system CPU time spent in seconds.
process_max_fds gauge Internal Go Maximum number of open file descriptors.
process_open_fds gauge Internal Go Number of open file descriptors.
process_resident_memory_bytes gauge Internal Go Resident memory size in bytes.
process_start_time_seconds gauge Internal Go Start time of the process since the Unix epoch in seconds.
process_virtual_memory_bytes gauge Internal Go Virtual memory size in bytes.
process_virtual_memory_max_bytes gauge Internal Go Maximum amount of virtual memory available in bytes.

Prometheus#

Name Type Component Description
promhttp_metric_handler_requests_in_flight gauge prometheus Current number of scrapes being served.
promhttp_metric_handler_requests_total counter prometheus Total number of scrapes by HTTP status code.

Teleport 메트릭

원문 보기
요약

Teleport 메트릭은 성능 모니터링을 위한 것입니다. These metrics are exported by the Auth Service. These metrics apply to all inference providers.

Teleport 메트릭은 성능 모니터링을 위한 것입니다. Teleport 사용량을 모니터링하려면 이벤트 핸들러 플러그인을 사용하여 감사 이벤트를 선호하는 로그 집계 시스템(Elastic, Splunk, Sumo Logic 등)으로 푸시하는 것을 고려하세요.

다음 메트릭을 사용할 수 있습니다:

Auth Service and backends#

Name Type Component Description
audit_failed_disk_monitoring counter Teleport Audit Log Number of times disk monitoring failed.
audit_failed_emit_events counter Teleport Audit Log Number of times emitting audit events failed.
audit_percentage_disk_space_used gauge Teleport Audit Log Percentage of disk space used.
audit_server_open_files gauge Teleport Audit Log Number of open audit files.
auth_generate_requests_throttled_total counter Teleport Auth Number of throttled requests to generate new server keys.
auth_generate_requests_total counter Teleport Auth Number of requests to generate new server keys.
auth_generate_requests gauge Teleport Auth Number of current generate requests.
auth_generate_seconds histogram Teleport Auth Latency for generate requests.
backend_batch_read_requests_total counter cache Number of read requests to the backend.
backend_batch_read_seconds histogram cache Latency for batch read operations.
backend_batch_write_requests_total counter cache Number of batch write requests to the backend.
backend_batch_write_seconds histogram cache Latency for backend batch write operations.
backend_read_requests_total counter cache Number of read requests to the backend.
backend_read_seconds histogram cache Latency for read operations.
backend_requests counter cache Number of requests to the backend (reads, writes, and keepalives).
backend_write_requests_total counter cache Number of write requests to the backend.
backend_write_seconds histogram cache Latency for backend write operations.
cluster_name_not_found_total counter Teleport Auth Number of times a cluster was not found.
dynamo_requests_total counter DynamoDB Total number of requests to the DynamoDB API.
dynamo_requests counter DynamoDB Total number of requests to the DynamoDB API grouped by result.
dynamo_requests_seconds histogram DynamoDB Latency of DynamoDB API requests.
etcd_backend_batch_read_requests counter etcd Number of read requests to the etcd database.
etcd_backend_batch_read_seconds histogram etcd Latency for etcd read operations.
etcd_backend_read_requests counter etcd Number of read requests to the etcd database.
etcd_backend_read_seconds histogram etcd Latency for etcd read operations.
etcd_backend_tx_requests counter etcd Number of transaction requests to the database.
etcd_backend_tx_seconds histogram etcd Latency for etcd transaction operations.
etcd_backend_write_requests counter etcd Number of write requests to the database.
etcd_backend_write_seconds histogram etcd Latency for etcd write operations.
teleport_etcd_events counter etcd Total number of etcd events processed.
teleport_etcd_event_backpressure counter etcd Total number of times event processing encountered backpressure.
firestore_events_backend_batch_read_requests counter GCP Cloud Firestore Number of batch read requests to Cloud Firestore events.
firestore_events_backend_batch_read_seconds histogram GCP Cloud Firestore Latency for Cloud Firestore events batch read operations.
firestore_events_backend_batch_write_requests counter GCP Cloud Firestore Number of batch write requests to Cloud Firestore events.
firestore_events_backend_batch_write_seconds histogram GCP Cloud Firestore Latency for Cloud Firestore events batch write operations.
firestore_events_backend_write_requests counter GCP Cloud Firestore Number of write requests to Cloud Firestore events.
firestore_events_backend_write_seconds histogram GCP Cloud Firestore Latency for Cloud Firestore events write operations.
gcs_event_storage_downloads_seconds histogram GCP GCS Latency for GCS download operations.
gcs_event_storage_downloads counter GCP GCS Number of downloads from the GCS backend.
gcs_event_storage_uploads_seconds histogram GCP GCS Latency for GCS upload operations.
gcs_event_storage_uploads counter GCP GCS Number of uploads to the GCS backend.
grpc_server_started_total counter Teleport Auth Total number of RPCs started on the server.
grpc_server_handled_total counter Teleport Auth Total number of RPCs completed on the server, regardless of success or failure.
grpc_server_msg_received_total counter Teleport Auth Total number of RPC stream messages received on the server.
grpc_server_msg_sent_total counter Teleport Auth Total number of gRPC stream messages sent by the server.
heartbeat_connections_received_total counter Teleport Auth Number of times the Auth Service received a heartbeat connection, representing total heart beating Agents.
s3_requests_total counter Amazon S3 Total number of requests to the S3 API.
s3_requests counter Amazon S3 Total number of requests to the S3 API grouped by result.
s3_requests_seconds histogram Amazon S3 Request latency for the S3 API.
teleport_audit_emit_events counter Teleport Audit Log Number of audit events emitted.
teleport_audit_parquetlog_batch_processing_seconds histogram Teleport Audit Log Duration of processing single batch of events in the Parquet-format audit log.
teleport_audit_parquetlog_s3_flush_seconds histogram Teleport Audit Log Duration of flushing parquet files to S3 in Parquet-format audit log.
teleport_audit_parquetlog_delete_events_seconds histogram Teleport Audit Log Duration of deletion events from SQS in Parquet-format audit log.
teleport_audit_parquetlog_batch_size histogram Teleport Audit Log Overall size of events in single batch in Parquet-format audit log.
teleport_audit_parquetlog_batch_count counter Teleport Audit Log Total number of events in single batch in Parquet-format audit log.
teleport_audit_parquetlog_last_processed_timestamp gauge Teleport Audit Log Number of last processing time in Parquet-format audit log.
teleport_audit_parquetlog_age_oldest_processed_message gauge Teleport Audit Log Number of age of oldest event in Parquet-format audit log.
teleport_audit_parquetlog_errors_from_collect_count counter Teleport Audit Log Number of collect failures in Parquet-format audit log.
teleport_connected_resources gauge Teleport Auth Number and type of resources connected via keepalives.
teleport_postgres_events_backend_write_requests counter Postgres (Events) Number of write requests to postgres events, labeled with the request status (success or failure).
teleport_postgres_events_backend_batch_read_requests counter Postgres (Events) Number of batch read requests to postgres events, labeled with the request status (success or failure).
teleport_postgres_events_backend_batch_delete_requests counter Postgres (Events) Number of batch delete requests to postgres events, labeled with the request status (success or failure).
teleport_postgres_events_backend_write_seconds histogram Postgres (Events) Latency for postgres events write operations, in seconds.
teleport_postgres_events_backend_batch_read_seconds histogram Postgres (Events) Latency for postgres events batch read operations, in seconds.
teleport_postgres_events_backend_batch_delete_seconds histogram Postgres (Events) Latency for postgres events batch delete operations, in seconds.
teleport_registered_servers gauge Teleport Auth The number of Teleport services that are connected to an Auth Service instance grouped by version.
teleport_registered_servers_by_install_methods gauge Teleport Auth The number of Teleport services that are connected to an Auth Service instance grouped by install methods.
teleport_roles_total gauge Teleport Auth The number of roles that exist in the cluster.
teleport_migrations gauge Teleport Auth Tracks for each migration if it is active (1) or not (0).
teleport_bot_instances gauge Teleport Auth The number of bot instances across the entire cluster grouped by version.
user_login_total counter Teleport Auth Number of user logins.
watcher_event_sizes histogram cache Overall size of events emitted.
watcher_events histogram cache Per resource size of events emitted.

Session recording summarizer#

These metrics are exported by the Auth Service. They are all labeled with an inference_model_name label, which is the metadata.name field of corresponding inference_model resource.

General metrics#

These metrics apply to all inference providers.

Name Type Component Description
teleport_summarizer_summarizations_total counter Teleport Auth Total number of summarization jobs started
teleport_summarizer_summarization_errors counter Teleport Auth Number of failed summarization jobs
teleport_summarizer_summarization_jobs_pending gauge Teleport Auth Number of summarization jobs currently awaiting execution
teleport_summarizer_summarization_jobs_running gauge Teleport Auth Number of summarization jobs currently being executed

OpenAI-specific metrics#

These metrics apply to jobs executed using OpenAI inference provider, including OpenAI-compatible proxies.

Name Type Component Description
teleport_summarizer_openai_api_requests counter Teleport Auth Total number of OpenAI API requests
teleport_summarizer_openai_api_errors counter Teleport Auth Number of errors returned by the OpenAI API. Additionally labeled with api_error_code which denotes the OpenAI API error code.
teleport_summarizer_openai_api_requests_in_flight gauge Teleport Auth Number of OpenAI requests currently awaiting response

Enhanced Session Recording / BPF#

Name Type Component Description
bpf_lost_command_events counter BPF Number of lost command events.
bpf_lost_disk_events counter BPF Number of lost disk events.
bpf_lost_network_events counter BPF Number of lost network events.

Proxy Service#

Name Type Component Description
failed_connect_to_node_attempts_total counter Teleport Proxy Number of failed SSH connection attempts to the SSH Service. Use with teleport_connect_to_node_attempts_total to get the failure rate.
failed_login_attempts_total counter Teleport Proxy Number of failed tsh login or tsh ssh logins.
grpc_client_started_total counter Teleport Proxy Total number of RPCs started on the client.
grpc_client_handled_total counter Teleport Proxy Total number of RPCs completed on the client, regardless of success or failure.
grpc_client_msg_received_total counter Teleport Proxy Total number of RPC stream messages received on the client.
grpc_client_msg_sent_total counter Teleport Proxy Total number of gRPC stream messages sent by the client.
proxy_connection_limit_exceeded_total counter Teleport Proxy Number of connections that exceeded the Proxy Service connection limit.
proxy_peer_client_dial_error_total counter Teleport Proxy Total number of errors encountered dialing peer Proxy Service instances.
proxy_peer_client_connections gauge Teleport Proxy Number of currently opened connection to proxy Proxy Service instances.
proxy_peer_client_rpc gauge Teleport Proxy Number of current client RPC requests.
proxy_peer_client_rpc_total counter Teleport Proxy Total number of client RPC requests.
proxy_peer_client_rpc_duration_seconds histogram Teleport Proxy Duration in seconds of RPCs sent by the client.
proxy_peer_client_message_sent_size histogram Teleport Proxy Size of messages sent by the client.
proxy_peer_client_message_received_size histogram Teleport Proxy Size of messages received by the client.
proxy_peer_server_connections gauge Teleport Proxy Number of currently opened connection to peer Proxy Service clients.
proxy_peer_server_rpc gauge Teleport Proxy Number of current server RPC requests.
proxy_peer_server_rpc_total counter Teleport Proxy Total number of server RPC requests.
proxy_peer_server_rpc_duration_seconds histogram Teleport Proxy Duration in seconds of RPCs sent by the server.
proxy_peer_server_message_sent_size histogram Teleport Proxy Size of messages sent by the server.
proxy_peer_server_message_received_size histogram Teleport Proxy Size of messages received by the server.
proxy_ssh_sessions_total gauge Teleport Proxy Number of active sessions through this Proxy Service instance.
proxy_missing_ssh_tunnels gauge Teleport Proxy Number of missing SSH tunnels. Used to debug if Teleport instances have discovered all Proxy Service instances.
remote_clusters gauge Teleport Proxy Number of inbound connections from leaf clusters.
teleport_connect_to_node_attempts_total counter Teleport Proxy Number of SSH connection attempts to a SSH Service. Use with failed_connect_to_node_attempts_total to get the failure rate.
teleport_reverse_tunnels_connected gauge Teleport Proxy Number of reverse SSH tunnels connected to the Teleport Proxy Service by Teleport instances.
teleport_proxy_db_connection_setup_time_seconds histogram Teleport Proxy Time to establish connection to DB service from Proxy service.
teleport_proxy_db_connection_dial_attempts_total counter Teleport Proxy Number of dial attempts from Proxy to DB service made.
teleport_proxy_db_connection_dial_failures_total counter Teleport Proxy Number of failed dial attempts from Proxy to DB service made.
teleport_proxy_db_attempted_servers_total histogram Teleport Proxy Number of servers processed during connection attempt to the DB service from Proxy service.
teleport_proxy_db_connection_tls_config_time_seconds histogram Teleport Proxy Time to fetch TLS configuration for the connection to DB service from Proxy service.
teleport_proxy_db_active_connections_total gauge Teleport Proxy Number of currently active connections to DB service from Proxy service.
trusted_clusters gauge Teleport Proxy Number of outbound connections to leaf clusters.

Database Service#

Name Type Component Description
teleport_db_messages_from_client_total counter Teleport Database Service Number of messages (packets) received from the DB client.
teleport_db_messages_from_server_total counter Teleport Database Service Number of messages (packets) received from the DB server.
teleport_db_method_call_count_total counter Teleport Database Service Number of times a DB method was called.
teleport_db_method_call_latency_seconds histogram Teleport Database Service Call latency for a DB method calls.
teleport_db_initialized_connections_total counter Teleport Database Service Number of initialized DB connections.
teleport_db_active_connections_total gauge Teleport Database Service Number of active DB connections.
teleport_db_connection_durations_seconds histogram Teleport Database Service Duration of DB connection.
teleport_db_connection_setup_time_seconds histogram Teleport Database Service Initial time to setup DB connection, before any requests are handled.
teleport_db_errors_total counter Teleport Database Service Number of synthetic DB errors sent to the client.

Kubernetes access#

The following tables identify all metrics available in the Teleport Proxy Service if at least one Kubernetes cluster is enrolled in your Teleport cluster.

Client#

The following table identifies all metrics available when the service connects to upstream servers. In the case of proxy, the upstream server can be a kubernetes_service or Kubernetes Cluster if it's running in legacy mode.

Name Type Component Description
teleport_kubernetes_client_in_flight_requests gauge Teleport Kubernetes Proxy In-flight requests waiting for the upstream response.
teleport_kubernetes_client_requests_total counter Teleport Kubernetes Proxy Total number of requests sent to the upstream Teleport proxy, kube_service or Kubernetes Cluster servers.
teleport_kubernetes_client_tls_duration_seconds histogram Teleport Kubernetes Proxy Latency distribution of TLS handshakes.
teleport_kubernetes_client_got_conn_duration_seconds histogram Teleport Kubernetes Proxy Latency distribution of time to dial to the upstream server - using reverse tunnel or direct dialer.
teleport_kubernetes_client_first_byte_response_duration_seconds histogram Teleport Kubernetes Proxy Latency distribution of time to receive the first response byte from the upstream server.
teleport_kubernetes_client_request_duration_seconds histogram Teleport Kubernetes Proxy Latency distribution of the upstream request time.

Server#

The following table identifies all metrics available for incoming connections.

Name Type Component Description
teleport_kubernetes_server_in_flight_requests gauge Teleport Kubernetes Proxy In-flight requests currently handled by the server.
teleport_kubernetes_server_api_requests_total counter Teleport Kubernetes Proxy Total number of requests handled by the server.
teleport_kubernetes_server_request_duration_seconds histogram Teleport Kubernetes Proxy Latency distribution of the total request time.
teleport_kubernetes_server_response_size_bytes histogram Teleport Kubernetes Proxy Distribution of the response size.
teleport_kubernetes_server_exec_in_flight_sessions gauge Teleport Kubernetes Proxy Number of active kubectl exec sessions.
teleport_kubernetes_server_exec_sessions_total counter Teleport Kubernetes Proxy Total number of kubectl exec sessions.
teleport_kubernetes_server_portforward_in_flight_sessions gauge Teleport Kubernetes Proxy Number of active kubectl portforward sessions.
teleport_kubernetes_server_portforward_sessions_total counter Teleport Kubernetes Proxy Number of active kubectl portforward sessions.
teleport_kubernetes_server_join_in_flight_sessions gauge Teleport Kubernetes Proxy Number of active joining sessions,
teleport_kubernetes_server_join_sessions_total counter Teleport Kubernetes Proxy Total number of joining sessions.

Teleport SSH Service#

Name Type Component Description
user_max_concurrent_sessions_hit_total counter Teleport SSH Number of times a user exceeded their concurrent session limit.

Teleport Kubernetes Service#

The following table identifies all metrics available when the service connects to upstream servers. In the case of kubernetes_service, the upstream server is always a Kubernetes cluster.

Name Type Component Description
teleport_kubernetes_client_in_flight_requests gauge Teleport Kubernetes Service In-flight requests waiting for the upstream response.
teleport_kubernetes_client_requests_total counter Teleport Kubernetes Service Total number of requests sent to the upstream teleport proxy, kube_service or Kubernetes Cluster servers.
teleport_kubernetes_client_tls_duration_seconds histogram Teleport Kubernetes Service Latency distribution of TLS handshakes.
teleport_kubernetes_client_got_conn_duration_seconds histogram Teleport Kubernetes Service Latency distribution of time to dial to the upstream server - using reversetunnel or direct dialer.
teleport_kubernetes_client_first_byte_response_duration_seconds histogram Teleport Kubernetes Service Latency distribution of time to receive the first response byte from the upstream server.
teleport_kubernetes_client_request_duration_seconds histogram Teleport Kubernetes Service Latency distribution of the upstream request time.

The following table identifies all metrics available for incoming connections.

Name Type Component Description
teleport_kubernetes_server_in_flight_requests gauge Teleport Kubernetes Service In-flight requests currently handled by the server.
teleport_kubernetes_server_api_requests_total counter Teleport Kubernetes Service Total number of requests handled by the server.
teleport_kubernetes_server_request_duration_seconds histogram Teleport Kubernetes Service Latency distribution of the total request time.
teleport_kubernetes_server_response_size_bytes histogram Teleport Kubernetes Service Distribution of the response size.
teleport_kubernetes_server_exec_in_flight_sessions gauge Teleport Kubernetes Service Number of active kubectl exec sessions.
teleport_kubernetes_server_exec_sessions_total counter Teleport Kubernetes Service Total number of kubectl exec sessions.
teleport_kubernetes_server_portforward_in_flight_sessions gauge Teleport Kubernetes Service Number of active kubectl portforward sessions.
teleport_kubernetes_server_portforward_sessions_total counter Teleport Kubernetes Service Number of active kubectl portforward sessions.
teleport_kubernetes_server_join_in_flight_sessions gauge Teleport Kubernetes Service Number of active joining sessions,
teleport_kubernetes_server_join_sessions_total counter Teleport Kubernetes Service Total number of joining sessions.

All Teleport instances#

Name Type Component Description
process_state gauge Teleport State of the teleport process: 0 - ok, 1 - recovering, 2 - degraded, 3 - starting.
certificate_mismatch_total counter Teleport Number of SSH server login failures due to a certificate mismatch.
rx counter Teleport Number of bytes received during an SSH connection.
server_interactive_sessions_total gauge Teleport Number of active sessions.
teleport_build_info gauge Teleport Provides build information of Teleport including gitref (git describe --long --tags), Go version, and Teleport version. The value of this gauge will always be 1.
teleport_breaker_connector_executions_total counter Teleport Number of requests to the Teleport Auth Service API that go through a circuit breaker done by Teleport services, labeled by role of the connector (almost always Instance), state of the associated circuit breaker and success as interpreted by the breaker.
teleport_cache_events counter Teleport Number of events received by a Teleport service cache. Teleport's Auth Service, Proxy Service, and other services cache incoming events related to their service.
teleport_cache_stale_events counter Teleport Number of stale events received by a Teleport service cache. A high percentage of stale events can indicate a degraded backend.
tx counter Teleport Number of bytes transmitted during an SSH connection.

Teleport Health Checks#

Name Type Component Description
teleport_resources_health_status_healthy gauge Teleport Health Check Number of healthy resources.
teleport_resources_health_status_unhealthy gauge Teleport Health Check Number of unhealthy resources.
teleport_resources_health_status_unknown gauge Teleport Health Check Number of resources in an unknown health state.

Go runtime metrics#

These metrics are surfaced by the Go runtime and are not specific to Teleport.

Name Type Component Description
go_gc_duration_seconds summary Internal Go A summary of GC invocation durations.
go_goroutines gauge Internal Go Number of goroutines that currently exist.
go_info gauge Internal Go Information about the Go environment.
go_memstats_alloc_bytes_total counter Internal Go Total number of bytes allocated, even if freed.
go_memstats_alloc_bytes gauge Internal Go Number of bytes allocated and still in use.
go_memstats_buck_hash_sys_bytes gauge Internal Go Number of bytes used by the profiling bucket hash table.
go_memstats_frees_total counter Internal Go Total number of frees.
go_memstats_gc_cpu_fraction gauge Internal Go The fraction of this program's available CPU time used by the GC since the program started.
go_memstats_gc_sys_bytes gauge Internal Go Number of bytes used for garbage collection system metadata.
go_memstats_heap_alloc_bytes gauge Internal Go Number of heap bytes allocated and still in use.
go_memstats_heap_idle_bytes gauge Internal Go Number of heap bytes waiting to be used.
go_memstats_heap_inuse_bytes gauge Internal Go Number of heap bytes that are in use.
go_memstats_heap_objects gauge Internal Go Number of allocated objects.
go_memstats_heap_released_bytes gauge Internal Go Number of heap bytes released to the OS.
go_memstats_heap_sys_bytes gauge Internal Go Number of heap bytes obtained from the system.
go_memstats_last_gc_time_seconds gauge Internal Go Number of seconds since the Unix epoch of the last garbage collection.
go_memstats_lookups_total counter Internal Go Total number of pointer lookups.
go_memstats_mallocs_total counter Internal Go Total number of mallocs.
go_memstats_mcache_inuse_bytes gauge Internal Go Number of bytes in use by mcache structures.
go_memstats_mcache_sys_bytes gauge Internal Go Number of bytes used for mcache structures obtained from system.
go_memstats_mspan_inuse_bytes gauge Internal Go Number of bytes in use by mspan structures.
go_memstats_mspan_sys_bytes gauge Internal Go Number of bytes used for mspan structures obtained from system.
go_memstats_next_gc_bytes gauge Internal Go Number of heap bytes when next the garbage collection will take place.
go_memstats_other_sys_bytes gauge Internal Go Number of bytes used for other system allocations.
go_memstats_stack_inuse_bytes gauge Internal Go Number of bytes in use by the stack allocator.
go_memstats_stack_sys_bytes gauge Internal Go Number of bytes obtained from the system for stack allocator.
go_memstats_sys_bytes gauge Internal Go Number of bytes obtained from the system.
go_threads gauge Internal Go Number of OS threads created.
process_cpu_seconds_total counter Internal Go Total user and system CPU time spent in seconds.
process_max_fds gauge Internal Go Maximum number of open file descriptors.
process_open_fds gauge Internal Go Number of open file descriptors.
process_resident_memory_bytes gauge Internal Go Resident memory size in bytes.
process_start_time_seconds gauge Internal Go Start time of the process since the Unix epoch in seconds.
process_virtual_memory_bytes gauge Internal Go Virtual memory size in bytes.
process_virtual_memory_max_bytes gauge Internal Go Maximum amount of virtual memory available in bytes.

Prometheus#

Name Type Component Description
promhttp_metric_handler_requests_in_flight gauge prometheus Current number of scrapes being served.
promhttp_metric_handler_requests_total counter prometheus Total number of scrapes by HTTP status code.