# Latency

Latency is the time taken by the AI to process a request and generate a response.

**Detailed Explanation:**

* **Factors:** Model size, server load, and network speed affect latency.
* **Trade-off:** Larger models with higher accuracy often have longer response times.
* **Optimization:** Smaller models or caching frequent queries can reduce latency.

**Example:**\
A delay of 5 seconds might occur when querying a large model like GPT-4 for detailed outputs.
