Model Context Protocol: A Deep Dive with Hands-On Examples

In the realm of large language models (LLMs), the ability to understand and utilize context is paramount to generating relevant, coherent, and accurate responses. A Model Context Protocol refers to the established strategies, techniques, and architectural patterns employed to effectively manage and leverage the information provided to an LLM during a specific interaction. It’s the blueprint for how an AI system ingests, processes, and acts upon the contextual cues it receives.

Think of it as the set of rules and procedures that dictate how a conversation, a document, or any other form of input is presented to the model so it can perform its task optimally. Without a well-defined context protocol, even the most powerful LLM might struggle to understand the user’s intent, recall relevant information from past interactions, or ground its responses in specific data.

Why is a Deep Understanding of Context Protocol Crucial?

The effectiveness of any application built upon an LLM hinges on its ability to handle context effectively. A robust context protocol addresses several key challenges:

Context Window Limitations: LLMs have a finite “context window,” which is the maximum amount of text they can process at once. Efficient protocols are needed to select and prioritize the most relevant information within this window.
Maintaining Coherence and Relevance: In multi-turn conversations or complex tasks, the model needs to remember previous interactions and maintain a consistent understanding of the topic.
Grounding Responses in Specific Data: For tasks like question answering or information retrieval, the model must be able to access and utilize external knowledge sources or provided documents.
Personalization and Customization: Tailoring responses to individual user preferences or specific situations requires the model to be aware of the relevant contextual factors.
Avoiding Hallucinations and Inaccuracies: By providing relevant and accurate context, we can guide the model to generate more reliable and truthful responses.

Working of a Model Context Protocol

The exact implementation of a Model Context Protocol can vary depending on the specific use case and the architecture of the AI system. However, the underlying principles and common steps often include:

Input Analysis and Intent Recognition: The system first analyzes the user’s current input to understand their intent and identify any explicit or implicit contextual requirements. This might involve natural language understanding (NLU) techniques to extract keywords, identify entities, and determine the user’s goal.
Contextual Information Retrieval: Based on the input analysis, the system identifies and retrieves relevant contextual information. This can come from various sources:

Conversation History: For chatbots and conversational agents, the previous turns of the conversation form a crucial part of the context.
User Profiles and Preferences: Information about the user’s past interactions, interests, and settings can be used to personalize responses.
External Knowledge Sources: Databases, knowledge graphs, documents, and web searches can provide the model with the necessary information to answer questions or perform specific tasks.
Application State: The current state of the application or workflow can also be relevant context.

Context Construction and Formatting: Once the relevant contextual information is retrieved, it needs to be structured and formatted in a way that the LLM can effectively understand and utilize. This might involve:

Concatenating relevant information: Combining the current input with the retrieved context.
Using specific delimiters or formatting: Clearly separating different pieces of contextual information.
Employing prompt engineering techniques: Crafting prompts that explicitly instruct the model on how to use the provided context.
Summarization or filtering: Condensing or selecting the most important parts of the context to fit within the model’s context window.

Model Inference: The constructed input, containing both the user’s query and the relevant context, is then passed to the LLM. The model processes this information and generates a response based on its understanding of the context.
Response Generation and Refinement: The LLM’s output is then further processed and refined to ensure it is coherent, relevant, and aligned with the user’s expectations. This might involve post-processing steps like formatting, filtering, or adding additional information.
Context Update and Management: For ongoing interactions, the context needs to be updated and managed for subsequent turns. This might involve adding the current turn to the conversation history, updating the application state, or refreshing information from external sources.

Model Context Protocol Use Cases

A well-designed Model Context Protocol is essential for a wide range of applications leveraging LLMs:

Enhanced Chatbots and Conversational Agents: By maintaining conversation history and accessing user profiles, chatbots can provide more personalized and context-aware interactions. For example, a customer support chatbot can remember previous issues raised by the user and provide more efficient solutions.

Question Answering Systems over Large Documents: When a user asks a question about a lengthy document, the system can use a context protocol to identify relevant sections of the document and provide them as context to the LLM, enabling it to answer the question accurately. This is crucial for legal document analysis, research paper summarization, and technical documentation support.

Personalized Recommendation Systems: By considering a user’s past purchases, Browse history, and preferences as context, LLMs can generate more relevant and effective product or content recommendations.

Code Generation and Completion Tools: When a developer is writing code, the IDE can provide the LLM with the current code context (e.g., surrounding lines of code, function definitions) to enable more intelligent code completion and suggestion.

Meeting Summarization and Action Item Extraction: Providing the transcript of a meeting as context allows an LLM to generate a concise summary of the key discussion points and identify any assigned action items.

Content Creation with Specific Guidelines: When generating marketing copy or other forms of content, providing the LLM with specific brand guidelines, target audience information, and desired tone as context ensures the generated content aligns with the requirements.

Multi-turn Reasoning and Problem Solving: For complex tasks that require multiple steps or considerations, a context protocol can help the LLM maintain its understanding of the problem and the steps taken so far, leading to more effective solutions.

Detailed Example: Question Answering over a PDF Document

Let’s consider a scenario where a user wants to ask a question about the content of a large PDF document. Here’s how a Model Context Protocol might work in this case:

1. User Input: The user asks: “What are the key findings of the study regarding the impact of climate change on agricultural yields?”

2. Input Analysis: The system analyzes the query and identifies keywords like “key findings,” “climate change,” and “agricultural yields.”

3. Contextual Information Retrieval:

The system accesses the provided PDF document.

It uses information retrieval techniques (e.g., keyword search, semantic search) to identify sections or paragraphs within the document that are most likely to contain information related to climate change and agricultural yields.

Let’s say the system identifies the following relevant snippets:

*	**Snippet 1:** "The study revealed a significant negative correlation between rising global temperatures and the productivity of major cereal crops such as wheat and rice."
*	**Snippet 2:** "Changes in precipitation patterns, including increased frequency of droughts and floods, were also found to have a detrimental impact on agricultural output in several regions."
*	**Snippet 3: ** "Our analysis indicates that without significant mitigation efforts, global agricultural yields could decline by as much as 15-20% by the end of the century."

Context Construction and Formatting:

The system constructs a prompt that includes the user’s question and the retrieved snippets. A possible prompt structure could be:

```

Answer the following question based on the provided document snippets: "What are the key findings of the study regarding the impact of climate change on agricultural yields?"

Document Snippets:

---

Snippet 1: The study revealed a significant negative correlation between rising global temperatures and the productivity of major cereal crops such as wheat and rice.

---

Snippet 2: Changes in precipitation patterns, including increased frequency of droughts and floods, were also found to have a detrimental impact on agricultural output in several regions.

---

Snippet 3: Our analysis indicates that without significant mitigation efforts, global agricultural yields could decline by as much as 15-20% by the end of the century.

```

Model Inference: The constructed prompt is sent to the LLM.

Response Generation: The LLM processes the information and generates a response based on the provided context. A possible response could be:

"Based on the study, the key findings regarding the impact of climate change on agricultural yields are:

*	Rising global temperatures have a significant negative correlation with the productivity of major cereal crops like wheat and rice.
*	Changes in precipitation patterns, such as increased droughts and floods, negatively affect agricultural output in various regions.
*	Without significant mitigation, global agricultural yields could decrease by 15- 20% by the end of the century."

Context Update (Not applicable in this single-turn example): In a multi-turn conversation, if the user asked a follow-up question, the system would likely include the previous question and answer in the context for the next turn.

Coding an MCP Server and Application

Let’s code a simple Model Context Protocol (MCP) server and application step by step using Python and

Flask for the server and the library for the client.

Step 1: Setting up the Server (mcp_server.py)

This server will handle user queries, maintain a simple in-memory context for each session, and simulate a basic response generation based on the context.

from flask import Flask, request, jsonify 
import uuid

app = Flask(  name  )

# In-memory storage for context (session_id -> list of messages) 
context_store = {}

@app.route('/query', methods=['POST']) 
def handle_query():
data = request.get_json() query = data.get('query')
session_id = data.get('session_id')

if not query:
return jsonify({"error": "Query cannot be empty"}), 400

if not session_id:
session_id = str(uuid.uuid4()) 
context_store[session_id] = []

if session_id not in context_store: 
context_store[session_id] = []

# Add the user's query to the context 
context_store[session_id].append({"role": "user", "content": query})

# --- Simulate a simple response based on the context --- response_content = "Acknowledged: " + query

if len(context_store[session_id]) > 1: 
previous_interaction = context_store[session_id][-2] 
response_content += f" (Previous interaction:

'{previous_interaction['content']}')" 

else:
response_content += " (This is the first interaction.)"

response = {"role": "assistant", "content": response_content} context_store[session_id].append(response)

# --- End of simulated response ---
return jsonify({"response": response["content"], "session_id": session_id}) if   name == '  main  ':

print("MCP Server is starting...") 
app.run(debug=True)

Explanation of the Server Code:

Import Libraries: We import for creating the web server, to handle incoming requests, to convert responses to JSON, and uuid to generate unique session IDs.
Initialize Flask App: app = Flask(__name__) creates our Flask application instance.
context_store : This dictionary acts as our simple in-memory database to store the context for each session. The key is the session_id , and the value is a list of messages (each message being a dictionary with role and content ).
/query Endpoint: This route handles POST requests to /query .

Get Data: It retrieves the query and session_id from the JSON data sent in the request.
Handle New Session: If no session_id is provided, it generates a new unique ID and initializes an empty list in context_store for this session.
Ensure Session Exists: It checks if the session_id exists in and initializes it if not.
Update Context: The user’s query is added to the context list for the corresponding session_id . We represent each message in the context as a dictionary with a of the message. (either “user” or “assistant”) and the
Simulate Response: This is a very basic simulation of an LLM. It simply acknowledges the user’s query and, if there was a previous interaction, mentions it. In a real-world scenario, this is where you would integrate with an actual LLM.
Add Assistant Response to Context: The simulated response is also added to the context.
Return Response: The server returns a JSON response containing the content and the
session_id . The client needs to store this session_id for future interactions in the same session.

Run the Server: The if __name__ == ‘__main__’: block starts the Flask development server when the script is executed. debug=True enables debugging features, which is useful during development.

Step 2: Setting up the Application (mcp_app.py)

This application will allow the user to interact with the MCP server.

import requests import json

SERVER_URL = "http://127.0.0.1:5000/query" # Default Flask server address 
session_id = None # To store the session ID

def send_query(query, session_id):
  payload = {"query": query, "session_id": session_id} headers = {'Content-Type':     
  'application/json'}

  try:
  response = requests.post(SERVER_URL, headers=headers, data=json.dumps(payload))
  response.raise_for_status() # Raise an exception for bad status codes 
  return response.json()
  except requests.exceptions.RequestException as e: print(f"Error communicating with the 
  server: {e}") 
 return None

if   name == "  main  ": 
  print("Simple MCP Application") while True:
  user_input = input("You: ")
  if user_input.lower() == "exit": break
  server_response_data = send_query(user_input, session_id)
  if server_response_data:
  print(f"Server: {server_response_data.get('response')}") 
  session_id = server_response_data.get('session_id')

else:
  print("Failed to get a response from the server.")

Explanation of the Application Code:

Import Libraries: We import data to make HTTP requests to the server and to work with JSON
SERVER_URL : This constant defines the address of our MCP server. Make sure this matches the address where your Flask server is running (usually http://127.0.0.1:5000/ ).
session_id : This variable will store the session ID received from the server. It’s initialized to None .
send_query Function:

Takes the user’s query and the current as input.
Creates a payload dictionary containing the query and session ID.
Sets the Content-Type header to to indicate that we are sending JSON data.
Uses requests.post to send a POST request to the server’s endpoint with the payload.
response.raise_for_status() checks if the server returned a successful status code (e.g., 200 OK). If not, it raises an exception.
If the request is successful, it parses the JSON response from the server and returns it. Includes basic error handling for network issues or server errors.

Main Execution Block ( if __name__ == “__main__”: )

Prints a welcome message.
Enters a loop to continuously take user input.
Prompts the user to enter their query. If the user types “exit”, the loop breaks.
function to send the user’s input to the server along with the current
If the server returns a valid response, it prints the server’s response and updates the new or existing session ID received from the server. This is crucial for maintaining context in subsequent turns.
If there’s an error communicating with the server, it prints an error message.

Step 3: Running the Server and Application

Save the server code as mcp_server.py and the application code as mcp_app.py in the same directory.
Open two separate terminal windows.
In the first terminal, navigate to the directory where you saved the files and run the server:

python mcp_server.py

You should see output indicating that the Flask development server has started (usually on http://127.0.0.1:5000/)

In the second terminal, navigate to the same directory and run the application:

python mcp_app.py

You should see the “Simple MCP Application” message.

Step 4: Interacting with the Application

Now you can type queries into the application terminal and see the responses from the server.

Example Interaction:

Application Terminal:

Simple MCP Application
You: Hello

Server Terminal (will show the request):

Application Terminal (will show the server’s response):

Explanation of the Interaction:

In the first interaction, the application sends “Hello” to the server. Since no session_id was provided initially, the server generates one and responds with “Acknowledged: Hello (This is the first interaction.)” along with the new session_id. The application stores this session_id.
In the second interaction, the application sends “What is your name?” along with the session_id it received in the previous response. The server recognizes the session_id , retrieves the context for that session, and responds with “Acknowledged: What is your name? (Previous interaction: ‘Hello’)”. This demonstrates that the server is maintaining context across turns.
When the user types “exit”, the application loop breaks.

Further Improvements and Considerations

More Sophisticated Context Management: Instead of just storing a list of messages, you might want to store more structured context information.
Integration with a Real LLM: The key improvement would be to replace the simulated response generation with an actual call to an LLM API (like OpenAI’s GPT-3/4, Google’s Gemini, etc.). You would send the current query and the relevant context to the LLM and then return its response to the client.
Context Window Management: For real LLMs with limited context windows, you would need strategies to manage the size of the context being sent (e.g., summarizing old parts of the conversation, prioritizing recent interactions).
External Knowledge Retrieval: You could integrate mechanisms to fetch relevant information from external databases or knowledge sources based on the user’s query and add it to the context before sending it to the LLM.
User Authentication and Session Management: For a production application, you would need more robust user authentication and session management.
Error Handling and Logging: Add more comprehensive error handling and logging to both the server and the application.
Scalability: For handling many users, you would need to consider using a more scalable backend framework and a dedicated context store (like Redis or a database) instead of in-memory storage.

This simple example provides a basic foundation for understanding how a Model Context Protocol server and application can work together to manage context for AI interactions. Remember that the “protocol” in a real-world scenario would likely involve more complex logic and techniques tailored to the specific use case and the capabilities of the underlying language model.