Local LLM Implementation with OLAMA, ChatBot, MCP-Client, and MCP-SERVER

English Summary

Overview

This technical documentation outlines the implementation of a production-ready application utilizing local language models with Model Context Protocol (MCP). The case study presented is a phone directory assistant that demonstrates the integration of local Large Language Models, web interfaces, and custom services for enterprise applications.

Technical Prerequisites

  • Completion of foundational MCP-SERVER implementation modules
  • OLAMA platform installation for local large language model deployment
  • Microsoft Phi4 model implementation for inference operations
  • OpenAI package integration for model interaction
  • Streamlit framework configuration for front-end interface development

System Architecture

1. User query input is processed through the Streamlit interface layer

2. Query processing occurs via the local language model for intent classification

3. General knowledge queries are processed directly by the LLM inference pipeline

4. Directory-specific queries trigger the MCP module activation protocol

5. MCP-SERVER executes data retrieval operations against the local Excel repository

6. Results propagate through the system for LLM-based response formatting and user presentation

Implementation Specifications

  • Dependency management for core libraries (Streamlit, OpenAI, asyncio, MCP modules)
  • OpenAI client configuration for Ollama instance communication
  • MCP client implementation for standardized tool invocation protocols
  • JSON response sanitization utilities for LLM output normalization
  • User interface architecture with session state management and interaction logging
  • System prompt engineering for consistent JSON schema generation
  • Exception handling and graceful degradation protocols

Enterprise Application Potential

This reference architecture establishes a foundation for extension to various enterprise systems, including ERP integrations, workflow automation systems, and document processing operations, enabling systematic LLM interaction with proprietary business systems while maintaining security and compliance standards.

中文摘要

概述

本技術文檔概述了使用本地語言模型與模型上下文協議(MCP)實現的生產級應用程序。呈現的案例是電話目錄助手,展示了本地大型語言模型、網絡界面和自定義服務的集成,適用於企業應用。

技術先決條件

  • 完成基礎MCP-SERVER實施模塊
  • 安裝OLAMA平台以部署本地大型語言模型
  • 實施Microsoft Phi4模型進行推理操作
  • 整合OpenAI套件以進行模型交互
  • 配置Streamlit框架用於前端界面開發

系統架構

1. 用戶查詢輸入通過Streamlit界面層進行處理

2. 通過本地語言模型進行查詢處理以進行意圖分類

3. 一般知識查詢由LLM推理管道直接處理

4. 特定於目錄的查詢觸發MCP模塊啟動協議

5. MCP-SERVER針對本地Excel存儲庫執行數據檢索操作

6. 結果通過系統傳播,用於基於LLM的響應格式化和用戶展示

實施規範

  • 核心庫的依賴管理(Streamlit、OpenAI、asyncio、MCP模塊)
  • OpenAI客戶端配置用於Ollama實例通信
  • MCP客戶端實施標準化工具調用協議
  • 用於LLM輸出規範化的JSON響應淨化工具
  • 具有會話狀態管理和交互日誌記錄的用戶界面架構
  • 系統提示工程以生成一致的JSON模式
  • 異常處理和優雅降級協議

企業應用潛力

此參考架構為擴展到各種企業系統奠定了基礎,包括ERP集成、工作流自動化系統和文檔處理操作,使LLM能夠與專有業務系統系統地交互,同時維持安全性和合規性標準。

System Architecture Flow Diagram

👨 使用者 輸入查詢內容 💻 Streamlit 前端介面 顯示聊天介面 / 傳送輸入 🤖 OpenAI LLM 聊天引擎 理解意圖 / 產生查詢指令 🔄 MCP 協議模組 轉譯指令 / 呼叫 API 📞 本地電話簿服務 查詢聯絡人資料庫 🧠 LLM 整理回覆 生成自然語言 🖥️ Streamlit 顯示回應 👀 使用者 看到回覆

Code Implementation Summary

Phone Directory Server (MCP-SERVER)

The phone_directory_server.py implements a Model Context Protocol (MCP) server that provides search functionality for a phone directory stored in an Excel file. The server exposes two key functionalities:

  1. Search Phone Tool: A function that searches an Excel file for phone directory entries by name or phone number, returning formatted results including optional address and notes fields.
  2. Greeting Resource: A simple greeting function that demonstrates the resource pattern in MCP.

The server is designed for local deployment and handles exceptions gracefully, providing appropriate error messages when the Excel file is missing required fields or cannot be read.

Chatbot User Interface (Streamlit)

The chatbot_ui.py implements a Streamlit-based user interface that connects to a local LLM (Microsoft Phi4) via Ollama and communicates with the MCP server. The key components include:

  1. LLM Integration: Uses the OpenAI client library to communicate with a locally running Ollama instance serving the Phi4 model.
  2. MCP Client: An asynchronous client that connects to the phone directory server and calls its tools.
  3. JSON Processing: Functions to clean and parse JSON outputs from the LLM to determine which actions to take.
  4. Chat Interface: A Streamlit UI that maintains chat history, displays messages, and handles user input.
  5. Prompt Engineering: Uses system prompts to instruct the LLM on how to format responses for function calling.

System Interaction Flow

When a user enters a query in the chatbot interface:

  1. The query is sent to the LLM (Phi4) along with a system prompt instructing it to format responses as JSON.
  2. The LLM decides whether the query is about searching a phone directory or is a general question.
  3. If it's a phone search query, the LLM returns a JSON with action: "search_phone" and the query parameters.
  4. The UI parses this JSON and calls the MCP server asynchronously with the search parameters.
  5. The MCP server searches the Excel file and returns formatted results.
  6. The UI presents these results to the user and stores the interaction in the chat history.
  7. For general queries, the LLM returns a JSON with action: "none" and its response content.

Bilingual Support

The implementation supports both English and Chinese (Traditional), with comments and user interface elements in Chinese and a flexible architecture that can handle multilingual content in the phone directory.

Source Code: phone_directory_server.py

import pandas as pd
from mcp.server.fastmcp import FastMCP

# 初始化 FastMCP Server
mcp = FastMCP("Phone Directory Server")

# 工具:搜尋電話
@mcp.tool()
def search_phone(query: str) -> str:
    """
    搜尋電話簿中的資料
    :param query: 查詢關鍵字
    :return: 查詢結果
    """
    try:
        df = pd.read_excel("phone_directory.xlsx", dtype=str)
        df.columns = [col.strip() for col in df.columns]

        for col in ['姓名', '電話']:
            if col not in df.columns:
                return f"電話表缺少必要欄位:{col}"

        df['電話'] = df['電話'].astype(str)

        mask = (
            df['姓名'].str.contains(query, case=False, na=False) |
            df['電話'].str.contains(query, case=False, na=False)
        )
        results = df[mask]

        if results.empty:
            return "找不到符合查詢條件的資料。"

        response_lines = []
        for _, row in results.iterrows():
            line = f"姓名:{row['姓名']}, 電話:{row['電話']}"
            if '地址' in row and pd.notna(row['地址']):
                line += f", 地址:{row['地址']}"
            if '備註' in row and pd.notna(row['備註']):
                line += f", 備註:{row['備註']}"
            response_lines.append(line)

        return "\n".join(response_lines)

    except Exception as e:
        return f"電話表讀取失敗: {e}"

# 資源:問候語
@mcp.resource("greeting://{name}")
def get_greeting(name: str) -> str:
    return f"Hello, {name}!"

# 啟動 MCP Server
if __name__ == "__main__":
    mcp.run()
            

Source Code: chatbot_ui.py

#1.引入必要的函式庫import streamlit as st
import streamlit as st
import openai
import asyncio
from mcp import ClientSession
from mcp.client.stdio import stdio_client, StdioServerParameters
import platform
import json

#2.設定異步事件迴圈策略(針對 Windows 系統)
if platform.system() == "Windows":
    asyncio.set_event_loop_policy(asyncio.WindowsProactorEventLoopPolicy())


#3.設定 Ollama 本地模型的 API 金鑰和基礎 URL
client = openai.OpenAI(
    api_key="ollama",
    base_url="http://localhost:11434/v1"  # 根據您的 Ollama 設定
)


#4.MCP Server 串接
async def call_mcp_tool(tool_name, args):
    server_params = StdioServerParameters(command="python", args=["phone_directory_server.py"])

    async with stdio_client(server_params) as (read, write):
        async with ClientSession(read, write) as session:
            await session.initialize()
            if tool_name == "search_phone":
                return await session.call_tool("search_phone", args)
            elif tool_name == "greeting":
                content, _ = await session.read_resource(f"greeting://{args['name']}")
                return content
    return "MCP 工具呼叫失敗"


#5.定義清理 LLM 輸出的函式
def clean_llm_output(text):
    # 去除 markdown 的 ```json 或 ``` 包裝
    if text.startswith("```"):
        text = text.strip("`")  # 去除反引號
        lines = text.split("\n")
        # 如果第一行是 ```json 就跳過
        if lines[0].startswith("json"):
            lines = lines[1:]
        # 移除最後一行 ``` 結尾
        if lines and lines[-1].strip() == "":
            lines = lines[:-1]
        elif lines and lines[-1].startswith("```"):
            lines = lines[:-1]
        text = "\n".join(lines)
    return text.strip()

    
#6.streamlit UI
st.title("📞 電話簿助理 Chatbot (Function Calling Demo)")

# 聊天紀錄
chat_history = st.session_state.get("chat_history", [])

# 使用者輸入
user_input = st.chat_input("請輸入您的訊息...")

if user_input:
    # 加入使用者訊息
    chat_history.append({"role": "user", "content": user_input})

    # 這裡示範在系統訊息中,教 LLM 怎麼輸出 JSON
    system_prompt = (
        "你是一個能呼叫工具的助理。"
        "如果需要查電話,請回傳 JSON格式:\n"
        '{ "action": "search_phone", "args": { "query": "xxx" } }\n'
        "如果不需要呼叫任何工具,就回傳:\n"
        '{ "action": "none", "answer": "你要回答的內容" }\n'
        "不要輸出任何多餘文字,不要有多餘註解。"
    )

    # 你可以在最前面插入一則 system message
    conversation = [
        {"role": "system", "content": system_prompt}
    ] + chat_history

    # 送去 LLM
    response = client.chat.completions.create(
        model="phi4:latest",  # 或你的模型名稱
        messages=conversation
    )

    llm_output = response.choices[0].message.content
    llm_output_clean = clean_llm_output(llm_output)
   
    st.write("🔍 LLM Output 原始內容:", llm_output)
    # 解析 LLM 回應
    try:
        parsed = json.loads(llm_output_clean) 
        action = parsed.get("action")
        args = parsed.get("args", {})
        answer = parsed.get("answer", "")
    except json.JSONDecodeError as e:
        st.error(f"❌ JSON 解析錯誤:{e}")
        st.write("⚠️ LLM 回傳的內容:", llm_output)
        parsed = {"action": "none", "answer": llm_output}
        action = "none"
        args = {}
        answer = llm_output


    final_reply = ""  # 最終回覆內容

    if action == "search_phone":
        # 如果 LLM 決定要搜尋電話
        query_str = args.get("query", user_input)  # fallback 用 user_input
        with st.spinner("正在查詢電話簿..."):
            mcp_result = asyncio.run(call_mcp_tool("search_phone", {"query": query_str}))
            
        if hasattr(mcp_result, "content") and mcp_result.content:
            texts = [item.text for item in mcp_result.content if hasattr(item, "text")]
            final_reply = "\n".join(texts)
        else:
            final_reply = "⚠️ 查無資料"            
        # 最終回覆可將 MCP 結果組合進去
        final_reply = f"以下是查詢結果:\n{final_reply}"
    elif action == "none":
        # 如果不需要呼叫工具,就純顯示 LLM 結果
        final_reply = answer+parsed.get("answer")
    else:
        # 萬一解析出奇怪的 action,就當成普通文字回覆
        final_reply = llm_output

    # 將最終回覆加入聊天
    chat_history.append({"role": "assistant", "content": final_reply})

    # 顯示聊天記錄
    for msg in chat_history:
        st.chat_message(msg["role"]).write(msg["content"])

    # 存回 session_state
    st.session_state["chat_history"] = chat_history