A minimal operator-chat panel: six live plant tags, a TextBox, a Button, and the ChatRequest action — no scripting required. The smallest end-to-end example of FrameworX Local AI on a Display.

How-to GuidesSolution ExamplesLocal AI → Local AI Chat Example

Version 10.1.5+



Download the solution Local AI Chat Example.dbsln.

This example demonstrates the simplest possible operator chat panel backed by FrameworX Local AI — six plant tags as live grounding context, one ChatSession transcript widget, one TextBox for the question, one Button wired to ActionType=ChatRequest. No CodeBehind, no script call. The smallest reachable footprint of the canonical pattern described on the Local AI parent page.

The example ships pre-configured for qwen2.5:3b-instruct against a local Ollama at http://127.0.0.1:11434/v1/chat/completions — the same demo-recommended 3B model the shipping FrameworX demos use, so the example works end-to-end on a no-GPU workshop laptop with no FrameworX configuration. To move to a higher-quality production tier (qwen2.5:7b-instruct or larger) or a remote endpoint, edit the AI Engine tile on Solution → Capabilities; see the Moving to a production tier section below.

This example runs the LLM on the same host as TServer. That keeps the example self-contained and downloadable, but it means every Local AI call competes with the runtime for CPU. For production deployments — especially anything beyond occasional operator chat — move the LLM endpoint to a separate VM or host. See Local AI Deployment and Performance for the four topologies, the decision matrix by workload, and the sizing guidance.



Prerequisites

  • FrameworX 10.1.5 or later.

  • Local Ollama installed and reachable on http://127.0.0.1:11434. Run AISetup\Install-LocalAI.ps1 from the FrameworX install root if you have not already — see the Local AI - First Install Walkthrough.

  • The demo model pulled into Ollama: ollama pull qwen2.5:3b-instruct (~2 GB). This is the model the example ships configured for. To use a higher-quality production-tier model instead, see Moving to a production tier below.


What it contains

Open the solution in Designer and you will see exactly eight UNS tags (six under Plant/, two under Chat/), one Display, one SolutionSettings configuration. Nothing else.

Plant tags — the live context the operator and the model both see

Tag

Type

StartValue

Purpose

Plant/TankLevel

Double, units gallons

784.5

Reactor R-101 fill level. Range 0–1000 gallons.

Plant/Temperature

Double, units degC

67.2

Reactor R-101 jacket temperature.

Plant/Pressure

Double, units bar

4.45

Reactor R-101 head pressure.

Plant/PumpRunning

Digital

1

Feed pump P-201 status. 1 = running, 0 = stopped.

Plant/BatchID

Text

BATCH-2026-0525-A

Current batch identifier.

Plant/OperatorName

Text

Marco

Operator on shift.

Chat tags — the operator-to-AI wiring

Tag

Type

Purpose

Chat/Query

Text

Operator types into a TextBox bound here. The ChatRequest action reads this as the question.

Chat/Reply

JSON

Receives the full reply envelope. JSON type lets Display expressions extract fields via JsonString("text") / JsonString("status") with no scripting.

That is every UNS tag the example needs. No Devices, no Alarms, no Historian, no Scripts. The plant tags carry StartValue so the runtime has live numbers as soon as you click Run; in a real solution they would be driven by a TagProvider, a Device, or a script.


The ChatTest display — pixel-by-pixel

One Canvas display at 1200 × 800. Seventeen elements total, all theme-bound (no hard-coded colors).

  • Header strip (Uid 0–1). TextBlock title "Local LLM AI — Integration Test" + a one-line subtitle explaining the panel.

  • Live tags row (Uid 2–8). Seven TextBlocks. The first is a section heading ("LIVE TAGS IN SCOPE"); the next six bind to the six plant tags via inline expansion in the form {@Tag.Plant/TankLevel} gal. These render in real time as the tags change.

  • Quick-prompt buttons (Uid 9–12). Three Buttons, each with an ActionDynamic that sets @Tag.Chat/Query to a pre-built question. Clicking a button stages the question; the operator can then edit it in the TextBox before sending.

  • ChatSession control (Uid 13). The native FrameworX transcript widget. Renders the per-Display-panel conversation history with bubble styling, role labels (OperatorPlant AI), and auto-scroll to bottom. Width 1152, height 420. No data-binding needed — the transcript is owned by the runtime's cached path.

  • Question TextBox (Uid 14–15). Section label + multi-line TextBox bound to @Tag.Chat/Query.

  • Ask AI Button (Uid 16). The single load-bearing element. ActionDynamic with ActionType="ChatRequest", ObjectLink="@Tag.Chat/Query", ObjectValueLink="@Tag.Chat/Reply". One click sends the current Query to the LLM, receives the reply envelope on Chat/Reply, and the ChatSession control auto-appends the turn to its transcript.

  • Status footer (Uid 17). A single TextBlock binding @Tag.Chat/Reply.JsonString("status") and @Tag.Chat/Reply.JsonString("latencyMs") — live readout of the most recent envelope.


How to run it

  1. Open Local AI Chat Example.dbsln in Designer.

  2. Confirm SolutionCapabilities[LocalAI].Enabled = true on Solution → Capabilities → AI Engine tile. The status dot should be green ("Reachable").

  3. Confirm qwen2.5:3b-instruct is pulled in your local Ollama (ollama list). This is the model the example ships configured for. For higher-quality production-tier alternatives, see Moving to a production tier below.

  4. Click Run. The runtime starts and the RichClient (or your configured client) opens to the ChatTest panel.

  5. Click one of the three quick-prompt buttons to stage a question, or type your own in the TextBox.

  6. Click → Ask AI. Within a few seconds the model's reply text appears in the ChatSession transcript above and the footer shows status=ok with the latency in milliseconds.

The ChatSession control retains the conversation per Display panel — you can ask follow-ups and the model has context. If you want a fresh conversation, close the panel and reopen it.


What to try next

  • Change a plant tag value mid-conversation. Use Designer's Watch panel to drive Plant/TankLevel from 784.5 to 950, then ask the AI "What is the current tank level and how close are we to the cap?". The model sees the new value because the live tags expansion in the display passes the current snapshot on every call.

  • Add a fresh question on the canvas. Duplicate one of the three quick-prompt buttons and change its ObjectValueLink to a new question string. Click it during Run and the new question stages in the TextBox.

  • Inspect the full envelope. The footer shows only status and latencyMs. Add a fourth TextBlock bound to {@Tag.Chat/Reply.JsonString("warnings")} if you want to see warnings; bind the toolTrace field if you enable the UNS / Alarm / Historian tool bits on SolutionSettings.ModelOptions (see Local AI Configuration) and want to inspect autonomous tool calls.

  • Move the LLM off the TServer host. If you have access to a second machine (or a sibling VM), install Ollama there, pull the model on that host, and change the URL field in SolutionCapabilities[LocalAI].Settings from localhost to the remote IP or hostname. The example continues to work unchanged — TServer no longer fights the LLM for CPU. Full guidance on when this matters: Local AI Deployment and Performance.

  • Try a cloud LLM endpoint. Point URL at any OpenAI-compatible chat-completions endpoint (cloud or otherwise); store the API key in SecuritySecrets and reference it via /secret:<Name> in the Authorization field. See SecuritySecrets Authentication for Local AI.


Moving to a production tier — 7B, larger models, remote endpoints

The example ships configured for qwen2.5:3b-instruct so it runs end-to-end on a workshop laptop with no GPU. For production deployments — where reply quality, autonomous tool-call reliability, and concurrent operator load matter — switch to a larger model (recommended: qwen2.5:7b-instruct), or move the LLM off the TServer host entirely:

  1. Pull the production model on the LLM host: ollama pull qwen2.5:7b-instruct (~4.7 GB).

  2. In Designer, open the solution and go to Solution → Capabilities.

  3. On the AI Engine tile, click Edit Configuration.

  4. Paste the JSON below into the dialog and save (adjust URL if the LLM runs on a different host):

{
  "URL": "http://127.0.0.1:11434/v1/chat/completions",
  "Name": "qwen2.5:7b-instruct",
  "Authorization": "NoAuth",
  "Headers": "",
  "Info": "Production tier — 7B-instruct (~4.7 GB). Best reply quality and tool-call reliability.",
  "TimeoutSeconds": 60
}

Restart the runtime; the next chat call uses the production model. Replies are noticeably richer and autonomous tool dispatch is more reliable, in exchange for higher latency per call. For remote endpoints, cloud LLMs, and the full topology decision tree, see Local AI Deployment and Performance.


Performance baseline — what to expect

Indicative latencies for the production-tier qwen2.5:7b-instruct model, CPU-only inference on a typical x64 development laptop (Windows 11, no GPU, 16 GB RAM). These numbers are the upper bound for what the example reaches at production quality. The shipped 3B default runs at roughly half these latencies on the same hardware (~2× faster) in exchange for shorter, less nuanced replies. The first call after runtime start pays a model-load cost; subsequent calls run against the warm model.

Question shape

First call (cold model)

Subsequent calls (warm)

Short factual (e.g. “what is the current tank level?”)

~5 s

1.5–3 s

Three-bullet list (~40 tokens out)

~17 s

11–13 s

Two-sentence summary with one tag (~50 tokens)

~25 s

17–25 s

Two-sentence summary across six tags (~55 tokens)

~21 s

9–15 s

Two-sentence reasoning with hedge (~85 tokens)

~28 s

9–15 s

Numbers scale roughly linearly with output token count: token-throughput on CPU is ~3–5 tokens/second across all cores. The qwen2.5:3b-instruct model is ~2× faster than 7B on the same hardware (trade against reply quality). Adding a GPU cuts latency 3–5× for either size. For deeper sizing guidance — including when latency in the same-host topology starts hurting other runtime work — see Local AI Deployment and Performance.


How it relates to the deeper Local AI features

This example is intentionally minimal — it demonstrates only the ChatRequest action. For richer patterns:

  • LocalAI KnowledgeGraph Demo — a full shipping solution with operator chat, alarm-driven anomaly narration, knowledge-graph asset tree, and end-to-end grounded narrative generation.

  • The three server-side AI.Execute examples on the Local AI parent page show the atomic script path (alarm root-cause hypothesis, multi-language alert translation, end-of-shift summary).

  • Local AI Configuration covers tool-category bits (SolutionSettings.ModelOptions), alternate endpoints (cloud LLMs, other local models), and SecuritySecrets-backed authorization for non-local endpoints.

  • Local AI Deployment and Performance covers the four hosting topologies (same-host, sibling VM, dedicated host, cloud), the workload × topology decision matrix, and the sizing guidance — essential reading before moving past a workshop / single-operator deployment.

  • Local AI Developer Reference covers reply-envelope schema details, cached-path hooks, and the AI.Execute deep semantics.


In this section...