FrameworX 10.1.5 ships configured to talk to a local Ollama with qwen2.5:7b-instruct as the recommended default model (~4.7 GB) — the best balance of reasoning and reliable JSON tool-call output. It expects a machine with a GPU. This walkthrough covers a first Local AI setup end to end — install Ollama, pull the model, point FrameworX at the endpoint, and verify — plus what to expect on disk, time, and latency. FrameworX ships no local installer; you install Ollama yourself, and the per-OS commands (Windows, macOS, Linux) are on Local AI - Installing Models (Windows, macOS, Linux). On limited hardware with no GPU, switch the solution's SolutionCapabilities[LocalAI].Settings.Name field to the qwen2.5:3b-instruct fallback — see the Switching the model section below. Everything runs locally with no internet connection.
AI Integration → Local AI → First Install Walkthrough
Quick start
Three steps, on the machine that will serve the model — its own server for production, or the FrameworX Server box itself for a workshop or demo. The per-OS install detail (Windows, macOS, Linux) is on Local AI - Installing Models (Windows, macOS, Linux).
# 1. Install Ollama (Windows shown; see Installing Models for macOS / Linux) winget install Ollama.Ollama # 2. Pull the recommended default model (~4.7 GB) ollama pull qwen2.5:7b-instruct # 3. Confirm Ollama is serving and the model is present ollama list curl http://localhost:11434/v1/models
Then point your solution at it in Designer — see Next: configure your model in Designer below. On no-GPU hardware, pull qwen2.5:3b-instruct instead.
The recommended model: qwen2.5:7b-instruct
Item | Value |
|---|---|
Name |
|
License | Apache 2.0 — commercial use permitted, no royalty, no per-seat fee. |
Size on disk | ~4.7 GB (quantized Q4_K_M; pulled into |
Why this model | FrameworX recommends |
Hardware expectations
The 7B model runs on standard SCADA-class hardware; a GPU is expected for a responsive interactive-chat experience (Ollama auto-detects and uses it). It still runs on CPU, but slower — CPU-only machines should prefer the 3B fallback for atomic tasks.
Resource | Recommended | Notes |
|---|---|---|
Disk | ~6.5 GB free | 1.8 GB Ollama runtime + 4.7 GB model. Plan for ~10 GB if you expect to keep both the 3B (fallback) and 7B (default) variants installed. |
RAM | 16 GB or more | The model occupies ~5 GB while loaded; the rest of the SCADA stack (TServer, Designer, plant Displays) needs headroom. Smaller machines (8 GB) work for evaluation but compete with everything else for memory. |
CPU | Modern x64 | Any current Intel / AMD desktop or server CPU runs the 7B model. First call after load takes ~10–15 seconds; subsequent calls land in ~500 ms on a typical workstation. |
GPU | Recommended | Expected for a responsive 7B interactive-chat experience. Ollama auto-detects an NVIDIA or AMD GPU and offloads layers transparently — no configuration needed; per-call latency drops by roughly 3–10×. The model still runs on CPU if no GPU is present, but slower. |
What a first install involves
Step | Action | Already done if |
|---|---|---|
1 | Check port 11434 is free of any conflicting service | port is already held by Ollama |
2 | Install Ollama ( |
|
3 | Make sure Ollama is serving — it runs as a background service; | endpoint at |
4 | Pull | model already in |
5 | Verify inference end to end (the Designer status indicator, or a | — |
None of these steps is destructive; re-doing them on an already-set-up machine is safe and changes nothing.
What to expect on first install
Item | Value |
|---|---|
Total disk usage | ~6.5 GB (1.8 GB Ollama runtime in |
First-run time | ~5 minutes on a 50 MB/s connection (1.8 GB Ollama installer + 4.7 GB model pull). Slower connections scale linearly. |
Permissions required | None. Ollama installs per-user — no UAC / admin elevation needed. |
First chat latency | ~15 seconds. The model loads from disk into RAM on first call after startup or after the keep-alive window expires. |
Subsequent chat latency | ~500 milliseconds on a typical CPU; faster with a GPU. |
Keep-alive | Default 5 minutes. Idle longer than that and the next call pays cold-load again. Set |
Verify it works
Two quick checks confirm the runtime is ready before you wire up the solution. First, that Ollama is serving and the model is present:
# Models on disk
> ollama list
NAME ID SIZE MODIFIED
qwen2.5:7b-instruct a1b2c3d4e5f6 4.7 GB 2 minutes ago
# Endpoint reachable (lists the served models as JSON)
> curl http://localhost:11434/v1/models
{"object":"list","data":[{"id":"qwen2.5:7b-instruct","object":"model"}]}
Second, the end-to-end path FrameworX actually uses: the Status indicator on the AI Engine tile in Solution → Capabilities turns Reachable (green), and a ChatRequest from any Display panel returns a reply envelope with status = "ok" and a populated text field. Any failure there points at the same short list of causes: Ollama not started, model not pulled, or the port held by another process.
Next: configure your model in Designer
With Ollama serving and the model pulled, open Designer and connect the solution:
- Open your solution in Designer.
- Navigate to Solution → Capabilities → AI Engine.
- Tick Enable Local AI to flip
SolutionCapabilities[LocalAI].Enabledtotrue. The status indicator should resolve to Reachable within a few seconds. - Click Edit Configuration to set the endpoint URL, model name, response timeout, or authorization — or to point at a remote / cloud LLM instead of the default local Ollama.
If the indicator stays on Unreachable, confirm Ollama is up on the endpoint host (ollama list and curl http://localhost:11434/v1/models on that machine) — the usual causes are Ollama not started, the model not pulled, or the port held by another process. If it shows Auth required, see SecuritySecrets Authentication for Local AI. Full configuration reference: Local AI Configuration.
If port 11434 is already in use
If port 11434 is held by a different process (LM Studio, llama.cpp server, oobabooga, an old test server, etc.) rather than Ollama, Ollama cannot serve on it. Identify the holder first:
# Windows: which PID holds 11434 netstat -ano | findstr 11434
- Stop the conflicting service, or move it to a different port.
- Start Ollama — it claims 11434 once the port is free.
Do not blindly kill whatever holds the port — 11434 is heavily used by the LLM ecosystem, and a silent process kill is the wrong default. Confirm what it is first.
Running the model on a different host
By default Ollama binds localhost only. To run Ollama on a separate machine (typically a GPU server) and have FrameworX talk to it over the network:
- On the Ollama host: set
OLLAMA_HOST=0.0.0.0:11434in the system environment, then restart Ollama. - Open inbound TCP 11434 in the Ollama host's firewall.
- In FrameworX, edit
SolutionCapabilities[LocalAI].Settingsto pointURLathttp://<ollama-host-ip>:11434/v1/chat/completions.
Authentication note. Ollama has no built-in authentication. For deployments beyond a trusted LAN, restrict the firewall rule to the FX server's IP, or front port 11434 with a reverse proxy that adds an API key — then set the FX Authorization field on the LocalAI capability to BearerToken with that key. Do not expose port 11434 directly to an untrusted network.
Switching the model
Recommended default and limited-hardware fallback
FrameworX recommends qwen2.5:7b-instruct as the default (best reasoning and tool-call reliability); it expects a GPU. On a machine with no GPU, use the qwen2.5:3b-instruct fallback (~2 GB) — lower speed and quality, suited to atomic tasks rather than interactive chat. For maximum reasoning on a strong GPU, qwen2.5:32b-instruct is the performance tier.
Switching any solution between models is a two-step change. The common cases: moving to the qwen2.5:3b-instruct fallback on a no-GPU machine, or moving an older copy that was configured for 3B (some demo solutions used to ship configured for 3B for no-GPU evaluation) up to the recommended qwen2.5:7b-instruct.
- Pull the target model with
ollama pull <name>— for exampleollama pull qwen2.5:7b-instruct. Models already on disk are not removed; multiple tiers coexist. - Update the solution's Local AI configuration. In Designer, open Solution → Capabilities → AI Engine, click Edit Configuration, and set the
Namefield to the target model. Save. The next chat orAI.Executecall uses the new model — no restart required.
To confirm the switch took effect: ask the chat panel any question and inspect the reply. The status field on the reply JSON envelope reports ok and the latencyMs field will be slightly higher than with 3B (the 7B model takes longer per call — usually ~500 ms vs. ~200 ms on the same CPU). Reply quality — specifically multi-step reasoning, tool selection, and JSON tool-call formatting — improves visibly.
Choosing a different model entirely
The default qwen2.5:7b-instruct is a balance of quality and footprint for typical SCADA hardware. To use a different model:
- Pull it with Ollama:
ollama pull <model-name>(for example,qwen2.5:3b-instructfor low-RAM / no-GPU machines,qwen2.5:32b-instructfor strong-GPU servers, or another OpenAI-compatible model your hardware supports). - In FrameworX, edit
SolutionCapabilities[LocalAI].Settingsand set theNamefield to the new model name.
Any OpenAI-compatible endpoint works — including cloud LLMs (OpenAI, Azure OpenAI, Anthropic via OpenAI-compat proxy). Set URL and Authorization accordingly.
See also
- Local AI - Installing Models (Windows, macOS, Linux) — per-OS Ollama install and model-pull commands.
- Local AI — the main reference page (parent of this one).
- AI Integration — the broader AI surface in FrameworX.
In this section...