minimost.preview

minimost.preview

Link preview generation for URLs posted in chat messages.

When a user sends a message containing a URL, the client fetches a preview card from /link_preview?url=<url> and renders it below the message. This module implements the server-side logic for generating those cards.

Three preview strategies are tried in order:

  1. Bitbucket Cloud — URLs on bitbucket.org. Fetches raw file content via the Bitbucket Cloud REST API and returns a code snippet with optional line-number highlighting.

  2. Bitbucket Server / Data Center — Self-hosted Bitbucket instances matching the /projects/{P}/repos/{R}/browse/{path} URL pattern. Fetches raw content via the Bitbucket Server REST API.

  3. OpenGraph / generic — Falls back to fetching the HTML page and extracting <meta property="og:…"> tags (plus <title> and Twitter card meta tags) to build a rich preview card.

Security:

  • Private and loopback IP addresses are blocked by _is_safe_url() to prevent Server-Side Request Forgery (SSRF).

  • Only http and https schemes are accepted.

  • A 5-second timeout and 64 KiB read limit are applied to all outgoing requests to prevent resource exhaustion.

Caching:

Results are cached in an in-process FIFO dictionary (_CACHE) with a maximum of 200 entries. This is intentionally simple — cache entries are never invalidated, and the cache is lost on server restart.

Module-level attributes

_CACHEdict

In-process preview result cache. Keys are URL strings; values are the result dicts returned by fetch_preview().

_CACHE_MAXint

Maximum number of entries in _CACHE before the oldest entry is evicted (FIFO).

_TIMEOUTint

HTTP request timeout in seconds (5).

_HEADERSdict

Request headers sent with all outgoing HTTP requests, including a browser-like User-Agent to avoid bot-detection blocks.

_PRIVATE_RANGESre.Pattern

Regex that matches hostnames known to be private or loopback addresses (used as a fast pre-filter before the DNS-based _resolves_to_public_ip check).

minimost.preview.is_text_filename(name)[source]

Return True if name (a basename) denotes a previewable text file.

Matches by extension (_TEXT_EXTENSIONS), by exact filename (_TEXT_FILENAMES), or by the jenkinsfile prefix — which covers Jenkinsfile, Jenkinsfile.prod and similar (case-insensitive).

class minimost.preview._MetaParser[source]

Bases: HTMLParser

Streaming HTML parser that extracts page metadata for link previews.

Parses the <head> section of an HTML document and collects metadata from the following sources (in priority order via dict.setdefault):

  • <meta property="og:*"> — OpenGraph protocol tags.

  • <meta name="twitter:title|description|image"> — Twitter Card tags.

  • <meta name="description"> — generic description tag.

  • <title> — plain HTML title element (lowest priority).

Parsing stops immediately when the <body> tag is encountered, since all relevant metadata is in the <head>. This minimises memory usage for large pages.

Attributes

ogdict

Collected metadata keyed by OpenGraph property name (without the og: prefix), e.g. {"title": "...", "description": "...", "image": "..."}.

Example:

parser = _MetaParser()
parser.feed('<head><meta property="og:title" content="Hello"></head>')
assert parser.title == "Hello"

Initialize and reset this instance.

If convert_charrefs is True (the default), all character references are automatically converted to the corresponding Unicode characters.

__init__()[source]

Initialize and reset this instance.

If convert_charrefs is True (the default), all character references are automatically converted to the corresponding Unicode characters.

handle_starttag(tag, attrs)[source]

Process an opening HTML tag.

Stops processing entirely once <body> is seen. Extracts content from <meta> tags according to the priority rules described in the class docstring. Sets an internal flag when <title> is opened.

Parameters:
  • tag (str) – Lowercase tag name.

  • attrs (list of tuple) – List of (name, value) attribute pairs.

handle_data(data)[source]

Accumulate text data inside <title> elements.

Parameters:

data (str) – Raw text content from the parser.

handle_endtag(tag)[source]

Clear the title-tracking flag when the </title> tag is seen.

Parameters:

tag (str) – Lowercase tag name.

property title

The best available title string.

Returns the OpenGraph/Twitter title if one was found in <meta> tags, otherwise falls back to the content of the <title> element.

Return type:

str

property description

The page description from <meta> tags, or an empty string.

Return type:

str

property image

The preview image URL from <meta> tags, or an empty string.

Return type:

str

minimost.preview._is_safe_url(url)[source]

Check that a URL does not point to a private or loopback address.

Parses the hostname from url and tests it against _PRIVATE_RANGES. This is the primary SSRF (Server-Side Request Forgery) mitigation: it prevents the preview endpoint from being used to probe internal network services.

Blocked address patterns:

  • localhost

  • 127.x.x.x (loopback)

  • 10.x.x.x (RFC 1918 private)

  • 172.16–31.x.x (RFC 1918 private)

  • 192.168.x.x (RFC 1918 private)

  • ::1 (IPv6 loopback)

Parameters:

url (str) – The URL to validate.

Returns:

True if the URL is safe to fetch, False if it resolves to a private/loopback address or cannot be parsed.

Return type:

bool

minimost.preview._resolves_to_public_ip(hostname)[source]

Return True if hostname resolves only to public IP addresses.

minimost.preview._fetch(url, max_bytes=65536)[source]

Fetch the body of an HTTP/HTTPS URL with safety limits.

Sends a GET request using _HEADERS (browser-like User-Agent) and _TIMEOUT second timeout. Reads at most max_bytes bytes from the response body.

Only http and https schemes are accepted; any other scheme raises ValueError.

Parameters:
  • url (str) – The URL to fetch.

  • max_bytes (int) – Maximum number of bytes to read from the response. Defaults to 65536 (64 KiB). Use a larger value when fetching raw source files for code previews.

Returns:

Raw response body bytes.

Return type:

bytes

Raises:
minimost.preview._build_code_result(raw, filepath, line_start, line_end, url)[source]

Build a code preview result dict from raw file text.

Shared by both Bitbucket Cloud and Bitbucket Server preview functions. Slices the file content to show a relevant snippet and annotates it with metadata needed for client-side syntax highlighting and line-number display.

Snippet selection:

  • If line_start is provided: shows the highlighted line(s) plus ±3 lines of context.

  • If line_start is None: shows the first 25 lines of the file.

Parameters:
  • raw (str) – The full raw text content of the file (UTF-8 decoded).

  • filepath (str) – The file path within the repository (e.g. "src/minimost/chat.py").

  • line_start (int or None) – 1-based start line to highlight, or None for no highlighting.

  • line_end (int or None) – 1-based end line to highlight (inclusive), or None if only one line is highlighted.

  • url (str) – The original browser URL that triggered the preview, used to link back from the preview card.

Returns:

A code preview dict with keys:

  • type (str): Always "code".

  • filename (str): Basename of the file.

  • filepath (str): Full repository path.

  • language (str): File extension (lowercase), used for syntax highlighting (e.g. "py", "js").

  • first_line_num (int): Line number of the first line in code snippet (1-based).

  • highlight_start (int or None): First highlighted line.

  • highlight_end (int or None): Last highlighted line.

  • code (str): Newline-joined snippet text.

  • total_lines (int): Total number of lines in the full file.

  • url (str): Original browser URL.

Return type:

dict

minimost.preview._parse_bb_cloud(url)[source]

Parse a Bitbucket Cloud file URL into its components.

Accepts URLs of the form:

https://bitbucket.org/{workspace}/{repo}/src/{ref}/{path}[#lines-N[:M]]
Parameters:

url (str) – The Bitbucket Cloud URL to parse.

Returns:

A tuple (workspace, repo, ref, filepath, line_start, line_end) if the URL matches, or None if it does not.

  • workspace (str): Bitbucket workspace/organization slug.

  • repo (str): Repository slug.

  • ref (str): Git ref (branch, tag, or commit SHA).

  • filepath (str): Path to the file within the repository.

  • line_start (int or None): 1-based start line from the fragment.

  • line_end (int or None): 1-based end line from the fragment.

Return type:

tuple or None

minimost.preview._bitbucket_cloud_preview(url)[source]

Generate a code preview for a Bitbucket Cloud file URL.

Calls _parse_bb_cloud() to validate and decompose the URL, then fetches up to 512 KiB of the raw file content from the Bitbucket Cloud REST API:

https://api.bitbucket.org/2.0/repositories/{workspace}/{repo}/src/{ref}/{path}

Passes the raw text to _build_code_result() to produce the final preview dict.

Parameters:

url (str) – A Bitbucket Cloud file browser URL.

Returns:

A code preview dict (see _build_code_result()) on success, or {} if the URL does not match or the API call fails.

Return type:

dict

minimost.preview._parse_bb_server(url)[source]

Parse a Bitbucket Server / Data Center file URL into its components.

Accepts URLs of the form:

http(s)://{host}/projects/{PROJECT}/repos/{repo}/browse/{path}[#{start}-{end}]

The URL scheme (http or https) is preserved in the returned base URL so that plain-HTTP self-hosted instances work correctly.

Parameters:

url (str) – The Bitbucket Server URL to parse.

Returns:

A tuple (base, project, repo, filepath, line_start, line_end) if the URL matches, or None if it does not.

  • base (str): Scheme and host, e.g. "https://bitbucket.example.com".

  • project (str): Project key.

  • repo (str): Repository slug.

  • filepath (str): Path to the file within the repository.

  • line_start (int or None): 1-based start line from the fragment.

  • line_end (int or None): 1-based end line from the fragment.

Return type:

tuple or None

minimost.preview._bitbucket_server_preview(url)[source]

Generate a code preview for a Bitbucket Server / Data Center file URL.

Calls _parse_bb_server() to validate and decompose the URL, then fetches up to 512 KiB of the raw file content from the Bitbucket Server REST API:

{scheme}://{host}/rest/api/1.0/projects/{PROJECT}/repos/{repo}/raw/{path}

The URL scheme is inherited from the original URL, so self-hosted HTTP instances work without modification.

Parameters:

url (str) – A Bitbucket Server browse URL.

Returns:

A code preview dict (see _build_code_result()) on success, or {} if the URL does not match or the API call fails.

Return type:

dict

minimost.preview._og_preview(url)[source]

Generate a generic OpenGraph preview for any web URL.

Fetches up to 64 KiB of the page HTML (the default _fetch() limit) and parses it with _MetaParser. If no title can be extracted, returns {} — it is not useful to render a preview card without a title.

Title and description are capped at 200 and 400 characters respectively to prevent excessively large preview cards.

Parameters:

url (str) – The URL to generate a preview for.

Returns:

An OpenGraph preview dict with keys type, title, description, image, domain, and url, or {} if the request fails or no title is found.

Return type:

dict

minimost.preview._text_file_preview(url)[source]

Generate a code preview for a direct link to a text/source file.

Checks the URL path’s file extension (or filename) against _TEXT_EXTENSIONS / _TEXT_FILENAMES. If it matches, fetches the raw content and passes it through _build_code_result().

Parameters:

url (str) – The URL to inspect and potentially fetch.

Returns:

A code preview dict on success, or {} if the URL does not point to a recognised text file or the fetch fails.

Return type:

dict

minimost.preview.fetch_preview(url)[source]

Return a preview dict for a URL, using the cache when available.

This is the main entry point called by the /link_preview route in minimost.chat.

Strategy (tried in order):

  1. Return the cached result if url is already in _CACHE.

  2. Reject the URL if the scheme is not http/https, or if _is_safe_url() returns False (SSRF protection).

  3. Try _bitbucket_cloud_preview() if the host is bitbucket.org.

  4. Try _bitbucket_server_preview() if the URL matches the Bitbucket Server path pattern.

  5. Fall back to _og_preview() for any other URL.

  6. Cache the result (even {}) and return it.

FIFO cache eviction:

When the cache reaches _CACHE_MAX entries, the oldest entry is removed by deleting the first key from the dictionary (relies on Python 3.7+ insertion-ordered dicts).

Parameters:

url (str) – The URL to preview.

Returns:

A preview dict (see the route docstring for key details), or {} if no preview could be generated.

Return type:

dict

Preview Type Reference

Code preview (returned for Bitbucket URLs):

{
    "type": "code",
    "filename": "chat.py",          # basename of the file
    "filepath": "src/chat.py",      # full path within the repo
    "language": "py",               # file extension (for highlighting)
    "first_line_num": 47,           # line number of the first snippet line
    "highlight_start": 50,          # first highlighted line (1-based), or None
    "highlight_end": 60,            # last highlighted line (1-based), or None
    "code": "def send(channel):\n...",  # snippet text
    "total_lines": 616,             # total lines in the full file
    "url": "https://bitbucket.org/...", # original URL
}

OpenGraph preview (returned for generic web pages):

{
    "type": "og",
    "title": "Page Title",          # truncated to 200 chars
    "description": "...",           # truncated to 400 chars
    "image": "https://...",         # og:image URL
    "domain": "example.com",        # hostname only
    "url": "https://example.com/page",  # original URL
}

No preview available:

{}

Supported Bitbucket URL Formats

Bitbucket Cloud:

https://bitbucket.org/{workspace}/{repo}/src/{ref}/{path}
https://bitbucket.org/{workspace}/{repo}/src/{ref}/{path}#lines-N
https://bitbucket.org/{workspace}/{repo}/src/{ref}/{path}#lines-N:M

Bitbucket Server / Data Center:

https://{host}/projects/{PROJECT}/repos/{repo}/browse/{path}
https://{host}/projects/{PROJECT}/repos/{repo}/browse/{path}#{line}
https://{host}/projects/{PROJECT}/repos/{repo}/browse/{path}#{start}-{end}
http://{host}/projects/{PROJECT}/repos/{repo}/browse/{path}   (plain HTTP also works)