minimost.preview
minimost.preview
Link preview generation for URLs posted in chat messages.
When a user sends a message containing a URL, the client fetches a preview
card from /link_preview?url=<url> and renders it below the message.
This module implements the server-side logic for generating those cards.
Three preview strategies are tried in order:
Bitbucket Cloud — URLs on
bitbucket.org. Fetches raw file content via the Bitbucket Cloud REST API and returns a code snippet with optional line-number highlighting.Bitbucket Server / Data Center — Self-hosted Bitbucket instances matching the
/projects/{P}/repos/{R}/browse/{path}URL pattern. Fetches raw content via the Bitbucket Server REST API.OpenGraph / generic — Falls back to fetching the HTML page and extracting
<meta property="og:…">tags (plus<title>and Twitter card meta tags) to build a rich preview card.
Security:
Private and loopback IP addresses are blocked by
_is_safe_url()to prevent Server-Side Request Forgery (SSRF).Only
httpandhttpsschemes are accepted.A 5-second timeout and 64 KiB read limit are applied to all outgoing requests to prevent resource exhaustion.
Caching:
Results are cached in an in-process FIFO dictionary (_CACHE) with a
maximum of 200 entries. This is intentionally simple — cache entries are
never invalidated, and the cache is lost on server restart.
Module-level attributes
- _CACHEdict
In-process preview result cache. Keys are URL strings; values are the result dicts returned by
fetch_preview().- _CACHE_MAXint
Maximum number of entries in
_CACHEbefore the oldest entry is evicted (FIFO).- _TIMEOUTint
HTTP request timeout in seconds (5).
- _HEADERSdict
Request headers sent with all outgoing HTTP requests, including a browser-like
User-Agentto avoid bot-detection blocks.- _PRIVATE_RANGESre.Pattern
Regex that matches hostnames known to be private or loopback addresses (used as a fast pre-filter before the DNS-based
_resolves_to_public_ipcheck).
- minimost.preview.is_text_filename(name)[source]
Return
Trueif name (a basename) denotes a previewable text file.Matches by extension (
_TEXT_EXTENSIONS), by exact filename (_TEXT_FILENAMES), or by thejenkinsfileprefix — which coversJenkinsfile,Jenkinsfile.prodand similar (case-insensitive).
- class minimost.preview._MetaParser[source]
Bases:
HTMLParserStreaming HTML parser that extracts page metadata for link previews.
Parses the
<head>section of an HTML document and collects metadata from the following sources (in priority order viadict.setdefault):<meta property="og:*">— OpenGraph protocol tags.<meta name="twitter:title|description|image">— Twitter Card tags.<meta name="description">— generic description tag.<title>— plain HTML title element (lowest priority).
Parsing stops immediately when the
<body>tag is encountered, since all relevant metadata is in the<head>. This minimises memory usage for large pages.Attributes
- ogdict
Collected metadata keyed by OpenGraph property name (without the
og:prefix), e.g.{"title": "...", "description": "...", "image": "..."}.
Example:
parser = _MetaParser() parser.feed('<head><meta property="og:title" content="Hello"></head>') assert parser.title == "Hello"
Initialize and reset this instance.
If convert_charrefs is True (the default), all character references are automatically converted to the corresponding Unicode characters.
- __init__()[source]
Initialize and reset this instance.
If convert_charrefs is True (the default), all character references are automatically converted to the corresponding Unicode characters.
- handle_starttag(tag, attrs)[source]
Process an opening HTML tag.
Stops processing entirely once
<body>is seen. Extracts content from<meta>tags according to the priority rules described in the class docstring. Sets an internal flag when<title>is opened.
- handle_data(data)[source]
Accumulate text data inside
<title>elements.- Parameters:
data (str) – Raw text content from the parser.
- handle_endtag(tag)[source]
Clear the title-tracking flag when the
</title>tag is seen.- Parameters:
tag (str) – Lowercase tag name.
- property title
The best available title string.
Returns the OpenGraph/Twitter
titleif one was found in<meta>tags, otherwise falls back to the content of the<title>element.- Return type:
- minimost.preview._is_safe_url(url)[source]
Check that a URL does not point to a private or loopback address.
Parses the hostname from url and tests it against
_PRIVATE_RANGES. This is the primary SSRF (Server-Side Request Forgery) mitigation: it prevents the preview endpoint from being used to probe internal network services.Blocked address patterns:
localhost127.x.x.x(loopback)10.x.x.x(RFC 1918 private)172.16–31.x.x(RFC 1918 private)192.168.x.x(RFC 1918 private)::1(IPv6 loopback)
- minimost.preview._resolves_to_public_ip(hostname)[source]
Return True if hostname resolves only to public IP addresses.
- minimost.preview._fetch(url, max_bytes=65536)[source]
Fetch the body of an HTTP/HTTPS URL with safety limits.
Sends a GET request using
_HEADERS(browser-like User-Agent) and_TIMEOUTsecond timeout. Reads at most max_bytes bytes from the response body.Only
httpandhttpsschemes are accepted; any other scheme raisesValueError.- Parameters:
- Returns:
Raw response body bytes.
- Return type:
- Raises:
ValueError – If the URL scheme is not
httporhttps.urllib.error.URLError – If the request fails (network error, DNS failure, etc.).
urllib.error.HTTPError – If the server returns a non-2xx status.
- minimost.preview._build_code_result(raw, filepath, line_start, line_end, url)[source]
Build a code preview result dict from raw file text.
Shared by both Bitbucket Cloud and Bitbucket Server preview functions. Slices the file content to show a relevant snippet and annotates it with metadata needed for client-side syntax highlighting and line-number display.
Snippet selection:
If line_start is provided: shows the highlighted line(s) plus ±3 lines of context.
If line_start is
None: shows the first 25 lines of the file.
- Parameters:
raw (str) – The full raw text content of the file (UTF-8 decoded).
filepath (str) – The file path within the repository (e.g.
"src/minimost/chat.py").line_start (int or None) – 1-based start line to highlight, or
Nonefor no highlighting.line_end (int or None) – 1-based end line to highlight (inclusive), or
Noneif only one line is highlighted.url (str) – The original browser URL that triggered the preview, used to link back from the preview card.
- Returns:
A code preview dict with keys:
type(str): Always"code".filename(str): Basename of the file.filepath(str): Full repository path.language(str): File extension (lowercase), used for syntax highlighting (e.g."py","js").first_line_num(int): Line number of the first line in code snippet (1-based).highlight_start(int or None): First highlighted line.highlight_end(int or None): Last highlighted line.code(str): Newline-joined snippet text.total_lines(int): Total number of lines in the full file.url(str): Original browser URL.
- Return type:
- minimost.preview._parse_bb_cloud(url)[source]
Parse a Bitbucket Cloud file URL into its components.
Accepts URLs of the form:
https://bitbucket.org/{workspace}/{repo}/src/{ref}/{path}[#lines-N[:M]]
- Parameters:
url (str) – The Bitbucket Cloud URL to parse.
- Returns:
A tuple
(workspace, repo, ref, filepath, line_start, line_end)if the URL matches, orNoneif it does not.workspace(str): Bitbucket workspace/organization slug.repo(str): Repository slug.ref(str): Git ref (branch, tag, or commit SHA).filepath(str): Path to the file within the repository.line_start(int or None): 1-based start line from the fragment.line_end(int or None): 1-based end line from the fragment.
- Return type:
tuple or None
- minimost.preview._bitbucket_cloud_preview(url)[source]
Generate a code preview for a Bitbucket Cloud file URL.
Calls
_parse_bb_cloud()to validate and decompose the URL, then fetches up to 512 KiB of the raw file content from the Bitbucket Cloud REST API:https://api.bitbucket.org/2.0/repositories/{workspace}/{repo}/src/{ref}/{path}
Passes the raw text to
_build_code_result()to produce the final preview dict.- Parameters:
url (str) – A Bitbucket Cloud file browser URL.
- Returns:
A code preview dict (see
_build_code_result()) on success, or{}if the URL does not match or the API call fails.- Return type:
- minimost.preview._parse_bb_server(url)[source]
Parse a Bitbucket Server / Data Center file URL into its components.
Accepts URLs of the form:
http(s)://{host}/projects/{PROJECT}/repos/{repo}/browse/{path}[#{start}-{end}]
The URL scheme (
httporhttps) is preserved in the returned base URL so that plain-HTTP self-hosted instances work correctly.- Parameters:
url (str) – The Bitbucket Server URL to parse.
- Returns:
A tuple
(base, project, repo, filepath, line_start, line_end)if the URL matches, orNoneif it does not.base(str): Scheme and host, e.g."https://bitbucket.example.com".project(str): Project key.repo(str): Repository slug.filepath(str): Path to the file within the repository.line_start(int or None): 1-based start line from the fragment.line_end(int or None): 1-based end line from the fragment.
- Return type:
tuple or None
- minimost.preview._bitbucket_server_preview(url)[source]
Generate a code preview for a Bitbucket Server / Data Center file URL.
Calls
_parse_bb_server()to validate and decompose the URL, then fetches up to 512 KiB of the raw file content from the Bitbucket Server REST API:{scheme}://{host}/rest/api/1.0/projects/{PROJECT}/repos/{repo}/raw/{path}
The URL scheme is inherited from the original URL, so self-hosted HTTP instances work without modification.
- Parameters:
url (str) – A Bitbucket Server browse URL.
- Returns:
A code preview dict (see
_build_code_result()) on success, or{}if the URL does not match or the API call fails.- Return type:
- minimost.preview._og_preview(url)[source]
Generate a generic OpenGraph preview for any web URL.
Fetches up to 64 KiB of the page HTML (the default
_fetch()limit) and parses it with_MetaParser. If no title can be extracted, returns{}— it is not useful to render a preview card without a title.Title and description are capped at 200 and 400 characters respectively to prevent excessively large preview cards.
- minimost.preview._text_file_preview(url)[source]
Generate a code preview for a direct link to a text/source file.
Checks the URL path’s file extension (or filename) against
_TEXT_EXTENSIONS/_TEXT_FILENAMES. If it matches, fetches the raw content and passes it through_build_code_result().
- minimost.preview.fetch_preview(url)[source]
Return a preview dict for a URL, using the cache when available.
This is the main entry point called by the
/link_previewroute inminimost.chat.Strategy (tried in order):
Return the cached result if url is already in
_CACHE.Reject the URL if the scheme is not
http/https, or if_is_safe_url()returnsFalse(SSRF protection).Try
_bitbucket_cloud_preview()if the host isbitbucket.org.Try
_bitbucket_server_preview()if the URL matches the Bitbucket Server path pattern.Fall back to
_og_preview()for any other URL.Cache the result (even
{}) and return it.
FIFO cache eviction:
When the cache reaches
_CACHE_MAXentries, the oldest entry is removed by deleting the first key from the dictionary (relies on Python 3.7+ insertion-ordered dicts).
Preview Type Reference
Code preview (returned for Bitbucket URLs):
{
"type": "code",
"filename": "chat.py", # basename of the file
"filepath": "src/chat.py", # full path within the repo
"language": "py", # file extension (for highlighting)
"first_line_num": 47, # line number of the first snippet line
"highlight_start": 50, # first highlighted line (1-based), or None
"highlight_end": 60, # last highlighted line (1-based), or None
"code": "def send(channel):\n...", # snippet text
"total_lines": 616, # total lines in the full file
"url": "https://bitbucket.org/...", # original URL
}
OpenGraph preview (returned for generic web pages):
{
"type": "og",
"title": "Page Title", # truncated to 200 chars
"description": "...", # truncated to 400 chars
"image": "https://...", # og:image URL
"domain": "example.com", # hostname only
"url": "https://example.com/page", # original URL
}
No preview available:
{}
Supported Bitbucket URL Formats
Bitbucket Cloud:
https://bitbucket.org/{workspace}/{repo}/src/{ref}/{path}
https://bitbucket.org/{workspace}/{repo}/src/{ref}/{path}#lines-N
https://bitbucket.org/{workspace}/{repo}/src/{ref}/{path}#lines-N:M
Bitbucket Server / Data Center:
https://{host}/projects/{PROJECT}/repos/{repo}/browse/{path}
https://{host}/projects/{PROJECT}/repos/{repo}/browse/{path}#{line}
https://{host}/projects/{PROJECT}/repos/{repo}/browse/{path}#{start}-{end}
http://{host}/projects/{PROJECT}/repos/{repo}/browse/{path} (plain HTTP also works)