π CSS Scraper Provider (css_scraper)
The CSS Scraper is a versatile provider that can extract a price from any public webpage using a CSS selector. It is one of the Asset Providers β it does not rely on a specific financial data API.
π User Guide: CSS Scraper β User Manual
βοΈ How it Works
-
Configuration: When assigning this provider to an asset, you must provide:
identifier: The URL of the webpage to scrape.identifier_type:URLprovider_params:current_css_selector(required): The CSS selector to locate the price element on the page (e.g.,#sp-last,.price-value).currency(required): The currency of the price (ISO 4217).decimal_format(optional):us(e.g.,1,234.56) oreu(e.g.,1.234,56). Default:us.timeout(optional): HTTP request timeout in seconds. Default:30.user_agent(optional): Custom User-Agent header. Default:LibreFolio/1.0.
-
Execution:
- Fetches the HTML of the specified URL via
httpx(async HTTP client). - Uses BeautifulSoup to parse the HTML and find the element matching the CSS selector.
- Extracts the text content and parses it into a
Decimalvalue, handling different number formats based ondecimal_format.
- Fetches the HTML of the specified URL via
-
get_asset_url(): Returns theidentifierURL itself (the page being scraped). -
params_schema: Exposes all 5 configuration fields for dynamic form generation in the frontend.
π’ Decimal Format Remapping
The decimal_format parameter controls how the scraped text is parsed into a number:
| Format | Input Example | Parsed Value |
|---|---|---|
us (default) |
1,234.56 |
1234.56 |
eu |
1.234,56 |
1234.56 |
The parser strips all non-numeric characters except the decimal separator, then converts to Decimal. Group separators (, for US, . for EU) are removed first.
β‘ Caching & Performance
- No response caching: Each
get_current_value()call performs a fresh HTTP request. This is intentional β scraped data may change frequently and the provider cannot predict staleness. - Connection pooling: Uses
httpx.AsyncClientwith connection reuse across requests. - Timeout handling: Configurable per-asset via
timeoutparameter. Default 30s prevents blocking on slow sites.
π Use Cases
- Tracking the price of an asset from a financial news website.
- Scraping data from a niche market data provider that doesn't have an API.
- Tracking the value of a collectible from an auction site.
β οΈ Limitations
- No Historical Data:
supports_history = False. It can only fetch the current value. - Fragile: If the website's layout changes, the CSS selector may break. Use the probe endpoint to test before saving.
- Requires Public Access: It cannot access pages that require a login.
- Rate limits: No built-in rate limiting. High-frequency sync may trigger the target site's anti-bot protection.
π Related Documentation
- π CSS Scraper β User Guide β End-user configuration guide
- π¦ Providers Overview β All available providers
- π° Asset Architecture β Sync pipeline and price queries
- π Asset Plugin Guide β How to create a new provider