Parsing From Site

It often happens that the supplier's price list is not informative enough to fill your site with full-fledged products: images, attributes, product descriptions, or product names.

But it will be enough for the plugin if the price list contains links to product pages. If there are no links in the price list, then they can be added manually.

Here is an example of a very simple price list:

price ist example for product import

Manually adding links to the price list is not as hard work as it might seem at first glance. Usually, 500 - 800 links can be added in 1 working day, without much stress. Another way to collect the site donor product URLs is to use their sitemap if available.

The main goal is to ensure the product SKUs and product links connection. So that the product SKUs in your store match the SKUs in the supplier's price list. This is necessary so that later, the plugin can automatically update product prices and quantities according to price lists.

How to set the parser to collect data

The parser task is to find the desired fragment in the page source code using your settings. You can set only 4 settings:

1. The column number containing the product page link (in this case, column number 4)

2. Parsing parameters: two texts in the page code, between which the desired fragment is located. In our example, the product name "Рюкзак Walker Ray Hype Black, 30x50x16 см" is between parameters content=" and "

so the parsing parameters will be like this:

However, according to these parameters, the plugin will parse the text website, since it occurs before the product name Рюкзак Walker Ray Hype Black, 30x50x16 см (see page code).

To refine the parsing "Keytext to begin, end parsing" is used.

3. They tell the plugin which piece of text to cut out of the page code in order to look for parsing parameters in this fragment.

If you cut a fragment from the page code from the text property="og:title to the page end, then the parsing result will be correct because the search for parameters will start from the text property="og:title:

4. It is possible to use "The number of occurrences" for the first parameter instead of "Keytext to begin, end parsing". In our example, the first parameter - the text content=" occurs twice in the page code, so this parsing setting will also be correct:

Parsing a password-protected website

If you are a supplier's official partner, and the supplier's website is protected from unauthorized viewing, then you must have a login and password to access the site pages.

This means that you can parse the supplier's site. More precisely, not you, but your server (website) on which the plugin is installed. And to be more precise, the form of the supplier in which you specified the parsing parameters is parsed.

The login and password for accessing the "closed" site should be set in the supplier's form, in the "Cron" tab in the "Main task" section:

even if you don't use cron.

Cron jobs can be disabled:

Parsing a price list cell

Sometimes suppliers add product attributes, not in separate price list columns, but in the "Description" column as a table under the description text:

The plugin will be able to parse this text into attributes in the same way as it does when parsing a web page. The settings will be like this:

1. The price list column number containing such descriptions should be set here:

2. Enable product attributes adding:

3. Parsing parameters set up:

Parsing. Troubleshooting

File parsing

The need to parse files arises when the donor site does not want to give its web page to your server and some parsing protection is applied.

You should not blame the donor site for malicious intent, most likely it perceived your parsing attempts as a DDOS attack because when parsing, your server navigates through the donor site pages at a speed that is not available to a simple user browsing product pages from its computer.

The plugin can be set to slow down the page transition speed this way:

but if in the plugin report you see this text:

 The Product passed: Row ~= 6  url =  Site no answer 

it means that you are late with the parsing pause. Your server is banned, and now you have to parse donor pages as files.

Step-by-step instruction:

1. Open the product page in your browser

2. Select part of the link of this page as the future file name according to the following rules:

From the sign ? to the end of the link or to a period:

From the sign & to the end of the link or to a period:

From slash:

(if there is a slash at the end of the link, then before the slash).

3. Press keys in sequence:

ctrl+C , ctrl+S , ctrl+V and  Enter

The page will be saved as an HTML file on your computer:

4. Remove the extension from the file name (use "Group Rename"):

5. Upload the page files to the server in the admin/uploads folder.

6. Turn on this setting:

and save the supplier form.

7. Start price list processing.


In the admin/model/catalog/suppler.php file, you can specify proxy server addresses. This will avoid the problems associated with the ban.

By default, parsing through a proxy in the plugin is disabled, as this greatly slows down the work. To allow the use of a proxy, you need to remove the comment (find this text $arr_proxi = array(); ):

Be sure to update the proxy server addresses.

Wrong page

If, when parsing a web page, in the report appears this message:

Parsing Product Name error: Row 12  Check your settings.

this does not necessarily indicate an error in the parsing parameters. It is possible that the donor site returned to your server the wrong page and not the requested one. For example, it offered to enter a captcha or (more often) sent some kind of nonsense.

To see what exactly the donor site sent, you need in the plugin file admin/model/catalog/suppler.php before the lines return $body; in two places:



to insert this code:

$err = " Answer = " . $body ." \n"; 

Try on a price list with one product. The result can be seen in the Report (file admin/uploads/errors.tmp).

Usergio Copyright © 2022