14 Commits

Author SHA1 Message Date
c7d71a9ec2 Merge branch 'master' of https://github.com/misskey-dev/summaly 2023-04-20 04:02:59 +00:00
994f420b46 4.0.2 2023-04-20 04:02:55 +00:00
5a3321a04f fix: allow legacy allowfullscreen (#9) 2023-04-20 12:41:11 +09:00
1bab7afee6 Merge branch 'master' of https://github.com/misskey-dev/summaly 2023-03-16 03:26:10 +00:00
441e8c22f9 v4.0.1 2023-03-16 03:26:00 +00:00
376bba9c61 fix: give null when oEmbed access fails (#8) 2023-03-16 12:22:23 +09:00
028b2fed2f fix README.md 2023-03-13 18:03:16 +00:00
90d5d0f33b Fix README.md 2023-03-13 18:02:07 +00:00
9e955d8d04 fix readme 2023-03-13 17:57:37 +00:00
a36652c859 v4.0.0 2023-03-13 17:53:29 +00:00
eab3766db9 feat: add oEmbed support (#6)
* feat: add oEmbed support

* more safelisted features

* fix the syntax

* Update README.md

* permissions

* names

* playerを使うように

* fix type error

* support width (for size ratio)

* test for type: video

* nullable width

* restore max height test

* ignored permissions

* restore autoplay

* Use WHATWG URL

---------

Co-authored-by: tamaina <tamaina@hotmail.co.jp>
2023-03-14 02:46:41 +09:00
51f3870e1f fix changelog 2023-02-12 15:02:56 +00:00
5684f116c9 v3.0.4 2023-02-12 14:44:43 +00:00
709ca51b6c v3.0.3 2023-02-12 14:37:37 +00:00
51 changed files with 4139 additions and 1642 deletions

View File

@ -1,6 +1,27 @@
4.0.2 / 2023-04-20
------------------
* YouTubeをフルスクリーンにできない問題を修正
4.0.1 / 2023-03-16
------------------
* oEmbedの読み込みでエラーが発生した際は、エラーにせずplayerの中身をnullにするように
4.0.0 / 2023-03-14
------------------
* oEmbed type=richの制限的なサポート
* プラグインの引数がWHATWG URLになりました
3.0.4 / 2023-02-12
------------------
* 不要な依存関係を除去
3.0.3 / 2023-02-12
------------------
* agentが指定されているもしくはagentが空のオブジェクトの場合はプライベートIPのリクエストを許可
3.0.2 / 2023-02-12
------------------
* Fastifyのルーティングを'/'から'*'に
* Fastifyのルーティングを'/url'から'/'に
3.0.1 / 2023-02-12
------------------

View File

@ -21,7 +21,7 @@ import { summaly } from 'summaly';
summaly(url[, opts])
```
As Fastify plugin:
As Fastify plugin:
(will listen `GET` of `/`)
```javascript
@ -51,60 +51,85 @@ npm run serve
``` typescript
interface IPlugin {
test: (url: URL.Url) => boolean;
summarize: (url: URL.Url) => Promise<Summary>;
test: (url: URL) => boolean;
summarize: (url: URL) => Promise<Summary>;
}
```
urls are WHATWG URL since v4.
### Returns
A Promise of an Object that contains properties below:
※ Almost all values are nullable. player shoud not be null.
※ Almost all values are nullable. player should not be null.
#### Root
| Property | Type | Description |
| :-------------- | :------- | :--------------------------------------- |
| **description** | *string* | The description of the web page |
| **icon** | *string* | The url of the icon of the web page |
| **sitename** | *string* | The name of the web site |
| **thumbnail** | *string* | The url of the thumbnail of the web page |
| **player** | *Player* | The player of the web page |
| **title** | *string* | The title of the web page |
| **url** | *string* | The url of the web page |
| Property | Type | Description |
| :-------------- | :------- | :------------------------------------------ |
| **description** | *string* | The description of the web page |
| **icon** | *string* | The url of the icon of the web page |
| **sitename** | *string* | The name of the web site |
| **thumbnail** | *string* | The url of the thumbnail of the web page |
| **oEmbed** | *OEmbedRichIframe* | The oEmbed rich iframe info of the web page |
| **player** | *Player* | The player of the web page |
| **title** | *string* | The title of the web page |
| **url** | *string* | The url of the web page |
#### Player
| Property | Type | Description |
| :-------------- | :------- | :--------------------------------------- |
| **url** | *string* | The url of the player |
| **width** | *number* | The width of the player |
| **height** | *number* | The height of the player |
| Property | Type | Description |
| :-------------- | :--------- | :---------------------------------------------- |
| **url** | *string* | The url of the player |
| **width** | *number* | The width of the player |
| **height** | *number* | The height of the player |
| **allow** | *string[]* | The names of the allowed permissions for iframe |
Currently the possible items in `allow` are:
* `autoplay`
* `clipboard-write`
* `fullscreen`
* `encrypted-media`
* `picture-in-picture`
See [Permissions Policy](https://developer.mozilla.org/en-US/docs/Web/HTTP/Permissions_Policy) in MDN for details of them.
### Example
``` javascript
```javascript
import { summaly } from 'summaly';
const summary = await summaly('https://www.youtube.com/watch?v=NMIEAhH_fTU');
console.log(summary); // will be ... ↓
/*
console.log(summary);
```
will be ... ↓
```json
{
title: '【楽曲試聴】「Stage Bye Stage」(歌:島村卯月、渋谷凛、本田未央)',
icon: 'https://s.ytimg.com/yts/img/favicon-vfl8qSV2F.ico',
description: 'http://columbia.jp/idolmaster/ 2018年7月18日発売予定 THE IDOLM@STER CINDERELLA GIRLS CG STAR LIVE Stage Bye Stage 歌:島村卯月、渋谷凛、本田未央 COCC-17495CD1枚組 ¥1,200税 収録内容 Tr...',
thumbnail: 'https://i.ytimg.com/vi/NMIEAhH_fTU/maxresdefault.jpg',
player: {
url: 'https://www.youtube.com/embed/NMIEAhH_fTU',
width: 1280,
height: 720
"title": "【アイドルマスター】「Stage Bye Stage」(歌:島村卯月、渋谷凛、本田未央)",
"icon": "https://www.youtube.com/s/desktop/9318de79/img/favicon.ico",
"description": "Website▶https://columbia.jp/idolmaster/Playlist▶https://www.youtube.com/playlist?list=PL83A2998CF3BBC86D2018年7月18日発売予定THE IDOLM@STER CINDERELLA GIRLS CG STAR...",
"thumbnail": "https://i.ytimg.com/vi/NMIEAhH_fTU/maxresdefault.jpg",
"player": {
"url": "https://www.youtube.com/embed/NMIEAhH_fTU?feature=oembed",
"width": 200,
"height": 113,
"allow": [
"autoplay",
"clipboard-write",
"encrypted-media",
"picture-in-picture",
"web-share"
]
},
sitename: 'YouTube',
url: 'https://www.youtube.com/watch?v=NMIEAhH_fTU'
"sitename": "YouTube",
"sensitive": false,
"url": "https://www.youtube.com/watch?v=NMIEAhH_fTU"
}
*/
```
Testing
@ -115,12 +140,8 @@ License
----------------------------------------------------------------
[MIT](LICENSE)
[npm-link]: https://www.npmjs.com/package/summaly
[npm-badge]: https://img.shields.io/npm/v/summaly.svg?style=flat-square
[mit]: http://opensource.org/licenses/MIT
[mit-badge]: https://img.shields.io/badge/license-MIT-444444.svg?style=flat-square
[travis-link]: https://travis-ci.org/syuilo/summaly
[travis-badge]: http://img.shields.io/travis/syuilo/summaly.svg?style=flat-square
[himasaku]: https://himasaku.net
[himawari-badge]: https://img.shields.io/badge/%E5%8F%A4%E8%B0%B7-%E5%90%91%E6%97%A5%E8%91%B5-1684c5.svg?style=flat-square
[sakurako-badge]: https://img.shields.io/badge/%E5%A4%A7%E5%AE%A4-%E6%AB%BB%E5%AD%90-efb02a.svg?style=flat-square

6
built/general.d.ts vendored
View File

@ -1,4 +1,4 @@
import * as URL from 'node:url';
import Summary from './summary.js';
declare const _default: (url: URL.Url, lang?: string | null) => Promise<Summary | null>;
import { URL } from 'node:url';
import type { default as Summary } from './summary.js';
declare const _default: (_url: URL | string, lang?: string | null) => Promise<Summary | null>;
export default _default;

View File

@ -1,11 +1,126 @@
import * as URL from 'node:url';
import { URL } from 'node:url';
import clip from './utils/clip.js';
import cleanupTitle from './utils/cleanup-title.js';
import { decode as decodeHtml } from 'html-entities';
import { head, scpaping } from './utils/got.js';
export default async (url, lang = null) => {
import { get, head, scpaping } from './utils/got.js';
import * as cheerio from 'cheerio';
/**
* Contains only the html snippet for a sanitized iframe as the thumbnail is
* mostly covered in OpenGraph instead.
*
* Width should always be 100%.
*/
async function getOEmbedPlayer($, pageUrl) {
const href = $('link[type="application/json+oembed"]').attr('href');
if (!href) {
return null;
}
const oEmbedUrl = (() => {
try {
return new URL(href, pageUrl);
}
catch {
return null;
}
})();
if (!oEmbedUrl) {
return null;
}
const oEmbed = await get(oEmbedUrl.href).catch(() => null);
if (!oEmbed) {
return null;
}
const body = (() => {
try {
return JSON.parse(oEmbed);
}
catch { }
})();
if (!body || body.version !== '1.0' || !['rich', 'video'].includes(body.type)) {
// Not a well formed rich oEmbed
return null;
}
if (!body.html.startsWith('<iframe ') || !body.html.endsWith('</iframe>')) {
// It includes something else than an iframe
return null;
}
const oEmbedHtml = cheerio.load(body.html);
const iframe = oEmbedHtml("iframe");
if (iframe.length !== 1) {
// Somehow we either have multiple iframes or none
return null;
}
if (iframe.parents().length !== 2) {
// Should only have the body and html elements as the parents
return null;
}
const url = iframe.attr('src');
if (!url) {
// No src?
return null;
}
try {
if ((new URL(url)).protocol !== 'https:') {
// Allow only HTTPS for best security
return null;
}
}
catch (e) {
return null;
}
// Height is the most important, width is okay to be null. The implementer
// should choose fixed height instead of fixed aspect ratio if width is null.
//
// For example, Spotify's embed page does not strictly follow aspect ratio
// and thus keeping the height is better than keeping the aspect ratio.
//
// Spotify gives `width: 100%, height: 152px` for iframe while `width: 456,
// height: 152` for oEmbed data, and we treat any percentages as null here.
let width = Number(iframe.attr('width') ?? body.width);
if (Number.isNaN(width)) {
width = null;
}
const height = Math.min(Number(iframe.attr('height') ?? body.height), 1024);
if (Number.isNaN(height)) {
// No proper height info
return null;
}
// TODO: This implementation only allows basic syntax of `allow`.
// Might need to implement better later.
const safeList = [
'autoplay',
'clipboard-write',
'fullscreen',
'encrypted-media',
'picture-in-picture',
'web-share',
];
// YouTube has these but they are almost never used.
const ignoredList = [
'gyroscope',
'accelerometer',
];
const allowedPermissions = (iframe.attr('allow') ?? '').split(/\s*;\s*/g)
.filter(s => s)
.filter(s => !ignoredList.includes(s));
if (iframe.attr('allowfullscreen') === '') {
allowedPermissions.push('fullscreen');
}
if (allowedPermissions.some(allow => !safeList.includes(allow))) {
// This iframe is probably too powerful to be embedded
return null;
}
return {
url,
width,
height,
allow: allowedPermissions
};
}
export default async (_url, lang = null) => {
if (lang && !lang.match(/^[\w-]+(\s*,\s*[\w-]+)*$/))
lang = null;
const url = typeof _url === 'string' ? new URL(_url) : _url;
const res = await scpaping(url.href, { lang: lang || undefined });
const $ = res.$;
const twitterCard = $('meta[property="twitter:card"]').attr('content');
@ -21,7 +136,7 @@ export default async (url, lang = null) => {
$('link[rel="image_src"]').attr('href') ||
$('link[rel="apple-touch-icon"]').attr('href') ||
$('link[rel="apple-touch-icon image_src"]').attr('href');
image = image ? URL.resolve(url.href, image) : null;
image = image ? (new URL(image, url.href)).href : null;
const playerUrl = (twitterCard !== 'summary_large_image' && $('meta[property="twitter:player"]').attr('content')) ||
(twitterCard !== 'summary_large_image' && $('meta[name="twitter:player"]').attr('content')) ||
$('meta[property="og:video"]').attr('content') ||
@ -44,39 +159,30 @@ export default async (url, lang = null) => {
if (title === description) {
description = null;
}
let siteName = $('meta[property="og:site_name"]').attr('content') ||
let siteName = decodeHtml($('meta[property="og:site_name"]').attr('content') ||
$('meta[name="application-name"]').attr('content') ||
url.hostname;
siteName = siteName ? decodeHtml(siteName) : null;
url.hostname);
const favicon = $('link[rel="shortcut icon"]').attr('href') ||
$('link[rel="icon"]').attr('href') ||
'/favicon.ico';
const sensitive = $('.tweet').attr('data-possibly-sensitive') === 'true';
const find = async (path) => {
const target = URL.resolve(url.href, path);
const target = new URL(path, url.href);
try {
await head(target);
await head(target.href);
return target;
}
catch (e) {
return null;
}
};
// 相対的なURL (ex. test) を絶対的 (ex. /test) に変換
const toAbsolute = (relativeURLString) => {
const relativeURL = URL.parse(relativeURLString);
const isAbsolute = relativeURL.slashes || relativeURL.path !== null && relativeURL.path[0] === '/';
// 既に絶対的なら、即座に値を返却
if (isAbsolute) {
return relativeURLString;
}
// スラッシュを付けて返却
return '/' + relativeURLString;
const getIcon = async () => {
return (await find(favicon)) || null;
};
const icon = await find(favicon) ||
// 相対指定を絶対指定に変換し再試行
await find(toAbsolute(favicon)) ||
null;
const [icon, oEmbed] = await Promise.all([
getIcon(),
getOEmbedPlayer($, url.href),
]);
// Clean up the title
title = cleanupTitle(title, siteName);
if (title === '') {
@ -84,13 +190,14 @@ export default async (url, lang = null) => {
}
return {
title: title || null,
icon: icon || null,
icon: icon?.href || null,
description: description || null,
thumbnail: image || null,
player: {
player: oEmbed ?? {
url: playerUrl || null,
width: Number.isNaN(playerWidth) ? null : playerWidth,
height: Number.isNaN(playerHeight) ? null : playerHeight
height: Number.isNaN(playerHeight) ? null : playerHeight,
allow: ['autoplay', 'encrypted-media', 'fullscreen'],
},
sitename: siteName || null,
sensitive,

View File

@ -2,7 +2,7 @@
* summaly
* https://github.com/syuilo/summaly
*/
import * as URL from 'node:url';
import { URL } from 'node:url';
import tracer from 'trace-redirect';
import general from './general.js';
import { setAgent } from './utils/got.js';
@ -30,7 +30,7 @@ export const summaly = async (url, options) => {
actualUrl = url;
}
}
const _url = URL.parse(actualUrl, true);
const _url = new URL(actualUrl);
// Find matching plugin
const match = plugins.filter(plugin => plugin.test(_url))[0];
// Get summary

6
built/iplugin.d.ts vendored
View File

@ -1,7 +1,7 @@
/// <reference types="node" />
import * as URL from 'node:url';
import type { URL } from 'node:url';
import Summary from './summary.js';
export interface IPlugin {
test: (url: URL.Url) => boolean;
summarize: (url: URL.Url, lang?: string) => Promise<Summary>;
test: (url: URL) => boolean;
summarize: (url: URL, lang?: string) => Promise<Summary>;
}

View File

@ -1,5 +1,5 @@
/// <reference types="node" />
import * as URL from 'node:url';
import { URL } from 'node:url';
import summary from '../summary.js';
export declare function test(url: URL.Url): boolean;
export declare function summarize(url: URL.Url): Promise<summary>;
export declare function test(url: URL): boolean;
export declare function summarize(url: URL): Promise<summary>;

View File

@ -36,8 +36,9 @@ export async function summarize(url) {
player: {
url: playerUrl || null,
width: playerWidth ? parseInt(playerWidth) : null,
height: playerHeight ? parseInt(playerHeight) : null
height: playerHeight ? parseInt(playerHeight) : null,
allow: playerUrl ? ['fullscreen', 'encrypted-media'] : [],
},
sitename: 'Amazon'
sitename: 'Amazon',
};
}

View File

@ -1,5 +1,5 @@
/// <reference types="node" />
import * as URL from 'node:url';
import { URL } from 'node:url';
import summary from '../summary.js';
export declare function test(url: URL.Url): boolean;
export declare function summarize(url: URL.Url): Promise<summary>;
export declare function test(url: URL): boolean;
export declare function summarize(url: URL): Promise<summary>;

View File

@ -29,8 +29,9 @@ export async function summarize(url) {
player: {
url: null,
width: null,
height: null
height: null,
allow: [],
},
sitename: 'Wikipedia'
sitename: 'Wikipedia',
};
}

4
built/summary.d.ts vendored
View File

@ -42,4 +42,8 @@ export declare type Player = {
* The height of the player
*/
height: number | null;
/**
* The allowed permissions of the iframe
*/
allow: string[];
};

View File

@ -28,8 +28,8 @@ export async function scpaping(url, opts) {
},
typeFilter: /^(text\/html|application\/xhtml\+xml)/,
});
// テスト用
const allowPrivateIp = process.env.SUMMALY_ALLOW_PRIVATE_IP === 'true';
// SUMMALY_ALLOW_PRIVATE_IPはテスト用
const allowPrivateIp = process.env.SUMMALY_ALLOW_PRIVATE_IP === 'true' || Object.keys(agent).length > 0;
if (!allowPrivateIp && response.ip && PrivateIp(response.ip)) {
throw new StatusError(`Private IP rejected ${response.ip}`, 400, 'Private IP Rejected');
}
@ -84,14 +84,15 @@ async function getResponse(args) {
limit: 0,
},
});
return await receiveResponce({ req, typeFilter: args.typeFilter });
return await receiveResponse({ req, typeFilter: args.typeFilter });
}
async function receiveResponce(args) {
async function receiveResponse(args) {
const req = args.req;
const maxSize = MAX_RESPONSE_SIZE;
req.on('response', (res) => {
// Check html
if (args.typeFilter && !res.headers['content-type']?.match(args.typeFilter)) {
// console.warn(res.headers['content-type']);
req.cancel(`Rejected by type filter ${res.headers['content-type']}`);
return;
}

View File

@ -1,4 +1,5 @@
export declare class StatusError extends Error {
name: string;
statusCode: number;
statusMessage?: string;
isPermanentError: boolean;

1515
package-lock.json generated

File diff suppressed because it is too large Load Diff

View File

@ -1,6 +1,6 @@
{
"name": "summaly",
"version": "3.0.1",
"version": "4.0.2",
"description": "Get web page's summary",
"author": "syuilo <syuilotan@yahoo.co.jp>",
"license": "MIT",
@ -9,6 +9,7 @@
"main": "./built/index.js",
"type": "module",
"types": "./built/index.d.ts",
"packageManager": "pnpm@8.3.1",
"files": [
"built",
"LICENSE"
@ -18,33 +19,27 @@
"test": "node --experimental-vm-modules node_modules/jest/bin/jest.js --silent=false --verbose false",
"serve": "fastify start ./built/index.js"
},
"optionalDependencies": {
"fastify": "3.24.1"
},
"devDependencies": {
"@jest/globals": "^29.4.2",
"@swc/core": "^1.3.35",
"@swc/jest": "^0.2.24",
"@jest/globals": "^29.5.0",
"@swc/core": "^1.3.52",
"@swc/jest": "^0.2.26",
"@types/cheerio": "0.22.18",
"@types/debug": "4.1.7",
"@types/escape-regexp": "^0.0.1",
"@types/html-entities": "1.3.4",
"@types/node": "16.11.12",
"debug": "^4.3.4",
"express": "^4.18.2",
"fastify": "^4.13.0",
"fastify": "^4.15.0",
"fastify-cli": "^5.7.1",
"jest": "^29.4.2",
"jest": "^29.5.0",
"typescript": "4.5.3"
},
"dependencies": {
"cheerio": "^1.0.0-rc.12",
"cheerio": "1.0.0-rc.12",
"escape-regexp": "0.0.1",
"got": "^12.5.3",
"got": "^12.6.0",
"html-entities": "2.3.2",
"iconv-lite": "0.6.3",
"jschardet": "3.0.0",
"koa": "2.13.4",
"private-ip": "2.3.3",
"trace-redirect": "1.0.6"
}

3351
pnpm-lock.yaml generated Normal file

File diff suppressed because it is too large Load Diff

View File

@ -1,15 +1,141 @@
import * as URL from 'node:url';
import { URL } from 'node:url';
import clip from './utils/clip.js';
import cleanupTitle from './utils/cleanup-title.js';
import { decode as decodeHtml } from 'html-entities';
import { head, scpaping } from './utils/got.js';
import Summary from './summary.js';
import { get, head, scpaping } from './utils/got.js';
import type { default as Summary, Player } from './summary.js';
import * as cheerio from 'cheerio';
export default async (url: URL.Url, lang: string | null = null): Promise<Summary | null> => {
/**
* Contains only the html snippet for a sanitized iframe as the thumbnail is
* mostly covered in OpenGraph instead.
*
* Width should always be 100%.
*/
async function getOEmbedPlayer($: cheerio.CheerioAPI, pageUrl: string): Promise<Player | null> {
const href = $('link[type="application/json+oembed"]').attr('href');
if (!href) {
return null;
}
const oEmbedUrl = (() => {
try {
return new URL(href, pageUrl);
} catch { return null }
})();
if (!oEmbedUrl) {
return null;
}
const oEmbed = await get(oEmbedUrl.href).catch(() => null);
if (!oEmbed) {
return null;
}
const body = (() => {
try {
return JSON.parse(oEmbed);
} catch {}
})();
if (!body || body.version !== '1.0' || !['rich', 'video'].includes(body.type)) {
// Not a well formed rich oEmbed
return null;
}
if (!body.html.startsWith('<iframe ') || !body.html.endsWith('</iframe>')) {
// It includes something else than an iframe
return null;
}
const oEmbedHtml = cheerio.load(body.html);
const iframe = oEmbedHtml("iframe");
if (iframe.length !== 1) {
// Somehow we either have multiple iframes or none
return null;
}
if (iframe.parents().length !== 2) {
// Should only have the body and html elements as the parents
return null;
}
const url = iframe.attr('src');
if (!url) {
// No src?
return null;
}
try {
if ((new URL(url)).protocol !== 'https:') {
// Allow only HTTPS for best security
return null;
}
} catch (e) {
return null;
}
// Height is the most important, width is okay to be null. The implementer
// should choose fixed height instead of fixed aspect ratio if width is null.
//
// For example, Spotify's embed page does not strictly follow aspect ratio
// and thus keeping the height is better than keeping the aspect ratio.
//
// Spotify gives `width: 100%, height: 152px` for iframe while `width: 456,
// height: 152` for oEmbed data, and we treat any percentages as null here.
let width: number | null = Number(iframe.attr('width') ?? body.width);
if (Number.isNaN(width)) {
width = null;
}
const height = Math.min(Number(iframe.attr('height') ?? body.height), 1024);
if (Number.isNaN(height)) {
// No proper height info
return null;
}
// TODO: This implementation only allows basic syntax of `allow`.
// Might need to implement better later.
const safeList = [
'autoplay',
'clipboard-write',
'fullscreen',
'encrypted-media',
'picture-in-picture',
'web-share',
];
// YouTube has these but they are almost never used.
const ignoredList = [
'gyroscope',
'accelerometer',
];
const allowedPermissions =
(iframe.attr('allow') ?? '').split(/\s*;\s*/g)
.filter(s => s)
.filter(s => !ignoredList.includes(s));
if (iframe.attr('allowfullscreen') === '') {
allowedPermissions.push('fullscreen');
}
if (allowedPermissions.some(allow => !safeList.includes(allow))) {
// This iframe is probably too powerful to be embedded
return null;
}
return {
url,
width,
height,
allow: allowedPermissions
}
}
export default async (_url: URL | string, lang: string | null = null): Promise<Summary | null> => {
if (lang && !lang.match(/^[\w-]+(\s*,\s*[\w-]+)*$/)) lang = null;
const url = typeof _url === 'string' ? new URL(_url) : _url;
const res = await scpaping(url.href, { lang: lang || undefined });
const $ = res.$;
const twitterCard = $('meta[property="twitter:card"]').attr('content');
@ -32,7 +158,7 @@ export default async (url: URL.Url, lang: string | null = null): Promise<Summary
$('link[rel="apple-touch-icon"]').attr('href') ||
$('link[rel="apple-touch-icon image_src"]').attr('href');
image = image ? URL.resolve(url.href, image) : null;
image = image ? (new URL(image, url.href)).href : null;
const playerUrl =
(twitterCard !== 'summary_large_image' && $('meta[property="twitter:player"]').attr('content')) ||
@ -66,12 +192,11 @@ export default async (url: URL.Url, lang: string | null = null): Promise<Summary
description = null;
}
let siteName =
let siteName = decodeHtml(
$('meta[property="og:site_name"]').attr('content') ||
$('meta[name="application-name"]').attr('content') ||
url.hostname;
siteName = siteName ? decodeHtml(siteName) : null;
url.hostname
);
const favicon =
$('link[rel="shortcut icon"]').attr('href') ||
@ -81,33 +206,23 @@ export default async (url: URL.Url, lang: string | null = null): Promise<Summary
const sensitive = $('.tweet').attr('data-possibly-sensitive') === 'true'
const find = async (path: string) => {
const target = URL.resolve(url.href, path);
const target = new URL(path, url.href);
try {
await head(target);
await head(target.href);
return target;
} catch (e) {
return null;
}
};
// 相対的なURL (ex. test) を絶対的 (ex. /test) に変換
const toAbsolute = (relativeURLString: string): string => {
const relativeURL = URL.parse(relativeURLString);
const isAbsolute = relativeURL.slashes || relativeURL.path !== null && relativeURL.path[0] === '/';
const getIcon = async () => {
return (await find(favicon)) || null;
}
// 既に絶対的なら、即座に値を返却
if (isAbsolute) {
return relativeURLString;
}
// スラッシュを付けて返却
return '/' + relativeURLString;
};
const icon = await find(favicon) ||
// 相対指定を絶対指定に変換し再試行
await find(toAbsolute(favicon)) ||
null;
const [icon, oEmbed] = await Promise.all([
getIcon(),
getOEmbedPlayer($, url.href),
])
// Clean up the title
title = cleanupTitle(title, siteName);
@ -118,13 +233,14 @@ export default async (url: URL.Url, lang: string | null = null): Promise<Summary
return {
title: title || null,
icon: icon || null,
icon: icon?.href || null,
description: description || null,
thumbnail: image || null,
player: {
player: oEmbed ?? {
url: playerUrl || null,
width: Number.isNaN(playerWidth) ? null : playerWidth,
height: Number.isNaN(playerHeight) ? null : playerHeight
height: Number.isNaN(playerHeight) ? null : playerHeight,
allow: ['autoplay', 'encrypted-media', 'fullscreen'],
},
sitename: siteName || null,
sensitive,

View File

@ -3,7 +3,7 @@
* https://github.com/syuilo/summaly
*/
import * as URL from 'node:url';
import { URL } from 'node:url';
import tracer from 'trace-redirect';
import Summary from './summary.js';
import type { IPlugin as _IPlugin } from './iplugin.js';
@ -69,7 +69,7 @@ export const summaly = async (url: string, options?: Options): Promise<Result> =
}
}
const _url = URL.parse(actualUrl, true);
const _url = new URL(actualUrl);
// Find matching plugin
const match = plugins.filter(plugin => plugin.test(_url))[0];

View File

@ -1,7 +1,7 @@
import * as URL from 'node:url';
import type { URL } from 'node:url';
import Summary from './summary.js';
export interface IPlugin {
test: (url: URL.Url) => boolean;
summarize: (url: URL.Url, lang?: string) => Promise<Summary>;
test: (url: URL) => boolean;
summarize: (url: URL, lang?: string) => Promise<Summary>;
}

View File

@ -1,8 +1,8 @@
import * as URL from 'node:url';
import { URL } from 'node:url';
import { scpaping } from '../utils/got.js';
import summary from '../summary.js';
export function test(url: URL.Url): boolean {
export function test(url: URL): boolean {
return url.hostname === 'www.amazon.com' ||
url.hostname === 'www.amazon.co.jp' ||
url.hostname === 'www.amazon.ca' ||
@ -19,7 +19,7 @@ export function test(url: URL.Url): boolean {
url.hostname === 'www.amazon.au';
}
export async function summarize(url: URL.Url): Promise<summary> {
export async function summarize(url: URL): Promise<summary> {
const res = await scpaping(url.href);
const $ = res.$;
@ -51,8 +51,9 @@ export async function summarize(url: URL.Url): Promise<summary> {
player: {
url: playerUrl || null,
width: playerWidth ? parseInt(playerWidth) : null,
height: playerHeight ? parseInt(playerHeight) : null
height: playerHeight ? parseInt(playerHeight) : null,
allow: playerUrl ? ['fullscreen', 'encrypted-media'] : [],
},
sitename: 'Amazon'
sitename: 'Amazon',
};
}

View File

@ -1,4 +1,4 @@
import * as URL from 'node:url';
import { URL } from 'node:url';
import { get } from '../utils/got.js';
import debug from 'debug';
import summary from '../summary.js';
@ -6,12 +6,12 @@ import clip from './../utils/clip.js';
const log = debug('summaly:plugins:wikipedia');
export function test(url: URL.Url): boolean {
export function test(url: URL): boolean {
if (!url.hostname) return false;
return /\.wikipedia\.org$/.test(url.hostname);
}
export async function summarize(url: URL.Url): Promise<summary> {
export async function summarize(url: URL): Promise<summary> {
const lang = url.host ? url.host.split('.')[0] : null;
const title = url.pathname ? url.pathname.split('/')[2] : null;
const endpoint = `https://${lang}.wikipedia.org/w/api.php?format=json&action=query&prop=extracts&exintro=&explaintext=&titles=${title}`;
@ -38,8 +38,9 @@ export async function summarize(url: URL.Url): Promise<summary> {
player: {
url: null,
width: null,
height: null
height: null,
allow: [],
},
sitename: 'Wikipedia'
sitename: 'Wikipedia',
};
}

View File

@ -52,4 +52,9 @@ export type Player = {
* The height of the player
*/
height: number | null;
/**
* The allowed permissions of the iframe
*/
allow: string[];
};

View File

@ -42,8 +42,8 @@ export async function scpaping(url: string, opts?: { lang?: string; }) {
typeFilter: /^(text\/html|application\/xhtml\+xml)/,
});
// テスト用
const allowPrivateIp = process.env.SUMMALY_ALLOW_PRIVATE_IP === 'true';
// SUMMALY_ALLOW_PRIVATE_IPはテスト用
const allowPrivateIp = process.env.SUMMALY_ALLOW_PRIVATE_IP === 'true' || Object.keys(agent).length > 0;
if (!allowPrivateIp && response.ip && PrivateIp(response.ip)) {
throw new StatusError(`Private IP rejected ${response.ip}`, 400, 'Private IP Rejected');
@ -108,16 +108,17 @@ async function getResponse(args: GotOptions) {
},
});
return await receiveResponce({ req, typeFilter: args.typeFilter });
return await receiveResponse({ req, typeFilter: args.typeFilter });
}
async function receiveResponce<T>(args: { req: Got.CancelableRequest<Got.Response<T>>, typeFilter?: RegExp }) {
async function receiveResponse<T>(args: { req: Got.CancelableRequest<Got.Response<T>>, typeFilter?: RegExp }) {
const req = args.req;
const maxSize = MAX_RESPONSE_SIZE;
req.on('response', (res: Got.Response) => {
// Check html
if (args.typeFilter && !res.headers['content-type']?.match(args.typeFilter)) {
// console.warn(res.headers['content-type']);
req.cancel(`Rejected by type filter ${res.headers['content-type']}`);
return;
}

View File

@ -1,4 +1,5 @@
export class StatusError extends Error {
public name: string;
public statusCode: number;
public statusMessage?: string;
public isPermanentError: boolean;

View File

@ -0,0 +1,3 @@
<!DOCTYPE html>
<meta property="og:video:url" content="https://example.com/embedurl" />
<link type="application/json+oembed" href="http://localhost:3060/oembed.json" />

View File

@ -0,0 +1,3 @@
<!DOCTYPE html>
<meta property="og:description" content="blobcats rule the world">
<link type="application/json+oembed" href="http://localhost:3060/oembed.json" />

View File

@ -0,0 +1,3 @@
<!DOCTYPE html>
<link type="application/json+oembed" href="http://localhost:3060/oembe.json" />
<meta property="og:description" content="nonexistent">

View File

@ -0,0 +1,2 @@
<!DOCTYPE html>
<link type="application/json+oembed" href="oembed.json" />

View File

@ -0,0 +1,3 @@
<!DOCTYPE html>
<link type="application/json+oembed" href="http://localhost:+3060/oembed.json" />
<meta property="og:description" content="wrong url">

2
test/htmls/oembed.html Normal file
View File

@ -0,0 +1,2 @@
<!DOCTYPE html>
<link type="application/json+oembed" href="http://localhost:3060/oembed.json" />

View File

@ -6,13 +6,16 @@
/* dependencies below */
import fs from 'node:fs';
import fs, { readdirSync } from 'node:fs';
import process from 'node:process';
import fastify from 'fastify';
import { summaly } from '../src/index.js';
import { dirname } from 'node:path';
import { fileURLToPath } from 'node:url';
import {expect, jest, test, describe, beforeEach, afterEach} from '@jest/globals';
import { expect, jest, test, describe, beforeEach, afterEach } from '@jest/globals';
import { Agent as httpAgent } from 'node:http';
import { Agent as httpsAgent } from 'node:https';
import { StatusError } from '../src/utils/status-error.js';
const _filename = fileURLToPath(import.meta.url);
const _dirname = dirname(_filename);
@ -31,10 +34,14 @@ const host = `http://localhost:${port}`;
// Display detail of unhandled promise rejection
process.on('unhandledRejection', console.dir);
let app: ReturnType<typeof fastify>;
let app: ReturnType<typeof fastify> | null = null;
let n = 0;
afterEach(() => {
if (app) return app.close();
afterEach(async () => {
if (app) {
await app.close();
app = null;
}
});
/* tests below */
@ -66,7 +73,7 @@ test('faviconがHTML上で指定されていなくて、ルートにも存在し
test('titleがcleanupされる', async () => {
app = fastify();
app.get('/', (request, reply) => {
return reply.send(fs.createReadStream(_dirname + '/htmls/ditry-title.html'));
return reply.send(fs.createReadStream(_dirname + '/htmls/dirty-title.html'));
});
await app.listen({ port });
@ -77,15 +84,39 @@ test('titleがcleanupされる', async () => {
describe('Private IP blocking', () => {
beforeEach(() => {
process.env.SUMMALY_ALLOW_PRIVATE_IP = 'false';
app = fastify();
app.get('*', (request, reply) => {
return reply.send(fs.createReadStream(_dirname + '/htmls/og-title.html'));
});
return app.listen({ port });
});
test('private ipなサーバーの情報を取得できない', async () => {
app = fastify();
app.get('/', (request, reply) => {
return reply.send(fs.createReadStream(_dirname + '/htmls/og-title.html'));
const summary = await summaly(host).catch((e: StatusError) => e);
if (summary instanceof StatusError) {
expect(summary.name).toBe('StatusError');
} else {
expect(summary).toBeInstanceOf(StatusError);
}
});
test('agentが指定されている場合はprivate ipを許可', async () => {
const summary = await summaly(host, {
agent: {
http: new httpAgent({ keepAlive: true }),
https: new httpsAgent({ keepAlive: true }),
}
});
await app.listen({ port });
expect(() => summaly(host)).rejects.toMatch('Private IP rejected 127.0.0.1');
expect(summary.title).toBe('Strawberry Pasta');
});
test('agentが空のオブジェクトの場合はprivate ipを許可しない', async () => {
const summary = await summaly(host, { agent: {} }).catch((e: StatusError) => e);
if (summary instanceof StatusError) {
expect(summary.name).toBe('StatusError');
} else {
expect(summary).toBeInstanceOf(StatusError);
}
});
afterEach(() => {
@ -96,7 +127,7 @@ describe('Private IP blocking', () => {
describe('OGP', () => {
test('title', async () => {
app = fastify();
app.get('/', (request, reply) => {
app.get('*', (request, reply) => {
return reply.send(fs.createReadStream(_dirname + '/htmls/og-title.html'));
});
await app.listen({ port });
@ -182,6 +213,7 @@ describe('TwitterCard', () => {
const summary = await summaly(host);
expect(summary.player.url).toBe('https://example.com/embedurl');
expect(summary.player.allow).toStrictEqual(['autoplay', 'encrypted-media', 'fullscreen']);
});
test('Player detection - Pleroma:video => video', async () => {
@ -193,6 +225,7 @@ describe('TwitterCard', () => {
const summary = await summaly(host);
expect(summary.player.url).toBe('https://example.com/embedurl');
expect(summary.player.allow).toStrictEqual(['autoplay', 'encrypted-media', 'fullscreen']);
});
test('Player detection - Pleroma:image => image', async () => {
@ -206,3 +239,133 @@ describe('TwitterCard', () => {
expect(summary.thumbnail).toBe('https://example.com/imageurl');
});
});
describe("oEmbed", () => {
const setUpFastify = async (oEmbedPath: string, htmlPath = 'htmls/oembed.html') => {
app = fastify();
app.get('/', (request, reply) => {
return reply.send(fs.createReadStream(new URL(htmlPath, import.meta.url)));
});
app.get('/oembed.json', (request, reply) => {
return reply.send(fs.createReadStream(
new URL(oEmbedPath, new URL('oembed/', import.meta.url))
));
});
await app.listen({ port });
}
for (const filename of readdirSync(new URL('oembed/invalid', import.meta.url))) {
test(`Invalidity test: ${filename}`, async () => {
await setUpFastify(`invalid/${filename}`);
const summary = await summaly(host);
expect(summary.player.url).toBe(null);
});
}
test('basic properties', async () => {
await setUpFastify('oembed.json');
const summary = await summaly(host);
expect(summary.player.url).toBe('https://example.com/');
expect(summary.player.width).toBe(500);
expect(summary.player.height).toBe(300);
});
test('type: video', async () => {
await setUpFastify('oembed-video.json');
const summary = await summaly(host);
expect(summary.player.url).toBe('https://example.com/');
expect(summary.player.width).toBe(500);
expect(summary.player.height).toBe(300);
});
test('max height', async () => {
await setUpFastify('oembed-too-tall.json');
const summary = await summaly(host);
expect(summary.player.height).toBe(1024);
});
test('children are ignored', async () => {
await setUpFastify('oembed-iframe-child.json');
const summary = await summaly(host);
expect(summary.player.url).toBe('https://example.com/');
});
test('allows fullscreen', async () => {
await setUpFastify('oembed-allow-fullscreen.json');
const summary = await summaly(host);
expect(summary.player.url).toBe('https://example.com/');
expect(summary.player.allow).toStrictEqual(['fullscreen']);
});
test('allows legacy allowfullscreen', async () => {
await setUpFastify('oembed-allow-fullscreen-legacy.json');
const summary = await summaly(host);
expect(summary.player.url).toBe('https://example.com/');
expect(summary.player.allow).toStrictEqual(['fullscreen']);
});
test('allows safelisted permissions', async () => {
await setUpFastify('oembed-allow-safelisted-permissions.json');
const summary = await summaly(host);
expect(summary.player.url).toBe('https://example.com/');
expect(summary.player.allow).toStrictEqual([
'autoplay', 'clipboard-write', 'fullscreen',
'encrypted-media', 'picture-in-picture', 'web-share',
]);
});
test('ignores rare permissions', async () => {
await setUpFastify('oembed-ignore-rare-permissions.json');
const summary = await summaly(host);
expect(summary.player.url).toBe('https://example.com/');
expect(summary.player.allow).toStrictEqual(['autoplay']);
});
test('oEmbed with relative path', async () => {
await setUpFastify('oembed.json', 'htmls/oembed-relative.html');
const summary = await summaly(host);
expect(summary.player.url).toBe('https://example.com/');
});
test('oEmbed with nonexistent path', async () => {
await setUpFastify('oembed.json', 'htmls/oembed-nonexistent-path.html');
const summary = await summaly(host);
expect(summary.player.url).toBe(null);
expect(summary.description).toBe('nonexistent');
});
test('oEmbed with wrong path', async () => {
await setUpFastify('oembed.json', 'htmls/oembed-wrong-path.html');
const summary = await summaly(host);
expect(summary.player.url).toBe(null);
expect(summary.description).toBe('wrong url');
});
test('oEmbed with OpenGraph', async () => {
await setUpFastify('oembed.json', 'htmls/oembed-and-og.html');
const summary = await summaly(host);
expect(summary.player.url).toBe('https://example.com/');
expect(summary.description).toBe('blobcats rule the world');
});
test('Invalid oEmbed with valid OpenGraph', async () => {
await setUpFastify('invalid/oembed-insecure.json', 'htmls/oembed-and-og.html');
const summary = await summaly(host);
expect(summary.player.url).toBe(null);
expect(summary.description).toBe('blobcats rule the world');
});
test('oEmbed with og:video', async () => {
await setUpFastify('oembed.json', 'htmls/oembed-and-og-video.html');
const summary = await summaly(host);
expect(summary.player.url).toBe('https://example.com/');
expect(summary.player.allow).toStrictEqual([]);
});
test('width: 100%', async () => {
await setUpFastify('oembed-percentage-width.json');
const summary = await summaly(host);
expect(summary.player.width).toBe(null);
expect(summary.player.height).toBe(300);
});
});

View File

@ -0,0 +1,7 @@
{
"version": "1.0",
"type": "rich",
"html": "<div><iframe src='https://example.com/'></iframe>",
"width": 500,
"height": 300
}

View File

@ -0,0 +1,7 @@
{
"version": "1.0",
"type": "rich",
"html": "<iframe src='https://example.com/'></iframe><iframe src='https://example.com/'></iframe>",
"width": 500,
"height": 300
}

View File

@ -0,0 +1,7 @@
{
"version": "11.0",
"type": "rich",
"html": "<iframe src='https://example.com/'></iframe>",
"width": 500,
"height": 300
}

View File

@ -0,0 +1,7 @@
{
"version": "1.0",
"type": "rich",
"html": "<iframe src='http://example.com/'></iframe>",
"width": 500,
"height": 300
}

View File

@ -0,0 +1,7 @@
{
"version": "1.0",
"type": "rich",
"html": "<iframe src='https://example.com/'></iframe>",
"width": 500,
"height": "blobcat"
}

View File

@ -0,0 +1,6 @@
{
"version": "1.0",
"type": "rich",
"html": "<iframe src='https://example.com/'></iframe>",
"width": 500
}

View File

@ -0,0 +1,6 @@
{
"type": "rich",
"html": "<iframe src='https://example.com/'></iframe>",
"width": 500,
"height": 300
}

View File

@ -0,0 +1,7 @@
{
"version": "0.1",
"type": "rich",
"html": "<iframe src='https://example.com/'></iframe>",
"width": 500,
"height": 300
}

View File

@ -0,0 +1,7 @@
{
"version": "1.0",
"type": "photo",
"url": "https://example.com/example.avif",
"width": 300,
"height": 300
}

View File

@ -0,0 +1,7 @@
{
"version": "1.0",
"type": "rich",
"html": "<iframe src='https://example.com/' allow='camera'></iframe>",
"width": 500,
"height": 300
}

View File

@ -0,0 +1,7 @@
{
"version": "1.0",
"type": "rich",
"html": "<iframe src='https://example.com/' allow='fullscreen;camera'></iframe>",
"width": 500,
"height": 300
}

View File

@ -0,0 +1,7 @@
{
"version": "1.0",
"type": "rich",
"html": "<iframe src='https://example.com/' allowfullscreen></iframe>",
"width": 500,
"height": 300
}

View File

@ -0,0 +1,7 @@
{
"version": "1.0",
"type": "rich",
"html": "<iframe src='https://example.com/' allow='fullscreen'></iframe>",
"width": 500,
"height": 300
}

View File

@ -0,0 +1,7 @@
{
"version": "1.0",
"type": "rich",
"html": "<iframe src='https://example.com/' allow='autoplay;clipboard-write;fullscreen;encrypted-media;picture-in-picture;web-share'></iframe>",
"width": 500,
"height": 300
}

View File

@ -0,0 +1,7 @@
{
"version": "1.0",
"type": "rich",
"html": "<iframe src='https://example.com/'><script>alert('Hahaha I take this world')</script></iframe>",
"width": 500,
"height": 300
}

View File

@ -0,0 +1,7 @@
{
"version": "1.0",
"type": "rich",
"html": "<iframe src='https://example.com/' allow='autoplay;gyroscope;accelerometer'></iframe>",
"width": 500,
"height": 300
}

View File

@ -0,0 +1,7 @@
{
"version": "1.0",
"type": "rich",
"html": "<iframe src='https://example.com/'></iframe>",
"width": "100%",
"height": 300
}

View File

@ -0,0 +1,6 @@
{
"version": "1.0",
"type": "rich",
"html": "<iframe src='https://example.com/'></iframe>",
"height": 3000
}

View File

@ -0,0 +1,7 @@
{
"version": "1.0",
"type": "video",
"html": "<iframe src='https://example.com/'></iframe>",
"width": 500,
"height": 300
}

7
test/oembed/oembed.json Normal file
View File

@ -0,0 +1,7 @@
{
"version": "1.0",
"type": "rich",
"html": "<iframe src='https://example.com/'></iframe>",
"width": 500,
"height": 300
}