Mask R-CNN: Automatically create GIF images of target objects from video

In the previous article, we introduced a method of automatically detecting and photographing birds using deep learning combined with a camera. Today, we use another novel deep learning model, Mask R-CNN, to automatically generate GIF images of the target object from the video.

Mask R-CNN has a lot of applications, and Zhijun has also introduced Facebook's use of this model to achieve a full-body AR project. However, in this project, the author Kirk Kaiser uses the MatterPort version. It supports Python 3, has great sample code, and is perhaps the easiest to install version.

The first step is to enter the correct content.

When I started making gif auto-generators, I chose to do one of the stupidest things first, which worked well in creative coding projects.

First, the input video can only contain one person. Don't try to track multiple people in the video. This way we can isolate the target object from other objects and it is easier to assess how well the model covers the target object. If the target object is gone, we can see that we can also see the noise of the model recognition border.

I chose the video I shot in the backyard.

Processing video in Python

Mask R-CNN: Automatically create GIF images of target objects from video

Although there are other ways to process video in Python, I prefer to convert the video to a sequence of images and then convert it back using ffmpeg.

Using the following command, we can get a series of images from the input video. Depending on the video source you enter, it may be between 24 and 60 frames per second. You need to track the number of video frames entered per second to stay in sync after the conversion.

$ ffmpeg -i FILENAME.mp4 -qscale:v 2 %05d.jpg

This will create a 5-bit, 0-filled image sequence. If you enter a video that is longer than 5 digits, you can change %05d to %09d.

The number sequence will be as long as the video lasts (in seconds), multiplied by the number of frames per second. So a video with a duration of three seconds and 24 frames per second will have 72 frames.

Thus, we get a series of still images that can be entered using our static mask R-CNN code.

Once the images have been processed, they will be put back into the video later using the following command:

$ ffmpeg -r 60 -f image2 -i %05d.jpg OUTPUT.mp4

The parameter -r specifies the number of frames per second that we need to use to build the output video. If we want to slow down the video, we can lower the value of the parameter. If you want to speed up, you can increase the value of the parameter.

In order, let's use the Mask R-CNN to detect and process the image first.

Detect and mark images

Mask R-CNN: Automatically create GIF images of target objects from video

The Matterport version of Mask R-CNN comes with Jupyter Notebook to help you understand how Mask R-CNN works.

Once you have set up the repo locally, I recommend running it on the demo notebook and evaluating the level of image detection work.

Usually the mask is an unsigned 8-bit integer with the same shape as the input image. When no target object is detected, the mask is 0 or black. When an object is detected, the mask is 255 or white.

In order to process the mask, we need to copy or paste another image as a channel. Using a code similar to the following, you can pull out a single character from the image of each video to produce a transparent bottom image:

Import numpy as np

Import os

Import coco

Import model as modellib

Import glob

Import imageio

Import cv2

# Root directory to project

ROOT_DIR = os.getcwd()

# Directory to save logs and trained model

MODEL_DIR = os.path.join(ROOT_DIR, "logs")

# Path to compiled weights file

# Download this file and place in the root of your

# project (See README file for details)

COCO_MODEL_PATH = os.path.join(ROOT_DIR, "mask_rcnn_coco.h5")

classInferenceConfig(coco.CocoConfig):

# Set batch size to 1 since we'll be running inference on

# one image at a time. Batch size = GPU_COUNT * IMAGES_PER_GPU

GPU_COUNT = 1

IMAGES_PER_GPU = 1

Config = InferenceConfig()

# Create model object in inference mode.

Model = modellib.MaskRCNN(mode="inference", model_dir=MODEL_DIR, config=config)

# Load weights trained on MS-COCO

Model.load_weights(COCO_MODEL_PATH, by_name=True)

Class_names = ['BG', 'person', 'bicycle', 'car', 'motorcycle', 'airplane',

'bus', 'train', 'truck', 'boat', 'traffic light',

'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird',

'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear',

'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie',

'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',

'kite', 'baseball bat', 'baseball glove', 'skateboard',

'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup',

'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple',

'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza',

'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed',

'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote',

'keyboard', 'cell phone', 'microwave', 'oven', 'toaster',

'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',

'teddy bear', 'hair drier', 'toothbrush']

numFiles = len(glob.glob('extractGif/*.jpg'))

Counter = 0

For i in range(1, numFiles):

Filename = 'extractGif/%05d.jpg' % i

Print("doing frame %s" % filename)

Frame = cv2.imread(filename)

Results = model.detect([frame], verbose=0)

r = results[0]

Masky = np.zeros((frame.shape[0], frame.shape[1]), dtype='uint8')

Humans = []

If r['rois'].shape[0] >= 1:

For b in range(r['rois'].shape[0]):

If r['class_ids'][b] == class_names.index('person'):

Masky += r['masks'][:,:,b] * 255

humansM = r['masks'][:,:,b] * 255

Y1, x1, y2, x2 = r['rois'][b]

humansCut = frame[y1:y2, x1:x2]

humansCut = cv2.cvtColor(humansCut.astype(np.uint8), cv2.COLOR_BGR2RGBA)

humansCut[:,:,3] = humansM[y1:y2, x1:x2]

Humans.append(humansCut)

If len(humans) >= 1:

Counter += 1

For j, human in enumerate(humans):

Fileout = 'giffer%i/%05d.png' % (j, counter)

Ifnot os.path.exists('giffer%i' % j):

Os.makedirs('giffer%i' % j)

Print(fileout)

#frame = cv2.cvtColor(frame.astype('uint8'), cv2.COLOR_BGRA2BGR)

Imageio.imwrite(fileout, human)

The class_names variable is very important, it represents the categories of different things we use to separate the COCO dataset. Then just replace the name of any category in the list with if r[class_ids][b] == class_names.index('person') and you can get the version with the mask from the video.

Convert images to GIF

Now that we have a set of transparent images, we can open them and see how they look. My results are not very good. The characters are not very delicate, but they are also very interesting.

Then we can enter the image into ffmpeg to create the animation, just need to find the maximum width and height of the image sequence, and then paste it into a new image sequence:

Import glob

From PIL importImage

maxW = 0

maxH = 0

DIRECTORY = 'wave-input'

numFiles = len(glob.glob(DIRECTORY + '/*.png'))

For num in range(numFiles - 1):

Im = Image.open(DIRECTORY + '/%05d.png' % (num + 1))

If im.width > maxW:

maxW = im.width

If im.height > maxH:

maxH = im.height

For num in range(numFiles - 1):

Each_image = Image.new("RGBA", (maxW, maxH))

Im = Image.open(DIRECTORY + '/%05d.png' % (num + 1))

Each_image.paste(im, (0,0))

Each_image.save('gifready/%05d.png' % num)

The above code opens our giffer0 directory and iterates through all the images to find the width and height of the largest size image. Then it puts these images into a new directory (gifready) and we can generate them as gifs.

Here, we use Imagemagick to generate gif:

$ convert -dispose Background *.png outty.gif

But it's not too exciting to just generate animations from the video. Let's continue to mix them and see what happens...

Generate generated GIFs in Pygame or video

Recently, I used these extracted images directly in Pygame. I did not convert them to gifs, but kept the original PNG format.

I created a creative small programming environment that includes a setup and draw function that takes an image sequence as input. With this setting, I can rotate, scale, or interfere with the image. Here is the code:

Import pygame

Import random

Import time

Import math

Import os

Import glob

Imageseq = []

Def setup(screen, etc):

Global imageseq

numIn = len(glob.glob('wave-input/*.png'))

For i in range(numIn):

If os.path.exists('wave-input/%05d.png' % (i + 1)):

Imagey = pygame.image.load('wave-input/%05d.png' % (i + 1)).convert_alpha()

Imageseq.append(imagey)

Counter = 1

Def draw(screen, etc):

#our current animation loop frame

Global counter

Current0 = imageseq[counter % len(imageseq)]

Counter += 1

For i in range(0, 1920, 200):

Screen.blit(current0, (i, 1080 // 2 - 230))

It loads the images from each directory into an alphapygame.surface. Each of them is added to the list, and then we can blit in the loop or draw each picture onto the screen.

This is just a basic setup, see the code above for more advanced operations, or check out other interesting experiments on my GitHub.

Modify the input video with the extracted Mask

To produce the effect of the above image, I recorded the first n frames of the video, as well as the position of the person and the skateboard.

Then I will stack the previously extracted figures one by one, and finally paste the last image.

In addition to this I tried to mix my mask with other videos. Here is a case of synthesis:

At the same time debug the environment and every frame of the skater, then cover the environment with the mask, covering the top of the skateboard video. This code can also be found on my GitHub.

Conclusion

To combine deep learning with art, this project is just the beginning. There is also a model called OpenPose that predicts a person's movements and the model is very stable. I plan to incorporate OpenPose into future projects to create more interesting works.

Multi-port HUB2.0

Multi-Port HUB 2.0

Multi-port Hub 2.0: This multi -port hub 2.0 is very compatible: Windows XP, 7, 8, 10, 11, Vista/Mac OS X 10.2 and higher/linux/unix compatibility. Can be added to any compatible device. This multi-port hub 2.0 belt protection equipment: constructed in overvoltage/over-current/leakage and short-circuit protection unit. The built -in electric surge protector design is used to ensure that the equipment is safe without a driver.

Multi-Port Hub2.0,Power Bank Charger,Wireless Charging Stations,Universal Laptop Charger

shenzhen ns-idae technology co.,ltd , https://www.best-charger.com